JP2002366146A

JP2002366146A - Encoding method for sound signal

Info

Publication number: JP2002366146A
Application number: JP2001178319A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2001-06-13
Filing date: 2001-06-13
Publication date: 2002-12-20
Anticipated expiration: 2021-06-13
Also published as: JP4601865B2

Abstract

PROBLEM TO BE SOLVED: To provide an encoding method for a sound signal which can process even a sound signal consisting of a plurality of channels like a stereophonic sound signal with an operation load nearly as large as that of a sound signal consisting of one channel like a monaural sound signal and can suppress an encoded data amount. SOLUTION: A plurality of unit sections are set on the time base (S1a, S1b) for a sound signal consisting of two right and left channels and frequency analyses are taken by the unit sections to compute phoneme data consisting of frequencies, intensity value, section start times corresponding to the start points of the unit sections, and section end times corresponding to the end points of the unit sections (S2a, S2b); and all phoneme data obtained from the sound signals of both the channels are integrated between phoneme data of both the channels of the same time and the same frequency and integrated phoneme data to which an intensity ratio is added according to the intensity values of a plurality of phoneme data of the integration source are generated (S3).

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、放送メディア（ラジ
オ、テレビ）、通信メディア（ＣＳ映像・音声配信、イ
ンターネット音楽配信、通信カラオケ）、パッケージメ
ディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−
ＲＯＭ、ゲームカセット、携帯音楽プレーヤ向け固体メ
モリ媒体）などで提供する各種オーディオコンテンツの
制作、並びに、専用携帯音楽プレーヤ、携帯電話・ＰＨ
Ｓ・ポケベルなどに向けたボーカルを含む音楽コンテン
ツ、歌舞伎・能・読経・詩歌など文芸作品の音声素材ま
たは語学教育音声教材のＭＩＤＩ伝送に利用するのに好
適な音響信号の符号化技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to broadcast media (radio, television), communication media (CS video / audio distribution, Internet music distribution, communication karaoke), package media (CD, MD, cassette, video, LD, CD). −
Production of various audio contents provided by ROMs, game cassettes, solid-state memory media for portable music players, etc., and dedicated portable music players, mobile phones and PHs
The present invention relates to an audio signal encoding technique suitable for being used for MIDI transmission of music contents including vocals directed to S. pagers, audio materials of literary works such as kabuki, noh, chanting, poetry, or audiovisual materials for language education.

【０００２】[0002]

【従来の技術】音響信号に代表される時系列信号には、
その構成要素として複数の周期信号が含まれている。こ
のため、与えられた時系列信号にどのような周期信号が
含まれているかを解析する手法は、古くから知られてい
る。例えば、フーリエ解析は、与えられた時系列信号に
含まれる周波数成分を解析するための方法として広く利
用されている。2. Description of the Related Art Time-series signals represented by acoustic signals include:
The components include a plurality of periodic signals. For this reason, a method of analyzing what periodic signal is included in a given time-series signal has been known for a long time. For example, Fourier analysis is widely used as a method for analyzing frequency components included in a given time-series signal.

【０００３】このような時系列信号の解析方法を利用す
れば、音響信号を符号化することも可能である。コンピ
ュータの普及により、原音となるアナログ音響信号を所
定のサンプリング周波数でサンプリングし、各サンプリ
ング時の信号強度を量子化してデジタルデータとして取
り込むことが容易にできるようになってきており、こう
して取り込んだデジタルデータに対してフーリエ解析な
どの手法を適用し、原音信号に含まれていた周波数成分
を抽出すれば、各周波数成分を示す符号によって原音信
号の符号化が可能になる。[0003] If such a time-series signal analysis method is used, it is possible to encode an audio signal. With the spread of computers, it has become easier to sample analog audio signals as original sounds at a predetermined sampling frequency, quantize the signal strength at each sampling, and take in as digital data. If a method such as Fourier analysis is applied to the data and frequency components included in the original sound signal are extracted, the original sound signal can be encoded by a code indicating each frequency component.

【０００４】一方、電子楽器による楽器音を符号化しよ
うという発想から生まれたＭＩＤＩ（Musical Instrume
nt Digital Interface）規格も、パーソナルコンピュー
タの普及とともに盛んに利用されるようになってきてい
る。このＭＩＤＩ規格による符号データ（以下、ＭＩＤ
Ｉデータという）は、基本的には、楽器のどの鍵盤キー
を、どの程度の強さで弾いたか、という楽器演奏の操作
を記述したデータであり、このＭＩＤＩデータ自身に
は、実際の音の波形は含まれていない。そのため、実際
の音を再生する場合には、楽器音の波形を記憶したＭＩ
ＤＩ音源が別途必要になるが、その符号化効率の高さが
注目を集めており、ＭＩＤＩ規格による符号化および復
号化の技術は、現在、パーソナルコンピュータを用いて
楽器演奏、楽器練習、作曲などを行うソフトウェアに広
く採り入れられている。On the other hand, MIDI (Musical Instrume) was born from the idea of encoding musical instrument sounds by electronic musical instruments.
The Digital Interface (nt Digital Interface) standard has also been actively used with the spread of personal computers. Code data according to the MIDI standard (hereinafter, MID)
I data) is basically data describing an operation of playing a musical instrument, such as which keyboard key of the musical instrument was played and at what strength, and the MIDI data itself contains the actual sound. No waveform is included. Therefore, when reproducing the actual sound, the MI which stores the waveform of the musical instrument sound is used.
Although a DI sound source is required separately, its high coding efficiency has been attracting attention, and the encoding and decoding technology according to the MIDI standard currently uses a personal computer to play musical instruments, practice musical instruments, compose music, and the like. Is widely adopted in software that performs

【０００５】そこで、音響信号に代表される時系列信号
に対して、所定の手法で解析を行うことにより、その構
成要素となる周期信号を抽出し、抽出した周期信号をＭ
ＩＤＩデータを用いて符号化しようとする提案がなされ
ている。例えば、特開平１０−２４７０９９号公報、特
開平１１−７３１９９号公報、特開平１１−７３２００
号公報、特開平１１−９５７５３号公報、特開２０００
−９９００９号公報、特開２０００−９９０９２号公
報、特開２０００−９９０９３号公報、特開２０００−
２６１３２２号公報、特開２００１−５４５０号公報、
特開２００１−１４８６３３号公報には、任意の時系列
信号について、構成要素となる周波数を解析し、その解
析結果からＭＩＤＩデータを作成することができる種々
の方法が提案されている。Therefore, by analyzing a time-series signal represented by an acoustic signal by a predetermined method, a periodic signal as a component of the signal is extracted, and the extracted periodic signal is converted to an M signal.
There have been proposals to encode using IDI data. For example, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200
JP, JP-A-11-95753, JP-A-2000
-99009, JP-A-2000-99092, JP-A-2000-99093, JP-A-2000-99093
No. 261322, JP-A-2001-5450,
Japanese Unexamined Patent Application Publication No. 2001-148633 proposes various methods capable of analyzing frequencies as constituent elements of an arbitrary time-series signal and creating MIDI data from the analysis result.

【０００６】[0006]

【発明が解決しようとする課題】上記各公報において提
案してきたＭＩＤＩ符号化方式により、演奏録音等から
得られる音響信号の効率的な符号化が可能になった。し
かしながら、上記従来の符号化手法は、１つのチャンネ
ルからなる音響信号を処理するための方法であるため、
モノラルの音響信号を処理することには適しているが、
ステレオの音響信号のような２チャンネルからなる音響
信号については、必ずしも効率的に符号化処理すること
ができない。もちろん、複数のチャンネルからなる信号
を処理することは可能であるが、各チャンネルの信号を
独立して処理するため、当然にチャンネル数分の演算処
理を行う必要があり、負荷が膨大なものとなっている。
また、符号化の結果得られる符号データ量についてもチ
ャンネル数に応じて増大する。The MIDI encoding scheme proposed in each of the above publications has made it possible to efficiently encode audio signals obtained from performance recordings and the like. However, since the above-mentioned conventional encoding method is a method for processing an audio signal composed of one channel,
Suitable for processing monaural sound signals,
It is not always possible to efficiently encode a two-channel audio signal such as a stereo audio signal. Of course, it is possible to process signals consisting of multiple channels, but since the signals of each channel are processed independently, it is naturally necessary to perform arithmetic processing for the number of channels, resulting in a huge load. Has become.
Also, the amount of code data obtained as a result of the encoding increases according to the number of channels.

【０００７】上記のような点に鑑み、本発明は、ステレ
オ音響信号のような複数のチャンネルからなる音響信号
についても、モノラル音響信号のような１つのチャンネ
ルからなる音響信号とほぼ同程度の演算負荷で処理を行
うことができると共に、符号データ量も抑えることが可
能な音響信号の符号化方法を提供することを課題とす
る。[0007] In view of the above, the present invention provides a method for calculating a sound signal having a plurality of channels such as a stereo sound signal which is substantially the same as a sound signal having a single channel such as a monaural sound signal. It is an object of the present invention to provide an audio signal encoding method capable of performing processing with a load and suppressing the amount of encoded data.

【０００８】[0008]

【課題を解決するための手段】上記課題を解決するた
め、本発明では、複数の独立した波形情報で与えられた
音響信号に対して、時間軸上に複数の単位区間を設定
し、前記単位区間ごとに周波数解析を行なって、周波数
と、強度値と、単位区間の始点に対応する区間開始時刻
と、単位区間の終点に対応する区間終了時刻で構成され
る音素データを算出し、音素データ算出の処理を各入力
チャンネルにおける全単位区間に対して行うことにより
得られる全ての音素データについて、同一時刻、同一周
波数をもつ音素データ同士を各入力チャンネル間におい
て統合すると共に、統合元の複数の音素データの強度値
に基づいて強度比率を付加した統合音素データを作成す
ることにより、統合音素データの集合である符号データ
を得るようにしたことを特徴とする。In order to solve the above-mentioned problems, according to the present invention, a plurality of unit sections are set on a time axis with respect to a sound signal given by a plurality of independent waveform information, A frequency analysis is performed for each section to calculate phoneme data including a frequency, an intensity value, a section start time corresponding to the start point of the unit section, and a section end time corresponding to the end point of the unit section. For all phoneme data obtained by performing the calculation process for all unit sections in each input channel, phoneme data having the same time and the same frequency are integrated between each input channel, and a plurality of integration source Code data that is a set of integrated phoneme data is obtained by creating integrated phoneme data to which an intensity ratio is added based on the intensity value of phoneme data. And it features.

【０００９】本発明によれば、ステレオ音響信号のよう
な複数の入力チャンネルを有する音響信号について、各
入力チャンネルごとに音素データを作成し、各入力チャ
ンネルの音素データを統合すると共に、強度比率を付加
するようにしたので、複数の入力チャンネルからの音響
信号を、統合された１つのチャンネルの符号データとし
て扱うことができ、データ量および処理負荷の削減が行
われる。この際、強度比率の情報を付加しているので、
元の入力チャンネルの強度値バランスを失うことはな
い。According to the present invention, for an audio signal having a plurality of input channels such as a stereo audio signal, phoneme data is created for each input channel, the phoneme data of each input channel is integrated, and the intensity ratio is adjusted. Because of the addition, audio signals from a plurality of input channels can be treated as code data of one integrated channel, and the data amount and the processing load are reduced. At this time, since the information of the intensity ratio is added,
There is no loss of the intensity value balance of the original input channel.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施形態について
図面を参照して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１１】（音響信号符号化方法の基本原理）はじめ
に、本発明に係る音響信号の符号化方法の基本原理を述
べておく。この基本原理は、前掲の各公報に開示されて
いるので、ここではその概要のみを簡単に述べることに
する。(Basic Principle of Audio Signal Coding Method) First, the basic principle of the audio signal coding method according to the present invention will be described. Since this basic principle is disclosed in the above-mentioned publications, only an outline thereof will be briefly described here.

【００１２】図１（ａ）に示すように、時系列信号とし
てアナログ音響信号が与えられたものとする。図１の例
では、横軸に時間ｔ、縦軸に振幅（強度）をとって、こ
の音響信号を示している。ここでは、まずこのアナログ
音響信号を、デジタルの音響データとして取り込む処理
を行う。これは、従来の一般的なＰＣＭの手法を用い、
所定のサンプリング周波数でこのアナログ音響信号をサ
ンプリングし、振幅を所定の量子化ビット数を用いてデ
ジタルデータに変換する処理を行えば良い。ここでは、
説明の便宜上、ＰＣＭの手法でデジタル化した音響デー
タの波形も図１（ａ）のアナログ音響信号と同一の波形
で示すことにする。As shown in FIG. 1A, it is assumed that an analog audio signal is given as a time-series signal. In the example of FIG. 1, the horizontal axis represents time t, and the vertical axis represents amplitude (intensity), and this acoustic signal is shown. Here, first, a process of capturing the analog audio signal as digital audio data is performed. This uses the conventional general PCM method,
The analog audio signal may be sampled at a predetermined sampling frequency and the amplitude may be converted into digital data using a predetermined number of quantization bits. here,
For convenience of explanation, the waveform of the audio data digitized by the PCM method is also shown by the same waveform as the analog audio signal in FIG.

【００１３】続いて、この解析対象となる音響信号の時
間軸上に、複数の単位区間を設定する。図１（ａ）に示
す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６
が定義され、これら各時刻を始点および終点とする５つ
の単位区間ｄ１〜ｄ５が設定されている。図１の例で
は、全て同一の区間長をもった単位区間が設定されてい
るが、個々の単位区間ごとに区間長を変えるようにして
もかまわない。あるいは、隣接する単位区間が時間軸上
で部分的に重なり合うような区間設定を行ってもかまわ
ない。Subsequently, a plurality of unit sections are set on the time axis of the audio signal to be analyzed. In the example shown in FIG. 1A, six times t1 to t6 are equally spaced on the time axis t.
Are defined, and five unit sections d1 to d5 having these times as a start point and an end point are set. In the example of FIG. 1, unit sections having the same section length are all set, but the section length may be changed for each unit section. Alternatively, a section may be set such that adjacent unit sections partially overlap on the time axis.

【００１４】こうして単位区間が設定されたら、各単位
区間ごとの音響信号（以下、区間信号と呼ぶことにす
る）について、それぞれ代表周波数を選出する。各区間
信号には、通常、様々な周波数成分が含まれているが、
例えば、その中で成分の強度割合の大きな周波数成分を
代表周波数として選出すれば良い。ここで、代表周波数
とはいわゆる基本周波数が一般的であるが、音声のフォ
ルマント周波数などの倍音周波数や、ノイズ音源のピー
ク周波数も代表周波数として扱うことがある。代表周波
数は１つだけ選出しても良いが、音響信号によっては複
数の代表周波数を選出した方が、より精度の高い符号化
が可能になる。図１（ｂ）には、個々の単位区間ごとに
それぞれ３つの代表周波数を選出し、１つの代表周波数
を１つの代表符号（図では便宜上、音符として示してあ
る）として符号化した例が示されている。ここでは、代
表符号（音符）を収容するために３つのトラックＴ１，
Ｔ２，Ｔ３が設けられているが、これは個々の単位区間
ごとに選出された３つずつの代表符号を、それぞれ異な
るトラックに収容するためである。When the unit sections are set in this way, a representative frequency is selected for each audio signal (hereinafter, referred to as section signal) for each unit section. Each section signal usually contains various frequency components,
For example, a frequency component having a large intensity ratio of the component may be selected as the representative frequency. Here, the representative frequency is generally a so-called fundamental frequency, but a harmonic frequency such as a formant frequency of a voice and a peak frequency of a noise sound source may be treated as the representative frequency. Although only one representative frequency may be selected, depending on the acoustic signal, selecting a plurality of representative frequencies enables more accurate encoding. FIG. 1B shows an example in which three representative frequencies are selected for each unit section, and one representative frequency is encoded as one representative code (for convenience, shown as a musical note in the figure). Have been. Here, three tracks T1, T1,
T2 and T3 are provided in order to accommodate three representative codes selected for each unit section in different tracks.

【００１５】例えば、単位区間ｄ１について選出された
代表符号ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，
３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容され
ている。ここで、各符号ｎ（ｄ１，１），ｎ（ｄ１，
２），ｎ（ｄ１，３）は、ＭＩＤＩ符号におけるノート
ナンバーを示す符号である。ＭＩＤＩ符号におけるノー
トナンバーは、０〜１２７までの１２８通りの値をと
り、それぞれピアノの鍵盤の１つのキーを示すことにな
る。具体的には、例えば、代表周波数として４４０Ｈｚ
が選出された場合、この周波数はノートナンバーｎ＝６
９（ピアノの鍵盤中央の「ラ音（Ａ３音）」に対応）に
相当するので、代表符号としては、ｎ＝６９が選出され
ることになる。もっとも、図１（ｂ）は、上述の方法に
よって得られる代表符号を音符の形式で示した概念図で
あり、実際には、各音符にはそれぞれ強度に関するデー
タも付加されている。例えば、トラックＴ１には、ノー
トナンバーｎ（ｄ１，１），ｎ（ｄ２，１）・・・なる
音高を示すデータとともに、ｅ（ｄ１，１），ｅ（ｄ
２，１）・・・なる強度を示すデータが収容されること
になる。この強度を示すデータは、各代表周波数の成分
が、元の区間信号にどの程度の度合いで含まれていたか
によって決定される。具体的には、各代表周波数をもっ
た周期関数の区間信号に対する相関値に基づいて強度を
示すデータが決定されることになる。また、図１（ｂ）
に示す概念図では、音符の横方向の位置によって、個々
の単位区間の時間軸上での位置が示されているが、実際
には、この時間軸上での位置を正確に数値として示すデ
ータが各音符に付加されていることになる。For example, the representative codes n (d1,1), n (d1,2), n (d1,
3) are accommodated in the tracks T1, T2, T3, respectively. Here, each code n (d1, 1), n (d1,
2), n (d1, 3) are codes indicating note numbers in the MIDI code. The note number in the MIDI code takes 128 values from 0 to 127 and indicates one key of the piano keyboard. Specifically, for example, 440 Hz as a representative frequency
Is selected, this frequency has a note number n = 6
9 (corresponding to the "ra tone (A3 tone)" at the center of the piano keyboard), so that n = 69 is selected as the representative code. However, FIG. 1B is a conceptual diagram showing a representative code obtained by the above-described method in the form of a musical note, and in practice, data relating to the intensity is added to each musical note. For example, in the track T1, e (d1, 1) and e (d) are added together with data indicating pitches of note numbers n (d1, 1), n (d2, 1).
2, 1)... Are stored. The data indicating the strength is determined based on how much the component of each representative frequency is included in the original section signal. Specifically, data indicating the intensity is determined based on the correlation value of the periodic function having each representative frequency with respect to the section signal. FIG. 1 (b)
In the conceptual diagram shown in Fig. 7, the position of each unit section on the time axis is indicated by the position of the note in the horizontal direction, but in actuality, data that accurately indicates the position on the time axis as a numerical value Is added to each note.

【００１６】音響信号を符号化する形式としては、必ず
しもＭＩＤＩ形式を採用する必要はないが、この種の符
号化形式としてはＭＩＤＩ形式が最も普及しているた
め、実用上はＭＩＤＩ形式の符号データを用いるのが好
ましい。ＭＩＤＩ形式では、「ノートオン」データもし
くは「ノートオフ」データが、「デルタタイム」データ
を介在させながら存在する。「ノートオン」データは、
特定のノートナンバーＮとベロシティーＶを指定して特
定の音の演奏開始を指示するデータであり、「ノートオ
フ」データは、特定のノートナンバーＮとベロシティー
Ｖを指定して特定の音の演奏終了を指示するデータであ
る。また、「デルタタイム」データは、所定の時間間隔
を示すデータである。ベロシティーＶは、例えば、ピア
ノの鍵盤などを押し下げる速度（ノートオン時のベロシ
ティー）および鍵盤から指を離す速度（ノートオフ時の
ベロシティー）を示すパラメータであり、特定の音の演
奏開始操作もしくは演奏終了操作の強さを示すことにな
る。It is not always necessary to adopt the MIDI format as a format for encoding an audio signal. However, since the MIDI format is the most widespread as this type of encoding format, the MIDI format code data is practically used. It is preferable to use In the MIDI format, "note on" data or "note off" data exists with "delta time" data interposed. Note-on data is
The "note-off" data is data specifying a specific note number N and velocity V to designate the start of performance of a specific sound, and "note-off" data is data specifying a specific note number N and velocity V. This is data for instructing the end of the performance. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which a piano keyboard or the like is depressed (velocity at the time of note-on) and the speed at which the finger is released from the keyboard (velocity at the time of note-off). Or it indicates the strength of the performance end operation.

【００１７】前述の方法では、第ｉ番目の単位区間ｄｉ
について、代表符号としてＪ個のノートナンバーｎ（ｄ
ｉ，１），ｎ（ｄｉ，２），・・・，ｎ（ｄｉ，Ｊ）が
得られ、このそれぞれについて強度ｅ（ｄｉ，１），ｅ
（ｄｉ，２），・・・，ｅ（ｄｉ，Ｊ）が得られる。そ
こで、次のような手法により、ＭＩＤＩ形式の符号デー
タを作成することができる。まず、「ノートオン」デー
タもしくは「ノートオフ」データの中で記述するノート
ナンバーＮとしては、得られたノートナンバーｎ（ｄ
ｉ，１），ｎ（ｄｉ，２），・・・，ｎ（ｄｉ，Ｊ）を
そのまま用いれば良い。一方、「ノートオン」データも
しくは「ノートオフ」データの中で記述するベロシティ
ーＶとしては、得られた強度ｅ（ｄｉ，１），ｅ（ｄ
ｉ，２），・・・，ｅ（ｄｉ，Ｊ）を所定の方法で規格
化した値を用いれば良い。また、「デルタタイム」デー
タは、各単位区間の長さに応じて設定すれば良い。In the above method, the i-th unit section di
, J note numbers n (d
i, 1), n (di, 2),..., n (di, J) are obtained, and the intensities e (di, 1), e
(Di, 2),..., E (di, J) are obtained. Therefore, MIDI-format code data can be created by the following method. First, as the note number N described in the “note-on” data or “note-off” data, the obtained note number n (d
i, 1), n (di, 2),..., n (di, J) may be used as they are. On the other hand, as the velocity V described in the “note-on” data or the “note-off” data, the obtained intensities e (di, 1) and e (d
i, 2),..., and e (di, J) may be standardized by a predetermined method. The “delta time” data may be set according to the length of each unit section.

【００１８】（周期関数との相関を求める具体的な方
法）上述した基本原理の基づく方法では、区間信号に対
して、１つまたは複数の代表周波数が選出され、この代
表周波数をもった周期信号によって、当該区間信号が表
現されることになる。ここで、選出される代表周波数
は、文字どおり、当該単位区間内の信号成分を代表する
周波数である。この代表周波数を選出する具体的な方法
には、後述するように、短時間フーリエ変換を利用する
方法と、一般化調和解析の手法を利用する方法とがあ
る。いずれの方法も、基本的な考え方は同じであり、あ
らかじめ周波数の異なる複数の周期関数を用意してお
き、これら複数の周期関数の中から、当該単位区間内の
区間信号に対する相関が高い周期関数を見つけ出し、こ
の相関の高い周期関数の周波数を代表周波数として選出
する、という手法を採ることになる。すなわち、代表周
波数を選出する際には、あらかじめ用意された複数の周
期関数と、単位区間内の区間信号との相関を求める演算
を行うことになる。そこで、ここでは、周期関数との相
関を求める具体的な方法を述べておく。(Specific Method for Determining Correlation with Periodic Function) In the method based on the basic principle described above, one or a plurality of representative frequencies are selected for an interval signal, and a periodic signal having this representative frequency is selected. Thus, the section signal is expressed. Here, the selected representative frequency is, literally, a frequency representative of a signal component in the unit section. Specific methods for selecting the representative frequency include a method using a short-time Fourier transform and a method using a generalized harmonic analysis method, as described later. Both methods have the same basic concept. A plurality of periodic functions having different frequencies are prepared in advance, and a periodic function having a high correlation with the section signal in the unit section is selected from the plurality of periodic functions. , And the frequency of the periodic function having a high correlation is selected as a representative frequency. That is, when selecting a representative frequency, an operation for calculating a correlation between a plurality of periodic functions prepared in advance and a section signal in a unit section is performed. Therefore, here, a specific method for obtaining the correlation with the periodic function will be described.

【００１９】複数の周期関数として、図２に示すような
三角関数が用意されているものとする。これらの三角関
数は、同一周波数をもった正弦関数と余弦関数との対か
ら構成されており、１２８通りの標準周波数ｆ（０）〜
ｆ（１２７）のそれぞれについて、正弦関数および余弦
関数の対が定義されていることになる。ここでは、同一
の周波数をもった正弦関数および余弦関数からなる一対
の関数を、当該周波数についての周期関数として定義す
ることにする。すなわち、ある特定の周波数についての
周期関数は、一対の正弦関数および余弦関数によって構
成されることになる。このように、一対の正弦関数と余
弦関数とにより周期関数を定義するのは、信号に対する
周期関数の相関値を求める際に、相関値が位相の影響を
受ける事を考慮するためである。なお、図２に示す各三
角関数内の変数Ｆおよびｋは、区間信号Ｘについてのサ
ンプリング周波数Ｆおよびサンプル番号ｋに相当する変
数である。例えば、周波数ｆ（０）についての正弦波
は、ｓｉｎ（２πｆ（０）ｋ／Ｆ）で示され、任意のサ
ンプル番号ｋを与えると、区間信号を構成する第ｋ番目
のサンプルと同一時間位置における周期関数の振幅値が
得られる。It is assumed that a trigonometric function as shown in FIG. 2 is prepared as a plurality of periodic functions. These trigonometric functions are composed of a pair of a sine function and a cosine function having the same frequency, and have 128 standard frequencies f (0) to
For each of f (127), a pair of a sine function and a cosine function is defined. Here, a pair of functions consisting of a sine function and a cosine function having the same frequency is defined as a periodic function for the frequency. That is, the periodic function for a specific frequency is constituted by a pair of a sine function and a cosine function. The reason why the periodic function is defined by the pair of the sine function and the cosine function is to consider that the correlation value is affected by the phase when calculating the correlation value of the periodic function for the signal. Variables F and k in each trigonometric function shown in FIG. 2 are variables corresponding to sampling frequency F and sample number k for section signal X. For example, a sine wave for a frequency f (0) is represented by sin (2πf (0) k / F), and given an arbitrary sample number k, the same time position as the k-th sample forming the section signal Is obtained.

【００２０】ここでは、１２８通りの標準周波数ｆ
（０）〜ｆ（１２７）を図３に示すような式で定義した
例を示すことにする。すなわち、第ｎ番目（０≦ｎ≦１
２７）の標準周波数ｆ（ｎ）は、以下に示す〔数式１〕
で定義されることになる。Here, 128 standard frequencies f
An example in which (0) to f (127) are defined by equations as shown in FIG. 3 will be shown. That is, the n-th (0 ≦ n ≦ 1
The standard frequency f (n) of 27) is represented by the following [Equation 1].
Is defined as

【００２１】〔数式１〕ｆ（ｎ）＝４４０×２^γ ⁽ⁿ⁾ γ（ｎ）＝（ｎ−６９）／１２[Equation 1] f (n) = 440 × 2 ^γ ⁽ⁿ⁾ γ (n) = (n−69) / 12

【００２２】このような式によって標準周波数を定義し
ておくと、最終的にＭＩＤＩデータを用いた符号化を行
う際に便利である。なぜなら、このような定義によって
設定される１２８通りの標準周波数ｆ（０）〜ｆ（１２
７）は、等比級数をなす周波数値をとることになり、Ｍ
ＩＤＩデータで利用されるノートナンバーに対応した周
波数になるからである。したがって、図２に示す１２８
通りの標準周波数ｆ（０）〜ｆ（１２７）は、対数尺度
で示した周波数軸上に等間隔（ＭＩＤＩにおける半音単
位）に設定した周波数ということになる。Defining the standard frequency using such an expression is convenient for finally performing encoding using MIDI data. This is because 128 standard frequencies f (0) to f (12)
7) takes frequency values forming a geometric series, and M
This is because the frequency corresponds to the note number used in the IDI data. Therefore, 128 shown in FIG.
The standard frequencies f (0) to f (127) are frequencies set at equal intervals (in semitone units in MIDI) on a frequency axis represented by a logarithmic scale.

【００２３】続いて、任意の区間の区間信号に対する各
周期関数の相関の求め方について、具体的な説明を行
う。例えば、図４に示すように、ある単位区間ｄについ
て区間信号Ｘが与えられていたとする。ここでは、区間
長Ｌをもった単位区間ｄについて、サンプリング周波数
Ｆでサンプリングが行なわれており、全部でｗ個のサン
プル値が得られているものとし、サンプル番号を図示の
ように、０，１，２，３，・・・，ｋ，・・・，ｗ−
２，ｗ−１とする（白丸で示す第ｗ番目のサンプルは、
右に隣接する次の単位区間の先頭に含まれるサンプルと
する）。この場合、任意のサンプル番号ｋについては、
Ｘ（ｋ）なる振幅値がデジタルデータとして与えられて
いることになる。短時間フーリエ変換においては、Ｘ
（ｋ）に対して各サンプルごとに中央の重みが１に近
く、両端の重みが０に近くなるような窓関数Ｗ（ｋ）を
乗ずることが通常である。すなわち、Ｘ（ｋ）×Ｗ
（ｋ）をＸ（ｋ）と扱って以下のような相関計算を行う
もので、窓関数の形状としては余弦波形状のハミング窓
が一般に用いられている。ここで、ｗは以下の記述にお
いても定数のような記載をしているが、一般にはｎの値
に応じて変化させ、区間長Ｌを超えない範囲で最大とな
るＦ／ｆ（ｎ）の整数倍の値に設定することが望まし
い。Next, a specific description will be given of a method of obtaining a correlation of each periodic function with respect to an interval signal of an arbitrary interval. For example, as shown in FIG. 4, it is assumed that a section signal X is given for a certain unit section d. Here, it is assumed that sampling is performed at a sampling frequency F for a unit section d having a section length L, and that a total of w sample values have been obtained. 1, 2, 3, ..., k, ..., w-
2, w-1 (the w-th sample shown by a white circle is
The sample is included at the beginning of the next unit section adjacent to the right.) In this case, for any sample number k,
The amplitude value X (k) is given as digital data. In the short-time Fourier transform, X
It is normal to multiply (k) by a window function W (k) such that the weight at the center is close to 1 and the weight at both ends is close to 0 for each sample. That is, X (k) × W
The following correlation calculation is performed by treating (k) as X (k), and a cosine-wave shaped Hamming window is generally used as the shape of the window function. Here, w is described as a constant in the following description. In general, w is changed according to the value of n, and the maximum value of F / f (n) within the range not exceeding the section length L is obtained. It is desirable to set the value to an integral multiple.

【００２４】このような区間信号Ｘに対して、第ｎ番目
の標準周波数ｆ（ｎ）をもった正弦関数Ｒｎとの相関値
を求める原理を示す。両者の相関値Ａ（ｎ）は、図５の
第１の演算式によって定義することができる。ここで、
Ｘ（ｋ）は、図４に示すように、区間信号Ｘにおけるサ
ンプル番号ｋの振幅値であり、ｓｉｎ（２πｆ（ｎ）ｋ
／Ｆ）は、時間軸上での同位置における正弦関数Ｒｎの
振幅値である。この第１の演算式は、単位区間ｄ内の全
サンプル番号ｋ＝０〜ｗ−１の次元について、それぞれ
区間信号Ｘの振幅値と正弦関数Ｒｎの振幅ベクトルの内
積を求める式ということができる。The principle of obtaining a correlation value between such a section signal X and a sine function Rn having an n-th standard frequency f (n) will be described. The correlation value A (n) between the two can be defined by the first arithmetic expression in FIG. here,
X (k) is the amplitude value of the sample number k in the section signal X, as shown in FIG. 4, and sin (2πf (n) k
/ F) is the amplitude value of the sine function Rn at the same position on the time axis. This first arithmetic expression can be said to be an expression for calculating the inner product of the amplitude value of the section signal X and the amplitude vector of the sine function Rn for the dimensions of all sample numbers k = 0 to w−1 in the unit section d. .

【００２５】同様に、図５の第２の演算式は、区間信号
Ｘと、第ｎ番目の標準周波数ｆ（ｎ）をもった余弦関数
との相関値を求める式であり、両者の相関値はＢ（ｎ）
で与えられる。なお、相関値Ａ（ｎ）を求めるための第
１の演算式も、相関値Ｂ（ｎ）を求めるための第２の演
算式も、最終的に２／ｗが乗ぜられているが、これは相
関値を規格化するためのものでり、前述のとおりｗはｎ
に依存して変化させるのが一般的であるため、この係数
もｎに依存する変数である。Similarly, the second operation expression of FIG. 5 is an expression for obtaining a correlation value between the section signal X and a cosine function having the n-th standard frequency f (n). Is B (n)
Given by It should be noted that both the first equation for obtaining the correlation value A (n) and the second equation for obtaining the correlation value B (n) are finally multiplied by 2 / w. Is for normalizing the correlation value, and w is n
This coefficient is also a variable that depends on n, since it is generally changed depending on.

【００２６】区間信号Ｘと標準周波数ｆ（ｎ）をもった
標準周期関数との相関実効値は、図５の第３の演算式に
示すように、正弦関数との相関値Ａ（ｎ）と余弦関数と
の相関値Ｂ（ｎ）との二乗和平方根値Ｅ（ｎ）によって
示すことができる。この相関実効値の大きな標準周期関
数の周波数を代表周波数として選出すれば、この代表周
波数を用いて区間信号Ｘを符号化することができる。The effective value of the correlation between the section signal X and the standard periodic function having the standard frequency f (n) is, as shown in the third equation of FIG. 5, the value of the correlation A (n) with the sine function. It can be indicated by the root sum square (E (n)) of the correlation value B (n) with the cosine function. If a frequency of the standard periodic function having a large correlation effective value is selected as a representative frequency, the section signal X can be encoded using the representative frequency.

【００２７】すなわち、この相関値Ｅ（ｎ）が所定の基
準以上の大きさとなる１つまたは複数の標準周波数を代
表周波数として選出すれば良い。なお、ここで「相関値
Ｅ（ｎ）が所定の基準以上の大きさとなる」という選出
条件は、例えば、何らかの閾値を設定しておき、相関値
Ｅ（ｎ）がこの閾値を超えるような標準周波数ｆ（ｎ）
をすべて代表周波数として選出する、という絶対的な選
出条件を設定しても良いが、例えば、相関値Ｅ（ｎ）の
大きさの順にＱ番目までを選出する、というような相対
的な選出条件を設定しても良い。That is, one or more standard frequencies at which the correlation value E (n) is equal to or larger than a predetermined reference may be selected as a representative frequency. Here, the selection condition that “the correlation value E (n) is equal to or larger than a predetermined reference” is set, for example, by setting a certain threshold value and setting a standard value such that the correlation value E (n) exceeds this threshold value. Frequency f (n)
May be set as the representative frequency, but relative selection conditions such as selecting up to the Qth in the order of the magnitude of the correlation value E (n) may be set. May be set.

【００２８】（一般化調和解析の手法）ここでは、本発
明に係る音響信号の符号化を行う際に有用な一般化調和
解析の手法について説明する。既に説明したように、音
響信号を符号化する場合、個々の単位区間内の区間信号
について、相関値の高いいくつかの代表周波数を選出す
ることになる。一般化調和解析は、より高い精度で代表
周波数の選出を可能にする手法であり、その基本原理は
次の通りである。(Method of Generalized Harmonic Analysis) Here, a method of generalized harmonic analysis useful in encoding an audio signal according to the present invention will be described. As described above, when encoding an audio signal, some representative frequencies having high correlation values are selected for section signals in each unit section. Generalized harmonic analysis is a technique that enables selection of a representative frequency with higher accuracy, and its basic principle is as follows.

【００２９】図６（ａ）に示すような単位区間ｄについ
て、信号Ｓ（ｊ）なるものが存在するとする。ここで、
ｊは後述するように、繰り返し処理のためのパラメータ
である（ｊ＝１〜Ｊ）。まず、この信号Ｓ（ｊ）に対し
て、図２に示すような１２８通りの周期関数すべてにつ
いての相関値を求める。そして、最大の相関値が得られ
た１つの周期関数の周波数を代表周波数として選出し、
当該代表周波数をもった周期関数を要素関数として抽出
する。続いて、図６（ｂ）に示すような含有信号Ｇ
（ｊ）を定義する。この含有信号Ｇ（ｊ）は、抽出され
た要素関数に、その振幅として、当該要素関数の信号Ｓ
（ｊ）に対する相関値を乗じることにより得られる信号
である。例えば、周期関数として図２に示すように、一
対の正弦関数と余弦関数とを用い、周波数ｆ（ｎ）が代
表周波数として選出された場合、振幅Ａ（ｎ）をもった
正弦関数Ａ（ｎ）ｓｉｎ（２πｆ（ｎ）ｋ／Ｆ）と、振
幅Ｂ（ｎ）をもった余弦関数Ｂ（ｎ）ｃｏｓ（２πｆ
（ｎ）ｋ／Ｆ）との和からなる信号が含有信号Ｇ（ｊ）
ということになる（図６（ｂ）では、図示の便宜上、一
方の関数しか示していない）。ここで、Ａ（ｎ），Ｂ
（ｎ）は、図５の式で得られる規格化された相関値であ
るから、結局、含有信号Ｇ（ｊ）は、信号Ｓ（ｊ）内に
含まれている周波数ｆ（ｎ）をもった信号成分というこ
とができる。It is assumed that a signal S (j) exists in a unit section d as shown in FIG. here,
j is a parameter for the repetition processing (j = 1 to J) as described later. First, correlation values are obtained for this signal S (j) for all 128 periodic functions as shown in FIG. Then, the frequency of one periodic function at which the maximum correlation value is obtained is selected as a representative frequency,
A periodic function having the representative frequency is extracted as an element function. Subsequently, the content signal G as shown in FIG.
(J) is defined. The content signal G (j) is added to the extracted element function as the amplitude of the signal S of the element function.
This is a signal obtained by multiplying the correlation value for (j). For example, as shown in FIG. 2, when a pair of a sine function and a cosine function is used as a periodic function and a frequency f (n) is selected as a representative frequency, a sine function A (n) having an amplitude A (n) is selected. ) Sin (2πf (n) k / F) and cosine function B (n) cos (2πf) having amplitude B (n)
(N) k / F) is the content signal G (j)
(In FIG. 6B, only one function is shown for convenience of illustration). Where A (n), B
Since (n) is a normalized correlation value obtained by the equation in FIG. 5, the content signal G (j) has the frequency f (n) contained in the signal S (j). Signal component.

【００３０】こうして、含有信号Ｇ（ｊ）が求まった
ら、信号Ｓ（ｊ）から含有信号Ｇ（ｊ）を減じることに
より、差分信号Ｓ（ｊ＋１）を求める。図６（ｃ）は、
このようにして求まった差分信号Ｓ（ｊ＋１）を示して
いる。この差分信号Ｓ（ｊ＋１）は、もとの信号Ｓ
（ｊ）の中から、周波数ｆ（ｎ）をもった信号成分を取
り去った残りの信号成分からなる信号ということができ
る。そこで、パラメータｊを１だけ増加させることによ
り、この差分信号Ｓ（ｊ＋１）を新たな信号Ｓ（ｊ）と
して取り扱い、同様の処理を、パラメータｊをｊ＝１〜
Ｊまで１ずつ増やしながらＪ回繰り返し実行すれば、Ｊ
個の代表周波数を選出することができる。When the content signal G (j) is obtained in this way, the difference signal S (j + 1) is obtained by subtracting the content signal G (j) from the signal S (j). FIG. 6 (c)
The difference signal S (j + 1) thus obtained is shown. This difference signal S (j + 1) is equal to the original signal S
From (j), it can be said that it is a signal composed of the remaining signal components obtained by removing the signal components having the frequency f (n). Therefore, by increasing the parameter j by 1, the difference signal S (j + 1) is treated as a new signal S (j), and the same processing is performed by setting the parameter j to j = 1 to j = 1.
If J is repeated J times while increasing by 1 to J, J
Representative frequencies can be selected.

【００３１】このような相関計算の結果として出力され
るＪ個の含有信号Ｇ（１）〜Ｇ（Ｊ）は、もとの区間信
号Ｘの構成要素となる信号であり、もとの区間信号Ｘを
符号化する場合には、これらＪ個の含有信号の周波数を
示す情報および振幅（強度）を示す情報を符号データと
して用いるようにすれば良い。尚、Ｊは代表周波数の個
数であると説明してきたが、標準周波数ｆ（ｎ）の個数
と同一すなわちＪ＝１２８であってもよく、周波数スペ
クトルを求める目的においてはそのように行うのが通例
である。The J contained signals G (1) to G (J) output as a result of the correlation calculation are signals that are components of the original section signal X, When encoding X, information indicating the frequency and amplitude (intensity) of these J contained signals may be used as code data. Although J has been described as being the number of representative frequencies, it may be the same as the number of standard frequencies f (n), that is, J = 128, and this is usually performed for the purpose of obtaining a frequency spectrum. It is.

【００３２】こうして、各単位区間について、所定数の
周波数群が選出されたら、この周波数群の各周波数に対
応する「音の高さを示す情報」、選出された各周波数の
信号強度に対応する「音の強さを示す情報」、当該単位
区間の始点に対応する「音の発音開始時刻を示す情
報」、当該単位区間に後続する単位区間の始点に対応す
る「音の発音終了時刻を示す情報」、の４つの情報を含
む所定数の符号データを作成すれば、当該単位区間内の
区間信号Ｘを所定数の符号データにより符号化すること
ができる。符号データとして、ＭＩＤＩデータを作成す
るのであれば、「音の高さを示す情報」としてノートナ
ンバーを用い、「音の強さを示す情報」としてベロシテ
ィーを用い、「音の発音開始時刻を示す情報」としてノ
ートオン時刻を用い、「音の発音終了時刻を示す情報」
としてノートオフ時刻を用いるようにすれば良い。When a predetermined number of frequency groups are selected for each unit section, "information indicating the pitch" corresponding to each frequency of this frequency group and the signal intensity corresponding to each selected frequency are selected. “Information indicating sound intensity”, “Information indicating sound start time of sound” corresponding to the start point of the unit section, “Information indicating sound end time of sound” corresponding to the start point of a unit section following the unit section By generating a predetermined number of code data including four pieces of information, the section signal X in the unit section can be encoded with the predetermined number of code data. If MIDI data is created as code data, a note number is used as "information indicating the pitch", a velocity is used as "information indicating the intensity of the sound", and "Information indicating the sound ending time" using the note-on time as the "information indicating"
May be used as the note-off time.

【００３３】（本発明に係る音響信号の符号化方法）こ
こまでに説明した従来技術とも共通する本発明の基本原
理を要約すると、原音響信号に単位区間を設定し、単位
区間ごとに複数の周波数に対応する信号強度を算出し、
得られた信号強度を基に用意された周期関数を利用して
１つまたは複数の代表周波数を選出し、選出された代表
周波数に対応する音の高さ情報と、選出された代表周波
数の強度に対応する音の強さ情報と、単位区間の始点に
対応する発音開始時刻と、単位区間の終点に対応する発
音終了時刻で構成される符号データを作成することによ
り、音響信号の符号化が行われていることになる。(Sound Signal Encoding Method According to the Present Invention) To summarize the basic principle of the present invention which is common to the prior art described so far, a unit section is set in an original sound signal, and a plurality of units are set for each unit section. Calculate the signal strength corresponding to the frequency,
One or a plurality of representative frequencies are selected using a periodic function prepared based on the obtained signal strength, and pitch information corresponding to the selected representative frequency and the strength of the selected representative frequency are selected. By generating code data composed of sound intensity information corresponding to, a sounding start time corresponding to the start point of the unit section, and a sounding end time corresponding to the end point of the unit section, encoding of the acoustic signal can be performed. It will be done.

【００３４】本発明の音響信号符号化方法は、上記基本
原理において、得られた信号強度を基に、用意された周
期関数に対応する周波数を全て利用し、これら各周波数
と、各周波数の強度と、単位区間の始点に対応する区間
開始時刻と、単位区間の終点に対応する区間終了時刻で
構成されるデータを「音素データ」と定義し、この音素
データをさらに加工することにより最終的な符号化デー
タを得るようにしたものである。In the sound signal encoding method of the present invention, based on the above-described basic principle, all frequencies corresponding to the prepared periodic functions are used based on the obtained signal intensities, and these frequencies and the intensities of the respective frequencies are used. And the data composed of the section start time corresponding to the start point of the unit section and the section end time corresponding to the end point of the unit section are defined as “phoneme data”, and the phoneme data is further processed to obtain the final This is to obtain encoded data.

【００３５】ここからは、本発明の音響信号符号化方法
について、図７に示すフローチャートを用いて説明す
る。まず、音響信号として複数のチャンネルからなる音
響信号を与える。ここでは、一例として２チャンネルの
ステレオ音響信号を与えるものとする。そして、左右の
チャンネルの音響信号について、その時間軸上の全区間
に渡って単位区間を設定する（ステップＳ１：図中、Ｓ
１ａ・Ｓ１ｂとして図示）。このステップＳ１における
手法は、上記基本原理において、図１（ａ）を用いて説
明した通りである。Hereinafter, the audio signal encoding method of the present invention will be described with reference to the flowchart shown in FIG. First, an audio signal composed of a plurality of channels is given as an audio signal. Here, it is assumed that a 2-channel stereo sound signal is given as an example. Then, a unit section is set for the sound signals of the left and right channels over the entire section on the time axis (step S1: S
1a and S1b). The method in step S1 is the same as that described with reference to FIG.

【００３６】続いて、各単位区間ごとの音響信号、すな
わち区間信号について、周波数解析を行って各周波数に
対応する強度値を算出し、周波数、強度値、単位区間の
始点、終点の４つの情報からなる音素データを算出する
（ステップＳ２：図中、Ｓ２ａ・Ｓ２ｂとして図示）。
具体的には、図２に示したような１２８種の周期関数に
対して区間信号の相関強度を求め、その周期関数の周波
数、求めた相関強度、単位区間の始点、終点の４つの情
報を音素データと定義する。ただし、本実施形態では、
上記基本原理で説明した場合のように、代表周波数を選
出するのではなく、用意した周期関数全てに対応する音
素データを取得する。このステップＳ２の処理を全単位
区間に対して行うことにより、音素データ[ｍ，ｎ]（０
≦ｍ≦Ｍ−１，０≦ｎ≦Ｎ−１）群が得られる。ここ
で、Ｎは周期関数の総数（上述の例ではＮ＝１２８）、
Ｍは音響信号において設定された単位区間の総数であ
る。つまり、Ｍ×Ｎ個の音素データからなる音素データ
群が得られることになる。Subsequently, the sound signal of each unit section, that is, the section signal, is subjected to frequency analysis to calculate an intensity value corresponding to each frequency, and the four information of the frequency, the intensity value, the start point and the end point of the unit section are calculated. Is calculated (step S2: shown as S2a and S2b in the figure).
Specifically, the correlation strength of the section signal is obtained for 128 kinds of periodic functions as shown in FIG. 2, and four pieces of information of the frequency of the periodic function, the obtained correlation strength, the start point and the end point of the unit section are obtained. Defined as phoneme data. However, in this embodiment,
Instead of selecting a representative frequency as in the case described in the above basic principle, phoneme data corresponding to all prepared periodic functions is obtained. By performing the processing in step S2 for all unit sections, the phoneme data [m, n] (0
.Ltoreq.m.ltoreq.M-1, 0.ltoreq.n.ltoreq.N-1). Here, N is the total number of periodic functions (N = 128 in the above example),
M is the total number of unit sections set in the audio signal. That is, a phoneme data group including M × N phoneme data is obtained.

【００３７】上記ステップＳ２における処理は、各チャ
ンネルの音響信号に対して行われる。そのため、各チャ
ンネルについて、音素データの集合である音素データ群
が得られることになる。音素データ群が得られたら、左
チャンネルの音響信号から得られた音素データ群、およ
び右チャンネルの音響信号から得られた音素データ群の
統合処理を行う（ステップＳ３）。具体的には、同一の
単位区間および同一の周波数の音素データ同士を統合す
ることにより行う。この際、得られる統合音素データの
強度値としては、統合される元の２つの音素データの強
度値の平均値を与えるようにする。さらに、この統合音
素データには、以下の〔数式２〕により算出されるバラ
ンス情報が付加される。The process in step S2 is performed on the audio signal of each channel. Therefore, for each channel, a phoneme data group which is a set of phoneme data is obtained. When the phoneme data group is obtained, a process of integrating the phoneme data group obtained from the audio signal of the left channel and the phoneme data group obtained from the audio signal of the right channel is performed (step S3). Specifically, this is performed by integrating phoneme data having the same unit section and the same frequency. At this time, as an intensity value of the obtained integrated phoneme data, an average value of the intensity values of the two original phoneme data to be integrated is given. Further, balance information calculated by the following [Equation 2] is added to the integrated phoneme data.

【００３８】〔数式２〕 Val ＝Ｅ_L ／（Ｅ_L ＋Ｅ_R ）[Equation 2] Val = E _L / (E _L + E _R )

【００３９】上記〔数式２〕において、Ｅ_Lは左チャン
ネルの音素データの強度値、Ｅ_Rは右チャンネルの音素
データの強度値を示す。すなわち、〔数式２〕によるバ
ランス情報の数値が高ければ高い程、左チャンネルの信
号の強度値が高いことを示している。結局、統合音素デ
ータは、周波数と、周波数の強度と、単位区間の始点に
対応する区間開始時刻と、単位区間の終点に対応する区
間終了時刻と、バランス情報で構成されるものとなる。
さらに、この際、各単位区間ごとに、強度値が高いもの
から所定数の統合音素データには、その属性として優先
マークを示すデータを付与しておく。この所定数として
は、ＭＩＤＩ音源の同時発音数の関係から１６個程度が
望ましい。このステップＳ３における統合処理により、
左右のチャンネルの音響信号から得られた２つの音素デ
ータ群が、１つの統合音素データ群に統合された。元の
音素データ群と統合音素データ群のデータ量は、ほぼ等
しいので、全体としては、データ量が半分になったこと
になる。In the above [Equation 2], E _L indicates the intensity value of the phoneme data of the left channel, and E _R indicates the intensity value of the phoneme data of the right channel. That is, the higher the numerical value of the balance information according to [Equation 2], the higher the intensity value of the signal of the left channel. After all, the integrated phoneme data is composed of the frequency, the frequency intensity, the section start time corresponding to the start point of the unit section, the section end time corresponding to the end point of the unit section, and balance information.
Further, at this time, for each unit section, data indicating a priority mark is given as an attribute to a predetermined number of integrated phoneme data starting from the highest intensity value. The predetermined number is desirably about 16 in consideration of the number of simultaneous sounds of the MIDI sound source. By the integration processing in step S3,
Two phoneme data groups obtained from the left and right channel acoustic signals were integrated into one integrated phoneme data group. Since the data amounts of the original phoneme data group and the integrated phoneme data group are substantially equal, the data amount is reduced by half as a whole.

【００４０】この統合音素データ群を目的とする符号デ
ータとすることもできるが、さらに以下のステップＳ
４、ステップＳ５の処理を行うことにより、データ量の
削減を行うことができる。統合音素データの集合である
統合音素データ群が得られたら、この統合音素データ群
のうち、その強度値が所定値に達していない統合音素デ
ータを削除し、残った統合音素データを有効な強度値を
有する有効音素データとして抽出する（ステップＳ
４）。このステップＳ４において、強度値が所定値に達
しない統合音素データを削除するのは、信号レベルがほ
とんど０であって、実際には音が存在していないと判断
される音素を削除するためである。そのため、この所定
値としては、音が実際に存在しないレベルとみなされる
値が設定される。Although the integrated phoneme data group can be used as the target code data, the following step S
4. By performing the processing in step S5, the data amount can be reduced. When an integrated phoneme data group, which is a set of integrated phoneme data, is obtained, the integrated phoneme data whose intensity value does not reach a predetermined value is deleted from the integrated phoneme data group, and the remaining integrated phoneme data is replaced with an effective intensity. Extracted as valid phoneme data having a value (step S
4). In step S4, the reason why the integrated phoneme data whose intensity value does not reach the predetermined value is deleted is to delete phonemes whose signal level is almost 0 and for which it is determined that no sound actually exists. is there. Therefore, a value that is regarded as a level at which sound does not actually exist is set as the predetermined value.

【００４１】このようにして有効音素データの集合であ
る有効音素データ群が得られたら、同一周波数で時系列
方向に連続する複数の有効音素データを１つの連結音素
データとして連結する（ステップＳ５）。図８は有効音
素データの連結を説明するための概念図である。図８
（ａ）は連結前の統合音素データ群の様子を示す図であ
る。図８（ａ）において、格子状に仕切られた各矩形は
統合音素データを示しており、網掛けがされている矩形
は、上記ステップＳ３において強度値が所定値に達しな
いために削除された統合音素データであり、その他の矩
形は有効音素データを示す。ステップＳ５においては、
同一周波数（同一ノートナンバー）で時間ｔ方向に連続
する有効音素データを連結するため、図８（ａ）に示す
有効音素データ群に対して連結処理を実行すると、図８
（ｂ）に示すような連結音素データ群が得られる。例え
ば、図８（ａ）に示した有効音素データＡ１、Ａ２、Ａ
３は連結されて、図８（ｂ）に示すような連結音素デー
タＡが得られることになる。このとき、新たに得られる
連結音素データＡの周波数としては、有効音素データＡ
１、Ａ２、Ａ３に共通の周波数が与えられ、強度値とし
ては、有効音素データＡ１、Ａ２、Ａ３の強度値のうち
最大のものが与えられ、開始時刻としては、先頭の有効
音素データＡ１の区間開始時刻ｔ１が与えられ、終了時
刻としては、最後尾の有効音素データＡ３の区間終了時
刻ｔ４が与えられ、バランス情報としては、有効音素デ
ータＡ１、Ａ２、Ａ３のうち強度値が最大となる有効音
素データが有するバランス情報が与えられる。有効音素
データ、連結音素データ共に、周波数（ノートナンバ
ー）、強度値、開始時刻、終了時刻、バランス情報の５
つの情報で構成されるため、３つの有効音素データが１
つの連結音素データに統合されることにより、データ量
は３分の１に削減される。このことは、最終的にＭＩＤ
Ｉ符号化される場合には、短い音符３つではなく、長い
音符１つとして表現されることを意味している。When an effective phoneme data group, which is a set of effective phoneme data, is obtained in this way, a plurality of effective phoneme data continuous in the time series direction at the same frequency are connected as one connected phoneme data (step S5). . FIG. 8 is a conceptual diagram for explaining connection of valid phoneme data. FIG.
(A) is a figure which shows the mode of the integrated phoneme data group before connection. In FIG. 8A, each rectangle partitioned in a lattice shape indicates integrated phoneme data, and the shaded rectangle is deleted in step S3 because the intensity value does not reach the predetermined value. This is integrated phoneme data, and the other rectangles indicate valid phoneme data. In step S5,
In order to link valid phoneme data having the same frequency (same note number) and continuing in the time t direction, the linking process is performed on the valid phoneme data group shown in FIG.
A connected phoneme data group as shown in (b) is obtained. For example, the valid phoneme data A1, A2, A shown in FIG.
3 are connected to obtain connected phoneme data A as shown in FIG. At this time, the frequency of the newly obtained connected phoneme data A is the effective phoneme data A
1, A2, and A3 are provided with a common frequency, and as the intensity value, the largest one of the effective phoneme data A1, A2, and A3 is given. As the start time, the first effective phoneme data A1 is used. The section start time t1 is given, the end time is the section end time t4 of the last valid phoneme data A3, and the balance information has the maximum intensity value among the valid phoneme data A1, A2, A3. The balance information of the valid phoneme data is given. Both the effective phoneme data and the connected phoneme data include frequency (note number), intensity value, start time, end time, and balance information.
Three valid phoneme data are 1
By being integrated into one connected phoneme data, the data amount is reduced to one third. This means that the MID
In the case of I-coding, it means that it is expressed as one long note, not three short notes.

【００４２】さらに、ステップＳ５においては、連結の
元となった有効音素データのうち最大の強度値をもつ有
効音素データに優先マークが付与されていた場合に、統
合された連結音素データに対して優先マークが付与され
る。例えば、図８（ａ）において、有効音素データＡ
１、Ａ２、Ａ３のうち有効音素データＡ２の強度値が最
大であったとする。この場合、有効音素データＡ２に優
先マークが付与されていれば、連結音素データＡに優先
マークが付与されるが、有効音素データＡ２に優先マー
クが付与されていなければ、有効音素データＡ１や有効
音素データＡ３に優先マークが付与されていても、連結
音素データＡには優先マークが付与されない。Further, in step S5, if the priority mark is given to the valid phoneme data having the maximum intensity value among the valid phoneme data from which the connection was made, the integrated connected phoneme data is A priority mark is given. For example, in FIG.
It is assumed that the intensity value of the valid phoneme data A2 among 1, 1, A2, and A3 is the maximum. In this case, if the priority mark is given to the valid phoneme data A2, the priority mark is given to the connected phoneme data A. However, if the priority mark is not given to the valid phoneme data A2, the valid phoneme data A1 or the valid Even if a priority mark is given to the phoneme data A3, no priority mark is given to the connected phoneme data A.

【００４３】上記のようにして連結音素データ群が得ら
れたら、この連結音素データ群のうち、優先マークが付
与されていない連結音素データを削除して、最終的な符
号データを得る（ステップＳ６）。通常のＭＩＤＩ音源
では同時発音数が１６〜６４という制約があるため、解
析により得られる音素をこれに合わせなければならな
い。従来は、上記ステップＳ５に示したような連結処理
を行う前に、各単位区間ごとに強度値の強いものから所
定数を抽出していたが、本発明では、音素データの連結
後のこの時点で抽出を行う。この際、ステップＳ３で各
単位区間ごとに所定数（本実施形態では１６個程度）の
統合音素データに付与され、ステップＳ５において連結
音素データに反映された優先マークの有無に基づいて、
連結音素データの抽出を行うので、最終的に残る連結音
素データは、同時刻においては所定数以下となる。この
ようにして同時刻に存在する連結音素データを所定数以
下とすることにより、通常のＭＩＤＩ音源を使用した場
合に、符号データが無駄なく利用されることになる。な
お、ステップＳ４、ステップＳ５の処理を行わない場合
は、統合音素データの集合が目的とする符号データとな
る。ただし、連結音素データも統合音素データのうち無
効なデータを削除した後、有効なもののみを連結したも
のであるので、広い意味では、「統合音素データ」とい
う言葉には、「連結音素データ」も含まれることにな
る。When the connected phoneme data group is obtained as described above, the connected phoneme data to which the priority mark is not added is deleted from the connected phoneme data group to obtain final code data (step S6). ). Since a normal MIDI sound source has a restriction that the number of simultaneous sounds is 16 to 64, phonemes obtained by analysis must be adjusted to this. Conventionally, before performing the connection processing as shown in the above step S5, a predetermined number is extracted from the one having the highest intensity value for each unit section. In the present invention, however, at this time after the connection of the phoneme data, Perform extraction with. At this time, a predetermined number (approximately 16 in this embodiment) of integrated phoneme data is added to each unit section in step S3, and based on the presence or absence of a priority mark reflected in the connected phoneme data in step S5,
Since the connected phoneme data is extracted, the remaining connected phoneme data is less than a predetermined number at the same time. By setting the number of connected phoneme data existing at the same time to a predetermined number or less in this way, code data can be used without waste when a normal MIDI sound source is used. When the processing in steps S4 and S5 is not performed, a set of integrated phoneme data is the target code data. However, the connected phoneme data is also obtained by deleting invalid data from the integrated phoneme data and then connecting only valid data. Therefore, in a broad sense, the term “integrated phoneme data” includes “connected phoneme data” Will also be included.

【００４４】符号データとして、ＭＩＤＩデータを作成
する場合は、「周波数」をノートナンバーに変換し、
「周波数の強度」をベロシティーに変換し、「区間の始
点に対応する区間開始時刻」をノートオン時刻に変換
し、「区間の終点に対応する区間終了時刻」をノートオ
フ時刻に変換し、バランス情報をパンポットに変換す
る。パンポットとは、ＭＩＤＩ規格において音量比率を
示す制御パラメータである。具体的には、上記〔数式
２〕で算出したバランス情報に１００を乗じて０〜１０
０の値をとり得るものとし、「０」が左最強、「５０」
が中央、「１００」が右最強とする。通常のＭＩＤＩ音
源は、このＭＩＤＩデータ中からパンポットを読取る
と、このパンポットに応じて、ＭＩＤＩデータの復号化
時に左右にそれぞれ強度の異なる音響信号を送るように
なっている。そのため、本発明により符号化された符号
データを用いることにより、元々左右の２チャンネルか
らなるデータを１つに統合しても左右のバランスを失わ
ず、ステレオ再生が可能となる。When MIDI data is created as code data, "frequency" is converted to a note number,
Convert "frequency intensity" to velocity, convert "section start time corresponding to section start point" to note-on time, convert "section end time corresponding to section end point" to note-off time, Convert balance information to pan pot. The pan pot is a control parameter indicating a volume ratio in the MIDI standard. Specifically, the balance information calculated by the above [Equation 2] is multiplied by 100 to obtain 0 to 10
"0" is the strongest left, "50"
Is the center, and “100” is the strongest on the right. When a normal MIDI sound source reads a panpot from the MIDI data, it transmits sound signals having different intensities to the left and right when decoding the MIDI data according to the panpot. Therefore, by using the code data coded according to the present invention, the stereo reproduction can be performed without losing the balance of the left and right even if the data originally consisting of the left and right two channels is integrated into one.

【００４５】（音源分離への応用）以上のようにして、
本発明に係る符号化方法により、ステレオ音響信号のよ
うな複数のチャンネルからなる音響信号を効率的に符号
化することが可能となるが、本発明を応用することによ
り、複数音源を有するステレオ音響信号から各音源ごと
の符号データを得ることができる。すなわち、複数音源
が混在したステレオ音響信号から音源の分離を行うこと
が可能となる。以下にこのような手法について具体的に
説明する。(Application to Sound Source Separation)
The encoding method according to the present invention makes it possible to efficiently encode an audio signal including a plurality of channels, such as a stereo audio signal. Code data for each sound source can be obtained from the signal. That is, it is possible to separate a sound source from a stereo sound signal in which a plurality of sound sources are mixed. Hereinafter, such a method will be specifically described.

【００４６】音源分離を行うためには、各連結音素デー
タに含まれているバランス情報を利用する。通常、楽曲
をステレオで録音しようとする場合、ボーカル、楽器等
の各音源ごとの演奏を録音し、ミキシング時に各音源を
左右どちらに録音するかを決定する。このとき、各楽器
を左右どちらに記録するかについては、所定のルールが
ある。例えば、ボーカルは中央から聴こえるようにする
ため、左右ほぼ同等の信号強度で記録される。他の楽器
についても左側に強い信号強度で記録するもの、右側
に、強い信号強度で記録するものなどが決まっている。
本発明では、このようなルールと上述のバランス情報を
利用する。To perform sound source separation, balance information included in each connected phoneme data is used. Normally, when music is to be recorded in stereo, the performance of each sound source such as vocals, musical instruments, etc. is recorded, and it is determined whether each sound source is recorded on the left or right during mixing. At this time, there is a predetermined rule as to whether each instrument is recorded on the left or right. For example, vocals are recorded with substantially the same signal strength on the left and right so that they can be heard from the center. With respect to other musical instruments, those recorded with a strong signal intensity on the left and those recorded with a strong signal intensity on the right are determined.
In the present invention, such a rule and the above-described balance information are used.

【００４７】具体的には、上記ステップＳ６において符
号化する際、各連結音素データのバランス情報を読取っ
て、あらかじめ設定された閾値にしたがって、連結音素
データ群を複数のグループに分類する。例えば、バラン
ス情報Valが「０．５」付近のものを１つのグループに
分けることにより、そのグループの連結音素データを復
号化することにより、ボーカルだけを再現することがで
きるようになる。Specifically, at the time of encoding in step S6, the balance information of each connected phoneme data is read, and the connected phoneme data groups are classified into a plurality of groups according to a preset threshold. For example, by dividing the balance information Val near "0.5" into one group and decoding the connected phoneme data of the group, only the vocals can be reproduced.

【００４８】この符号データとして、ＭＩＤＩデータを
作成する場合は、グループ化した連結音素データ群を、
グループごとにＭＩＤＩ規格のチャンネルに記録する。
そして、チャンネルごとにパンポットを指定しておけ
ば、そのチャンネルについては、常に同一の左右バラン
スで演奏されることになる。ＭＩＤＩ規格のパンポット
は、連結音素データに対応するＭＩＤＩイベント情報
（デルタタイム、ノートナンバー、ベロシティ）単位に
付加することもできるが、チャンネルごとに付加して、
１つのチャンネルに記録されたすべてのＭＩＤＩイベン
ト情報に影響させることもできる。チャンネルごとにパ
ンポットを設定することにより、ＭＩＤＩイベント情
報単位に付加するのに比べてデータ量を削減することも
できる。When MIDI data is created as this code data, a group of connected phoneme data
Recording is performed on a MIDI standard channel for each group.
If a pan pot is designated for each channel, the channel is always played with the same left-right balance. The MIDI standard panpot can be added in units of MIDI event information (delta time, note number, velocity) corresponding to connected phoneme data.
It can also affect all MIDI event information recorded on one channel. By setting a panpot for each channel, the data amount can be reduced as compared with the case where the panpot is added for each MIDI event information unit.

【００４９】以上、本発明の好適な実施形態について説
明したが、上記符号化方法は、コンピュータ等で実行さ
れることは当然である。具体的には、図７のフローチャ
ートに示したようなステップを上記手順で実行するため
のプログラムをコンピュータに搭載しておく。そして、
複数のチャンネルからなる音響信号をＰＣＭ方式等でデ
ジタル化した後、コンピュータに取り込み、ステップＳ
１〜ステップＳ５の処理を行った後、ＭＩＤＩ形式等の
符号データをコンピュータより出力する。出力された符
号データは、例えば、ＭＩＤＩデータの場合、ＭＩＤＩ
シーケンサ、ＭＩＤＩ音源を用いて音声として再生され
る。The preferred embodiment of the present invention has been described above, but it goes without saying that the encoding method is executed by a computer or the like. Specifically, a program for executing the steps shown in the flowchart of FIG. 7 in the above procedure is installed in a computer. And
After digitizing the audio signal composed of a plurality of channels by the PCM method or the like, the digital signal is taken into a computer, and step S
After performing the processing of steps 1 to S5, the computer outputs code data in the MIDI format or the like. The output code data is, for example, MIDI data, MIDI data.
It is reproduced as sound using a sequencer and a MIDI sound source.

【００５０】[0050]

【発明の効果】以上、説明したように本発明によれば、
複数の独立した波形情報で与えられた音響信号に対し
て、時間軸上に複数の単位区間を設定し、設定された単
位区間ごとに周波数解析を行なって、周波数と、強度値
と、単位区間の始点に対応する区間開始時刻と、単位区
間の終点に対応する区間終了時刻で構成される音素デー
タを算出し、音素データ算出の処理を各入力チャンネル
における全単位区間に対して行うことにより得られる全
ての音素データについて、同一時刻、同一周波数をもつ
音素データ同士を各入力チャンネル間において統合する
と共に、統合元の複数の音素データの強度値に基づいて
強度比率を付加した統合音素データを作成することによ
り、統合音素データの集合である符号データを得るよう
にしたので、複数の入力チャンネルからの音響信号を、
統合された１つのチャンネルの符号データとして扱うこ
とができ、データ量および処理負荷の削減が可能となる
という効果を奏する。As described above, according to the present invention,
For a sound signal given by a plurality of independent waveform information, a plurality of unit sections are set on the time axis, a frequency analysis is performed for each set unit section, and a frequency, an intensity value, and a unit section are set. Is calculated by calculating the phoneme data composed of the section start time corresponding to the start point of the section and the section end time corresponding to the end point of the unit section, and performing the processing of the phoneme data calculation for all the unit sections in each input channel. For all phoneme data to be created, phoneme data having the same time and the same frequency are integrated between each input channel, and integrated phoneme data is added with an intensity ratio based on the intensity values of the multiple phoneme data to be integrated. By doing so, code data which is a set of integrated phoneme data is obtained, so that acoustic signals from a plurality of input channels are
This can be treated as integrated one-channel code data, and the data amount and the processing load can be reduced.

【００５１】また、各出力チャンネルの符号データの復
号化の際に、音源の左右バランスのルールを考慮して、
強度比率に応じた音源を割り当てるようにすることによ
り、音源が混在した音響信号の音源分離に利用すること
もできる。Also, when decoding the code data of each output channel, taking into account the rule of the left and right balance of the sound source,
By allocating sound sources in accordance with the intensity ratio, it is possible to use the sound source for sound signals in which sound sources are mixed.

[Brief description of the drawings]

【図１】本発明の音響信号の符号化方法の基本原理を示
す図である。FIG. 1 is a diagram showing a basic principle of an audio signal encoding method according to the present invention.

【図２】本発明で利用される周期関数の一例を示す図で
ある。FIG. 2 is a diagram showing an example of a periodic function used in the present invention.

【図３】図２に示す各周期関数の周波数とＭＩＤＩノー
トナンバーｎとの関係式を示す図である。FIG. 3 is a diagram showing a relational expression between a frequency of each periodic function shown in FIG. 2 and a MIDI note number n.

【図４】解析対象となる信号と周期信号との相関計算の
手法を示す図である。FIG. 4 is a diagram showing a method of calculating a correlation between a signal to be analyzed and a periodic signal.

【図５】図４に示す相関計算を行うための計算式を示す
図である。FIG. 5 is a view showing a calculation formula for performing the correlation calculation shown in FIG. 4;

【図６】一般化調和解析の基本的な手法を示す図であ
る。FIG. 6 is a diagram showing a basic method of generalized harmonic analysis.

【図７】本発明の音響信号符号化方法のフローチャート
である。FIG. 7 is a flowchart of an audio signal encoding method according to the present invention.

【図８】有効音素データの連結を説明するための概念図
である。FIG. 8 is a conceptual diagram for explaining connection of valid phoneme data.

[Explanation of symbols]

Ａ（ｎ），Ｂ（ｎ）・・・相関値ｄ，ｄ１〜ｄ５・・・単位区間Ｅ（ｎ）・・・相関値Ｇ（ｊ）・・・含有信号ｎ，ｎ１〜ｎ６・・・ノートナンバーＳ（ｊ），Ｓ（ｊ＋１）・・・差分信号Ｘ，Ｘ（ｋ）・・・区間信号 A (n), B (n)... Correlation value d, d1 to d5... Unit section E (n)... Correlation value G (j). Note number S (j), S (j + 1) ... difference signal X, X (k) ... section signal

Claims

[Claims]

1. A plurality of unit sections are set on a time axis with respect to an acoustic signal given by a plurality of independent waveform information, and a frequency analysis is performed for each of the unit sections. A phoneme data calculation step of calculating phoneme data composed of a section start time corresponding to the start point of the unit section and a section end time corresponding to the end point of the unit section, and performing the processing of the phoneme data calculation step in each input channel. For all phoneme data obtained by performing for all unit sections, phoneme data having the same time and the same frequency are integrated between each input channel, and based on the intensity values of a plurality of phoneme data of the integration source. A phoneme data integration step of creating integrated phoneme data to which an intensity ratio is added, wherein code data which is a set of the integrated phoneme data is obtained. Coding method Hibiki signal.

2. A plurality of unit sections are set on a time axis with respect to a sound signal given by a plurality of independent waveform information, and a frequency analysis is performed for each of the unit sections to obtain a frequency, an intensity value, A phoneme data calculation step of calculating phoneme data composed of a section start time corresponding to the start point of the unit section and a section end time corresponding to the end point of the unit section, and performing the processing of the phoneme data calculation step in each input channel. For all phoneme data obtained by performing for all unit sections, while integrating phoneme data having the same time and the same frequency between channels,
A phoneme data integration step of creating integrated phoneme data to which an intensity ratio is added based on the intensity values of a plurality of phoneme data to be integrated, and using the intensity ratio of the integrated phoneme data, a volume between output channels of a sound source device for reproduction. A decoding step of controlling a balance and reproducing an audio signal from a plurality of output channels.

3. The decoding step further comprising: grouping the integrated phoneme data based on the intensity ratio of the integrated phoneme data, and adding an intensity ratio to the grouped integrated phoneme data group. The method according to claim 2, wherein the grouping step reproduces an acoustic signal by assigning a predetermined sound source to the grouped integrated phoneme data group based on the intensity ratio. A method for encoding and decoding an audio signal.

4. The encoded data is MIDI data comprising a note number, a velocity, and a delta time.
The method according to claim 2 or 3, wherein the intensity ratio is panpot information.

5. A computer sets a plurality of unit sections on a time axis for an acoustic signal given by a plurality of independent waveform information, performs frequency analysis for each of the unit sections, and A phoneme data calculation step of calculating phoneme data composed of an intensity value, a section start time corresponding to the start point of the unit section, and a section end time corresponding to the end point of the unit section; For all phoneme data obtained by performing for all unit sections in the channel, phoneme data having the same time and the same frequency are integrated between each input channel, and the intensity values of a plurality of phoneme data of the integration source are integrated. A program for executing a phoneme data integration step of creating integrated phoneme data to which an intensity ratio is added based on the data.