JP2002023741A

JP2002023741A - Synthesizer for acoustic signal

Info

Publication number: JP2002023741A
Application number: JP2000200831A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2000-07-03
Filing date: 2000-07-03
Publication date: 2002-01-25

Abstract

PROBLEM TO BE SOLVED: To provide a synthesizer for acoustic signals capable of reproducing the acoustic signals of a range wider than before even while utilizing standard code data based on a prescribed standard such as an MIDI standard. SOLUTION: A general purpose standard data form (figure 3 (a)) and an extension data form (figure 3 (b)) capable of handling highly accurate data are prepared. In the extension data form, the number of channels for recording each of single sound for constituting chords is increased, a note number can be more finely recorded by providing a fine pitch and further, a lot of sound velocities can be recorded. Thus, highly accurate sounds which can not be reproduced by the conventional standard data form can be reproduced.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、放送メディア（ラジ
オ、テレビ）、通信メディア（ＣＳ映像・音声配信、イ
ンターネット音楽配信、通信カラオケ）、パッケージメ
ディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−
ＲＯＭ、ゲームカセット、携帯音楽プレーヤ向け固体メ
モリ媒体）などで提供する各種オーディオコンテンツの
制作、並びに、専用携帯音楽プレーヤ、携帯電話・ＰＨ
Ｓ・ポケベルなどに向けたボーカルを含む音楽コンテン
ツ、歌舞伎・能・読経・詩歌など文芸作品の音声素材ま
たは語学教育音声教材のＭＩＤＩ伝送に利用するのに好
適な音響信号の符号化技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to broadcast media (radio, television), communication media (CS video / audio distribution, Internet music distribution, communication karaoke), package media (CD, MD, cassette, video, LD, CD). −
Production of various audio contents provided by ROMs, game cassettes, solid-state memory media for portable music players, etc., and dedicated portable music players, mobile phones and PHs
The present invention relates to an audio signal encoding technique suitable for use in MIDI transmission of music contents including vocals for S. pagers, audio materials of literary works such as kabuki, noh, chanting and poetry, or language teaching audio teaching materials.

【０００２】[0002]

【従来の技術】音響信号に代表される時系列信号には、
その構成要素として複数の周期信号が含まれている。こ
のため、与えられた時系列信号にどのような周期信号が
含まれているかを解析する手法は、古くから知られてい
る。例えば、フーリエ解析は、与えられた時系列信号に
含まれる周波数成分を解析するための方法として広く利
用されている。2. Description of the Related Art Time-series signals represented by acoustic signals include:
The components include a plurality of periodic signals. For this reason, a method of analyzing what periodic signal is included in a given time-series signal has been known for a long time. For example, Fourier analysis is widely used as a method for analyzing frequency components included in a given time-series signal.

【０００３】このような時系列信号の解析方法を利用す
れば、音響信号を符号化することも可能である。コンピ
ュータの普及により、原音となるアナログ音響信号を所
定のサンプリング周波数でサンプリングし、各サンプリ
ング時の信号強度を量子化してデジタルデータとして取
り込むことが容易にできるようになってきており、こう
して取り込んだデジタルデータに対してフーリエ解析な
どの手法を適用し、原音信号に含まれていた周波数成分
を抽出すれば、各周波数成分を示す符号によって原音信
号の符号化が可能になる。[0003] If such a time-series signal analysis method is used, it is possible to encode an audio signal. With the spread of computers, it has become easier to sample analog audio signals as original sounds at a predetermined sampling frequency, quantize the signal strength at each sampling, and take in as digital data. If a method such as Fourier analysis is applied to the data and frequency components included in the original sound signal are extracted, the original sound signal can be encoded by a code indicating each frequency component.

【０００４】一方、電子楽器による楽器音を符号化しよ
うという発想から生まれたＭＩＤＩ（Musical Instrume
nt Digital Interface）規格も、パーソナルコンピュー
タの普及とともに盛んに利用されるようになってきてい
る。このＭＩＤＩ規格による符号データ（以下、ＭＩＤ
Ｉデータという）は、基本的には、楽器のどの鍵盤キー
を、どの程度の強さで弾いたか、という楽器演奏の操作
を記述したデータであり、このＭＩＤＩデータ自身に
は、実際の音の波形は含まれていない。そのため、実際
の音を再生する場合には、楽器音の波形（歪み波形パタ
ーン）を記憶したＭＩＤＩ音源が別途必要になるが、そ
の符号化効率の高さが注目を集めており、ＭＩＤＩ規格
による符号化および復号化の技術は、現在、パーソナル
コンピュータを用いて楽器演奏、楽器練習、作曲などを
行うソフトウェアに広く採り入れられている。On the other hand, MIDI (Musical Instrume) was born from the idea of encoding musical instrument sounds by electronic musical instruments.
The Digital Interface (nt Digital Interface) standard has also been actively used with the spread of personal computers. Code data according to the MIDI standard (hereinafter, MID)
I data) is basically data describing an operation of playing a musical instrument, such as which keyboard key of the musical instrument was played and at what strength, and the MIDI data itself contains the actual sound. No waveform is included. Therefore, when an actual sound is reproduced, a MIDI sound source storing a waveform (distortion waveform pattern) of the musical instrument sound is required separately. Encoding and decoding techniques are now widely adopted in software for performing musical instruments, practicing musical instruments, composing music, and the like using a personal computer.

【０００５】そこで、音響信号に代表される時系列信号
に対して、所定の手法で解析を行うことにより、その構
成要素となる周期信号を抽出し、抽出した周期信号をＭ
ＩＤＩデータを用いて符号化しようとする提案がなされ
ている。例えば、特開平１０−２４７０９９号公報、特
開平１１−７３１９９号公報、特開平１１−７３２００
号公報、特開平１１−９５７５３号公報、特開２０００
−９９００９号公報、特開平２０００−９９０９３号公
報、特願平１１−５８４３１号明細書、特願平１１−１
７７８７５号明細書、特願平１１−３２９２９７号明細
書には、任意の時系列信号について、構成要素となる周
波数を解析し、その解析結果からＭＩＤＩデータを作成
することができる種々の方法が提案されている。Therefore, by analyzing a time-series signal represented by an acoustic signal by a predetermined method, a periodic signal as a component of the signal is extracted, and the extracted periodic signal is converted to an M signal.
There have been proposals to encode using IDI data. For example, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200
JP, JP-A-11-95753, JP-A-2000
-99009, JP-A-2000-99093, Japanese Patent Application No. 11-58431, Japanese Patent Application No. 11-1.
In the specification of 77875 and the specification of Japanese Patent Application No. 11-329297, various methods are proposed that can analyze the frequency as a component element of an arbitrary time-series signal and create MIDI data from the analysis result. Have been.

【０００６】[0006]

【発明が解決しようとする課題】現状のＭＩＤＩ音源装
置は、基本的に音楽（楽器音）を再現するために設計さ
れているため、これまで提案してきた符号化方法によ
り、自然音や音声など各種音響信号をＭＩＤＩ符号に忠
実に変換することができても、特に雑音に富む音声や自
然音の再生品質には限界が生じる。しかし、音楽といっ
ても歌声など楽器音と音声がミックスされるケースが少
なくなく、歌声のパートについてはＭＩＤＩで表現せ
ず、波形情報として分離して扱われるのが通常である。
ＭＩＤＩで表現される音楽の場合には、テンポ・音程・
音色を自在に変更できるというメリットがあるため、そ
れと同期して演奏される歌声のパートに対して同様な機
能を実現するためには特殊な工夫が必要となる。例え
ば、通信カラオケの分野では音楽情報をＭＩＤＩ形式で
伝送するのとは別にバックコーラスを波形情報で伝送さ
れることが行われるようになってきている。しかしなが
ら、依然として、ボーカルの信号処理による再生品質の
劣化が顕著であり、バックコーラスにはあまり支障ない
が、メインボーカルでの活用は困難である、楽曲伝送用
のＭＩＤＩとは別に伝送容量の大きい回線が必要になる
等の問題がある。Since the current MIDI sound source device is basically designed to reproduce music (instrument sound), natural sound, voice, and the like can be obtained by the encoding method proposed so far. Even if various audio signals can be faithfully converted to MIDI codes, there is a limit to the reproduction quality of voices and natural sounds particularly rich in noise. However, even in the case of music, there are many cases where instrumental sounds such as singing voices and voices are mixed, and singing voice parts are not represented by MIDI, but are usually handled separately as waveform information.
In the case of music expressed in MIDI, tempo, pitch,
Since there is a merit that the timbre can be freely changed, a special device is required to realize a similar function for a singing voice part played in synchronization with the timbre. For example, in the field of communication karaoke, music information is transmitted in the MIDI format, and a back chorus is transmitted in the form of waveform information. However, the reproduction quality is still remarkably degraded due to the vocal signal processing, and it does not hinder the back chorus, but is difficult to use in the main vocal. Is necessary.

【０００７】しかし、ＭＩＤＩ符号で伝送し、ＭＩＤＩ
音源で再生するという形態には上述のようにテンポ・音
程・音色を自在に変更できるなどの魅力があるため、適
用する利点は大きい。ただ、ＭＩＤＩ符号化形式には、
以下のような３つの問題がある。第１に規格のＭＩＤＩ
符号では周波数が半音単位で強度が１２８段階という制
約があり、音声や自然音を忠実に再現することは困難で
ある。第２にＧＭ規格のＭＩＤＩ音源では最大１６チャ
ンネル、同時発音数は３２和音・６４和音などの制約が
あり、オーケストラ音楽を含めて音響信号全般を表現す
るには不十分である。第３に音声、自然音、生体信号な
どには各種雑音が重要な表現手段になっているが、ＷＡ
ＶＥテーブル形式の一般のＭＩＤＩ音源ではこれらの音
を単発で発することはできても、合成・加工して表現す
るのに適していない。[0007] However, transmission by MIDI code and MIDI
As described above, the form of reproducing with the sound source has such attractiveness that the tempo, the pitch, and the tone can be freely changed. However, the MIDI encoding format includes
There are three problems as follows. First, standard MIDI
The code has a restriction that the frequency is a semitone unit and the intensity is 128 steps, and it is difficult to faithfully reproduce voice and natural sound. Second, the MIDI sound source of the GM standard is limited to 16 channels at maximum and the number of simultaneous sounds is 32 chords / 64 chords, and is not sufficient to express the entire acoustic signal including orchestral music. Thirdly, various noises are important expression means for voice, natural sound, biological signal, etc.
Although a general MIDI sound source in the VE table format can emit these sounds in a single shot, it is not suitable for synthesis, processing, and expression.

【０００８】そこで、本発明は上記のような点に鑑み、
ＭＩＤＩ規格等の所定の規格に準拠した標準符号データ
を利用しながらも、従来より広範囲の音響信号を再現す
ることが可能な音響信号の合成装置を提供することを課
題とする。Accordingly, the present invention has been made in view of the above points,
It is an object of the present invention to provide a sound signal synthesizing apparatus capable of reproducing a wider range of sound signals than before using standard code data conforming to a predetermined standard such as the MIDI standard.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するた
め、本発明では、音響信号の合成装置を、発音開始時
刻、発音終了時刻、音の周波数情報、音の強度情報、音
色識別情報からなる音素データが複数個で構成され、複
数のデータ形式を採り得る音響符号に対して、前記音響
符号を外部から取り込み、どのデータ形式かを認識し、
各データ形式に従って、各音素データを抽出するための
音素データ獲得手段と、前記音色識別情報に基づく波形
形状で、前記発音開始時刻から前記発音終了時刻までの
時間に基づく長さをもつ基本歪み波形データを準備する
ための波形部品生成手段と、前記音の周波数情報を基に
前記基本歪み波形データに対して時間軸方向に変倍をか
けるための時間軸伸縮手段と、前記音の強度情報を基に
前記基本歪み波形データに対して振幅軸方向に変倍をか
けるための振幅軸伸縮手段と、前記時間軸方向および振
幅軸方向に変倍がかかった基本歪み波形データを保持す
るため２種類のバッファメモリから構成される波形記憶
手段と、前記波形記憶手段の２種類のバッファメモリに
対して一方を書込み用、他方を読込み用とする排他的モ
ードを定義し、所定の時間間隔でこれらのモードを切り
替えるためのメモリモード切替手段と、前記書込み用バ
ッファメモリに対して前記基本歪み波形データを書き込
む際、前記発音開始時刻および前記発音終了時刻に対応
する書込み用バッファメモリのアドレス範囲内に、既に
書き込まれている波形データと加算しながら前記書込み
用バッファメモリに書き込むような処理を行うための出
力波形合成手段と、前記読込み用バッファメモリの内容
を時系列に音響信号として外部に出力するための合成信
号再生手段と、を有する構成としたことを特徴とする。
本発明によれば、音響信号として複数のデータ形式を入
力可能であり、入力された音響信号がどのデータ形式か
を認識し、各データ形式に従って、各音素データを抽出
し、各音素データから基本歪み波形データを得て、この
基本歪み波形データを変形して波形データを得て、同様
に得られる他の波形データと合成して出力するようにし
たので、汎用的なデータ形式を利用できると共に、より
精度の高い特殊なデータ形式を扱うことが可能となり、
汎用データのメリットを生かしながら、一方で汎用デー
タの制約にとらわれず、あらゆる音響信号を忠実に再現
可能となる。In order to solve the above-mentioned problems, the present invention provides an audio signal synthesizing apparatus comprising a sound generation start time, a sound generation end time, sound frequency information, sound intensity information, and timbre identification information. The phoneme data is composed of a plurality, for an acoustic code that can take a plurality of data formats, fetch the acoustic code from the outside, recognize which data format,
A phoneme data acquisition unit for extracting each phoneme data according to each data format; and a basic distortion waveform having a waveform shape based on the timbre identification information and having a length based on a time from the sounding start time to the sounding end time. Waveform component generation means for preparing data, time axis expansion / contraction means for applying magnification to the basic distortion waveform data in the time axis direction based on the frequency information of the sound, and intensity information of the sound. An amplitude axis expansion / contraction means for applying magnification to the basic distortion waveform data in the amplitude axis direction, and two types for holding the basic distortion waveform data subjected to magnification in the time axis direction and the amplitude axis direction. An exclusive mode in which one type is used for writing and the other type is used for reading is defined with respect to two types of buffer memories of the waveform storage means including the buffer memory of Memory mode switching means for switching between these modes at a time interval of: and a buffer memory for writing corresponding to the sound generation start time and the sound generation end time when the basic distortion waveform data is written to the write buffer memory. Output waveform synthesizing means for performing a process of writing to the write buffer memory while adding the already written waveform data within the address range of And a composite signal reproducing means for outputting the signal to the outside.
According to the present invention, it is possible to input a plurality of data formats as an audio signal, recognize which data format the input audio signal is, extract each phoneme data according to each data format, and perform basic processing from each phoneme data. Obtaining the distorted waveform data, transforming the basic distorted waveform data to obtain waveform data, and synthesizing it with other obtained waveform data for output , It is possible to handle a more precise special data format,
While utilizing the merits of general-purpose data, it is possible to faithfully reproduce all acoustic signals without being restricted by the general-purpose data.

【００１０】[0010]

【発明の実施の形態】以下、本発明の実施形態について
図面を参照して詳細に説明する。まず、本発明の基本的
な概念についてＭＩＤＩ符号化形式を例にとって説明す
る。図１は、拡張型ＭＩＤＩ符号化復号化方式の概念図
である。図１に示す方式では、符号化系を規格準拠ＭＩ
ＤＩ符号化と拡張型ＭＩＤＩ符号化に分け、復号化系を
規格準拠ＭＩＤＩ音源と拡張型ＭＩＤＩ音源に分けたこ
とを大きな特徴とする。ソース音響信号として音楽（楽
器音）が入力された場合は、従来のように規格準拠ＭＩ
ＤＩ符号化が行なわれるが、自然音、生体音、雑音が入
力された場合は、その音響信号の特性により規格準拠Ｍ
ＩＤＩ符号化または拡張型ＭＩＤＩ符号化が行なわれ、
音声が入力された場合は、拡張型ＭＩＤＩ符号化が行な
われる。Embodiments of the present invention will be described below in detail with reference to the drawings. First, the basic concept of the present invention will be described using a MIDI coding format as an example. FIG. 1 is a conceptual diagram of an extended MIDI encoding / decoding system. In the method shown in FIG.
A major feature is that the encoding system is divided into DI encoding and extended MIDI encoding, and the decoding system is divided into a standard-compliant MIDI sound source and an extended MIDI sound source. When music (instrument sound) is input as a source acoustic signal, a standard-compliant MI
DI encoding is performed, but when natural sound, body sound, or noise is input, the standard compliant M
IDI encoding or extended MIDI encoding is performed,
When voice is input, extended MIDI encoding is performed.

【００１１】符号化により得られた符号データは、それ
ぞれの規格に従って蓄積・編集・伝送が行われて復号化
系に達する。復号化の際は、規格準拠ＭＩＤＩデータ
は、規格準拠ＭＩＤＩ音源、拡張型ＭＩＤＩ音源の両方
を使用して復号化が行なわれる。一方、拡張型ＭＩＤＩ
データは、拡張型ＭＩＤＩ音源のみを使用して復号化が
行なわれる。このように拡張型ＭＩＤＩ符号化を行い、
拡張型ＭＩＤＩ音源を用いて復号化を行うことにより、
特に自然音、生体音、雑音、音声の再現が忠実に行われ
るようになる。本発明は、特にこの復号化の部分に関し
たものとなる。Code data obtained by encoding is stored, edited, and transmitted in accordance with respective standards, and reaches a decoding system. At the time of decoding, the standard-compliant MIDI data is decoded using both the standard-compliant MIDI sound source and the extended MIDI sound source. On the other hand, extended MIDI
Data is decoded using only the extended MIDI sound source. Performing extended MIDI encoding in this way,
By decoding using the extended MIDI sound source,
In particular, reproduction of natural sounds, body sounds, noises, and voices is faithfully performed. The present invention particularly relates to this decoding part.

【００１２】続いて、図１において復号化系に対応する
音響信号合成装置の具体的な構成について説明する。図
２は、本発明による音響信号合成装置の構成を示す機能
ブロック図である。図２において、音素データ獲得手段
１は、ＭＩＤＩデータ等の音響符号を入力し、音素デー
タを獲得する機能を有する。音響符号は時系列に並んだ
音素データの集合であるため、音響符号の中から順番に
抽出することにより各音素データが得られる。波形部品
生成手段２は、ＭＩＤＩ音源等の波形蓄積手段（図示せ
ず）から、音素データ中の音色識別情報に対応する波形
を抽出し、抽出した波形を発音開始時刻と発音終了時刻
までの発音時間になるように繰り返し複数接続した基本
歪み波形データを作成する機能を有する。時間軸伸縮手
段３は、音の周波数情報に基づいて基本歪み波形データ
の周波数を変更する機能を有する。振幅軸伸縮手段４
は、音の強さ情報に基づいて基本歪み波形データの振幅
を変更する機能を有する。出力波形合成手段５は、発音
開始時刻および発音終了時刻に対応する書込み用バッフ
ァメモリのアドレス範囲内に、既に書き込まれている波
形データと加算しながら書込みを行う機能を有する。波
形記憶手段６は、メモリモード切替手段７により書き込
み、読込みの切替えが行われる２つのバッファメモリを
有し、合成された出力波形を記憶する機能を有する。合
成信号再生手段８は、読込み用として機能しているバッ
ファメモリから出力波形を読み出して音響信号として再
生する機能を有する。Next, a specific configuration of the audio signal synthesizing apparatus corresponding to the decoding system will be described with reference to FIG. FIG. 2 is a functional block diagram showing the configuration of the audio signal synthesizing device according to the present invention. In FIG. 2, a phoneme data acquisition unit 1 has a function of inputting an acoustic code such as MIDI data and acquiring phoneme data. Since the acoustic code is a set of phoneme data arranged in time series, each phoneme data is obtained by sequentially extracting the acoustic code from the acoustic code. The waveform component generation means 2 extracts a waveform corresponding to timbre identification information in the phoneme data from a waveform storage means (not shown) such as a MIDI sound source, and extracts the extracted waveform until a sound generation start time and a sound generation end time. It has a function of creating a plurality of basic distortion waveform data connected repeatedly so as to be time. The time axis stretching means 3 has a function of changing the frequency of the basic distortion waveform data based on the frequency information of the sound. Amplitude axis expansion / contraction means 4
Has a function of changing the amplitude of the basic distortion waveform data based on the sound intensity information. The output waveform synthesizing means 5 has a function of performing writing while adding the already written waveform data to the address range of the writing buffer memory corresponding to the sound generation start time and the sound generation end time. The waveform storage unit 6 has two buffer memories in which writing and reading are switched by the memory mode switching unit 7, and has a function of storing a combined output waveform. The composite signal reproducing means 8 has a function of reading out an output waveform from a buffer memory functioning as a readout and reproducing the output waveform as an acoustic signal.

【００１３】ここで、音響信号合成装置に入力される音
響符号について説明しておく。音響符号としてＭＩＤＩ
規格を適用した場合のデータ形式を図３（ａ）に示す。
音響符号としては、必ずしもＭＩＤＩ形式を採用する必
要はないが、この種の符号化形式としてはＭＩＤＩ形式
が最も普及しているため、実用上はＭＩＤＩ形式の符号
データを用いるのが好ましい。ＭＩＤＩ形式では、「ノ
ートＯＮ」データもしくは「ノートＯＦＦ」データが、
「デルタタイム」データを介在させながら存在する。
「ノートＯＮ」データは、特定のノートナンバーＮとベ
ロシティーＶを指定して特定の音の演奏開始を指示する
データであり、「ノートＯＦＦ」データは、特定のノー
トナンバーＮとベロシティーＶを指定して特定の音の演
奏終了を指示するデータである。また、「デルタタイ
ム」データは、所定の時間間隔を示すデータである。ベ
ロシティーＶは、例えば、ピアノの鍵盤などを押し下げ
る速度（ノートＯＮ時のベロシティー）および鍵盤から
指を離す速度（ノートＯＦＦ時のベロシティー）を示す
パラメータであり、特定の音の演奏開始操作もしくは演
奏終了操作の強さを示すことになる。ＭＩＤＩ規格の場
合、これらの各データは、図３（ａ）に示すようにノー
トＯＮ／ノートＯＦＦを一対とする１つの組で構成さ
れ、これに音色識別情報を加えたものが音素データとな
り、この音素データの集合が音響符号である。図３
（ａ）において、ノートＯＮ時、ノートＯＦＦ時のデル
タタイムがそれぞれ音素データの発音開始時刻、発音終
了時刻を示し、ノートナンバーが音の周波数情報、ベロ
シティが音の強度情報を示している。チャンネル番号は
音色識別情報に相当し、同時に発音される音の各音（単
音または和音）の音色を特定するためのものであり、標
準のＭＩＤＩ規格では、０〜１５の計１６チャンネルが
使用可能であるが、後述する拡張形式との識別のため、
本実施形態では、０〜１４の計１５チャンネルに制限し
ている。Here, the acoustic code input to the acoustic signal synthesizer will be described. MIDI as acoustic code
FIG. 3A shows a data format when the standard is applied.
It is not always necessary to employ the MIDI format as the acoustic code, but since the MIDI format is the most widespread as this type of encoding format, it is preferable to use MIDI format code data for practical use. In the MIDI format, “Note ON” data or “Note OFF” data is
It exists with intervening "delta time" data.
"Note ON" data is data for designating the start of performance of a specific sound by designating a specific note number N and velocity V, and "Note OFF" data is for specifying a specific note number N and velocity V. This is data that designates the end of performance of a specific sound by designating it. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which the keyboard of a piano is depressed (velocity at the time of note ON) and the speed at which the finger is released from the keyboard (velocity at the time of note OFF). Or it indicates the strength of the performance end operation. In the case of the MIDI standard, each of these data is composed of a pair of note ON / note OFF as shown in FIG. 3 (a), and data obtained by adding timbre identification information to the data becomes phoneme data. This set of phoneme data is an acoustic code. FIG.
In (a), when the note is ON and when the note is OFF, the delta time indicates the sounding start time and the sounding end time of the phoneme data, respectively, the note number indicates the sound frequency information, and the velocity indicates the sound intensity information. The channel number is equivalent to timbre identification information, and is used to specify the timbre of each tone (single or chord) of a sound to be simultaneously pronounced. In the standard MIDI standard, a total of 16 channels from 0 to 15 can be used. However, in order to distinguish it from the extended format described below,
In the present embodiment, the number of channels is limited to a total of 15 channels from 0 to 14.

【００１４】本発明では、図３（ａ）に示したような標
準ＭＩＤＩ形式のデータと共に、その仕様を拡張した拡
張ＭＩＤＩ形式のデータを利用する。図３（ｂ）に拡張
ＭＩＤＩのデータ形式を示す。標準ＭＩＤＩ形式と比較
すると、拡張ＭＩＤＩ形式では、まず、チャンネル番号
の設定が異なっている。チャンネル番号は固定で１５と
なっており、ＭＩＤＩデータの復号手段においては、チ
ャンネル番号が１５かそれ以外かにより拡張ＭＩＤＩ形
式か標準ＭＩＤＩ形式かを判断できるようになってい
る。拡張ＭＩＤＩ形式では、和音を構成する各単音の情
報が記録されるチャンネルを特定するために拡張チャン
ネル番号を利用しており、標準ＭＩＤＩ形式の１５チャ
ンネルに対して、１２８チャンネルが使用可能となって
いる。ノートナンバーは同一であるが、拡張ＭＩＤＩ形
式では、ファインピッチの項目を有しており、ノートナ
ンバーの最小単位が半音であるのに対して、ファインピ
ッチでは１／１００半音単位で設定可能となっている。
ベロシティについても上位バイトと下位バイトを有する
ことにより、標準ＭＩＤＩと比較して１２８倍の精度で
設定可能となっている。In the present invention, data in the extended MIDI format whose specifications are extended together with data in the standard MIDI format as shown in FIG. FIG. 3B shows the data format of the extended MIDI. Compared with the standard MIDI format, the extended MIDI format first differs in the setting of the channel number. The channel number is fixed at 15, and the means for decoding MIDI data can determine whether the channel number is 15 or something other than the extended MIDI format or the standard MIDI format. In the extended MIDI format, an extended channel number is used to specify a channel in which information of each single note constituting a chord is recorded, and 128 channels can be used for 15 channels in the standard MIDI format. I have. Although the note number is the same, the extended MIDI format has a fine pitch item, and the minimum unit of the note number is a semitone, whereas the fine pitch can be set in 1/100 semitone units. ing.
By having the upper byte and the lower byte, the velocity can be set with a precision 128 times higher than that of the standard MIDI.

【００１５】図３に示したＭＩＤＩデータを入力音響符
号として、図２に示した音響信号合成装置により処理す
る場合について説明する。音響符号が入力されると、音
素データ獲得手段１は、音響符号のチャンネル番号を確
認し、チャンネル番号が０〜１４の場合は、標準ＭＩＤ
Ｉ形式として、チャンネル番号が１５の場合は、拡張Ｍ
ＩＤＩ形式として認識する。続いて、認識したデータ形
式に従って、音響符号から各音素データを抽出する。A case where the MIDI data shown in FIG. 3 is used as an input audio code and processed by the audio signal synthesizer shown in FIG. 2 will be described. When the acoustic code is input, the phoneme data acquisition means 1 checks the channel number of the acoustic code, and if the channel number is 0 to 14, the standard MID
If the channel number is 15 as the I format, the extension M
Recognize as IDI format. Subsequently, each phoneme data is extracted from the acoustic code according to the recognized data format.

【００１６】次に、波形部品生成手段２が、抽出された
音素データのうち、発音開始時刻、発音終了時刻、音色
識別情報に基づいて基本歪み波形データを作成する。具
体的には、まず、音素データが有する音色識別情報によ
り、波形部品生成手段２が有する音色定義テーブルを参
照し、対応する音色コードに変換する。さらに、この音
色コードにより、波形部品生成手段２が有するスペクト
ラムテーブルを参照し、対応する基本歪み波形データを
取得する。スペクトラムテーブルを利用した基本歪み波
形データの取得について、図４を用いて説明する。図４
において、左上の２つのグラフは、入力された音色コー
ドに対応するスペクトラムデータであり、横軸が周波
数、縦軸が強度となっている。各スペクトラムデータ
は、音色１、音色２がそれぞれ基本周波数を含む５つの
周波数からなることを示している。なお、各スペクトラ
ムデータにおいて、太線で示す周波数が基本周波数であ
る。続いて、このスペクトラムデータを用いて正弦波合
成を行う。これは、各周波数を有し、強度に対応する振
幅を有する正弦波を合成することにより行なわれる。こ
れにより、図中右上に示すような基本歪み波形データが
得られる。この基本歪み波形データはごく短時間（数十
ミリ秒）の情報しか有していないため、発音開始時刻か
ら発音終了時刻までの長さに対応するように複数時間方
向に並べて合成する処理を行う。これにより音素データ
の発音時間に対応した基本歪み波形データが得られる。Next, the waveform component generating means 2 generates basic distortion waveform data based on the sounding start time, sounding end time, and timbre identification information among the extracted phoneme data. Specifically, first, based on the timbre identification information included in the phoneme data, the timbre definition table included in the waveform component generation unit 2 is referred to and converted into a corresponding timbre code. Further, based on the timbre code, the spectrum table included in the waveform component generation means 2 is referred to, and the corresponding basic distortion waveform data is obtained. The acquisition of the basic distortion waveform data using the spectrum table will be described with reference to FIG. FIG.
In the graph, the upper left two graphs are spectrum data corresponding to the input timbre code, and the horizontal axis represents frequency and the vertical axis represents intensity. Each spectrum data indicates that timbre 1 and timbre 2 each include five frequencies including the fundamental frequency. In each spectrum data, the frequency indicated by a thick line is the fundamental frequency. Subsequently, sine wave synthesis is performed using the spectrum data. This is done by synthesizing a sine wave having each frequency and an amplitude corresponding to the intensity. As a result, basic distortion waveform data as shown in the upper right of the figure is obtained. Since this basic distortion waveform data has only information of a very short time (several tens of milliseconds), a process of arranging and synthesizing data in a plurality of time directions so as to correspond to the length from the sound generation start time to the sound generation end time is performed. . As a result, basic distortion waveform data corresponding to the sounding time of the phoneme data is obtained.

【００１７】次に、時間軸伸縮手段３が、音の周波数情
報に基づいて、得られた基本歪み波形データの時間軸方
向への伸縮を行う。具体的には、音素データが有する音
の周波数情報と同一の周波数になるように、基本歪み波
形データの周波数を変更する。続いて、振幅軸伸縮手段
４が、音の強度情報に基づいて、基本歪み波形データを
時間軸方向へ伸縮処理することにより得られた歪み波形
データに対して振幅軸方向への伸縮処理を行う。具体的
には、歪み波形データの最大振幅が、音素データが有す
る音の強度情報と同一になるように振幅値を変倍する。
さらに、発音開始時刻に、いきなり最大振幅にするので
はなく、図４の下段の出力合成波形に示すように、徐々
に振幅が大きくなるように立ち上がり制御を行う。この
発音開始時刻から最大振幅になるまでの時間は設定によ
り変更することができる。同様に、発音終了時刻につい
ても、いきなり振幅をゼロにするのではなく、徐々に振
幅が小さくなるように立ち下がり制御を行う。ただし、
立ち下がり制御の際は、発音終了時刻までは最大振幅
で、発音終了時刻から徐々に振幅を小さくしていく。な
お、ここでは、時間軸伸縮、振幅軸伸縮の順に処理を行
ったが、この逆でも良く、また同時に処理を行っても良
い。Next, the time axis expanding / contracting means 3 expands / contracts the obtained basic distortion waveform data in the time axis direction based on the frequency information of the sound. Specifically, the frequency of the basic distortion waveform data is changed so that the frequency becomes the same as the frequency information of the sound included in the phoneme data. Subsequently, the amplitude axis expansion / contraction means 4 performs expansion / contraction processing in the amplitude axis direction on the distortion waveform data obtained by expanding / contracting the basic distortion waveform data in the time axis direction based on the sound intensity information. . Specifically, the amplitude value is scaled so that the maximum amplitude of the distortion waveform data becomes the same as the sound intensity information of the phoneme data.
Further, at the sound generation start time, the rise control is performed so that the amplitude is gradually increased as shown in the output composite waveform in the lower part of FIG. The time from the tone generation start time to the maximum amplitude can be changed by setting. Similarly, with respect to the sound generation end time, the fall control is performed so that the amplitude does not suddenly become zero but gradually decreases. However,
In the fall control, the amplitude is the maximum amplitude until the sound generation end time, and the amplitude is gradually reduced from the sound generation end time. Here, the processing is performed in the order of expansion and contraction of the time axis and expansion and contraction of the amplitude axis. However, the processing may be performed in the reverse order or simultaneously.

【００１８】続いて、出力波形合成手段５が、波形記憶
手段６が有する書込みバッファメモリに、得られた歪み
波形データを書き込む処理を行う。この際、書込みバッ
ファメモリに記録されている歪み波形データを初めに読
み込み、新たに書き込もうとする歪み波形データをその
データに加算し、その合成された歪み波形データを書き
込みバッファメモリに記録するようにする。もし、図４
の下段の出力合成波形に示すように、前の歪み波形デー
タの発音終了時刻と後の歪み波形データの発音開始時刻
が近ければ、前の波形と後の波形が一部重なることによ
り、スムーズな音の切替（音の高さ、音の強さ、音色の
切替）が行なわれることになる。Subsequently, the output waveform synthesizing means 5 performs a process of writing the obtained distortion waveform data into the write buffer memory of the waveform storage means 6. At this time, the distortion waveform data recorded in the write buffer memory is first read, the distortion waveform data to be newly written is added to the data, and the combined distortion waveform data is recorded in the write buffer memory. I do. If Figure 4
As shown in the output composite waveform in the lower part, if the sounding end time of the previous distortion waveform data and the sounding start time of the subsequent distortion waveform data are close to each other, the former waveform and the latter waveform partially overlap, thereby providing a smoother waveform. Switching of sounds (switching of pitch, sound intensity, and tone) is performed.

【００１９】波形記憶手段６は２つのバッファメモリ
Ａ，Ｂを有している。これらのバッファメモリは、メモ
リモード切替手段７により所定の時間間隔で書込み、読
込みの機能が交互に切り替えられる。読込みバッファメ
モリに記録されている歪み波形データは、所定の時間間
隔で合成信号再生手段８に送られ、音響信号として再生
される。切替時間間隔Ｔは音素データの発音時間（発音
終了時刻−発音開始時刻）の最大値以上の値を設定す
る。書込みバッファメモリと読み込みバッファメモリ
は、この切替時間間隔Ｔの２倍のデータを記憶できる容
量を備えている。ただし、データの有効領域は前半のＴ
時間で、書込みバッファメモリが読込みバッファメモリ
に切り替わって、合成信号再生手段８で読み込まれるデ
ータは前半のＴ時間分の領域のみである。しかし、書込
みは、バッファメモリの後半Ｔ時間分の領域に対する書
込み操作は許可するものとする。前述したように書込み
バッファメモリに対する書込み処理を行う前に、データ
の読込み処理を行うため、音響信号合成処理が完了した
データは消去する必要がある。そこで、この初期化処理
は次のように行うようにする。両者とも初期状態では内
容が全て消去されており、読込みバッファメモリが書き
込みバッファメモリに切り替わると同時に、新たな読込
みバッファメモリ（前の書込みバッファメモリの内容を
保持）の後半Ｔ時間分のデータを新たな書込みバッファ
メモリの前半Ｔ時間分のデータ領域に上書きし、新たな
書込みバッファメモリの後半Ｔ時間分の領域を消去す
る。そうすると、音素データが切替時間をまたがって発
音するような時刻指定がなされている場合でもデータが
欠落しない。なぜなら、このような場合、書込みバッフ
ァメモリの後半Ｔ時間分の領域にもデータがはみ出して
書き込まれ、次のサイクルで、はみ出されたデータが波
形合成に利用されるためである。The waveform storage means 6 has two buffer memories A and B. The writing and reading functions of these buffer memories are alternately switched by the memory mode switching means 7 at predetermined time intervals. The distortion waveform data recorded in the read buffer memory is sent to the synthesized signal reproducing means 8 at predetermined time intervals, and reproduced as an acoustic signal. The switching time interval T is set to a value equal to or greater than the maximum value of the sounding time (sounding end time-sounding start time) of the phoneme data. The write buffer memory and the read buffer memory have a capacity capable of storing data twice as long as the switching time interval T. However, the effective area of the data is T
At time, the write buffer memory is switched to the read buffer memory, and the data read by the composite signal reproducing means 8 is only the area for the first half T time. However, for writing, it is assumed that a writing operation to the area for the second half T time of the buffer memory is permitted. As described above, since the data reading process is performed before the writing process to the writing buffer memory, the data that has been subjected to the audio signal synthesizing process needs to be erased. Therefore, this initialization process is performed as follows. In both cases, the contents are all erased in the initial state, and the read buffer memory is switched to the write buffer memory, and at the same time, the data for the second half T of the new read buffer memory (holding the contents of the previous write buffer memory) is newly added. The data area for the first half T of the new write buffer memory is overwritten, and the area for the second half T of the new write buffer memory is erased. Then, even when the time is specified so that the phoneme data sounds over the switching time, the data is not lost. This is because, in such a case, the data protrudes and is written into the area corresponding to the latter half T time of the write buffer memory, and in the next cycle, the protruding data is used for waveform synthesis.

【００２０】上記実施形態では、波形部品生成手段２が
スペクトラムテーブルを備え、音色コードによりスペク
トラムテーブルを参照することにより、所定のスペクト
ラムデータを抽出し、このスペクトラムデータを用いて
正弦波を合成することにより、基本歪み波形データを得
るようにしていたが、波形部品生成手段２が基本波形定
義テーブルを備え、音色コードにより基本波形定義テー
ブルを参照することにより、基本歪み波形データを得る
手法を適用することもできる。具体的には、汎用のＷＡ
ＶＥフォーマットで波形データを用意しておき、この波
形データと音色コードを対応付けたテーブルを基本波形
定義テーブルとして用意しておくのである。In the above embodiment, the waveform component generating means 2 has a spectrum table, extracts predetermined spectrum data by referring to the spectrum table by tone color code, and synthesizes a sine wave using the spectrum data. Has been described, the waveform component generating means 2 has a basic waveform definition table, and a method of obtaining the basic distortion waveform data by referring to the basic waveform definition table by a tone color code is applied. You can also. Specifically, general-purpose WA
Waveform data is prepared in VE format, and a table in which the waveform data is associated with a tone color code is prepared as a basic waveform definition table.

【００２１】[0021]

【発明の効果】以上、説明したように本発明によれば、
音響信号の合成装置を、発音開始時刻、発音終了時刻、
音の周波数情報、音の強度情報、音色識別情報からなる
音素データが複数個で構成され、複数のデータ形式を採
り得る音響符号に対して、前記音響符号を外部から取り
込み、どのデータ形式かを認識し、各データ形式に従っ
て、各音素データを抽出するための音素データ獲得手段
と、前記音色識別情報に基づく波形形状で、前記発音開
始時刻から前記発音終了時刻までの時間に基づく長さを
もつ基本歪み波形データを準備するための波形部品生成
手段と、前記音の周波数情報を基に前記基本歪み波形デ
ータに対して時間軸方向に変倍をかけるための時間軸伸
縮手段と、前記音の強度情報を基に前記基本歪み波形デ
ータに対して振幅軸方向に変倍をかけるための振幅軸伸
縮手段と、前記時間軸方向および振幅軸方向に変倍がか
かった基本歪み波形データを保持するため２種類のバッ
ファメモリから構成される波形記憶手段と、前記波形記
憶手段の２種類のバッファメモリに対して一方を書込み
用、他方を読込み用とする排他的モードを定義し、所定
の時間間隔でこれらのモードを切り替えるためのメモリ
モード切替手段と、前記書込み用バッファメモリに対し
て前記基本歪み波形データを書き込む際、前記発音開始
時刻および前記発音終了時刻に対応する書込み用バッフ
ァメモリのアドレス範囲内に、既に書き込まれている波
形データと加算しながら前記書込み用バッファメモリに
書き込むような処理を行うための出力波形合成手段と、
前記読込み用バッファメモリの内容を時系列に音響信号
として外部に出力するための合成信号再生手段と、を有
する構成としたので、汎用的なデータ形式を利用できる
と共に、より精度の高い特殊なデータ形式を扱うことが
可能となり、汎用データのメリットを生かしながら、一
方で汎用データの制約にとらわれず、あらゆる音響信号
を忠実に再現可能となるという効果を奏する。As described above, according to the present invention,
The sound signal synthesizing device sets the sound generation start time, sound generation end time,
Sound frequency information, sound intensity information, phoneme data consisting of timbre identification information is composed of a plurality, for the acoustic code that can take a plurality of data formats, the acoustic code is fetched from the outside, which data format A phoneme data acquisition means for recognizing and extracting each phoneme data according to each data format, and a waveform shape based on the timbre identification information, having a length based on a time from the sounding start time to the sounding end time. Waveform component generation means for preparing basic distortion waveform data, time axis expansion / contraction means for applying magnification to the basic distortion waveform data in the time axis direction based on the frequency information of the sound, Amplitude axis expansion / contraction means for applying magnification to the basic distortion waveform data in the amplitude axis direction based on intensity information, and a basic distortion wave subjected to magnification in the time axis direction and the amplitude axis direction Defining a waveform storage means composed of two types of buffer memories for holding data, and an exclusive mode in which one of the two types of buffer memories of the waveform storage means is for writing and the other is for reading, Memory mode switching means for switching between these modes at predetermined time intervals; and a writing buffer corresponding to the sound generation start time and the sound generation end time when writing the basic distortion waveform data to the writing buffer memory. Output waveform synthesizing means for performing a process of writing to the write buffer memory while adding the already written waveform data to the address range of the memory;
And a synthetic signal reproducing means for outputting the contents of the read buffer memory as a sound signal to the outside in a time-series manner, so that a general-purpose data format can be used and special data with higher accuracy can be used. This makes it possible to handle formats, and makes it possible to faithfully reproduce all audio signals while taking advantage of general-purpose data while being free from restrictions of general-purpose data.

[Brief description of the drawings]

【図１】拡張型ＭＩＤＩ符号化復号化方式の概念図であ
る。FIG. 1 is a conceptual diagram of an extended MIDI encoding / decoding system.

【図２】音響信号合成装置の構成を示す機能ブロック図
である。FIG. 2 is a functional block diagram illustrating a configuration of an acoustic signal synthesis device.

【図３】本音響信号合成装置により扱うことが可能な音
響符号のデータ形式の一例を示す図である。FIG. 3 is a diagram illustrating an example of a data format of an acoustic code that can be handled by the acoustic signal synthesis device.

【図４】スペクトラムテーブルを用いた場合の、基本歪
み波形データの作成と出力波形の合成の様子を示す図で
ある。FIG. 4 is a diagram showing a state of creating basic distortion waveform data and synthesizing an output waveform when a spectrum table is used.

[Explanation of symbols]

１・・・音素データ獲得手段２・・・波形部品生成手段３・・・時間軸伸縮手段４・・・振幅軸伸縮手段５・・・出力波形合成手段６・・・波形記憶手段７・・・メモリモード切替手段８・・・合成信号再生手段 DESCRIPTION OF SYMBOLS 1 ... Phoneme data acquisition means 2 ... Waveform component generation means 3 ... Time axis expansion / contraction means 4 ... Amplitude axis expansion / contraction means 5 ... Output waveform synthesis means 6 ... Waveform storage means 7 ... .Memory mode switching means 8... Synthesized signal reproducing means

Claims

[Claims]

1. An acoustic code which is composed of a plurality of phoneme data including a sound generation start time, a sound generation end time, sound frequency information, sound intensity information, and timbre identification information, and which can take a plurality of data formats. The acoustic code is taken in from the outside, which data format is recognized, according to each data format, phoneme data acquisition means for extracting each phoneme data, and a waveform shape based on the timbre identification information, from the sounding start time Waveform component generation means for preparing basic distortion waveform data having a length based on the time to the sound ending time; scaling the basic distortion waveform data in the time axis direction based on the frequency information of the sound A time axis expanding / contracting means for applying a magnification to the basic distortion waveform data on the basis of the sound intensity information in an amplitude axis direction; A waveform storage means comprising two types of buffer memories for holding basic distortion waveform data subjected to scaling in the amplitude axis direction, and writing one of the two types of buffer memories to the waveform storage means; A memory mode switching means for defining an exclusive mode in which the other is used for reading and switching these modes at predetermined time intervals; and when the basic distortion waveform data is written to the writing buffer memory, Output waveform synthesizing means for performing processing such as writing to the write buffer memory while adding the already written waveform data to the address range of the write buffer memory corresponding to the start time and the tone generation end time; Reproducing a synthesized signal for outputting the contents of the read buffer memory as a sound signal to the outside in time series Means for synthesizing an acoustic signal.

2. The apparatus according to claim 1, wherein said waveform component generation means performs attenuation such that rising and falling become smooth with respect to the amplitude near both ends of said basic distortion waveform. Sound signal synthesizer.

3. The apparatus according to claim 2, wherein said waveform component generating means includes a tone color definition table, and converts the tone component into a predetermined tone color code by referring to the tone definition table based on the tone color identification information. The audio signal synthesizing device according to claim 1.

4. The waveform component generating means includes a basic waveform definition table, and refers to the basic waveform definition table based on the timbre code to obtain the basic distortion waveform data. The audio signal synthesizing device according to claim 3.

5. The waveform component generating means includes a spectrum table, extracts predetermined spectrum data by referring to the spectrum table based on the timbre code, and synthesizes a sine wave based on the spectrum data. The sound signal synthesizing apparatus according to claim 3, wherein the basic distortion waveform data is obtained by the following.

6. A control information for updating the contents of each of the tone color definition table, the basic waveform definition table, and the spectrum table and table update data as a part of the acoustic code. The audio signal synthesizing device according to any one of claims 3 to 5, wherein:

7. The sound frequency information, the sound intensity information,
2. The apparatus according to claim 1, wherein all or any of the tone color identification information can be set in a plurality of setting modes having different precisions, and the setting mode is switched based on the tone color identification information. A sound signal synthesizing device according to claim 1.

8. The sound frequency information is described by a note number defined by the MIDI standard, the sound intensity information is described by 128 velocities defined by the MIDI standard, and the timbre identification information is described by: The sound generation start time and the sound generation end time are described by 16 kinds of channels defined by the MIDI standard, the sound generation end time is described by the SMF standard delta time, and the phoneme data is described by the MIDI standard note-on event and note-off event. 2. The sound signal synthesizing device according to claim 1, wherein the sound signal is synthesized.

9. The system according to claim 1, wherein said note number, said velocity, and said channel have two types of setting modes, a standard format and an extended format, and said setting mode is switched according to the value of said channel. An apparatus for synthesizing an acoustic signal according to claim 8.

10. Each buffer memory of said waveform storage means has a capacity twice as large as an address read by said synthesized signal reproducing means, and said output waveform synthesizing means writes in all areas of said write buffer memory. By performing such processing, data in an area where the read buffer memory is not read is transferred to the write buffer memory at an address corresponding to the area where the read buffer memory is read. The audio signal synthesizing device according to claim 1.