JPH0258100A

JPH0258100A - Voice encoding and decoding method, voice encoder, and voice decoder

Info

Publication number: JPH0258100A
Application number: JP63208201A
Authority: JP
Inventors: Kazunori Ozawa; 一範小澤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-08-24
Filing date: 1988-08-24
Publication date: 1990-02-27
Anticipated expiration: 2013-12-02
Also published as: JP2829978B2

Abstract

PURPOSE:To obtain a synthetic speech which does not deteriorate in quality even if there is a transient part of a speech or a change part between vowels by dividing a frame of a sound source signal in a speech section into pitch cycles, and indicating a multipulse in a pitch section as one representative section and an amplitude correction coefficient and a phase correction coefficient in other sections. CONSTITUTION:A discrete speech signal is inputted to a transmission side and a spectrum parameter indicating a spectrum envelope and a pitch parameter indicating pitch are extracted by frames; and a frame section is divided into pitch sections corresponding to the pitch information and the sound source signal is outputted as a combination of the multipulse in one section and correction information regarding the multipulse. On a reception side, the signal is restored to a driving sound source signal and a speech synthetic signal is generated by using the spectrum parameter. Namely, a representative section for each frame is searched for in a voiced section to transmit the amplitude and position of the multipulse and amplitudes and phase correction coefficients of other pitch sections as sound source information and also send the spectrum and pitch parameter of a composing filter as auxiliary information.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声符号化復号化方法及び音声符号化装置並
びに音声復号化装置に関し、特に、音声信号を低いビッ
トレート、特に４．８ｋｂ／ｓ程度以下で、比較的少な
い演算量により高品質に符号化し、復号化できるように
するための音声符号化復号化方法及びその符号化、復号
化のための装置に関する。Detailed Description of the Invention [Field of Industrial Application] The present invention relates to a voice encoding/decoding method, a voice encoding device, and a voice decoding device, and in particular, the present invention relates to a voice encoding/decoding method, a voice encoding device, and a voice decoding device. The present invention relates to a speech encoding/decoding method that enables high-quality encoding and decoding with a relatively small amount of calculation in less than s, and an apparatus for encoding and decoding the same.

[Conventional technology]

音声信号を４．８ｋｂ／ｓ程度の低いビットレートで符
号化する方式としては、例えば特願昭５９−２７２４３
５号や特願昭６０−１７８９１１号明細書等に開示され
ているピッチ補間マルチパルス法が知られている。この
方法では、送信側では、フレーム毎の音声信号から音声
信号のスペクトル特性を表すスペクトルパラメータとピ
ッチパラメータを抽出し、有声区間では、１フレームの
音源信号を、１フレームをピッチ区間毎に分割した複数
個のピッチ区間のうちの一つのピッチ区間（代表区間）
についてマルチパルスで表し、代表区間におけるマルチ
パルスの振幅１位置とスペクトル、ピッチパラメータを
伝送する。また、無声区間では、■フレームの音源を少
数のマルチパルスと雑音信号で表し、マルチパルスの振
幅５位置と雑音信号のゲイン、インデクスを伝送する。As a method for encoding audio signals at a low bit rate of about 4.8 kb/s, for example, Japanese Patent Application No. 59-27243
The pitch interpolation multi-pulse method disclosed in Japanese Patent Application No. 5 and Japanese Patent Application No. 60-178911 is known. In this method, on the transmitting side, the spectral parameters and pitch parameters representing the spectral characteristics of the audio signal are extracted from the audio signal for each frame, and in the voiced section, the sound source signal of 1 frame is divided into pitch sections. One pitch section (representative section) among multiple pitch sections
is expressed as a multi-pulse, and the amplitude 1 position, spectrum, and pitch parameter of the multi-pulse in a representative section are transmitted. Furthermore, in the silent section, the sound source of frame (2) is represented by a small number of multipulses and a noise signal, and the five amplitude positions of the multipulse, the gain and index of the noise signal are transmitted.

一方、受信側においては、有声区間では、代表区間のマ
ルチパルスと隣接フレームのマルチパルスとを用いてマ
ルチパルス同士を補間して代表区間以外のピッチ区間の
パルスを復元し、フレームの駆動音源信号を復元する。On the other hand, on the receiving side, in voiced sections, the multi-pulses in the representative section and the multi-pulses in adjacent frames are used to interpolate the multi-pulses to restore the pulses in pitch sections other than the representative section, and the drive sound source signal of the frame is restore.

また、無声区間では、マルチパルスと雑音信号のインデ
クス、ゲインを用いてフレームの音源信号を復元する。Furthermore, in the unvoiced section, the frame's sound source signal is restored using the index and gain of the multipulse and the noise signal.

さらに、復元した駆動音源信号を、スペクトルパラメー
タを用いた合成フィルタに入力して合成音声信号を出力
する。Furthermore, the restored drive sound source signal is input to a synthesis filter using spectral parameters to output a synthesized speech signal.

[Problem to be solved by the invention]

しかしながら、上述した従来方式では、有声区間では代
表区間のマルチパルス同士の補間によりフレームの駆動
音源信号を復元しているので、母音連鎖の母音から母音
へ遷移する部分や有声の過渡部など、音声信号の特性が
変化しているフレームでは、補間により復元した駆動音
源信号は大きく劣化し、その結果、合成音声の音質が劣
化していた。このような音声の特性が大きく変化する部
分は、音韻知覚や自然性の知覚に非常に重要であること
が知られているが、従来方式ではこれらの部分の情報が
十分に復元できないので音質が劣化するという大きな問
題点があった。However, in the conventional method described above, in voiced sections, the drive sound source signal of the frame is restored by interpolation between multi-pulses in the representative section. In frames where the signal characteristics are changing, the driving sound source signal restored by interpolation is significantly degraded, and as a result, the sound quality of the synthesized speech is degraded. It is known that areas where the characteristics of speech change significantly are extremely important for phonological perception and perception of naturalness, but conventional methods cannot sufficiently restore information in these areas, resulting in poor sound quality. There was a major problem: deterioration.

本発明の目的は夷上述した問題点を解決し、比較的少な
い演算量により低いビットレートでも音質の良好な音声
符号化復号化方法及びその方法に適した音声符号化装置
、音声復号化装置を提供することにある。The purpose of the present invention is to solve the above-mentioned problems, and to provide an audio encoding/decoding method that achieves good sound quality even at a low bit rate with a relatively small amount of calculation, and an audio encoding device and audio decoding device suitable for the method. It is about providing.

[Means to solve the problem]

本発明の音声４１号化復号化方法は、送信側において、離散的な音声信号を入力し、その音声
信号からフレーム毎にスペクトル包絡を表すスペクトル
パラメータとピッチを表すピッチパラメータとを抽出し
て前記フレーム区間をそのピッチ情報に応じたピッチ区
間に分割し、前記音声信号の音源信号を、前記ピッチ区
間のうち１つのピッチ区間のマルチパルスとそのマルチ
パルスに関する補正情報もしくは雑音とパルス列との組
合せで表し、受信側では、前記１つのピッチ区間のマルチパルスと前
記マルチパルスに関する補正情報もしくは前記雑音とパ
ルス列との組合せと前記ピソチバラメータとを用いて前
記フレームの駆動音源信号を復元し、前記スペクトルパ
ラメータを用いて合成音声信号を求めることを特徴とし
ている。The audio 41 encoding/decoding method of the present invention includes inputting a discrete audio signal on the transmitting side, extracting a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch from the audio signal for each frame. The frame section is divided into pitch sections according to the pitch information thereof, and the sound source signal of the audio signal is divided into pitch sections according to the pitch information thereof, and the sound source signal of the audio signal is divided into multipulses of one pitch section among the pitch sections and correction information regarding the multipulses, or a combination of noise and a pulse train. On the receiving side, the driving sound source signal of the frame is restored using the multi-pulse of the one pitch section, correction information regarding the multi-pulse, or a combination of the noise and pulse train, and the pisotiva parameter, and the spectral parameter is The feature is that the synthesized speech signal is obtained by using

また、本発明の音声符号化装置は、入力した離散的な音声信号からフレーム毎にスペクトル
包絡を表すスペクトルパラメータとピッチを表すピッチ
パラメータを抽出し符号化するパラメータ計算手段と、フレーム区間を前記ピッチパラメータに応じたピッチ区
間に分割し前記フレーム区間毎の音声信号の音源信号と
して前記ピッチ区間のうちの１つのピッチ区間のマルチ
パルスと他のピッチ区間において前記マルチパルスに関
して振幅あるいは位相の少なくとも一方を補正するため
の補正情報もしくは雑音とパルス列との組合せを求めて
符号化する音源信号計算手段と、前記パラメータ計算手段の出力符号と前記音源信号計算
手段の出力符号とを組み合わせて出力するマルチプレク
サとを有することを特徴としている。The speech encoding device of the present invention also includes a parameter calculating means for extracting and encoding a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch from an input discrete audio signal for each frame; The audio signal is divided into pitch sections according to parameters, and at least one of the amplitude and phase of the multipulse in one pitch section of the pitch sections and the multipulse in the other pitch sections are used as the sound source signal of the audio signal for each frame section. excitation signal calculation means for determining and encoding correction information for correction or a combination of noise and pulse train; and a multiplexer for outputting a combination of the output code of the parameter calculation means and the output code of the excitation signal calculation means. It is characterized by having

更に、本発明の音声復号化装置は、スペクトルパラメータを表す符号とピッチパラメータを
表す符号と音源信号を表す符号とを分離して復号化する
手段と、フレームを前記復号化したピッチパラメータに応じたピ
ッチ区間に分割し１つのピッチ区間についてマルチパル
スを発生し他のピッチ区間において前記マルチパルスに
関して振幅あるいは位相の少なくとも一方を補正する補
正情報を用いてパルスを発生させて前記フレームの駆動
音源信号を復元するかもしくは雑音とパルス列の組合せ
を用いて前記フレームの駆動音源信号を復元する駆動信
号復元手段と、駆動音源と前記復号したスペクトルパラメータとを用い
て合成音声を求め出力する合成フィルタとを有すること
を特徴としている。Furthermore, the audio decoding device of the present invention includes means for separately decoding a code representing a spectral parameter, a code representing a pitch parameter, and a code representing a sound source signal, and a means for decoding a frame in accordance with the decoded pitch parameter. The driving sound source signal of the frame is divided into pitch sections, generates a multi-pulse for one pitch section, and generates pulses in other pitch sections using correction information for correcting at least one of the amplitude and the phase of the multi-pulse. a driving signal restoring means for restoring the driving sound source signal of the frame by restoring the driving sound source signal or using a combination of noise and a pulse train; and a synthesis filter for obtaining and outputting synthesized speech using the driving sound source and the decoded spectral parameters. It is characterized by

[Effect]

本発明によれば、有声区間の音源信号を、フレームをピ
ッチ周期に分割して１つのピッチ区間（代表区間）のマ
ルチパルスと他のピッチ区間では補正情報を用いて表す
ことが可能となる。補正情報としては、より望ましくは
振幅補正係数、位相補正係数とすることができる。According to the present invention, it is possible to divide a frame into pitch periods and represent a sound source signal in a voiced period using multipulses in one pitch period (representative period) and correction information in other pitch periods. More preferably, the correction information may be an amplitude correction coefficient or a phase correction coefficient.

かかる音源信号の処理は、従来のものにおける駆動音源
信号の劣化回避に有効であり、音声の特性が大きく変化
する部分であっても音質の良好な合成音声を得ることを
可能ならしめる。また、有声区間以外では、雑音とマル
チパルスの組合せにより音源信号を表すことができるの
で、種々の子音に対しても良好な合成音声を得られる。Such processing of the sound source signal is effective in avoiding deterioration of the drive sound source signal in the conventional method, and makes it possible to obtain synthesized speech with good sound quality even in parts where the characteristics of the sound change greatly. In addition, since the sound source signal can be represented by a combination of noise and multipulses in areas other than voiced sections, good synthesized speech can be obtained even for various consonants.

〔Example〕

次に、本発明の実施例について図面を参照して説明する
。Next, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明による音声符号化復号化方法とそのため
の符号化装置及び復号化装置の一実施例を示すブロック
図である。また、第２図は有声フレームでの代表区間と
代表区間のマルチパルス及び振幅補正係数、位相補正係
数を説明するための図である。FIG. 1 is a block diagram showing an embodiment of a speech encoding/decoding method and an encoding device and a decoding device therefor according to the present invention. Further, FIG. 2 is a diagram for explaining a representative section in a voiced frame, a multi-pulse of the representative section, an amplitude correction coefficient, and a phase correction coefficient.

第１図に示すように、音声信号の符号化、復号化処理の
ための伝送系は、送信側の符号化装置と受信側の復号化
装置とから構成される。As shown in FIG. 1, a transmission system for encoding and decoding audio signals is composed of an encoding device on the transmitting side and a decoding device on the receiving side.

本実施例では、送信側は、バッファメモリ１１０を備え
ると共に、ピッチ分析回路１３０と、ピッチ符号化回路
１５０と、スペクトルパラメータを表すパラメータとし
てのにパラメータのためのにパラメータ計算回路１４０
と、Ｋパラメータ符号化回路１６０とを有する。In this embodiment, the transmitting side includes a buffer memory 110, a pitch analysis circuit 130, a pitch encoding circuit 150, and a parameter calculation circuit 140 for parameters representing spectral parameters.
and a K-parameter encoding circuit 160.

更に、インパルス応答計算回路１７０と、自己相関関数
計算回路１８０と、減算器１９０と、重み付は回路２０
０と、相互相関関数計算回路２１０と、音源信号計算回
路２２０と、符号化回路２３０と、マルチプレクサ２６
０とを備えると共に、振幅・位相補正係数計算回路２７
０と、雑音メモリ２２５と、駆動信号復元回路２８３と
、合成フィルタ２８１と、補間回路２８２とを備えて構
成されている。Further, an impulse response calculation circuit 170, an autocorrelation function calculation circuit 180, a subtracter 190, and a weighting circuit 20
0, a cross-correlation function calculation circuit 210, a sound source signal calculation circuit 220, an encoding circuit 230, and a multiplexer 26
0, and an amplitude/phase correction coefficient calculation circuit 27
0, a noise memory 225, a drive signal restoration circuit 283, a synthesis filter 281, and an interpolation circuit 282.

送信側では、符号１００で示す入力端子に音声信号が供
給され、ピッチ符号化回路１５０　、Ｋパラメータ符号
化回路１６０及び符号化回路２３０の各出力が供給され
るマルチプレクサ２６０を介して符号化出力が受信側へ
送出される。On the transmitting side, an audio signal is supplied to an input terminal indicated by reference numeral 100, and an encoded output is supplied via a multiplexer 260 to which outputs of a pitch encoding circuit 150, a K-parameter encoding circuit 160, and an encoding circuit 230 are supplied. Sent to the receiving side.

受信側は、第１図に示すように、デマルチプレクサ２９
０と、音源復号回路３００と、雑音メモリ３１０と、復
号回路３１５と、ピッチ復号回路３２０と、Ｋパラメー
タ復号回路３３０を備えると共に、駆動信号復元回路３
４０の他、補間回路３３５、合成フィルタ回路３５０を
含んで構成されており、デマルチプレクサ２９０に送信
側からの符号化出力が供給され、出力端子３６０を通し
て合成音声が取り出される。On the receiving side, as shown in FIG.
0, a sound source decoding circuit 300, a noise memory 310, a decoding circuit 315, a pitch decoding circuit 320, a K parameter decoding circuit 330, and a drive signal restoration circuit 3.
40, an interpolation circuit 335, and a synthesis filter circuit 350. The encoded output from the transmitting side is supplied to a demultiplexer 290, and synthesized speech is extracted through an output terminal 360.

入力端子１００に入力される音声信号の符号化。Encoding of the audio signal input to the input terminal 100.

復号化処理は、送信側では、離散的な音声信号を入力し
前記音声信号からフレーム毎にスペクトル包絡を表すス
ペクトルバラメークとピッチを表すピッチパラメータと
を抽出して前記フレーム区間を前記ピッチ情報に応じた
ピッチ区間に分割し、前記音声信号の音源信号を前記ピ
ッチ区間のうち１つのピッチ区間のマルチパルスと前記
マルチパルスに関する補正情報もしくは雑音とパルス列
との組合せで表し、受信側では、前記１つのピッチ区間
のマルチパルスと前記マルチパルスに関する補正情報も
しくは前記雑音とパルス列との組合せと前記ピッチパラ
メータとを用いて前記フレームの駆動音源信号を復元し
前記スペクトルパラメータを用いて合成音声信号を求め
ることによって行われる。In the decoding process, on the transmitting side, a discrete audio signal is input, and from the audio signal, a spectral variable that represents the spectral envelope and a pitch parameter that represents the pitch are extracted for each frame, and the frame section is converted into the pitch information. The sound source signal of the audio signal is represented by a combination of a multipulse of one pitch section among the pitch sections, correction information regarding the multipulse, or noise and a pulse train, and on the receiving side, restoring the drive sound source signal of the frame using the multi-pulse of one pitch section and correction information regarding the multi-pulse or a combination of the noise and pulse train and the pitch parameter, and obtaining a synthesized speech signal using the spectral parameter. carried out by.

以下、まず、これについて第２図に示す例を参照してそ
の原理を説明する。Hereinafter, the principle will first be explained with reference to the example shown in FIG. 2.

第１図に示した本発明に従う音声符号化復号化方法及び
装置は、有声区間では、フレーム区間をピッチ周期毎の
ピッチ区間に分割し、前記ピッチ区間のうちの１つのピ
ッチ区間（代表区間）についてマルチパルスを求め、同
一フレーム内の他のピッチ区間については前記マルチパ
ルスに対する振幅補正係数ｃｋ＋位相補正係数ｄ、を求
め、そして、フレーム毎に、音源情報として代表区間の
フレーム内のピッチ位置、代表区間のマルチハ／Ｌ／ス
の振幅２位置と同一フレームの他のピッチ区間の振幅補
正係数ｃｌｉ＋位相補正係数ｄ、を伝送し、さらに、補
助情報としてスペクトルパラメータ、ピッチパラメータ
、有声／無声判別情＋Ｉｌを伝送することを特徴とする
。代表区間は、最も良好な合成音声信号が求められる区
間を探索して求めてもよいし、フレーム内で固定として
もよい。音質は前者の方が良好であるが、演算量は多く
なる。The speech encoding/decoding method and apparatus according to the present invention shown in FIG. For other pitch sections within the same frame, find the amplitude correction coefficient ck + phase correction coefficient d for the multi-pulse. Then, for each frame, as sound source information, the pitch position within the frame of the representative section, The amplitude correction coefficient cli + phase correction coefficient d of the multi-phase/L/S amplitude in the representative section and the other pitch sections in the same frame are transmitted, and the spectral parameters, pitch parameters, and voiced/unvoiced discrimination information are transmitted as auxiliary information. +Il is transmitted. The representative section may be found by searching for a section where the best synthesized speech signal is found, or may be fixed within the frame. The former has better sound quality, but requires more calculations.

以下で振幅補正係数ｃｋ＋位相補正係数ｄｋの求め方、
代表区間の探索法を示す。今、フレームで求めた平均ピ
ッチ周期をＴとする。１フレームの音声波形フレームを
Ｔ毎のサブフレーム区間に分割した様子を第２図（ａ）
、　　（ｂ）に示す。ここでは、代表区間を探索する場
合について示す。Below, how to find the amplitude correction coefficient ck + phase correction coefficient dk,
We will show how to search for representative sections. Now, let T be the average pitch period found for each frame. Figure 2 (a) shows how one frame of audio waveform is divided into subframe sections of T.
, shown in (b). Here, a case will be described in which a representative section is searched.

代表区間の候補となるサブフレームを例えばサブフレー
ム■とする。サブフレーム■について予め定められた個
数りのマルチパルスの振幅１位置を求める。マルチパル
スの求め方については、相互相関関数Φｘｈと自己相関
関数Ｒｈｈを用いて求める方法が知られており、これは
例えば前記各特許出願明細書や、Ａｒａｓｅｋｉ、　Ｏ
ｚａｗａ、　Ｏｎｏ、　０ｃｈｉａｉ氏による”　Ｍｕ
ｌｔｉ−ｐｕｌｓｅ　Ｅｘｃｉｔｅｄ　５ｐｅｅｃｈ　
Ｃｏｄｅｒ　Ｂａ５ｅｄｏｎ　Ｍａｘｉｍｕｍ　Ｃｒｏ
ｓｓ−ｃｏｒｒｅｌａｔｉｏｎ　５ｅａｒｃｈ　Ａｌｇ
ｏｒｉｔｈｍ。For example, a subframe that is a candidate for the representative section is subframe ■. The amplitude 1 position of a predetermined number of multipulses is determined for subframe (■). As for how to obtain multipulses, it is known to use a cross-correlation function Φxh and an autocorrelation function Rhh.
“Mu” by zawa, Ono, Ochiai
lti-pulse Excited 5peech
Coder Ba5edon Maximum Cro
ss-correlation 5earch Alg
orithm.

（ＧＬＯＢＥＣＯＭ　８３．　ＩＥＥＥ　Ｇｌｏｂａｌ
　ＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓＣｏｎ　ｆ　
ｅｒｅｎｃｅ、講演番号２３．３．１９８３）　　（文
献１）に記載されているので、ここではその詳細な説明
は省略する。(GLOBECOM 83. IEEE Global
Telecommunications Conf.
erence, lecture number 23.3.1983) (Reference 1), so detailed explanation thereof will be omitted here.

代表区間のマルチパルスの振幅２位置をそれぞれｇ＝　
、ｍｉ　（ｉ可１〜Ｌ）とする。これを第２図（ｃ）に
示す。代表区間以外の区間ｋにおける振幅補正係数０１
１＋　位相補正係数ｄ、は、これらと合成フィルタを用
いて区間ｋについて合成した合成音声ｘ＋、（ｎ）と、
該当区間の音声Ｘｋ（ｎ＞との重み付は誤差電力Ｅｋを
最小化するように求めることができる。重み付は誤差電
力Ｅ、は、次式（１）で与えられる。The two amplitude positions of the multi-pulse in the representative section are each g=
, mi (i possible 1 to L). This is shown in FIG. 2(c). Amplitude correction coefficient 01 in section k other than the representative section
1+ The phase correction coefficient d, is the synthesized speech x+, (n) synthesized for the section k using these and the synthesis filter,
Weighting with the audio Xk (n>) of the corresponding section can be determined to minimize the error power Ek.The weighting and the error power E are given by the following equation (1).

Ｅｋ＝Σ　（（ｘｔ＋　（ｎ）−１ｋ（ｎ））＊ｗ　（
ｎ）１文ｋ　（ｎ）＝（ｋΣ　ｇ＝・ｈ　　（ｎ−ｍｉ
−Ｔ−ｄｋ）ここで、ｗ　（ｎ）は聴感重み付はフィル
タのインパルス応答を示す。ただし、このフィルタはな
くてもよい。また、ｈ　（ｎ）は音声を合成するための
合成フィルタのインパルス応答を示す。ｃｋ。Ek=Σ ((xt+ (n)-1k(n))*w (
n) 1 sentence k (n)=(kΣ g=・h (n-mi
-T-dk) Here, w (n) indicates the impulse response of the perceptually weighted filter. However, this filter may not be provided. Further, h (n) represents an impulse response of a synthesis filter for synthesizing speech. ck.

ｄｋは＋１１式を最小化するように求めることができる
。このためには例えばまずｄｋを固定して、（１）式を
ｃｋについて偏微分して０とおき、次式を得る。dk can be obtained by minimizing the +11 formula. To do this, for example, first fix dk, partially differentiate equation (1) with respect to ck, set it to 0, and obtain the following equation.

ここで、Ｘｗｋ　（ｎ）　、　　交ｗ＋＋　（ｎ）は、
それぞれＸｗｋ　（ｎ）　−）（ｋ（ｎ）　＊ｗ　（ｎ
）　　　・・・（４ａ）ｘｗ、ｌ　（ｎ）　　−Σ　ｇ
＝　Ｈｈ　　（ｎ　−ｍｉ−’ｒ　　　ｄｈ）である。Here, Xwk (n) and intersection w++ (n) are
Xwk (n) −)(k(n) *w (n
) ... (4a) xw, l (n) -Σ g
= Hh (n-mi-'r dh).

従って、（３）式の値を種々のｄｋＯ値について求め、
（３）弐のＣ３を最小化するｄｋｉｃｋの組合せを求め
ることにより（１）式のＥｋは最小化される。Therefore, find the value of equation (3) for various dkO values,
(3) Ek in equation (1) is minimized by finding a combination of dkick that minimizes C3.

このようにして、代表区間以外のピッチ区間に対してＣ
ｋ、ｄｋを求めフレーム全体に対して次式で定義される
誤差電力Ｅを次式（５）で求める。In this way, C
k and dk are determined, and the error power E defined by the following formula for the entire frame is determined by the following formula (5).

Ｅ＝　Σ　Ｅ、　　　　　　　　　　　　　　　・　・
　・（５）ここで、Ｎはフレームに含まれるサブフレー
ムの個数である。ただし、代表ピッチ区間（第２図の例
ではサブフレーム区間■）の重み付は誤差電力Ｅ２は次
式で求める。E= Σ E, ・・
- (5) Here, N is the number of subframes included in the frame. However, the weighting of the representative pitch section (in the example of FIG. 2, the subframe section ■), the error power E2 is determined by the following equation.

＊ｗ（ｎ））”　　　　　　　　　　　　　・　・　・
（６）代表ピッチ区間の探索は、全ての代表ピッチ区間
の候補について、（１）〜（６）式の値を求め、（５）
式の値を最も小さくする区間を代表ピッチ区間とするこ
とができる。第２図（ｃ）に探索後の代表ピッチ区間が
サブフレーム■であった場合について、代表区間のマル
チパルスと、代表区間以外のに番目の区間（第２図（ｃ
）ではに＝１．　２．　４−、　５）の音源Ｖ、（ｎ）
を次式に従い発生させた例を示す。*w(n))” ・・・
(6) To search for a representative pitch section, find the values of equations (1) to (6) for all candidates for representative pitch sections, and (5)
The section that minimizes the value of the expression can be set as the representative pitch section. Figure 2(c) shows the case where the representative pitch interval after the search is subframe ■, and the multi-pulse of the representative interval and the
) Then = 1. 2. 4-, 5) Sound source V, (n)
An example is shown in which is generated according to the following formula.

ｖｔ＋（ｎ）＝ｃｈΣ　ｇｉ・　δ　（ｎ　−ｍｉ−’
ｒ　−ｄｈ）・　・　・　（７）以上説明した方法により、有声区間では、フレーム毎に
代表区間を探索して代表区間のマルチパルスの振幅と位
置、他のピッチ区間の振幅１位相補正係数ｃｋ、ｄｋを
音源情報として伝送し、さらに補助情報として合成フィ
ルタのスペクトルパラメータ、ピッチパラメータを伝送
することにより、従来方式の問題点を解決し４．８ｋｂ
／ｓ程度でも良好な音質を提供できる。vt+(n)=chΣ gi・δ (n −mi−'
r - dh) ・・・ (7) In the voiced section, by the method explained above, a representative section is searched for each frame, and the amplitude and position of the multipulse in the representative section, and the amplitude 1 phase correction coefficient ck of other pitch sections are determined. , dk as sound source information, and further transmits the spectral parameters and pitch parameters of the synthesis filter as auxiliary information, solving the problems of the conventional method.
/s can provide good sound quality.

一方、無声区間では音源をマルチパルスと雑音。On the other hand, in the silent section, the sound source is multipulse and noise.

の組合せで表している。この具体的な構成については、
前記特願昭６０−１７８９１１号明細書等を参照できる
。It is expressed as a combination of For this specific configuration,
Reference may be made to the specification of Japanese Patent Application No. 178911/1984.

更に、第１図の送信側及び受信側の各要素の動作を含め
つつその符号化処理、復号化処理の内容を具体的に説明
する。Furthermore, the contents of the encoding process and decoding process will be specifically explained, including the operations of each element on the transmitting side and the receiving side in FIG.

第１図において、送信側では、入力端子１００から音声
信号を入力し、１フレ一ム分の音声信号をパンツアメモ
リ１１０に格納する。ピッチ分析回路１３０は、フレー
ムの音声信号から平均ピッチ周期Ｔを計算する。この方
法としては、例えば自己相関法に基づく方法が知られて
おり、詳細は前記の各特許出願のピッチ抽出回路を参照
することができる。また、この方法以外にも他の周知な
方法（例えば、ケプストラム法、５ＩＦＴ法、変相閣法
など）を用いることができる。ピッチ符号化回路１５０
は、平均ピッチ周期Ｔを予め定められたビット数で量子
化して得た符号をマルチプレクサ２６０へ出力するとと
もに、これを復号化して得た平均ピッチ周期Ｔ′を音源
信号計算回路２２０、補間回路２８２、駆動信号復元回
路２８３へ出力する。In FIG. 1, on the transmitting side, an audio signal is input from an input terminal 100, and the audio signal for one frame is stored in a panzer memory 110. Pitch analysis circuit 130 calculates an average pitch period T from the audio signal of the frame. As this method, for example, a method based on an autocorrelation method is known, and for details, the pitch extraction circuit of each of the above-mentioned patent applications can be referred to. In addition to this method, other well-known methods (for example, cepstrum method, 5IFT method, phase change cabinet method, etc.) can be used. Pitch encoding circuit 150
outputs the code obtained by quantizing the average pitch period T with a predetermined number of bits to the multiplexer 260, and also outputs the code obtained by decoding the average pitch period T' to the sound source signal calculation circuit 220 and the interpolation circuit 282. , and output to the drive signal restoration circuit 283.

Ｋパラメータ計算回路１４０は、フレームの音声信号の
スペクトル特性を表すパラメータとして、Ｋパラメータ
を前記フレームの音声信号から周知のＬＰＧ分析を行い
予め定められた次数Ｍだけ計算する。この具体的な方法
については、前記各特許出願のにパラメータ計算回路を
参照することができる。なお、ＫパラメータはＰＡＲＣ
ＯＲ係数同一のものである。Ｋパラメータ符号化回路１
６０は、前記にパラメータを予め定められた量子化ビッ
ト数で量子化して得た符号１ｋをマルチプレクサ２６０
へ出力するとともに、これを復号化してさらに線形予測
係数ａｉ′　（ｉ＝１〜Ｍ）に変換して重み付け回路２
００、補間回路２８２、インパルス応答計算回路１７０
へ出力する。Ｋパラメータの符号化、Ｋパラメータから
線形予測係数への変換の方法については前記各特許出願
明細書を参照することができる。The K parameter calculation circuit 140 calculates K parameters of a predetermined order M by performing well-known LPG analysis from the frame audio signal as a parameter representing the spectral characteristics of the frame audio signal. Regarding this specific method, reference can be made to the parameter calculation circuit in each of the above-mentioned patent applications. In addition, the K parameter is PARC
The OR coefficients are the same. K parameter encoding circuit 1
60 is a multiplexer 260 which receives the code 1k obtained by quantizing the parameters with a predetermined number of quantization bits.
At the same time, it is decoded and further converted into linear prediction coefficients ai' (i=1 to M), and then sent to the weighting circuit 2.
00, interpolation circuit 282, impulse response calculation circuit 170
Output to. For the method of encoding K parameters and converting K parameters into linear prediction coefficients, reference can be made to the specifications of each of the above-mentioned patent applications.

インパルス応答計算回路１７０は、前記線形予測係数を
用いて、聴感重み付けを行った合成フィルタのインパル
ス応答ｈｗ（ｎ）を計算し、これを自己相関関数計算回
路１８０へ出力する。自己相関関数計算回路１８０は、
前記インパルス応答の自己相関関数Ｒｈｈ（ｎ）を予め
定められた遅れ時間まで計算して出力する。インパルス
応答計算回路１７０、自己相関関数計算回路１８０の動
作は前記各特許出願明細書を参照することができる。The impulse response calculation circuit 170 uses the linear prediction coefficients to calculate the impulse response hw(n) of the perceptually weighted synthesis filter, and outputs it to the autocorrelation function calculation circuit 180. The autocorrelation function calculation circuit 180 is
The autocorrelation function Rhh(n) of the impulse response is calculated and outputted up to a predetermined delay time. For the operations of the impulse response calculation circuit 170 and the autocorrelation function calculation circuit 180, reference can be made to the specifications of each of the above-mentioned patent applications.

減算器１９０は、フレームの音声信号ｘ　（ｎ）から合
成フィルタ２８１の出力を１フレーム分減算し、減算結
果を重み付は回路２００へ出力する。重み付は回路２０
０は前記減算結果をインパルス応答がｗ　（ｎ）で表さ
れる聴感重み付はフィルタに通し、重み付は信号ｘ、（
ｎ）を得てこれを出力する。The subtracter 190 subtracts the output of the synthesis filter 281 by one frame from the frame audio signal x (n), and outputs the subtraction result to the weighting circuit 200 . Weighting is circuit 20
0 passes the above subtraction result through an auditory weighted filter whose impulse response is represented by w (n), and the weighted signal x, (
n) and output it.

重み付けの方法は前記各特許出願を参照できる。For the weighting method, reference can be made to each of the above-mentioned patent applications.

相互相関関数計算回路２１０は、重み付は信号ｘ、、（
ｎ）とインパルス応答り。（ｎ）を入力して相互相関関
数Φｘｈを予め定められた遅れ時間まで計算し出力する
。この計算法は前記各特許出願明細書を参照できる。The cross-correlation function calculation circuit 210 calculates the weighted signals x, (
n) and impulse response. (n) is input, the cross-correlation function Φxh is calculated and outputted up to a predetermined delay time. For this calculation method, reference can be made to the specifications of each of the above-mentioned patent applications.

音源信号計算回路２２０は、ピッチゲインＰｇを予め定
められた゛しきい値Ｔ、と比較して有声、無声の判別を
行う。すなわち、Ｐ、＞Ｔゎのときは有声、ｐ、＜’ｒ
、のときは無声と判別する。次に有声区間では、前記原
理の項で説明したように、復号化した平均ピッチ周期Ｔ
′を用いてフレームを予めピッチ周期毎のサブフレーム
に分割し、音源信号として、代表的な１ピッチ区間（代
表区間）の候補となるピッチ区間に対してマルチパルス
の位置と振幅を求める。The sound source signal calculation circuit 220 compares the pitch gain Pg with a predetermined threshold T to determine voiced or unvoiced. That is, when P, > Tゎ, it is voiced, p, <'r
, it is determined that there is no voice. Next, in the voiced section, as explained in the principle section above, the decoded average pitch period T
' is used to divide the frame into subframes for each pitch period in advance, and as a sound source signal, the position and amplitude of multipulses are determined for a pitch section that is a candidate for a typical one pitch section (representative section).

次に振幅・位相補正係数計算回路２７０は、前記（３）
　、　（４ａ）　、　（４ｂ）式に従い、他のピッチ区
間ｋにおける音源信号発生のためのマルチパルスの振幅
補正係数Ｃｋ、位相補正係数ｄｋを計算する。さらに、
これらの値を音源信号計算回路２２０へ出力し、音源信
号計算回路２２０では前記（ＩＬ　（５）、　（６１式
に基づき、いくつかの候補区間についてフレーム全体の
誤差電力Ｅを計算し、Ｅを最も小さくするピッチ区間を
代表区間として選択し、代表区間のサブフレーム番号を
示す情報Ｐ４、代表区間のマルチパルスの振幅ｇ正１位
置ｍｕ　（ｉ＝１−１ｊ　、及び他のピッチ区間の振幅
補正係数ＣＩ＋＋位相補正係数ｄｋを出力する。Next, the amplitude/phase correction coefficient calculation circuit 270 performs the above (3).
, (4a), (4b), calculate the multi-pulse amplitude correction coefficient Ck and phase correction coefficient dk for generating the sound source signal in another pitch section k. moreover,
These values are output to the sound source signal calculation circuit 220, and the sound source signal calculation circuit 220 calculates the error power E of the entire frame for several candidate sections based on the above-mentioned formulas (IL (5), (61), and The pitch section to be made the smallest is selected as the representative section, and the information P4 indicating the subframe number of the representative section, the amplitude g of the multipulse in the representative section positive 1 position mu (i = 1-1j, and the amplitude correction of other pitch sections Coefficient CI++ Phase correction coefficient dk is output.

一方、無声区間では、予め定められた個数のマルチパル
スと雑音信号で音源信号を表す。複数種類の雑音信号が
予め雑音メモリ２２５に格納されており、雑音の種類を
表すインデクスとゲインを求める。これらの計算はフレ
ームを予め定められた区間長に分割したサブフレーム毎
に行う。具体的な方法は、前記特願昭６０−１７８９１
１号明細書を参照することができる。この場合、音源信
号として伝送するのは、マルチパルスの振幅１位置と雑
音信号のインデクスとゲインである。On the other hand, in the silent section, the sound source signal is represented by a predetermined number of multipulses and a noise signal. A plurality of types of noise signals are stored in the noise memory 225 in advance, and the index and gain representing the type of noise are determined. These calculations are performed for each subframe obtained by dividing the frame into predetermined section lengths. The specific method is described in the above-mentioned Japanese Patent Application No. 17891/1989.
Reference may be made to Specification No. 1. In this case, what is transmitted as the sound source signal is the amplitude 1 position of the multipulse and the index and gain of the noise signal.

符号化回路２３０は、代表区間のマルチパルスの振幅ｇ
ｉ、位置ｍ、を予め定められたビット数で符号化して出
力する。また、代表区間のサブフレームを示す情報Ｐｌ
、振幅補正係数ｃｋ、位相補正係数ｄｋを予め定められ
たビット数で符号化してマルチプレクサ２６０へ出力す
る。さらに、これらを復号化して駆動信号復元回路２８
３へ出力する。The encoding circuit 230 encodes the amplitude g of the multipulse in the representative section.
i, position m, is encoded with a predetermined number of bits and output. In addition, information Pl indicating the subframe of the representative section
, amplitude correction coefficient ck, and phase correction coefficient dk are encoded with a predetermined number of bits and output to the multiplexer 260. Furthermore, the driving signal restoration circuit 28 decodes these signals.
Output to 3.

駆動信号復元回路２８３は、を声区間では、平均ピッチ
周期Ｔ′を用いてフレームを前記音源信号計算回路２２
０と同様な方法で分割し、代表区間のサブフレームを示
す情報Ｐ、、代表区間のマルチパルスの復号化された振
幅１位置を用いて、代表区間にはマルチパルスを発生し
、代表区間以外のピッチ区間では、前記代表区間のマル
チパルスと復号化された振幅補正係数、復号化された位
相補正係数を用いて、前記（７）式に従い音源信号■。In the voice section, the drive signal restoration circuit 283 converts the frame into the sound source signal calculation circuit 22 using the average pitch period T'.
0, and using the information P indicating the subframe of the representative section, the decoded amplitude 1 position of the multipulse in the representative section, a multipulse is generated in the representative section, and the subframes other than the representative section are generated. In the pitch section, the sound source signal ■ is generated according to the equation (7) using the multipulse of the representative section, the decoded amplitude correction coefficient, and the decoded phase correction coefficient.

（ｎ）を復元する。(n) is restored.

一方、無声区間では、マルチパルスを発生させ、さらに
雑音信号のインデクスを用いて雑音メモリ２２５から雑
音信号をアクセスしてそれにゲインを乗じて音源信号を
復元する。無声区間での音源信号の復元法の詳細は前記
特願昭６０−１７８９１１号明細書を参照することがで
きる。On the other hand, in the silent section, multi-pulses are generated, and the noise signal is accessed from the noise memory 225 using the index of the noise signal and multiplied by a gain to restore the sound source signal. For details of the method for restoring the sound source signal in the silent section, reference may be made to the specification of Japanese Patent Application No. 178911/1983.

補間回路２８２は、有声区間では、線形予測係数を一旦
にパラメータに変換してにパラメータ上でピッチ周期Ｔ
′のサブフレーム区間毎に補間し、線形予測係数に逆交
換し出力する。無声区間では補間は行わない。In the voiced section, the interpolation circuit 282 converts the linear prediction coefficients into parameters at once and calculates the pitch period T on the parameters.
' is interpolated for each subframe section, and the linear prediction coefficients are inversely exchanged and output. Interpolation is not performed in silent sections.

合成フィルタ２８１は、前記復元された音源信号を入力
し、前記線形予測係数を入力して１フレ一ム分の合成音
声信号を求めるとともに、次のフレームへの影響信号を
１フレーム分計算しこれを減算器１９０へ出力する。な
お、影響信号の計算法は特願昭５７−２３１６０５号明
細書等を参照できる。The synthesis filter 281 inputs the restored sound source signal, inputs the linear prediction coefficient, and obtains a synthesized speech signal for one frame, and calculates an influence signal for the next frame for one frame. is output to the subtracter 190. For the calculation method of the influence signal, reference may be made to the specification of Japanese Patent Application No. 57-231605.

マルチプレクサ２６０は、音源信号を表す符号、有声・
無声を表す符号、有声区間では代表区間のサブフレーム
を表す符号、平均ピッチ周期の符号、Ｋパラメータを表
す符号を組み合わせて出力する。The multiplexer 260 converts the code representing the sound source signal, voiced and
A code representing unvoiced, a code representing a subframe of the representative period in a voiced section, a code representing the average pitch period, and a code representing the K parameter are combined and output.

以上が本実施例の送信側の動作についての説明である。The above is a description of the operation on the transmitting side of this embodiment.

このように、入力した離散的な音声信号からフレーム毎
にスペクトル包絡を表すスペクトルパラメータとピッチ
を表すピッチパラメータを抽出し符号化するパラメータ
のための計算回路と、前記フレーム区間を前記ピッチパ
ラメータに応じたピッチ区間に分割し前記フレーム毎の
音声信号の音源信号として前記ピッチ区間の内の１つの
ピッチ区間のマルチパルスと他のピッチ区間において前
記マルチパルスに関して振幅あるいは位相の少なくとも
一方を補正するための補正情報もしくは雑音とパルス列
との組合せを求めて符号化する音源信号のための計算回
路と、前記パラメータのための計算回路の出力符号と前
記音源信号のための計算回路の出力符号とを組み合わせ
て出力するマルチプレクサ回路とを有する構成の音声符
号化装置によって、本発明に係る送信側での音声符号化
処理を実現することができる。In this way, a parameter calculation circuit extracts and encodes a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch from an input discrete audio signal for each frame, and a calculation circuit for extracting and encoding the spectral parameter representing the spectral envelope and the pitch parameter representing the pitch from the input discrete audio signal, and for correcting at least one of amplitude or phase with respect to the multipulse in one pitch section of the pitch sections and the multipulse in another pitch section as a sound source signal of the audio signal for each frame. a calculation circuit for an excitation signal that obtains and encodes a combination of correction information or noise and a pulse train; an output code of the calculation circuit for the parameter; and an output code of the calculation circuit for the excitation signal; The audio encoding process on the transmitting side according to the present invention can be realized by the audio encoding device configured to include an output multiplexer circuit.

一方、これに対する受信側での音声復号化処理は、スペ
クトルパラメータを表す符号とピッチパラメータを表す
符号と音源信号を表す符号とをデマルチプレクサで分離
して復号化するための回路と、フレームを前記復号した
ピッチパラメータに応じたピッチ区間に分割し１つのピ
ッチ区間についてマルチパルスを発生し他のピッチ区間
において前記マルチパルスに関して振幅あるいは位相の
少なくとも一方を補正する補正情報を用いてパルスを発
生させて前記フレームの駆動音源信号を復元するか、も
しくは雑音とパルス列の組合せを用いて前記フレームの
駆動音源信号を復元する駆動信号復元回路と、前記駆動
音源と前記復号したスペクトルパラメータとを用いて合
成音声を求め出力する合成フィルタとを有する構成の音
声復号化装置によって実現することができる。On the other hand, audio decoding processing on the receiving side requires a circuit to separate and decode the code representing the spectrum parameter, the code representing the pitch parameter, and the code representing the sound source signal using a demultiplexer, and Divide into pitch sections according to the decoded pitch parameter, generate multi-pulses for one pitch section, and generate pulses in other pitch sections using correction information for correcting at least one of the amplitude or phase of the multi-pulses. a driving signal restoration circuit that restores the driving sound source signal of the frame or using a combination of noise and a pulse train, and synthesizes speech using the driving sound source and the decoded spectral parameters. This can be realized by a speech decoding device having a configuration including a synthesis filter that determines and outputs the following.

すなわち第１図の場合は、受信側では、まず、デマルチ
プレクサ２９０は前記組み合わされた符号を入力し、音
源信号を表す符号、有声・無声を表す符号、有声区間で
は代表区間のサブフレームを表す符号、平均ピッチ周期
の符号、Ｋパラメータを表す符号を分離して出力する。In other words, in the case of FIG. 1, on the receiving side, the demultiplexer 290 first inputs the combined codes, and demultiplexes the code representing the sound source signal, the code representing voiced/unvoiced, and the subframe of the representative period in the voiced period. The code, the code of the average pitch period, and the code representing the K parameter are separated and output.

音源復号回路３００は音源信号を表す符号を復号して駆
動信号復元回路３４０へ出力する。ピッチ復号回路３２
０は平均ピッチ周期を復号して駆動信号復元回路３４０
と補間回路３５５へ出力する。復号回路３１５は、振幅
補正係数１位相補正係数を表す符号を入力しこれらを復
号して出力する。また代表区間のサブフレームを表す符
号を復号して出力する。The sound source decoding circuit 300 decodes the code representing the sound source signal and outputs it to the drive signal restoration circuit 340. Pitch decoding circuit 32
0 is the drive signal restoration circuit 340 that decodes the average pitch period.
is output to the interpolation circuit 355. The decoding circuit 315 inputs codes representing an amplitude correction coefficient and a phase correction coefficient, decodes them, and outputs them. It also decodes and outputs the code representing the subframe of the representative section.

Ｋパラメータ復号回路３３０はにパラメータを表す符号
を復号して補間回路３３５へ出力する。The K parameter decoding circuit 330 decodes the code representing the parameter and outputs it to the interpolation circuit 335.

駆動信号復元回路３４０は、復号化した音源情報の他に
、有声・無声情報、有声の場合は復号化した平均ピッチ
周期、復号化した振幅補正係数、復号化した位相補正係
数、復号化した代表区間のサブフレーム位置を入力し、
送信側の駆動信号復元回路２８３と同一の動作を行い、
１フレームの駆動音源信号を復元して出力する。また、
雑音メモリ３１０は送信側の雑音メモリ２２５と同一の
構成となっている。In addition to the decoded sound source information, the drive signal restoration circuit 340 also collects voiced/unvoiced information, in the case of voiced information, the decoded average pitch period, the decoded amplitude correction coefficient, the decoded phase correction coefficient, and the decoded representative. Enter the subframe position of the section,
Performs the same operation as the drive signal restoration circuit 283 on the transmission side,
The drive sound source signal of one frame is restored and output. Also,
The noise memory 310 has the same configuration as the noise memory 225 on the transmitting side.

補間回路３５５は、送信側の補間回路２８２と同一の動
作を行い、有声区間ではにパラメータを復号した平均ピ
ッチ周期毎に直線補間し、さらにこれを線形予測係数に
変換して出力する。The interpolation circuit 355 performs the same operation as the interpolation circuit 282 on the transmitting side, and performs linear interpolation for each decoded average pitch period of the parameter in the voiced section, and further converts this into a linear prediction coefficient and outputs it.

合成フィルタ回路３５０は、復元したフレームの駆動音
源信号と線形予測係数を入力して１フレ一ム分の合成音
声ｘ　（ｎ）を計算して端子３６０を通して出力する。The synthesis filter circuit 350 inputs the restored excitation signal of the frame and the linear prediction coefficient, calculates one frame's worth of synthesized speech x (n), and outputs the synthesized speech x (n) through the terminal 360 .

ここで合成フィルタの動作は、前記特願昭５７−２３１
６０５号明細書に開示の合成フィルタを参照できる。Here, the operation of the synthesis filter is described in the above-mentioned patent application No. 57-231.
Reference may be made to the synthesis filter disclosed in the '605 specification.

以上で本実施例の受信側の説明を終える。This completes the explanation of the receiving side of this embodiment.

上述した実施例はあくまで本発明の一構成に過ぎずその
変形例も種々考えられる。The embodiment described above is merely one configuration of the present invention, and various modifications thereof are possible.

例えば、前記実施例では、有声区間以外では音源信号を
少数のマルチパルスと雑音信号で表したが、これは周知
の確率符号化（Ｓｔｏｃｈａｓｔｉｃ　ｃｏｄｉｎｇ）
の方法により表すこともできる。この方法の詳細につい
ては、例えば５ｃｈｒｏｅｄｅｒ、へｔａ１氏による“
Ｃｏｄｅ−ｅｘｃｉｔｅｄｌｉｎｅａｒ　ｐｒｅｄｉｃ
ｔｉｏｎ　（ＣＥＬＰ）：　Ｉｌｉｇｈｑｕａｌｉｔｙ
　５ｐｅｅｃｈ　ａｔ　ｖｅｒｙ　ｌｏｗ　ｂｉｔ　ｒ
ａｔｅｓ、　　　（ＩＣ＾ＳＳＰ、　９３７−９４０．
　１９８５）　　（文献２）等を参照できる。さらに、
雑音メモリ２２５．３１０に格納されている雑音信号の
求め方としては、予め定められた確率密度特性（例えば
ガウス分布など）を有する白色雑音信号を格納しておい
てもよいし、予め多量の音声信号を予測して求めた予測
残差信号から学習により計算しておいてもよい。後者の
方法については、例えば、Ｍａｋｈｏｕ　１氏らによる
“ＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ　ｉｎ　　５
ｐｅｅｃｈ　Ｃｏｄｉｎｇ、　　　（Ｐｒｏｃ、　ＩＥ
ＥＥ、　ｖｏｌ、７３．１１．１５５１−１５８８．１
９８５）　（文献３）等を参照できる。For example, in the above embodiment, the sound source signal is represented by a small number of multipulses and a noise signal outside the voiced section, but this is done using well-known stochastic coding.
It can also be expressed using the following method. For details on this method, see for example “5chroeder, heta1”
Code-excitedlinear predic
tion (CELP): Lightquality
5peech at very low bit r
ates, (IC^SSP, 937-940.
1985) (Reference 2). moreover,
The noise signal stored in the noise memory 225.310 can be obtained by storing a white noise signal having a predetermined probability density characteristic (for example, Gaussian distribution, etc.), or by storing a large amount of audio in advance. It may be calculated by learning from a prediction residual signal obtained by predicting a signal. Regarding the latter method, see, for example, "Vector Quantization in 5" by Makhou et al.
peach Coding, (Proc, IE
EE, vol, 73.11.1551-1588.1
985) (Reference 3) etc.

また、実施例では、フレームの音声信号を有声区間と無
声区間の２種に分類して異なる音源信号を用いたが、こ
の分類数を増やしてもよい。例えば、音声学的な知見を
利用して、母音、鼻音、摩擦音、破裂音等に分類してそ
れぞれ異なる音源信号を用いてもよい。Further, in the embodiment, the audio signal of a frame is classified into two types, voiced section and unvoiced section, and different sound source signals are used, but the number of classifications may be increased. For example, using phonetic knowledge, different sound source signals may be used for classifying sounds into vowels, nasals, fricatives, plosives, etc.

また、実施例では、スペクトルパラメータとしてにパラ
メータを符号化し、その分析法としてＬＰＧ分析を用い
たが、スペクトルパラメータとしては他の周知なパラメ
ータ、例えばＬＰＳ　、ケプストラム、改良ケプストラ
ム、−Ｃ化ケプストラム、メルケプストラムなどを用い
ることもできる。また、各パラメータに最適な分析法を
用いることができる。また、補間回路２８２．　ｊ３ｓ
における補間すべきパラメータ及びその補間法について
は、他の周知な方法を用いることができる。具体的な補
間法は、例えばへｔａ１氏らによる“５ｐｅｅｃｈ　Ａ
ｎａｌｙｓｉｓ　ａｎｄ　５ｙｎｔｈｅｓｉｓ　ｂｙ　
Ｌｉｎｅａｒ　Ｐｒｅｄｉｃｔｉｏｎ　ｏｆ　５ｐｅｅ
ｃｈ　Ｗａｖｅと題した論文（Ｊ、Ａｃｏｕｓｔ、　Ｓ
ｏｃ、八ｍ、、　ｐｐ、　６３７−６５５１９７１）　
　（文献４）等を参照できる。In addition, in the examples, the parameters were encoded as spectral parameters and LPG analysis was used as the analysis method, but other well-known parameters such as LPS, cepstrum, improved cepstrum, -C cepstrum, mel Cepstrum etc. can also be used. Moreover, the optimal analysis method for each parameter can be used. Also, the interpolation circuit 282. j3s
As for the parameters to be interpolated and the interpolation method thereof, other well-known methods can be used. A specific interpolation method is, for example, “5peech A” by Heta1 et al.
analysis and 5 synthesis by
Linear Prediction of 5pee
A paper entitled ch Wave (J, Acoust, S
oc, 8m,, pp, 637-6551971)
(Reference 4) etc. can be referred to.

さらに、有声区間では、代表区間以外のピッチ区間では
、振幅補正係数ｃｋと位相補正係数ｄｋを求めて伝送し
たが、復号化した平均ピッチ周期Ｔ′をピッチ区間毎に
補間することにより位相補正係数を伝送しない構成とす
ることもできる。また振幅補正係数は、ピッチ区間毎に
伝送するのではなくてピッチ区間毎に求めた振幅補正係
数の値を最小２乗曲線あるいは最小２乗直線で近似して
、前記曲線あるいは直線の係数を伝送するような構成に
してもよい。これらにより補正情報の伝送のための情報
量を低減することができる。Furthermore, in the voiced section, in pitch sections other than the representative section, the amplitude correction coefficient ck and the phase correction coefficient dk are determined and transmitted, but the phase correction coefficient ck is calculated by interpolating the decoded average pitch period T' for each pitch section. It is also possible to have a configuration in which the information is not transmitted. In addition, the amplitude correction coefficient is not transmitted for each pitch section, but by approximating the value of the amplitude correction coefficient determined for each pitch section by a least squares curve or a least squares straight line, and then transmitting the coefficient of the curve or straight line. It is also possible to configure such a configuration. These can reduce the amount of information for transmission of correction information.

また、サブフレーム分割法としては、第２図（ｂ）のよ
うにフレーム左端からピッチ周期Ｔ毎に分割したが、前
フレームから連続的に分割する方法や、前記特願昭５９
−２７２４３５号、特願昭６０−１７８９１１号に開示
のような分割法を用いることもできる。In addition, as a subframe division method, as shown in FIG.
It is also possible to use a division method as disclosed in Japanese Patent Application No. 178911/1983.

また、演算量を大幅に低減するために、有声区間では、
代表区間をフレーム内の予め定められた区間に固定しく
例えば、フレームのほぼ中央のピッチ区間や、フレーム
内でパワーの最も大きいピッチ区間など）、代表区間の
探索をしない構成としてもよい。この場合は、各候補区
間に対する（５）。In addition, in order to significantly reduce the amount of calculation, in voiced sections,
The representative section may be fixed to a predetermined section within the frame (for example, the pitch section approximately in the center of the frame, the pitch section with the largest power within the frame, etc.), or the representative section may not be searched. In this case, (5) for each candidate section.

（６）式の計算が不要となり、大幅な演算量低減が可能
となるが音質は低下する。The calculation of equation (6) becomes unnecessary, and the amount of calculation can be significantly reduced, but the sound quality deteriorates.

また、さらに演算量を低減するために、送信側では影響
信号の計算を省略することもできる。これによって、送
信側における駆動信号復元回路２８３、補間回路２８２
、合成フィルタ２８１、減算器１９０は不要となり演算
量低減が可能となるが、やはり音質は低下する。Furthermore, in order to further reduce the amount of calculation, calculation of the influence signal can be omitted on the transmitting side. As a result, the drive signal restoration circuit 283 and the interpolation circuit 282 on the transmitting side
, the synthesis filter 281, and the subtracter 190 are no longer necessary, making it possible to reduce the amount of calculations, but the sound quality still deteriorates.

また、受信側で合成フィルタ回路３５０の後段に、量子
化雑音を整形することにより聴覚的に聞き易くするため
に、ピッチとスペクトル包絡の少なくとも１つについて
動作する適応形ポストフィルタを付加してもよい。適応
形ポストフィルタの構成については、例えば、Ｋｒｏｏ
ｎ氏らによるＡ　Ｃ１ａｓｓｏｆ　Ａｎａｌｙｓｉｓ−
ｂｙ−ｓｙｎｔｈｅｓｉｓ　Ｐｒｅｄｉｃｔｉｖｅ　Ｃ
ｏｄｅｒｓｆｏｒ　Ｈｉｇｈ　Ｑｕａｌｉｔｙ　ａｔ　
Ｒａｔｅｓ　ｂｅｔｗｅｅｎ　４．８　ａｎｄ　１６ｋ
ｂ／ｓ、　　　（ＩＥＥＥＪＳＡＣ，ｖｏｌ、６．２．
３５３−３６３．１９８８）（文献５）等を参照できる
。Furthermore, an adaptive post filter that operates on at least one of the pitch and the spectral envelope may be added to the receiving side after the synthesis filter circuit 350 in order to make the quantization noise more audible by shaping it. good. Regarding the configuration of the adaptive postfilter, see, for example, Kroo
A C1assof Analysis by Mr. n et al.
by-synthesis Predictive C
orders for High Quality at
Rates between 4.8 and 16k
b/s, (IEEEJSAC, vol, 6.2.
353-363.1988) (Reference 5).

なお、デジタル信号処理の分野でよく知られているよう
に、自己相関関数は周波数軸上でパワースペクトルに、
相互相関関数はクロスパワースペクトルに対応している
ので、これらから計算することもできる。これらの計算
法については、Ｏｐｐｅｎｈｅｉｍ氏らによるＤｉｇｉ
ｔａｌ　Ｓｉｇｎａｌ　Ｐｒｏｃｅｓｓｉｎｇ（Ｐｒｅ
ｎｔｉｃｅ−Ｈａｌｌ、　１９７５）と題した単行本（
文献６）を参照できる。As is well known in the field of digital signal processing, the autocorrelation function is expressed as a power spectrum on the frequency axis.
Since the cross-correlation function corresponds to the cross-power spectrum, it can also be calculated from these. Regarding these calculation methods, please refer to Digi by Oppenheim et al.
tal Signal Processing (Pre
ntice-Hall, 1975).
Reference 6) can be referred to.

〔Effect of the invention〕

以上説明したように、本発明によれば、有声区間の音源
信号を、フレームをピッチ周期に分割して１つのピッチ
区間（代表区間）のマルチパルスと他のピッチ区間では
補正情報、特に振幅補正係数、位相補正係数を用いて表
すことができるので、母音定常区間はもちろんのこと、
音韻知覚や自然性の知覚に重要な音声の特性が変化して
いる部分（有声の過渡部や母音間の変化部分）でも音質
の劣化のほとんどない合成音声を得ることができるとい
う大きな効゛果がある。さらに有声区間以外では、雑音
とマルチパルスの組合せにより音源信号を表すことがで
きるので、種々の子音に対して良好な合成音声を得るこ
とができるという大きな効果がある。As explained above, according to the present invention, the voice source signal of the voiced section is divided into frames into pitch periods, and the multi-pulse of one pitch section (representative section) is combined with correction information, especially amplitude correction, for the other pitch sections. Since it can be expressed using coefficients and phase correction coefficients, it is possible to express not only vowel stationary intervals, but also
The great effect is that it is possible to obtain synthesized speech with almost no deterioration in sound quality even in parts where the characteristics of speech that are important for phonological perception and naturalness perception change (voiced transition parts and changes between vowels). There is. Furthermore, since the sound source signal can be represented by a combination of noise and multipulses outside the voiced section, there is a great effect that good synthesized speech can be obtained for various consonants.

また、このような音質の良好な符号化復号化方法に適し
た音声符号装置、音声復号化装置を提供することができ
る。Furthermore, it is possible to provide a speech encoding device and a speech decoding device suitable for such an encoding/decoding method with good sound quality.

[Brief explanation of the drawing]

第１図は本発明の音声符号化復号化方法及び音声符号化
装置並びに音声復号化装置の一実施例を示すブロック図
、第２図は本発明の説明に供する有声フレームでの代表区
間と代表区間のマルチパルス及び振幅補正係数、位相補
正係数を示す説明図である。１１０　　・・・・・バッファメモリ１３０　　・・・・・ピッチ分析回路１４０　　・・・・・Ｋパラメータ計算回路１５０　　
・・・・・ピッチ符号化回路１６０　　・・・・・Ｋパ
ラメータ符号化回路１７０　　・・・・・インパルス応
答計算回路１８０　　・・・・・自己相関関数計算回路
１９０　　・・・・・減算器２００　　・・・・・重み付は回路２１０　　・・・・・相互相関関数計算回路２２０　　
・・・・・音源信号計算回路２２５、３１０・・　・雑
音メモリ２３０　　・・・・・符号化回路２６０　　・・・・・マルチプレクサ２７０　　・・・・・振幅・位相補正係数計算回路２８
１、３５０・・・合成フィルタ２８２、３３５・・・補間回路２８３、３４０・・・駆動信号復元回路デマルチプレク
サ音源復号回路復号回路ピッチ復号回路にパラメータ復号回路FIG. 1 is a block diagram showing an embodiment of the speech encoding/decoding method, speech encoding device, and speech decoding device of the present invention. FIG. 2 is a representative section and a representative section in a voiced frame used for explaining the present invention. FIG. 3 is an explanatory diagram showing multi-pulses, amplitude correction coefficients, and phase correction coefficients in a section. 110 ...Buffer memory 130 ...Pitch analysis circuit 140 ...K parameter calculation circuit 150
...Pitch encoding circuit 160 ...K parameter coding circuit 170 ...Impulse response calculation circuit 180 ...Autocorrelation function calculation circuit 190 ...Subtractor 200 ... Weighting circuit 210 ... Cross correlation function calculation circuit 220
... Sound source signal calculation circuit 225, 310 ... Noise memory 230 ... Encoding circuit 260 ... Multiplexer 270 ... Amplitude/phase correction coefficient calculation circuit 28
1, 350... Synthesis filter 282, 335... Interpolation circuit 283, 340... Drive signal restoration circuit Demultiplexer Sound source decoding circuit Decoding circuit Pitch decoding circuit and parameter decoding circuit

Claims

[Claims]

(1) On the transmitting side, a discrete audio signal is input, a spectral parameter representing the spectral envelope and a pitch parameter representing the pitch are extracted from the audio signal for each frame, and the frame section is determined according to the pitch information. dividing the audio signal into pitch sections, and representing the sound source signal of the audio signal as a combination of a multipulse of one pitch section among the pitch sections and correction information regarding the multipulse or noise and a pulse train; restoring the driving sound source signal of the frame using the multi-pulse in the pitch section and correction information regarding the multi-pulse or a combination of the noise and pulse train and the pitch parameter, and obtaining a synthesized speech signal using the spectral parameter. A speech encoding/decoding method characterized by:

(2) a parameter calculation means for extracting and encoding a spectral parameter representing a spectral envelope and a pitch parameter representing a pitch from the input discrete audio signal for each frame; and dividing the frame interval into pitch intervals according to the pitch parameter. and correction information or noise for correcting at least one of the amplitude or phase of the multipulse in one of the pitch sections and the multipulse in another pitch section as the sound source signal of the audio signal for each frame section. and a multiplexer for outputting a combination of the output code of the parameter calculation means and the output code of the sound source signal calculation means. conversion device.

(3) means for separately decoding a code representing a spectral parameter, a code representing a pitch parameter, and a code representing a sound source signal, and dividing a frame into pitch sections according to the decoded pitch parameter to obtain one pitch. A multi-pulse is generated for one pitch interval, and a pulse is generated using correction information for correcting at least one of the amplitude and the phase of the multi-pulse in another pitch interval to restore the driving sound source signal of the frame, or the noise and pulse train are generated. a driving signal restoring means for restoring the driving sound source signal of the frame using a combination of the above, and a synthesis filter for obtaining and outputting synthesized speech using the driving sound source and the decoded spectral parameters. conversion device.