JPH0377999B2

JPH0377999B2 -

Info

Publication number: JPH0377999B2
Application number: JP58195744A
Authority: JP
Inventors: Satoru Taguchi
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-10-19
Filing date: 1983-10-19
Publication date: 1991-12-12
Also published as: JPS6087400A

Description

【発明の詳細な説明】本発明はマルチパルス型音声符号復号化装置に
関する。入力音声信号を分析して、この入力音声
信号の音声情報を構成するスペクトル包絡情報と
音源情報とを分析側で抽出し、これら音声情報を
伝送路を介して合成側に送出して入力音声信号を
再生するボコーダはよく知られている。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a multi-pulse speech code decoding device. The input audio signal is analyzed, the spectral envelope information and sound source information that constitute the audio information of this input audio signal are extracted on the analysis side, and these audio information are sent to the synthesis side via the transmission path to generate the input audio signal. Vocoders that play .

上述したスペクトル包絡情報は、入力音声信号
を発生する声道系のスペクトル分布情報を表わす
もので、通常LPC分析によつて得られた分析次
数に対応する個数のLPC係数、たとえばαパラ
メータ，κパラメータ，LSPパラメータ等によつ
て表現され、また音源情報はスペクトル包絡の微
細構造を示すもので入力音声信号からスペクトル
分布情報を除いた、いわゆる残差信号として知ら
れるもので、入力音声信号の音源の強さ、ピツチ
周期および音声・無声に関する情報が含まれ、通
常これらの情報は入力音声信号の分析フレームご
との自己相関係数を介して抽出されることもよく
知られている。 The above-mentioned spectral envelope information represents the spectral distribution information of the vocal tract system that generates the input speech signal, and usually includes a number of LPC coefficients corresponding to the analysis order obtained by LPC analysis, such as α parameters and κ parameters. , LSP parameters, etc., and the sound source information indicates the fine structure of the spectral envelope, and is known as the so-called residual signal, which is obtained by removing the spectral distribution information from the input audio signal. It is also well known that information regarding intensity, pitch period, and speech/silence is included, and that this information is usually extracted via autocorrelation coefficients for each analysis frame of the input speech signal.

さて、スペクトル包絡情報はボコーダの合成側
で入力音声信号を合成する場合、通常全極型のデ
ジタルフイルタを利用して近似的声道系を形成せ
しめるLPC合成器の係数として利用され、音源
情報はこのデジタルフイルタの駆動音源として利
用され、このデジタルフイルタによつて入力音声
信号が合成される。 Now, when spectral envelope information is synthesized on the synthesis side of a vocoder, it is usually used as coefficients of an LPC synthesizer that uses an all-pole digital filter to form an approximate vocal tract system, and the sound source information is It is used as a driving sound source for this digital filter, and input audio signals are synthesized by this digital filter.

このようにして得られる従来のLPCボコーダ
は、約4Kb（キロビツト）以下の低ビツトレート
でも音声の合成が可能であり多用されているもの
の、高品質の音声合成は高ビツトレートにおいて
も困難であるという欠点を有する。この原因は音
源情報のモデル化の場合、有声音に対してはその
内容に対応するピツチ周期を抽出してこのピツチ
周期に対応する単一のインパルス列で近似的に表
現し、ランダム周期の無声音に対しては白色雑音
で近似的に表現するという単純なモデル化処理を
前提としているため、入力音声信号の音源情報を
忠実に抽出したものとならず、従つて音源情報に
含まれる入力音声信号の波形情報の分析、合成が
実施されていないことによる。 Although the conventional LPC vocoder obtained in this way is capable of synthesizing speech even at low bit rates of about 4Kb (kilobits) or less and is widely used, it has the disadvantage that high-quality speech synthesis is difficult even at high bit rates. has. The reason for this is that when modeling sound source information, for a voiced sound, the pitch period corresponding to its content is extracted and approximately represented by a single impulse train corresponding to this pitch period, while unvoiced sounds with a random period are is assumed to be a simple modeling process in which it is approximated by white noise, so it does not faithfully extract the sound source information of the input audio signal, and therefore the input audio signal contained in the sound source information This is due to the fact that analysis and synthesis of waveform information has not been carried out.

マルチパルス型音声符号復号化装置は、このよ
うな波形非伝送による問題の改善を図るため波形
伝送を行なつて入力音声信号の合成を実施する音
声符復号化装置のひとつとして近時よく知られつ
つあるものである。 A multi-pulse audio code/decoder has recently become well known as a type of audio code/decoder that performs waveform transmission and synthesizes input audio signals in order to improve the problem caused by non-transmission of waveforms. It is something that is growing.

第１図は従来のマルチパルス型ボコーダの基本
的構成を示すブロツク図である。 FIG. 1 is a block diagram showing the basic configuration of a conventional multi-pulse vocoder.

LPC合成器１は声道をシミユレートする全極
型デジタルフイルタを備え、その係数は入力端子
2001を介して入力される入力音声信号ｘ（ｎ）（ｎ
＝１，２，３……ｎ）をLPC分析器２により分
析フレームごとに分析したLPC係数が供給され
る。音源パルス発生器３は、入力音声信号の音源
情報から複数個のインパルス系列、すなわちマル
チパルスからなる駆動音源系列Ｖ（ｎ）を得て、
これをLPC合成器１の駆動音源として供給する。 The LPC synthesizer 1 is equipped with an all-pole digital filter that simulates the vocal tract, and its coefficients are input to the input terminal.
The input audio signal x(n)(n
= 1, 2, 3...n) by the LPC analyzer 2 for each analysis frame. The sound source pulse generator 3 obtains a driving sound source sequence V(n) consisting of a plurality of impulse sequences, that is, multipulses, from the sound source information of the input audio signal,
This is supplied as a driving sound source to the LPC synthesizer 1.

LPC合成器１はこうして入力するLPC係数を、
通常は全極型デジタルフイルタを利用する合成フ
イルタの係数とし、マルチパルスを駆動音源とし
て駆動される合成信号x〓（ｎ）を出力する。この
場合、マルチパルスは入力音声信号の波形情報を
含むものであり、LPC合成器１は波形情報を含
む入力音声信号の合成を行なうこととなる。 The LPC synthesizer 1 inputs the LPC coefficients in this way,
This is usually the coefficient of a synthesis filter that uses an all-pole digital filter, and outputs a synthesis signal x〓(n) driven by a multi-pulse as a driving sound source. In this case, the multi-pulse includes waveform information of the input audio signal, and the LPC synthesizer 1 synthesizes the input audio signal including the waveform information.

さて、LPC合成器１から出力する合成信号x〓
（ｎ）は次に減算器４で入力音声信号ｘ（ｎ）との
差をとり、誤差ｅ（ｎ）を得てこれを聴感重み付
け器５に送出する。 Now, the composite signal x output from LPC synthesizer 1
(n) is then subtracted from the input audio signal x(n) by a subtracter 4 to obtain an error e(n), which is sent to the auditory weighter 5.

聴感重み付け器５は、誤差ｅ（ｎ）に対して次
の(1)式に示す特性Ｗ（Ｚ）を有する重み付けフイ
ルタによつて聴感的な重み付けを付与したうえ、
これらを２乗誤差最小化器６に送出するものであ
る。 The perceptual weighting device 5 applies perceptual weighting to the error e(n) using a weighting filter having a characteristic W(Z) shown in the following equation (1), and
These are sent to the square error minimizer 6.

Ｗ（Ｚ）＝〔１−_P 〓^k=1 a_kZ^-k〕／〔１−_P 〓^k=1 a_kγ^kZ^-k〕 ……(1) (1)式においてa_kはLPC合成器１の全極型デジタ
ルフイルタの係数とすべきLPC係数、ｐはその
次数であり従つてLPC分析次数、γは重み付け
係数、Ｚは全極型デジタルフイルタのＺ変換表示
による伝達関数Ｈ（Z^-1）におけるＺ＝exp（jλ）
を示し、ここにλ＝2π△Tfであり△Ｔは分析フ
レームの標本化サンプリング周期、は周波数を
示す。 W (Z) = [1- _P 〓 ^k=1 a _k Z ^-k ] / [1- _P 〓 ^k=1 a _k γ ^k Z ^-k ] ...(1) In equation (1), a _k is LPC The LPC coefficient to be used as the coefficient of the all-pole digital filter of the synthesizer 1, p is its order and therefore the LPC analysis order, γ is the weighting coefficient, and Z is the transfer function H( Z = exp(jλ) at Z ^-1 )
, where λ=2πΔTf, ΔT is the sampling period of the analysis frame, and is the frequency.

また(1)式において重み付け係数γは、０＜γ＜
１の範囲で設定される。 In addition, in equation (1), the weighting coefficient γ is 0<γ<
It is set in the range of 1.

(1)式に示すＷ（Ｚ）はγ＝１に対しては１，γ
＝０に対してはＷ（Ｚ）＝１−ｐ（Ｚ）の範囲で変
化し、γの値は誤差ｅ（ｎ）の周波数スペクトル
におけるフオルマント領域に現われる過大なレベ
ルを抑圧する程度に対応して前述した範囲の中で
設定され、合成すべき信号の聴感的重み付けの役
割を果たすものであり、通常予め最適聴感テスト
によつてその最適値が選定される。 W(Z) shown in equation (1) is 1 for γ=1, γ
= 0, it changes in the range W(Z) = 1-p(Z), and the value of γ corresponds to the degree to which an excessive level appearing in the formant region in the frequency spectrum of error e(n) is suppressed. It is set within the above-mentioned range and plays the role of perceptual weighting of the signals to be synthesized, and its optimal value is usually selected in advance by an optimal auditory test.

このようにして重み付けされた誤差ｅ（ｎ）は、
音源パルス発生器３から出力される駆動音源系列
Ｖ（ｎ）、すなわちマルチパルスの最適時間位置と
振幅とを決定するために２乗誤差小化器６に送出
され、次の(2)式による２乗誤差εを計算し、εを
最小にするように駆動音源系列Ｖ（ｎ）が選択さ
れる。 The error e(n) weighted in this way is
In order to determine the driving sound source sequence V(n) output from the sound source pulse generator 3, that is, the optimal time position and amplitude of the multi-pulse, it is sent to the square error reducer 6, and is calculated according to the following equation (2). A squared error ε is calculated, and a driving sound source sequence V(n) is selected so as to minimize ε.

ε＝_N 〓ⁿ⁼¹ 〔ｅ（ｎ）〓ｗ（ｎ）〕² ……(2) (2)式において記号〓は聴感重み付け器５の重み
付けフイルタによるたたみ込み積分、Ｎはマルチ
パルスを計算する区間長を示す。 ε= _N 〓 ⁿ⁼¹ [e(n)〓w(n)] ² ...(2) In equation (2), the symbol 〓 is the convolution integral by the weighting filter of the auditory weighter 5, and N is the multipulse calculation Indicates the length of the interval.

上述した処理はマルチパルスのパルスごとに繰
返され、分析による合成がマルチパルスごとに行
なわれる、いわゆるAnalysis−by−Synthesis手
法（以下Ａ−ｂ−Ｓ手法と略称する）であつて、
このＡ−ｂ−Ｓ手法は上述した内容からも明らか
な如く、マルチパルス１つずつについてパルス発
生、２乗誤差計算およびパルス位置・振幅調整の
ループで行なわれるため、低ビツトレート領域に
おける有効な手段であるにもかかわらず、その演
算量が極めて膨大なものとなるという欠点があ
る。 The above-mentioned process is repeated for each multi-pulse, and synthesis by analysis is performed for each multi-pulse, which is the so-called Analysis-by-Synthesis method (hereinafter abbreviated as A-b-S method).
As is clear from the above, this A-b-S method is an effective method in the low bit rate region because it is performed in a loop of pulse generation, square error calculation, and pulse position/amplitude adjustment for each multipulse. However, the disadvantage is that the amount of calculation required is extremely large.

なお、このＡ−ｂ−Ｓ手法については、B.S.
Atal et al，“Ａ New Model of LPC Ex−
citation for Producing Natural−Sounding
Speech at Low Bit Rates”，Proc，
ICASSP82，pp614−617（1982）等に記述されて
いる。 Regarding this A-b-S method, BS
Atal et al, “A New Model of LPC Ex−
citation for Producing Natural−Sounding
Speech at Low Bit Rates”，Proc，
It is described in ICASSP82, pp614-617 (1982), etc.

このような従来のＡ−ｂ−Ｓ手法における欠点
に対して、相関演算にもとづき最適なマルチパル
スを効率的に計算する次のような演算処理アルゴ
リズムが最近紹介されている。 In order to address these shortcomings in the conventional A-b-S method, the following arithmetic processing algorithm has recently been introduced which efficiently calculates optimal multi-pulses based on correlation calculations.

すなわち、入力音声信号ｘ（ｎ）はＮサンプル
ごと処理フレームによつて区分され、このフレー
ムごとにマルチパルスが包括的に計算されるもの
である。 That is, the input audio signal x(n) is divided into processing frames every N samples, and multipulses are comprehensively calculated for each frame.

いま、１分析フレーム内に音源パルスがｋ個存
在するものとし、ｉ番目のパルスがフレーム端か
ら時間位置miにあり、かつその振幅がgiである
とすると、LPC合成フイルタの駆動音源ｄ（ｎ）
は次の(3)式で示される。 Assume that there are k sound source pulses in one analysis frame, and that the i-th pulse is at the time position mi from the frame end and its amplitude is gi, then the drive sound source d(n )
is expressed by the following equation (3).

ｄ（ｎ）＝_K 〓ⁱ⁼¹ gi・δn，mi ……(3) (3)式においてδn，miはクロネツカーのデルタ
関数であり、δn，mi＝１（ｎ＝mi），δn，mi＝０
（ｎ≒mi）である。 d(n) = _K 〓 ⁱ⁼¹ gi・δn, mi ...(3) In equation (3), δn, mi are Kronetzker's delta functions, and δn, mi=1 (n=mi), δn, mi =0
(n≒mi).

LPC合成フイルタはこの駆動音源ｄ（ｎ）によ
つて駆動され合成信号x〓（ｍ）を出力する。 The LPC synthesis filter is driven by this drive sound source d(n) and outputs a synthesis signal x〓(m).

LPC合成フイルタとして、たとえば全極型デ
ジタルフイルタを考えるものとし、その伝達関数
をインパルス応答ｋ（ｎ）（０≦ｎ≦M_-1）で表現
するものとすると、合成信号（ｎ）は次の(4)式
で表わされる。 As an LPC synthesis filter, let us consider, for example, an all-pole digital filter, and its transfer function is expressed by an impulse response k(n) (0≦n≦M _-1 ), then the synthesized signal (n) is as follows. It is expressed by equation (4).

（ｎ）＝_M-1 〓^l=0 ｄ（１）・ｈ（ｎ−１） ……(4) (4)式においてｄ(1)は駆動音源を表わす。次に入
力音声信号ｘ（ｎ）と合成信号（ｎ）との誤差
に対し聴感的な補正を施した重み付け誤差をe_w
（ｎ）とするとe_w（ｎ）は次の５式で示される。 (n)= _M-1 ^〓l=0 d(1)・h(n-1)...(4) In equation (4), d(1) represents the driving sound source. Next, the weighted error, which has been audibly corrected for the error between the input audio signal x(n) and the composite signal (n), is expressed as e _w
(n), e _w (n) is expressed by the following five equations.

e_w（ｎ）＝｛ｘ（ｎ）−（ｎ）｝〓ｗ（ｎ）……(5
) さらに２乗誤差は(5)式から誘導して次の(6)式で
示すことができる。 e _w (n)={x(n)−(n)}〓w(n)……(5
) Furthermore, the squared error can be derived from equation (5) and expressed as the following equation (6).

_M 〓ⁿ⁼¹ e² _w（ｎ）＝_M 〓ⁿ⁼¹ 〔｛ｘ（ｎ）−（ｎ）｝〓ｗ（ｎ）〕² ……(6) (6)式においてＭは誤差を最小化する区間のサン
プル数を示し、たとえば１分析フレーム長に選
ぶ。最適な音源パルス列としてのマルチパルスは
(6)式を最小化するgiを得ることによつて得られ、
このgiは上述した(3)，(4)および(6)式から次の(7)式
の如く誘導される。 _M 〓 ⁿ⁼¹ e ² _w (n)= _M 〓 ⁿ⁼¹ [{x(n)−(n)}〓w(n)] ² ...(6) In equation (6), M minimizes the error The number of samples in the section to be analyzed is selected, for example, as one analysis frame length. Multipulse as the optimal sound source pulse train is
Obtained by obtaining gi that minimizes equation (6),
This gi is derived from the above-mentioned equations (3), (4), and (6) as shown in the following equation (7).

gi（mi）＝_M 〓ⁿ⁼¹ x_w（ｎ）・h_w（ｎ−mi）−_i=1 〓ⁱ⁼¹ 〔ge_M 〓^M=1 hw（ｎ−me）・hw（ｎ−mi）〕／_M 〓^M=1 hw（ｎ−mi）hw（ｎ
−mi） ……(7) (7)式においてxw（ｎ）はｘ（ｎ）〓ｗ（ｎ）、hw
（ｎ）はｈ（ｎ）〓ｗ（ｎ）を示す。(7)式の右辺の
分子の第１項はxw（ｎ）とhw（ｎ）との時間遅れ
miの相互相関関数hx（mi）を示すものであり、
また、第２項の_P 〓^k=1 hw（ｎ−me）・hw（ｎ−mi）
はhw（ｎ）の共分散関数hh（me，mi）（１≦
me，mi≦Ｍ）を示す。共分散関数hh（me，mi）
は自己相関関数Rhh（｜me−mi｜）と等しくな
り、従つて(7)式は次の(8)式の如く表わすことがで
きる。 gi(mi)= _M 〓 ⁿ⁼¹ x _w (n)・h _w (n−mi) − _i=1 〓 ⁱ⁼¹ [ge _M 〓 ^M=1 hw(n−me)・hw(n−mi )]/ _M 〓 ^M=1 hw(n-mi)hw(n
−mi) ...(7) In equation (7), xw(n) is x(n)〓w(n), hw
(n) indicates h(n)〓w(n). The first term in the numerator on the right side of equation (7) is the time lag between xw(n) and hw(n).
It shows the cross-correlation function hx(mi) of mi,
Also, the second term _P 〓 ^k=1 hw(n-me)・hw(n-mi)
is the covariance function hh(me, mi) of hw(n) (1≦
me, mi≦M). Covariance function hh(me, mi)
is equal to the autocorrelation function Rhh (|me−mi|), and therefore, equation (7) can be expressed as the following equation (8).

(8)式によれば、時間位置miにおいてパルスを
発生せしめると振幅gi（mi）が最適なものとして
決定しうることとなる。なお(8)式において１≦
mi≦Ｍである。 According to equation (8), if a pulse is generated at time position mi, the amplitude gi(mi) can be determined as the optimal one. Note that in equation (8), 1≦
mi≦M.

つまり、ある音源パルスに着目し、種種の時間
位置において(8)式によりその振幅を計算したう
え、その振幅の絶対値を最大とするものが(6)式に
示す２乗誤差を最小化するパルスとなり、このよ
うな手続きを繰返して複数個の音源パルスを求め
ることができる。 In other words, by focusing on a certain sound source pulse and calculating its amplitude using equation (8) at various time positions, the one that maximizes the absolute value of the amplitude minimizes the squared error shown in equation (6). A plurality of sound source pulses can be obtained by repeating this procedure.

なお、上述した計算アルゴリズムに関しては、
小沢，荒関，小野“マルチパルス型駆動形音声符
号化法の検討”、1983年３月電子通信学会通信方
式研究会に詳述されている。 Regarding the calculation algorithm mentioned above,
Ozawa, Araseki, and Ono, ``Study of multi-pulse driven speech coding method'', detailed at the Institute of Electronics and Communication Engineers communication system study group, March 1983.

このような計算アルゴリズムに基づいて行なわ
れるマルチパルスの発生によれば、相互相関関数
と自己相関関数ならびに最大値演算から最適なマ
ルチパルスの計算が可能となるため、構成が非常
に簡素化されたものとなり演算量を大幅に低減し
うるマルチパルス型音声符号復号化装置を実現す
ることができる。 Generating multipulses based on such calculation algorithms makes it possible to calculate optimal multipulses from cross-correlation functions, autocorrelation functions, and maximum value calculations, which greatly simplifies the configuration. As a result, it is possible to realize a multi-pulse speech code decoding device that can significantly reduce the amount of calculation.

しかしながら、このようにして改善したマルチ
パルス型音声符号復号化装置にあつてもさらに次
に述べるような欠点がある。 However, even the multi-pulse speech code/decoder improved in this manner still has the following drawbacks.

すなわち、フレーム単位でLPC係数を算出し、
マルチパルスを決定し、合成を行なつているた
め、合成側におけるLPC係数を用いた音声合成
フイルタのフイルタ係数がフレーム単位で変更さ
れる。その結果、フイルタの動特性の影響が著し
く音声の自然性を損なう異状波形が合成フイルタ
により合成される。前記異状波形の影響を緩和す
るために合成側に於いて、LPC係数を補間して
使用すれば上記異状波形の問題は解決するが、反
面、分析側でマルチパルスを決定するために使用
した補間前のLPC係数と合成側で使用した補間
後のLPC係数とが異なる事になり、当然、合成
音声は入力音声と異なつたものとなる。 In other words, calculate the LPC coefficient for each frame,
Since multi-pulses are determined and synthesized, the filter coefficients of the speech synthesis filter using LPC coefficients on the synthesis side are changed on a frame-by-frame basis. As a result, the synthesis filter synthesizes an abnormal waveform that is significantly affected by the dynamic characteristics of the filter and impairs the naturalness of the voice. If the LPC coefficients are interpolated and used on the synthesis side to alleviate the influence of the abnormal waveform, the problem of the abnormal waveform can be solved, but on the other hand, the interpolation used to determine the multipulse on the analysis side The previous LPC coefficients and the interpolated LPC coefficients used on the synthesis side are different, and naturally the synthesized speech will be different from the input speech.

本発明の目的は上述した欠点を除去し、マルチ
パルス型音声符号復号化装置において、合成側で
使用される補間LPC係数と同一の係数を分析側
で算出し、更に算出された補間LPC係数を用い
てマルチパルスを決定する手段を備えることによ
り、入力音声信号と合成音声信号とのＳ／Ｎの劣
化を大幅に改善した簡単な構成のマルチパルス型
音声符号復号化装置を提供することにある。 An object of the present invention is to eliminate the above-mentioned drawbacks, and to calculate the same coefficients on the analysis side as the interpolation LPC coefficients used on the synthesis side in a multi-pulse speech code decoding device, and further calculate the calculated interpolation LPC coefficients on the analysis side. An object of the present invention is to provide a multi-pulse type speech encoding/decoding device having a simple configuration, which greatly improves the deterioration of the S/N between an input speech signal and a synthesized speech signal by providing means for determining multi-pulses using the multi-pulse method. .

本発明のマルチパルス型音声符号復号化装置
は、入力音声信号を分析フレームごとにLPC分
析して抽出したLPC係数をスペクトル包絡情報
としこのスペクトル包絡情報とともに前記入力音
声信号の音声情報を構成する音源情報を分析フレ
ームごとにこの音源情報の特徴に対応する発生時
間位置と振幅とを有する複数個のインパルス系列
（マルチパルス）を以つて表現し前記入力音声信
号の分析および合成を行なうマルチパルス音声符
号復号化装置において、前記入力音声信号の分析
フレームごとに抽出するLPC係数を分析側で補
間し、補間LPC係数を算出する手段と、補間
LPC係数を用いてマルチパルスを決定する手段
とを備えて構成されている。 The multi-pulse speech code/decoding device of the present invention performs LPC analysis on an input speech signal for each analysis frame and uses the extracted LPC coefficients as spectral envelope information, and the sound source constitutes the speech information of the input speech signal together with this spectral envelope information. A multipulse speech code that expresses information in each analysis frame as a plurality of impulse sequences (multipulses) having generation time positions and amplitudes corresponding to the characteristics of the sound source information, and analyzes and synthesizes the input speech signal. In the decoding device, means for interpolating LPC coefficients extracted for each analysis frame of the input audio signal on an analysis side to calculate interpolated LPC coefficients;
and means for determining multi-pulses using LPC coefficients.

次に図面を参照して本発明を詳細に説明する。
第２図は本発明によるマルチパルス型音声符号復
号化装置の分析側の一実施例を示すブロツク図、
第３図は本発明によるマルチパルス型音声符号復
号化装置の合成側の一実施例を示すブロツク図で
ある。 Next, the present invention will be explained in detail with reference to the drawings.
FIG. 2 is a block diagram showing an embodiment of the analysis side of the multi-pulse speech code/decoder according to the present invention;
FIG. 3 is a block diagram showing an embodiment of the synthesis side of the multi-pulse speech code/decoder according to the present invention.

第２図に示す本発明によるマルチパルス型音声
符号復号化装置の分析側は、LPC分析器７，相
互相関係数算出器８，標準型デジタルフイルタ
９，符号化器(1)１０，自己相関係数算出器１１，
音源パルス発生器１２，符号化器(2)１３，補間器
１４，マルチプレクサ１５およびインパルス応答
算出器１６を備えて構成される。 The analysis side of the multi-pulse speech code decoding device according to the present invention shown in FIG. relational number calculator 11,
It is comprised of a sound source pulse generator 12, an encoder (2) 13, an interpolator 14, a multiplexer 15, and an impulse response calculator 16.

入力端子７００１を介して入力した音声信号
は、LPC分析器７および標準型デジタルフイル
タ９にそれぞれ供給される。 Audio signals input via input terminal 7001 are supplied to LPC analyzer 7 and standard digital filter 9, respectively.

LPC分析器７は入力音声信号を分析フレーム
ごとに8kHzで標本化し、予め設定するビツト数
のデジタル量として量子化し、この量子化音声信
号をLPC分析してLPC係数としてのｐ次のκパ
ラメータ（偏自己相関係数）を抽出し、これを出
力ライン７０１を介して符号化器(1)１０に供給す
る。本実施例においては分析フレームは20mSEC
に設定している。 The LPC analyzer 7 samples the input audio signal at 8 kHz for each analysis frame, quantizes it as a digital quantity with a preset number of bits, performs LPC analysis on this quantized audio signal, and calculates the p-order κ parameter ( (partial autocorrelation coefficient) is extracted and supplied to the encoder (1) 10 via an output line 701. In this example, the analysis frame is 20mSEC
It is set to .

符号化器(1)１０は、入力したLPC係数の量子
化と符号化を行なつたのち、出力ライン１００１
を介してマルチパルス１５に出力ライン１００２
を介して補間器１４にそれぞれ送出する。 The encoder (1) 10 quantizes and encodes the input LPC coefficients, and then outputs the output line 1001.
Output line 1002 to multipulse 15 via
The signals are sent to the interpolator 14 via the respective signals.

補間器１４は量子化されたLPC係数を例えば
5mSECに（本実施例では４点補間に相等する）、
又は125μSに（本実施例では160点補間に相等す
る）線形補間し、補間LPC係数を算出する。補
間器１４は更に前記補間LPC係数を出力ライン
１４０１を介してインパルス応答算出器１６へ出
力ライン１４０２を介して標準型デジタルフイル
タ９へそれぞれ送出する。 The interpolator 14 converts the quantized LPC coefficients into
5mSEC (equivalent to 4-point interpolation in this example),
Alternatively, linear interpolation is performed at 125 μS (equivalent to 160-point interpolation in this embodiment), and an interpolated LPC coefficient is calculated. The interpolator 14 further sends the interpolated LPC coefficients to the impulse response calculator 16 via an output line 1401 and to the standard digital filter 9 via an output line 1402, respectively.

インパルス応答算出器１６は補間LPC係数か
らインパルス応答ｈ（ｎ）（０≦ｎ≦Ｍ−１）を計
算し、出力ライン１６０２および１６０１を介し
て相互相関係数算出器８および自己相関関数算出
器１１に供給する。なお、計算されるインパルス
応答は原則としてフレーム周期に相等する数のイ
ンパルス応答波形列（本実施例では160種類）か
ら成つている。 The impulse response calculator 16 calculates the impulse response h(n) (0≦n≦M−1) from the interpolated LPC coefficients, and outputs it to the cross-correlation coefficient calculator 8 and the autocorrelation function calculator via output lines 1602 and 1601. 11. Note that the impulse response to be calculated is basically composed of impulse response waveform sequences whose number is equivalent to the frame period (160 types in this embodiment).

標準型デジタルフイルタ９は入力音声信号を
8kHzで標準化し予じめ設定するビツト数のデジ
タル量として量子化し、この量子化音声信号に聴
感重み付けを実施するものであり、聴感重み付け
を前記(1)式に示す伝達関数Ｗ（Ｚ）を構成する標
準型デジタルフイルタにより実行している。なお
説明が前後するが標準型デジタルフイルタ９は出
力ライン１４０２を介して供給される補間LPC
係数（本実施例では補間後のκパラメータ）をα
パラメータに変換し(1)式のαkとして使用する。
又、γは例えば0.8に選択される。標準型デジタ
ルフイルタ９は聴感重み付けを実施した入力音声
信号を出力ライン９０１を介して相互相関係数算
出器８へ出力する。 The standard digital filter 9 receives the input audio signal.
It is standardized at 8 kHz and quantized as a digital quantity with a preset number of bits, and perceptual weighting is applied to this quantized audio signal. This is implemented using standard digital filters. Although the explanation is complicated, the standard digital filter 9 is an interpolation LPC supplied via the output line 1402.
The coefficient (in this example, the κ parameter after interpolation) is α
Convert it to a parameter and use it as αk in equation (1).
Also, γ is selected to be 0.8, for example. The standard digital filter 9 outputs the perceptually weighted input audio signal to the cross-correlation coefficient calculator 8 via an output line 901.

相互相関係数算出器８は、聴感重み付けを実施
した入力音声信号と複数の（本実施例では160種
類の）インパルス応答ｈ（ｎ）とを利用して相互
関係数hxを計算し、これを出力ライン８０１を
介して音源パルス発生器１２に送出する。 The cross-correlation coefficient calculator 8 calculates a correlation coefficient hx using the perceptually weighted input audio signal and a plurality of (160 types in this example) impulse responses h(n), and It is sent to the sound source pulse generator 12 via an output line 801.

また、自己相関係数算出器１１は、入力した複
数のインパルス応答ｈ（ｎ）の各々に応答する複
数の自己相関係数Rhhを計算し、これを出力ライ
ン１１０１を介して音源パルス算出器１２に送出
する。 Further, the autocorrelation coefficient calculator 11 calculates a plurality of autocorrelation coefficients Rhh in response to each of the plurality of input impulse responses h(n), and sends these to the sound source pulse calculator 12 via an output line 1101. Send to.

音源パルス算出器１２は、こうして入力した分
析フレームごとの相互相関係数hxと複数の自己
相関係数Rhxとを利用して(8)式の計算を実行し所
定の数の音源パルス列を得て、これらのパルスの
振幅および位置情報を出力ライン１２０１を介し
て符号化器(2)１３に送出し、これによつて量子化
および符号化を行なつたのち出力ライン１３０１
を介してマルチプレクサ１５に送出する。 The sound source pulse calculator 12 executes the calculation of equation (8) using the cross-correlation coefficient hx and the plurality of autocorrelation coefficients Rhx for each analysis frame thus inputted, and obtains a predetermined number of sound source pulse sequences. , the amplitude and position information of these pulses are sent to the encoder (2) 13 via the output line 1201, where they are quantized and encoded, and then output to the output line 1301.
The signal is sent to multiplexer 15 via.

このようにして、量子化および符号化されてマ
ルチプレクサ１５に送出されるLPC係数および
マルチパルスデータは、入力音声信号のスペクト
ル包絡および音源情報を表わすデータとしてマル
チプレクサ１５を介して所定の方式で時分割さ
れ、伝送路１５０１を介して第２図に示す分析側
から第３図に示す合成側に伝送される。 In this way, the LPC coefficients and multipulse data that are quantized and encoded and sent to the multiplexer 15 are time-divided in a predetermined manner via the multiplexer 15 as data representing the spectral envelope and sound source information of the input audio signal. The signal is transmitted from the analysis side shown in FIG. 2 to the synthesis side shown in FIG. 3 via a transmission path 1501.

第３図に示す合成側は、伝送路１５０１を介し
て合成側から伝送されたデータに基づいて入力音
声信号の合成を行なうものであり、デマルチプレ
クサ２０，復号化器(1)１７，復号化器(2)１８，補
間器１９，LPC合成器２１およびLPF（Low
Pass Filter）２２等を備えて構成される。 The synthesis side shown in FIG. 3 synthesizes input audio signals based on data transmitted from the synthesis side via a transmission path 1501, and includes a demultiplexer 20, decoders (1) 17, decoding (2) 18, interpolator 19, LPC synthesizer 21 and LPF (Low
Pass Filter) 22, etc.

デマルチプレクサ２０は、伝送路１５０１を介
して入力した各種データをマルチプレクサ１５の
時分割伝送形式による変換前の状態に復元し、
LPC係数データは出力ライン２０１を介して復
号化器(1)１７に、マルチパルスデータは出力ライ
ン２０２を介して復号化器(2)１８にそれぞれ供給
され、これらの復号化器によつてデータの復号化
を行なつたうえ、それぞれ出力ライン１７１，１
８１に送出する。 The demultiplexer 20 restores various data input via the transmission line 1501 to the state before conversion by the time division transmission format of the multiplexer 15,
The LPC coefficient data is supplied to the decoder (1) 17 via the output line 201, and the multipulse data is supplied to the decoder (2) 18 via the output line 202. After decoding the output lines 171 and 1, respectively,
81.

補間器１９はLPC係数を前記第２図に示す補
間器と同一の約束で補間し補間LPC係数を算出
し、出力ライン１９１を介してLPC合成器２１
へ出力する。 The interpolator 19 interpolates the LPC coefficients with the same convention as the interpolator shown in FIG.
Output to.

LPC合成器２１は、このようにして入力する
マルチパルスを音源情報としてｐ次の全極型デジ
タルフイルタの駆動音源に利用し、また出力ライ
ン１９１を介して入力するｐ次の補間LPC係数
データを上記全極型デジタルフイルタの係数とし
てこのLPC合成フイルタを制御して入力音声信
号を合成し、これを出力ライン２１１を介して
LPF２２に送出し、所定の低域フイルタリング
を行つてアナログ量の合成音声として出力ライン
２２１に送出する。 The LPC synthesizer 21 uses the input multi-pulses as sound source information as the driving sound source of the p-order all-pole digital filter, and also uses the p-order interpolated LPC coefficient data input via the output line 191. This LPC synthesis filter is controlled as a coefficient of the all-pole digital filter to synthesize the input audio signal, and this is sent via the output line 211.
The signal is sent to the LPF 22, subjected to predetermined low-pass filtering, and sent to the output line 221 as an analog synthesized voice.

なお、第２図および第３図に示す本発明の実施
例においては、補間前のLPC係数としてκパラ
メータを用いているがこれは補間の前後を間はず
他のLPC係数、たとえばLSPパラメータ等を利
用してもよく、また符号化器とマルチプレクサ、
および復号化器とデマルチプレクサはそれぞれこ
れらを一体化した構成のものとしても同様に実施
し得ることは明らかであり、またLPC合成フイ
ルタは全極型以外の非極型デジタルフイルタ等と
置換してもほぼ同様に実施しうることもまた明ら
かである。 In the embodiment of the present invention shown in FIGS. 2 and 3, the κ parameter is used as the LPC coefficient before interpolation, but this is done before and after interpolation, and other LPC coefficients, such as LSP parameters, are used. May also be used as an encoder and multiplexer,
It is clear that the decoder and demultiplexer can also be implemented in the same way by integrating them, and the LPC synthesis filter can be replaced with a non-polar digital filter other than the all-pole type. It is also clear that it can be implemented in much the same way.

また、第２図に示す標準型デジタルフイルタ９
は聴感重み付けを実施しない場合にはＡ／Ｄ変換
機能を残して不要となることも自明である。 In addition, a standard digital filter 9 shown in FIG.
It is also obvious that if perceptual weighting is not performed, the A/D conversion function will remain and become unnecessary.

更に自己相関係数算出器１１は必づしも直接的
なインパルス応答波形を入力として必要とするも
のでなく、補間LPC係数よりLPC分析の能率的
な解法としてよく知られるレビンソン法を利用し
て、直接的に算出し得ることも又自明である。 Furthermore, the autocorrelation coefficient calculator 11 does not necessarily require a direct impulse response waveform as input, but uses the Levinson method, which is well known as an efficient solution method for LPC analysis, rather than interpolated LPC coefficients. It is also obvious that , can be calculated directly.

以上説明した如く本発明によれば、マルチパル
ス型音声符号復号化装置に於いて合成側で使用さ
れる補間LPC係数と同一の係数を分析側で算出
し、更に算出された補間LPC係数を用いてマル
チパルスを決定する手段を備えることにより、入
力音声信号と合成音声信号とのＳ／Ｎを向上し得
るという効果がある。 As explained above, according to the present invention, the same coefficients as the interpolation LPC coefficients used on the synthesis side are calculated on the analysis side in the multi-pulse speech code decoding device, and the calculated interpolation LPC coefficients are further used. By providing means for determining multi-pulses based on the input signal, there is an effect that the S/N ratio between the input audio signal and the synthesized audio signal can be improved.

[Brief explanation of drawings]

第１図は従来のマルチパルス型音声符号復号化
装置の基本的構成を示すブロツク図、第２図は本
発明によるマルチパルス型音声符号復号化装置の
分析側の一実施例を示すブロツク図、第３図は本
発明によるマルチパルス型符号復号化装置の合成
側の一実施例を示すブロツク図である。１……LPC合成器、２……LPC分析器、３…
…音源パルス発生器、４……減算器、５……聴感
重み付け器、６……２乗誤差最小化器、７……
LPC分析器、８……相互相関係数算出器、９…
…標準型デジタルフイルタ、１０……符号化器
(1)、１１……自己相関係数算出器、１２……音源
パルス発生器、１３……符号化器(2)、１４……補
間器、１５……マルチプレクサ、１６……インパ
ルス応答算出器、１７……復号化器(1)、１８……
復号化器(2)、１９……補間器、２０……デマルチ
プレクサ、２１……LPC合成器、２２……LPF。 FIG. 1 is a block diagram showing the basic configuration of a conventional multi-pulse speech code decoding device, and FIG. 2 is a block diagram showing an embodiment of the analysis side of the multi-pulse speech code decoding device according to the present invention. FIG. 3 is a block diagram showing an embodiment of the synthesis side of the multi-pulse code decoding apparatus according to the present invention. 1...LPC synthesizer, 2...LPC analyzer, 3...
... Sound source pulse generator, 4 ... Subtractor, 5 ... Auditory weighting device, 6 ... Square error minimizer, 7 ...
LPC analyzer, 8... Cross correlation coefficient calculator, 9...
...Standard digital filter, 10...Encoder
(1), 11... Autocorrelation coefficient calculator, 12... Sound source pulse generator, 13... Encoder (2), 14... Interpolator, 15... Multiplexer, 16... Impulse response calculator , 17... decoder (1), 18...
Decoder (2), 19... interpolator, 20... demultiplexer, 21... LPC synthesizer, 22... LPF.

Claims

[Claims] 1. LPC coefficients extracted by LPC analysis for each analysis frame of an input audio signal are used as spectral envelope information, and together with this spectral envelope information, sound source information constituting the audio information of the input audio signal is analyzed for each analysis frame. A multi-pulse type in which the input audio signal is analyzed and synthesized by expressing it as a plurality of predetermined impulse sequences (multi-pulses) having occurrence time positions and amplitudes corresponding to the characteristics of the sound source information. In the speech decoding device, the analysis side has a first means for interpolating the LPC coefficients extracted by LPC analysis for each analysis frame to obtain interpolated LPC coefficients, and the impulse response waveform obtained from the interpolated LPC coefficients and the above-mentioned a second means for obtaining a cross-correlation coefficient sequence with the input audio signal; a third means for obtaining an auto-correlation coefficient sequence of the impulse response waveform; 1. A multi-pulse speech code decoding device, comprising a fourth means for determining pulses on the analysis side. 2. The multi-pulse speech code decoding device according to claim 1, wherein the third means is means for directly calculating the autocorrelation coefficient sequence from the interpolated LPC coefficients. 3 In claim 1, the interpolation
The input audio signal is applied to a standard digital filter whose transfer function is controlled by an LPC coefficient, and the output signal of the filter is used as the new input audio signal. .