JPH043876B2

JPH043876B2 -

Info

Publication number: JPH043876B2
Application number: JP59096038A
Authority: JP
Priority date: 1984-05-14
Filing date: 1984-05-14
Publication date: 1992-01-24
Also published as: JPS60239799A

Description

【発明の詳細な説明】（技術分野）本発明はマルチパルス型ボコーダに関し、殊に
分析側に於けるマルチパルスの振幅、位置の決定
法に関する。TECHNICAL FIELD The present invention relates to a multi-pulse vocoder, and more particularly to a method for determining the amplitude and position of multi-pulses on the analysis side.

（従来技術）入力音声信号を分析して、この入力音声信号の
音声情報を構成するスペクトル包絡情報と音源情
報とを分析側で抽出し、これら音声情報を伝送路
を介して合成側に送出して入力音声信号を再生す
るボコーダはよく知られている。(Prior art) An input audio signal is analyzed, spectral envelope information and sound source information that constitute the audio information of the input audio signal are extracted on the analysis side, and these audio information are sent to the synthesis side via a transmission path. Vocoders that reproduce input audio signals are well known.

上述したスペクトル包絡情報は、入力音声信号
を発生する声道系のスペクトル分布情報を表わす
もので、通常LPC分析によつて得られた分析次
数に対応する個数のLPC係数、たとえばαパラ
メータ，Ｋパラメータ等によつて表現され、また
音源情報はスペクトル包絡の微細構造を示すもの
で入力音声信号からスペクトル分布情報を除い
た、いわゆる残差信号として知られるもので、入
力音声信号の音源の強さ、ピツチ周期および有
声・無声に関する情報が含まれ、通常これらの情
報は入力音声信号の分析フレームごとの自己相関
係を介して抽出されることもよく知られている。 The above-mentioned spectral envelope information represents the spectral distribution information of the vocal tract system that generates the input speech signal, and usually includes the number of LPC coefficients corresponding to the analysis order obtained by LPC analysis, such as the α parameter and the K parameter. The sound source information indicates the fine structure of the spectral envelope, and is known as the so-called residual signal obtained by removing the spectral distribution information from the input audio signal. It is also well known that information regarding pitch period and voiced/unvoiced is included, and that this information is usually extracted via autocorrelation for each analysis frame of the input audio signal.

さて、スペクトル包絡情報はボコーダの合成側
で入力音声信号を合成する場合、通常全極型のデ
ジタルフイルムを利用して近似的声道系を形成せ
しめるLPC合成器の係数として利用され、音源
情報はこのデジタルフイルタの駆動音源として利
用され、このデジタルフイルタによつて入力音声
信号が合成される。 When synthesizing input audio signals on the synthesis side of a vocoder, spectral envelope information is usually used as coefficients for an LPC synthesizer that uses all-polar digital film to form an approximate vocal tract system, and the sound source information is It is used as a driving sound source for this digital filter, and input audio signals are synthesized by this digital filter.

このようにして得られる従来のLPCボコーダ
は、約4Kb（キロワツト）以下の低ビツトレート
でも音声の合成が可能であり多用されているもの
の、高品質の音声合成は高ビツトレートにおいて
も困難であるという欠点を有する。この原因は音
源情報のモデル化の場合、有声音に対してはその
内容に対応するピツチ周期を抽出してこのピツチ
周期に対応する単一のインパルス列で近似的に表
現し、ランダム周期の無声音に対しては白色雑音
で近似的に表現するという単純なモデル化処理を
前提としているため、入力音声信号の音源情報を
忠実に抽出したものとならず、従つて音源情報に
含まれる入力音声信号の波形情報の分析、合成が
実施されていないことによる。 Although the conventional LPC vocoder obtained in this way is capable of synthesizing speech even at low bit rates of about 4Kb (kilowatts) or less and is widely used, it has the disadvantage that high-quality speech synthesis is difficult even at high bit rates. has. The reason for this is that when modeling sound source information, for a voiced sound, the pitch period corresponding to its content is extracted and approximately represented by a single impulse train corresponding to this pitch period, while unvoiced sounds with a random period are is assumed to be a simple modeling process in which it is approximated by white noise, so it does not faithfully extract the sound source information of the input audio signal, and therefore the input audio signal contained in the sound source information This is due to the fact that analysis and synthesis of waveform information has not been carried out.

マルチパルス型ボコーダは、このような波形非
伝送による問題の改善を図るため波形伝送を行な
つて入力音声信号の合成を実施するボコーダのひ
とつとして近時よく知られつつあるものである。 A multi-pulse vocoder has recently become well known as a type of vocoder that performs waveform transmission and synthesizes input audio signals in order to improve the problem caused by non-transmission of waveforms.

第１図は従来のマルチパルス型ボコーダの分析
側基本的構成を示すブロツク図である。 FIG. 1 is a block diagram showing the basic structure of the analysis side of a conventional multi-pulse vocoder.

LPC合成器１は声道をシミユレートする全極
型デジタルフイルタを備え、その係数は入力端子
2001を介して入力される入力音声信号ｘ（ｎ）（ｎ
＝１，２，３……ｎ）をLPC分析器２により分
析フレームごとに分析したLPC係数が供給され
る。音源パルス発生器３は、入力音声信号の音源
情報から複数個のインパルス系列、すなわちマル
チパルスからなる駆動音源系列Ｖ（ｎ）を得て、
これをLPC合成器１の駆動音源として供給する。 The LPC synthesizer 1 is equipped with an all-pole digital filter that simulates the vocal tract, and its coefficients are input to the input terminal.
The input audio signal x(n)(n
= 1, 2, 3...n) by the LPC analyzer 2 for each analysis frame. The sound source pulse generator 3 obtains a driving sound source sequence V(n) consisting of a plurality of impulse sequences, that is, multipulses, from the sound source information of the input audio signal,
This is supplied as a driving sound source to the LPC synthesizer 1.

LPC合成器１はこうして入力するLPC係数を、
通常は全極型デジタルフイルタを利用する合成フ
イルタの係数とし、マルチパルスを駆動音源とし
て駆動され合成信号x〓（ｎ）を出力する。この場
合、マルチパルスは入力音声信号の波形情報を含
むものであり、LPC合成器１は波形情報を含む
入力音声信号の合成を行なうこととなる。 The LPC synthesizer 1 inputs the LPC coefficients in this way,
Usually, this is the coefficient of a synthesis filter using an all-pole digital filter, which is driven by a multi-pulse as a driving sound source and outputs a synthesis signal x〓(n). In this case, the multi-pulse includes waveform information of the input audio signal, and the LPC synthesizer 1 synthesizes the input audio signal including the waveform information.

さて、LPC合成器１から出力する合成信号x〓
（ｎ）は次に減算器４で入力音声信号ｘ（ｎ）との
差をとり、誤差ｅ（ｎ）を得てこれを聴感重み付
け器５に送出する。 Now, the composite signal x output from LPC synthesizer 1
(n) is then subtracted from the input audio signal x(n) by a subtracter 4 to obtain an error e(n), which is sent to the auditory weighter 5.

聴感重み付け器５は、誤差ｅ（ｎ）に対して次
の(1)式に示す特性ｗ（ｚ）を有する重み付けフイ
ルタによつて聴感的な重み付けを付与したうえ、
これらを２乗誤差最小化器６に送出するものであ
る。 The perceptual weighting device 5 applies perceptual weighting to the error e(n) using a weighting filter having a characteristic w(z) shown in the following equation (1), and
These are sent to the square error minimizer 6.

ｗ（ｚ）＝〔１−_p 〓^k=1 a_kz^-k〕／〔１−_p 〓^k=1 a_kr^kz^-k〕 ……(1) (1)式においてa_kはLPC合成器１の全極型デジタ
ルフイルタの係数とすべきLPC係数、ｐはその
次数であり従つてLPC分析次数、ｒは重み付け
係数、ｚは全極型デジタルフイルタのｚ変換表示
による伝達関数Ｈ（z^-1）におけるｚ＝exp（jλ）を
示し、ここにλ＝2πΔT_fでありΔTは分析フレー
ムの標本化サンプリング周期、は周波数を示
す。 w (z) = [1- _p 〓 ^k=1 a _k z ^-k ] / [1- _p 〓 ^k=1 a _k r ^k z ^-k ] ...(1) In equation (1), a _k is LPC The LPC coefficient to be used as the coefficient of the all-pole digital filter of synthesizer 1, p is its order and therefore the LPC analysis order, r is the weighting coefficient, and z is the transfer function H( z ⁻¹ ), where λ=2πΔT _f , ΔT is the sampling period of the analysis frame, and is the frequency.

また(1)式において重み付け係数ｒは、０＜ｒ＜
１の範囲で設定される。 In addition, in equation (1), the weighting coefficient r is 0<r<
It is set in the range of 1.

(1)式に示すｗ（ｚ）はｒ＝１に対しては１、ｒ
＝０に対してはｗ（ｚ）＝１−ｐ（ｚ）の範囲の範
囲で変化し、ｒの値は誤差ｅ（ｎ）の周波数スペ
クトルにおけるフオルマント領域に現われる過大
なレベルを抑圧する程度に対応して前述した範囲
の中で設定され、合成すべき信号の聴感的重み付
けの役割を果たすものであり、通常予め最適聴感
テストによつてその最適値が選定される。 w(z) shown in equation (1) is 1 for r=1, r
= 0, the value of r varies within the range of w(z) = 1-p(z), and the value of r is set to an extent that suppresses the excessive level appearing in the formant region in the frequency spectrum of the error e(n). It is correspondingly set within the above-mentioned range and plays the role of audible weighting of the signals to be combined, and its optimum value is usually selected in advance by an optimum audibility test.

このようにして重み付けられた誤差ｅ（ｎ）は、
音源パルス発生器３から出力される駆動音源系列
Ｖ（ｎ）、すなわちマルチパルスの最適時間位置と
振幅とを決定するために２乗誤差最小化器６に送
出され、次の(2)式による２乗誤差εを計算し、ε
を最小にするように駆動音源系列Ｖ（ｎ）が選択
される。 The error e(n) weighted in this way is
In order to determine the driving sound source sequence V(n) output from the sound source pulse generator 3, that is, the optimal time position and amplitude of the multi-pulse, it is sent to the square error minimizer 6, and is calculated according to the following equation (2). Calculate the squared error ε, ε
The driving sound source sequence V(n) is selected so as to minimize the value of V(n).

ε＝_N 〓^n=k 〔ｅ（ｎ）＊ｗ（ｎ）〕² ……(2) (2)式において記号＊は聴感重み付け器５の重み
付けフイルタによるたたみ込み積分、Ｎはマルチ
パルスを計算する区間長を示す。 ε= _N 〓 ^n=k [e(n)*w(n)] ² ...(2) In equation (2), the symbol * is the convolution integral by the weighting filter of the auditory weighter 5, and N is the multipulse calculation Indicates the length of the interval.

上述した処理はマルチパルスのパルスごとに繰
返され、分析による合成がマルチパルスごとに行
なわれる、いわゆるAnalysis−by−Syntnesis手
法（以下Ａ−ｂ−Ｓ手法と略称する）であつて、
このＡ−ｂ−Ｓ手法は上述した内容からも明らか
な如く、マルチパルス１つずつについてパルス発
生、２乗誤差計算およびパルス位置・振幅調整の
ループで行なわれるため、低ビツトレート領域に
おける有効な手段であるにもかかわらずその演算
量が極めて膨大なものとなるという欠点がある。 The above-mentioned process is repeated for each multi-pulse, and synthesis by analysis is performed for each multi-pulse, which is the so-called Analysis-by-Syntonesis method (hereinafter abbreviated as A-b-S method).
As is clear from the above, this A-b-S method is an effective method in the low bit rate region because it is performed in a loop of pulse generation, square error calculation, and pulse position/amplitude adjustment for each multipulse. However, the disadvantage is that the amount of calculation required is extremely large.

なお、このＡ−ｂ−Ｓ手法については、B.S.
Atal el al、“Ａ New Model of LPC
Excitation for Producing Natural−Sounding
Speech at Low Bit Rates”，Proc.ICASSP
82，pp 614−617，（1982）等に詳述されている。 Regarding this A-b-S method, BS
Atal el al, “A New Model of LPC
Excitation for Producing Natural−Sounding
Speech at Low Bit Rates”，Proc.ICASSP
82, pp. 614-617, (1982), etc.

このような従来のＡ−ｂ−Ｓ手法における欠点
に対して、相関演算にもとづき最適なマルチパル
スを効率的に計算する次のような演算処理アルゴ
リズムが最近紹介されている。 In order to address these shortcomings in the conventional A-b-S method, the following arithmetic processing algorithm has recently been introduced which efficiently calculates optimal multi-pulses based on correlation calculations.

すなわち、入力音声信号ｘ（ｎ）はＮサンプル
ごと処理フレームによつて区分され、このフレー
ムごとにマルチパルスが包括的に計算されるもの
である。 That is, the input audio signal x(n) is divided into processing frames every N samples, and multipulses are comprehensively calculated for each frame.

いま、１分析フレーム内に音源パルスがｋ個存
在するものとし、ｉ番目のパルスがフレーム端か
ら時間位置m_iにあり、かつその振幅がg_iであると
すると、LPC合成フイルタの駆動音源ｄ（ｎ）は
次の(3)式で示される。 Assume that there are k sound source pulses in one analysis frame, and that the i-th pulse is at a time position m _i from the frame end and its amplitude is g _i , then the driving sound source d of the LPC synthesis filter is (n) is expressed by the following equation (3).

ｄ（ｎ）＝_k 〓ⁱ⁼¹ g_i・δn，m_i ……(3) (3)式においてδn，m_iはクロネツカーのデルタ
関数であり、δ，m_i＝１（ｎ＝m_i），δn，m_i＝０
（ｎ≒m_i）である。 d(n)= _k 〓 ⁱ⁼¹ g _i・δn,m _i ...(3) In equation (3), δn,m _i is the Kronetzker delta function, and δ,m _i =1(n=m _i ), δn, m _i =0
(n≒ _mi ).

LPC合成フイルムはこの駆動音源ｄ（ｎ）によ
つて駆動され合成信号x〓（ｍ）を出力する。 The LPC composite film is driven by this driving sound source d(n) and outputs a composite signal x〓(m).

LPC合成フイルタとして、たとえば全極型デ
ジタルフイルタを考えるものとし、その伝達関数
をインパルス応答ｋ（ｎ）（０ｎＭ−１）で表
現するものとすると、合成信号x〓（ｎ）は次の(4)
式で表わされる。 As an LPC synthesis filter, let us consider, for example, an all-pole digital filter, and its transfer function is expressed by an impulse response k(n) (0nM-1), then the synthesis signal x〓(n) is as follows (4 )
It is expressed by the formula.

x〓（ｎ）＝_M-1 〓^l=0 ｄ（ｌ）・ｈ（ｎ−ｌ） ……(4) (4)式においてｄ（ｌ）は駆動音源を表わす。次
に入力音声信号ｘ（ｎ）と合成信号x〓（ｎ）との誤
差に対し聴感的な補正を施した重み付け誤差をe_w
（ｎ）とするとe_w（ｎ）は次の(5)式で示される。 x〓(n)= _M-1〓l ⁼⁰ d(l)・h(n-l)...(4) In equation (4), d(l) represents the driving sound source. Next, the weighting error obtained by performing auditory correction on the error between the input audio signal x(n) and the composite signal x〓(n) is e _w
(n), e _w (n) is expressed by the following equation (5).

e_w（ｎ）＝{x(n)-x〓(n)}＊ｗ（ｎ） ……(5) さらに２乗誤差は(5)式から誘導して次の(6)式で
示すことができる。 e _w (n)={x(n)-x〓(n)}＊w(n)...(5) Furthermore, the squared error can be derived from equation (5) and expressed by the following equation (6). I can do it.

_M 〓ⁿ⁼¹ e² _w（ｎ）＝_M 〓ⁿ⁼¹ 〔｛ｘ（ｎ） −x〓（ｎ）｝＊ｗ（ｎ）〕² ……(6) (6)式においてＭは誤差を最小化する区間のサン
プル数を示し、たとえば１分析フレーム長に選
ぶ。最適な音源パルス列としてのマルチパルスは
(6)式を最小化するg_iを得ることによつて得られ、
このg_iは上述した(3)，(4)および(6)式から(7)式の如
く誘導される。 _M 〓 ⁿ⁼¹ e ² _w (n)= _M 〓 ⁿ⁼¹ [{x(n) −x〓(n)}*w(n)] ² ...(6) In equation (6), M is the error The number of samples in the interval that minimizes is selected, for example, as the length of one analysis frame. Multipulse as the optimal sound source pulse train is
Obtained by obtaining g _i that minimizes equation (6),
This g _i is derived from the above-mentioned equations (3), (4), and (6) as shown in equation (7).

g_i（m_i）＝_M 〓ⁿ⁼¹ x_w（ｎ）・h_w（ｎ−m_i） −_i-1 〓^l=1 〔gl_M 〓ⁿ⁼¹ h_w（ｎ−m_l）・h_w（ｎ−m_i）〕／_M 〓ⁿ⁼¹ h_w（ｎ−m_i）・h_w（ｎ−m_i） ……(7) (7)式においてx_w（ｎ）へｘ（ｎ）＊ｗ（ｎ），h_w
（ｎ）はｈ（ｎ）＊ｗ（ｎ）を示す。(7)式の右辺の
分子の第１項はx_w（ｎ）とh_w（ｎ）との時間遅れ
m_iの相互相関関数φ_hx（m_i）を示すものであり、
また、第２項の_p 〓^k=1 h_w（ｎ−m_l）・h_w（ｎ−m_i）は
h_w（ｎ）の共分散関数φ_hh（m_l，m_i）（１m_l，m_i
Ｍ）を示す。共分散関数φ_hh（m_l，m_i）は自己
関関数R_hh（｜m_l−m_i｜）と等しくなり、従つて
(7)式は次の(8)式の如く表わすことができる。g _i (m _i )= _M 〓 ⁿ⁼¹ x _w (n)・h _w (n−m _i ) − _i−1 〓 ^l=1 [gl _M 〓 ⁿ⁼¹ h _w (n−m _l )・h _w (n-m _i )] / _M 〓 ⁿ⁼¹ h _w (n-m _i )・h _w (n-m _i ) ...(7) In equation (7), x _w (n) is converted to x ( n) * w (n), h _w
(n) indicates h(n)*w(n). The first term in the numerator on the right side of equation (7) is the time delay between x _w (n) and h _w (n).
It shows the cross-correlation function φ _hx (m _i ) of m _i ,
Also, the second term _p 〓 ^k=1 h _w (n-m _l )・h _w (n-m _i ) is
_Covariance function φ _hh (m _l , m _i ) (1m _l , m _i
M) is shown. The covariance function φ _hh (m _l , m _i ) is equal to the autorelation function R _hh (|m _l −m _i |), so
Equation (7) can be expressed as the following equation (8).

(8)式によれば、時間位置m_iにおいてパルスを
発生せしめると振幅g_i（m_i）が最適なものとして
決定しうることとなる。なお(8)式において１
m_iＭである。 According to equation (8), if a pulse is generated at time position m _i , the amplitude g _i (m _i ) can be determined to be optimal. Note that in equation (8), 1
m _i M.

つまり、ある音源パルスに着目し、複種の時間
位置において(8)式によりその振幅を計算したう
え、その振幅の絶対値を最大とするものが(6)式に
示す２乗誤差を最小化するパルスとなり、このよ
うな手続を繰返して複数個の音源パルスを求める
ことができる。 In other words, by focusing on a certain sound source pulse and calculating its amplitude using equation (8) at multiple time positions, the one that maximizes the absolute value of the amplitude minimizes the squared error shown in equation (6). A plurality of sound source pulses can be obtained by repeating this procedure.

なお、上述した計算アルゴリズムに関しては、
小沢、荒関、小野“マルチパルス駆動形音声符号
化法の検討”、1983年３月電子通信学会通信
方式研究会に詳述されている。 Regarding the calculation algorithm mentioned above,
Ozawa, Araseki, Ono, ``Study of multi-pulse driven speech coding method'', March 1983, detailed in the Communications Method Study Group of the Institute of Electronics and Communication Engineers.

このような計算アルゴリズムに基づいて行なわ
れるマルチパルスの発生によれば、相互相関関数
と自己相関関数ならびに最大値演算から最適なマ
ルチパルスの計算が可能となるため、構成が非常
に簡素化されたものとなり演算量を大幅に低減し
うるマルチパルス型ボコーダを実現することがで
きる。 Generating multipulses based on such calculation algorithms makes it possible to calculate optimal multipulses from cross-correlation functions, autocorrelation functions, and maximum value calculations, which greatly simplifies the configuration. Therefore, it is possible to realize a multi-pulse vocoder that can significantly reduce the amount of calculation.

しかしながら、このようにして改善したマルチ
パルス型ボコーダにあつてもさらに次に述べるよ
うな欠点がある。 However, even the multi-pulse vocoder improved in this way still has the following drawbacks.

すなわち、合成側に於ける合成フイルタの語長
が有限の場合、上述した計算アルゴリズムにより
求めたマルチパルスによる駆動音源を用いた合成
フイルタ出力が、無音時や有声音語尾等の音声の
電力が小いさい場合に合成フイルタの有限語長の
影響で発生するリミツトサイクルによる雑音電力
により妨害され、著しく聴覚的に耳ざわりになる
という欠点がある。 In other words, when the word length of the synthesis filter on the synthesis side is finite, the output of the synthesis filter using the multi-pulse driven sound source determined by the calculation algorithm described above will be reduced when the power of the voice is low, such as during silence or at the end of a voiced sound. In most cases, noise power due to limit cycles generated due to the finite word length of the synthesis filter interferes with the signal, and the disadvantage is that it becomes extremely aurally harsh.

（発明の目的）本発明の目的は上述した欠点を除去し、マルチ
パルス型ボコーダにおいて、分析側で入力音声信
号に直流信号を加算してマルチパルスの振幅、位
置を求める事により、合成側へ合成フイルタのリ
ミツトサイクルによる雑音電力よりも十分に大き
な合成電力を発生し得るマルチパルスによる駆動
音源を供給することにより、無音時や有声音語尾
等の音声の電力が小いさい場合にも良好な合成音
を発生し得るマルチパルス型ボコーダを提供する
ことにある。(Object of the Invention) The object of the present invention is to eliminate the above-mentioned drawbacks, and in a multi-pulse type vocoder, add a DC signal to the input audio signal on the analysis side to obtain the amplitude and position of the multi-pulse, and then send the signal to the synthesis side. By supplying a multi-pulse driven sound source that can generate a synthesized power that is sufficiently larger than the noise power due to the limit cycle of the synthesis filter, it is effective even when the power of the voice is low, such as during silence or voiced endings. The object of the present invention is to provide a multi-pulse vocoder that can generate synthetic sounds.

（発明の構成）本発明のマルチパルス型ボコーダは、入力音声
信号を分析フレームごとにLPC分析して抽出し
たLPC係数をスペクトル包絡情報としこのスペ
クトル包絡情報とともに前記入力音声信号の音声
情報を構成する音源情報を分析フレームごとにこ
の音源情報の特徴に対応する発生時間位置と振幅
とを有する複数個のインパルス系列（マルチパル
ス）を以つて表現し前記入力音声信号の分析およ
び合成を行なうマルチパルス型ボコーダにおい
て、前記入力音声信号に直流信号を加算する手段
と、前記直流信号を加算された入力音声信号と音
声合成フイルタのインパルス応答との相互相関係
数列を算出する手段と、前記インパルス応答の自
己相関係数列を算出する手段と、前記相互相関係
数列と前記自己相関係との関連性に基づいてイン
パルス系列（マルチパルス）の振幅、位置をフオ
ワード的に算出する手段を分析側に備えて構成さ
れる。(Structure of the Invention) The multi-pulse vocoder of the present invention performs LPC analysis on an input audio signal for each analysis frame, uses the extracted LPC coefficients as spectral envelope information, and configures the audio information of the input audio signal together with this spectral envelope information. A multi-pulse type that analyzes and synthesizes the input audio signal by expressing sound source information in each analysis frame as a plurality of impulse sequences (multipulses) having generation time positions and amplitudes corresponding to the characteristics of the sound source information. In the vocoder, means for adding a DC signal to the input audio signal, means for calculating a cross-correlation coefficient sequence between the input audio signal to which the DC signal is added and the impulse response of the speech synthesis filter; The analysis side includes means for calculating a correlation coefficient sequence and means for calculating the amplitude and position of an impulse sequence (multipulse) in a forward manner based on the relationship between the cross-correlation coefficient sequence and the self-correlation. be done.

本発明においては直流分の印加により、リミツ
トサイクルを防止し、音声信号の電力が小さい場
合でも、これによるノイズの発生を防ぐことがで
きる。また印加される信号が直流分のため、受信
側では聴覚に感知されることはない。尚、音声信
号の電力が小さい場合はこれを増幅することも考
えられるが、この場合、忠実な伝送のためには受
信側へ増幅したことを表わす情報を送り、受信側
ではこの情報を基に合成された信号を減衰させる
必要がある。しかしながらこれでは伝送されるべ
き情報量が増えてしまい、情報量を可能な限り減
らして音声信号を伝送するという本来の目的とは
矛盾することになる。 In the present invention, by applying a DC component, limit cycles can be prevented, and even when the power of the audio signal is small, the generation of noise due to this can be prevented. Furthermore, since the applied signal is a direct current component, it is not perceived audibly on the receiving side. Note that if the power of the audio signal is low, it may be possible to amplify it, but in this case, for faithful transmission, information indicating that it has been amplified is sent to the receiving side, and the receiving side uses this information to It is necessary to attenuate the combined signal. However, this increases the amount of information to be transmitted, which contradicts the original purpose of transmitting audio signals by reducing the amount of information as much as possible.

（実施例）次に図面を参照して本発明を詳細に説明する。
第２図は本発明によるマルチパルン型ボコーダの
分析側の一実施例を示すブロツク図、第３図は本
発明によるマルチパルス型ボコーダの合成側の一
実施例を示すブロツク図である。(Example) Next, the present invention will be described in detail with reference to the drawings.
FIG. 2 is a block diagram showing an embodiment of the analysis side of a multi-pulse type vocoder according to the present invention, and FIG. 3 is a block diagram showing an embodiment of the synthesis side of the multi-pulse type vocoder according to the present invention.

第２図に示す本発明によるマルチパルス型ボコ
ーダの分析側は、LPC分析器７，相互相関関数
算出器８，符号化器(1)９，自己相関関数算出器１
０，マルチパルス算出器１１，付号化器(2)１２，
直流信号発生器１９，直流加器２０およびマルチ
プレクサ１３を備えて構成されている。 The analysis side of the multi-pulse vocoder according to the present invention shown in FIG.
0, multipulse calculator 11, encoder (2) 12,
It is configured to include a DC signal generator 19, a DC adder 20, and a multiplexer 13.

入力端子7001を介して入力した入力音声信号
は、LPC分析器７および直流加算器１５に供給
される。 The input audio signal input via the input terminal 7001 is supplied to the LPC analyzer 7 and the DC adder 15.

LPC分析器７は入力音声信号を分析フレーム
ごとに、予め設定するビツト数のデジタル量とし
て量子化化し、この量子化音声信号をLPC分析
してLPC係数としてのｐ次のＫパラメータ（偏
自己相関係数）を抽出し、これを出力ライン701
を介して符号化器(1)９に供給する。本実施例にお
いては分析フレームは20mSECに設定している。 The LPC analyzer 7 quantizes the input audio signal as a digital quantity with a preset number of bits for each analysis frame, performs LPC analysis on this quantized audio signal, and calculates the p-order K parameter (partial self-correlation) as an LPC coefficient. relation coefficient) and output this to the output line 701
The signal is supplied to the encoder (1) 9 via the encoder (1) 9. In this example, the analysis frame is set to 20 mSEC.

符号化器(1)９は、入力したLPC係数の量子化
と符号化を行なつたのち、出力ライン901を介し
てマルチプレクサ１３に送出する。 The encoder (1) 9 quantizes and encodes the input LPC coefficients, and then sends them to the multiplexer 13 via an output line 901.

LPC分析器７はまた、LPC係数らインパルス
応答ｈ（ｎ）（０ｎＭ−１）を計算し、出力ラ
イン702，符号化器(1)９、出力ライン902を介して
相互相関関数算出器８および自己相関関数算出器
１０に供給する。 The LPC analyzer 7 also calculates an impulse response h(n) (0 nM-1) from the LPC coefficients, and outputs the result to the cross-correlation function calculator 8 and It is supplied to an autocorrelation function calculator 10.

直流信号発生器１９は直流信号を発生するもの
であり、発生する直流信号の振幅は、入力音声信
号の有声音定常部の最大振幅に対応して予じめ経
験的に決定される。本実施例に於いては入力端子
7001を介して供給される入力音声信号の最大振幅
より30dB低い振幅の直流信号を直流信号発生器
１９は発生し直流加算器２０へ出力する。直流加
算器２０は入力端子7001を介して供給された入力
音声信号に直流信号を加算し相互相関関数算出器
８へ出力する。 The DC signal generator 19 generates a DC signal, and the amplitude of the generated DC signal is empirically determined in advance in accordance with the maximum amplitude of the voiced sound stationary portion of the input audio signal. In this embodiment, the input terminal
The DC signal generator 19 generates a DC signal with an amplitude 30 dB lower than the maximum amplitude of the input audio signal supplied via the input audio signal 7001 and outputs it to the DC adder 20 . The DC adder 20 adds a DC signal to the input audio signal supplied via the input terminal 7001 and outputs the result to the cross-correlation function calculator 8.

相互相関関数算出器８は、直流信号を加算され
た入力音声信号とインパルス応答ｈ（ｎ）とを利
用して相互相関関数φ_hxを計算し、これを出力ラ
イン801を介してマルチパルス算出器１１に送出
する。 The cross-correlation function calculator 8 calculates a cross-correlation function φ _hx using the input audio signal to which the DC signal has been added and the impulse response h(n), and sends it to the multipulse calculator via an output line 801. Send it to 11.

また、自己相関関数算出器１０は、入力したイ
ンパルス応答ｈ（ｎ）の自己相関関数R_hhを計算
し、これを出力ライン1001を介して類似度算出器
１１に送出する。 Further, the autocorrelation function calculator 10 calculates the autocorrelation function _Rhh of the input impulse response h(n), and sends it to the similarity calculator 11 via the output line 1001.

マルチパルス算出器１１はこうして入力した分
析フレームごとの相互相関関数φ_hxと自己相関関
数R_hhとを利用して後述する手法を用いて所定の
数の音源パルス列を得て、これらのパルスの振幅
および位置情報を出力ライン1101を介して符号化
器(2)１２に送出し、これによつて量子化および符
号化を行なつたのち出力ライン1201を介してマル
チプレクサ１３に送出する。 The multi-pulse calculator 11 uses the cross-correlation function φ _hx and auto-correlation function R _hh for each analysis frame input in this way to obtain a predetermined number of sound source pulse trains using a method described later, and calculates the amplitude of these pulses. and position information are sent to encoder (2) 12 via output line 1101, where they are quantized and encoded, and then sent to multiplexer 13 via output line 1201.

このようにして、量子化および符号化されてマ
ルチプレクサ１３に送出されるLPC係数および
マルチパルスデータは、入力音声信号のスペクト
ル包絡および音源情報を表わすデータとしてマル
チプレクサ１３を介して所定の方式で時分割さ
れ、伝送路1301を介して第２図に示す分析側から
第３図に示す合成側に伝送される。 In this way, the LPC coefficients and multipulse data that are quantized and encoded and sent to the multiplexer 13 are time-divided in a predetermined manner via the multiplexer 13 as data representing the spectral envelope and sound source information of the input audio signal. The signal is transmitted from the analysis side shown in FIG. 2 to the synthesis side shown in FIG. 3 via the transmission path 1301.

第３図に示す合成側は、伝送路1301を介して分
析側から伝送されたデータに基づいて入力音声信
号の合成を行なうものであり、マルチプレクサ１
４，複号化器(1)１５，複号化器(2)１６，LPC合
成器１７およびLPF（Low Pass Filter）18等を
備えて構成される。 The synthesis side shown in FIG. 3 synthesizes input audio signals based on data transmitted from the analysis side via a transmission path 1301, and multiplexer 1
4. It is configured with a decoder (1) 15, a decoder (2) 16, an LPC synthesizer 17, an LPF (Low Pass Filter) 18, and the like.

テマルチプレクサ１４は、伝送路1301を介して
入力した各種データをマルチプレクサ１３の時分
割伝送形式による変換前の状態に復元し、LPC
係数データは出力ライン141を介して複号化器(1)
１５に、マルチパルスデータは出力ライン142を
介して複号化器(2)１６にそれぞれ供給され、これ
らの複号化器によつてデータの復号化を行なつた
うえ、それぞれ出力ライン151，161に送出する。 The multiplexer 14 restores the various data input via the transmission path 1301 to the state before conversion by the time division transmission format of the multiplexer 13, and converts the data into the LPC
The coefficient data is sent to the decoder (1) via output line 141.
15, the multi-pulse data is supplied to decoders (2) 16 via output lines 142, and the data is decoded by these decoders and then output to output lines 151 and 15, respectively. Send to 161.

LPC合成器１７は、このようにして入力する
マルチパルスを音源情報としてｐ次の全極型デジ
タルフイルタの駆動音源に利用し、また出力ライ
ン151を介して入力するｐ次のLPC係数データを
上記全極型デジタルフイルタの係数としてこの
LPC合成フイルタを制御して入力音声信号を合
成し、これを出力ライン211を介してLPF１８に
送出し、所定の低域フイルタリングを行つてアナ
ログ量の合成音声として出力ライン181に送出す
る。 The LPC synthesizer 17 uses the thus inputted multipulses as sound source information for the driving sound source of the p-order all-pole digital filter, and also uses the p-order LPC coefficient data input via the output line 151 as the sound source information. This is used as the coefficient of an all-pole digital filter.
The LPC synthesis filter is controlled to synthesize input audio signals, which are sent to the LPF 18 via the output line 211, subjected to predetermined low-pass filtering, and sent to the output line 181 as analog synthesized audio.

上述のLPC合成器１７に入力されたマルチパ
ルスで表現される音源情報は少なくとも有声音定
常部の30dB程度低い電力を有するものであり、
LPC合成器１７のリミツトサイクルによる電力
より十分大きなものである。なお、LPC合成器
１７で合成された直流成分は聴覚的にはいつさい
知覚されない。 The sound source information expressed by the multi-pulses input to the LPC synthesizer 17 described above has power at least about 30 dB lower than that of the voiced sound stationary part,
This is sufficiently larger than the power generated by the limit cycle of the LPC synthesizer 17. Note that the DC component synthesized by the LPC synthesizer 17 is not perceptible audibly.

次にマルチパルス算出器１１に応用される手法
について説明する。マルチパルス算出器１１は前
述の分析フレーム毎の相互相関係数φ_hxと自己相
関係数R_hhとを利用して所定の数の音源パルス列
を算出し得る手法であれば全て用いることが可能
である。一例としてマルチパルス算出器１１のマ
ルチパルス算出手法として前記小沢らのアルゴリ
ズムを用いる場合について述べる。 Next, a method applied to the multi-pulse calculator 11 will be explained. The multi-pulse calculator 11 can use any method that can calculate a predetermined number of sound source pulse trains using the cross-correlation coefficient φ _hx and auto-correlation coefficient R _hh for each analysis frame described above. be. As an example, a case will be described in which the algorithm of Ozawa et al. is used as the multipulse calculation method of the multipulse calculator 11.

初めに相互相関係数φ_hxの絶対値の最大のもの
を検索する。次に前記検索されたφ_hxの位置と振
幅（極性を有する）とを有する第１番目の音源パ
ルスを決定する。更に決定した音源パルスの成分
を除去するために、遅れ“０”の自己相関係数で
正規化され前記第１番目の音源パルスの振幅（極
性を有する）により重み付けされたR_hhにより前
記検索された位置をR_hh（０）と対応させてφ_hxを
補正する。次に補正されたφ_hxの絶対値の最大の
ものを検索し、第２の音源パルスを決定し、φ_hx
を補正する。必要に応じ上述の操作を繰返す。 First, the maximum absolute value of the cross-correlation coefficient φ _hx is searched. Next, the first sound source pulse having the searched position and amplitude (having polarity) of φ _hx is determined. Furthermore, in order to remove the component of the determined sound source pulse, the search is performed by R _hh normalized by the autocorrelation coefficient of lag "0" and weighted by the amplitude (with polarity) of the first sound source pulse. φ _hx is corrected by associating the detected position with R _hh (0). Next, the maximum absolute value of the corrected φ _hx is searched, the second sound source pulse is determined, and φ _hx
Correct. Repeat the above operation as necessary.

なお小沢らのアルゴリズムに依らないマルチパ
ルス算出手法としては、類似度に依る方法があ
る。類似度による方法は上記φ_hxの絶対値の最大
のものを検索する代りにφ_hxとR_hhとの類似度、例
えば相互相関係数、の最大のものを検索するもの
であり特許願58−149007に記載された手法であ
る。 Note that as a multipulse calculation method that does not rely on Ozawa et al.'s algorithm, there is a method that relies on similarity. The method based on similarity searches for the maximum similarity between φ _hx and R _hh , such as the cross-correlation coefficient, instead of searching for the maximum absolute value of φ _hx . This is the method described in 149007.

又、上記の説明に於いては(1)式に示される聴感
重み付けを実施しない事を前提にしていたが、聴
感重み付けを実施することも可能である。聴感重
み付けを実施する場合には(1)式に示される伝達関
数を有するフイルタを、例えばγ＝0.8として構
成し、直流加算器２０と相互相関関数算出器８と
の間に挿入し前述のLPC分析器７で算出される
インパルス応答の代りに、LPC係数に減衰係数
γ（γ＝0.8）を印加したLPC係数から計算された
インパルス応答ｈ（ｎ）を用いればよいことは明
らかである。 Further, in the above explanation, it is assumed that the perceptual weighting shown in equation (1) is not performed, but it is also possible to implement the perceptual weighting. When perceptual weighting is performed, a filter having a transfer function shown in equation (1) is configured, for example, with γ = 0.8, and is inserted between the DC adder 20 and the cross-correlation function calculator 8, and the filter is inserted between the DC adder 20 and the cross-correlation function calculator 8. It is clear that instead of the impulse response calculated by the analyzer 7, the impulse response h(n) calculated from the LPC coefficient to which the attenuation coefficient γ (γ=0.8) is applied may be used.

なお、第２図および第３図に示す本発明の実施
例においては、LPC係数としてＫパラメータを
用いているがこれは他のLPC係数、たとえばα
パラメータ等を利用してもよく、また符号化器と
マルチプレクサ、および復号化器とテマルチプレ
クサはそれぞれこれらを一体化した構成のものと
しても同様に実施し得ることは明らかであり、ま
たLPC合成フイルタは全極型以外の非極型デジ
タルフイルタ等と置換してもほぼ同様に実施しう
ることもまた明らかである。 Note that in the embodiments of the present invention shown in FIGS. 2 and 3, the K parameter is used as the LPC coefficient, but this is different from other LPC coefficients, such as α
It is clear that the encoder and the multiplexer, and the decoder and the multiplexer can be similarly implemented as integrated configurations, and the LPC synthesis filter It is also clear that it can be implemented in almost the same way even if it is replaced with a non-polar type digital filter or the like other than the all-polar type.

（発明の効果）以上説明した如く本発明によれば、マルチパル
スボコーダにおいて、入力音声信号に直流信号を
加算する手段と、前記直流信号を加算された入力
音声信号と音声合成フイルタのインパルス応答と
の相互相関係数を算出する手段と、前記インパル
ス応答の自己相関係数列を算出する手段と、前記
相互相関係数列と前記自己相関係数列との関連性
に基づいてインパルス系列（マルチパルス）の振
幅、位置をフオワード的に算出する手段を分析側
に有することにより、無音、有声音語尾等、電力
の小いさな音声部分についてもLPC合成器の入
力音源レベルを合成器リミツトサイクルよりも十
分に大きなものにすることが可能となり、良好な
合成音を発発し得るという効果がある。(Effects of the Invention) As described above, according to the present invention, in a multipulse vocoder, there is provided a means for adding a DC signal to an input audio signal, and an input audio signal to which the DC signal has been added and an impulse response of a speech synthesis filter. means for calculating a cross-correlation coefficient of the impulse response, means for calculating an autocorrelation coefficient sequence of the impulse response, and calculating an impulse sequence (multipulse) based on the relationship between the cross-correlation coefficient sequence and the autocorrelation coefficient sequence. By having a means to calculate the amplitude and position in a forward manner on the analysis side, the input sound source level of the LPC synthesizer can be adjusted to a level higher than the limit cycle of the synthesizer even for voice parts with low power such as silence and voiced endings. This has the effect of making it possible to make the sound louder, and producing a better synthesized sound.

[Brief explanation of drawings]

第１図は従来のマルチパルス型ボコーダの基本
的構成を示すブロツク図、第２図は本発明による
マルチパルス型ボコーダの分析側の一実施例を示
すブロツク図、第３図は本発明によるマルチパル
ス型ボコーダの合成側の一実施例を示すブロツク
図である。１……LPC合成器、２……LPC分析器、３…
…音源パルス発生器、４……減算器、５……聴感
重み付け器、６……２乗誤差最小化器、７……
LPC分析器、８……相互相関関数算出器、９…
…符号化器(1)、１０……自己相関関数算出器、１
１……マルチパルス算出器、１２……符号化器
(2)、１３……マルチプレクサ、１４……デマルチ
プレクサ、１５……復号化器(1)、１６……復号化
器(2)、１７……LPC合成器、１８……LPF、１
９……直流信号発生器、２０……直流加算器。 FIG. 1 is a block diagram showing the basic configuration of a conventional multi-pulse vocoder, FIG. 2 is a block diagram showing an embodiment of the analysis side of the multi-pulse vocoder according to the present invention, and FIG. 3 is a block diagram showing the basic configuration of a conventional multi-pulse vocoder. FIG. 2 is a block diagram showing an embodiment of the synthesis side of a pulse-type vocoder. 1...LPC synthesizer, 2...LPC analyzer, 3...
... Sound source pulse generator, 4 ... Subtractor, 5 ... Auditory weighting device, 6 ... Square error minimizer, 7 ...
LPC analyzer, 8... Cross correlation function calculator, 9...
... Encoder (1), 10 ... Autocorrelation function calculator, 1
1...multipulse calculator, 12...encoder
(2), 13...Multiplexer, 14...Demultiplexer, 15...Decoder (1), 16...Decoder (2), 17...LPC combiner, 18...LPF, 1
9...DC signal generator, 20...DC adder.

Claims

[Claims]

1 Analyze input audio signal by LPC for each frame
(Linear Prediction Coefficient)
The analyzed and extracted LPC coefficients are used as spectral envelope information, and together with this spectral envelope information, the sound source information that constitutes the audio information of the input audio signal is analyzed, and the generation time position and amplitude corresponding to the features are calculated for each analysis frame. In a multi-pulse type vocoder that analyzes and synthesizes the input audio signal by expressing it with a plurality of impulse sequences (multi-pulses), the multi-pulse vocoder includes means for adding a DC signal to the input audio signal, A multi-pulse vocoder characterized by having a function of determining amplitude and position on an analysis side.