JPH07177031A

JPH07177031A - Voice coding control system

Info

Publication number: JPH07177031A
Application number: JP5318862A
Authority: JP
Inventors: Yoshiaki Tanaka; 良紀田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-12-20
Filing date: 1993-12-20
Publication date: 1995-07-14

Abstract

PURPOSE:To improve reproducing sound quality by preventing foldover distortion by suppressing the change of each remainder pitch waveform in a frame, and sampling a representative remainder pitch waveform from a smoothed remainder pitch waveform. CONSTITUTION:A shift part 16 performs the phase matching of each remainder pitch, and a Fourier transformation part 17 performs the Fourier transformation of the remainder pitch waveform. Two-dimensional Fourier transformation for the remainder pitch waveform is performed by the transformation part 17 and a Fourier transformation part 19 at the next stage. Since a high-order component represents large change between the remainder pitch waveforms as a result of two-dimensional Fourier transformation, smoothing processing is performed by suppressing a component with such large change, and such processing is performed at a coefficient processing part 20. Thence, inverse Fourier transformation is performed at a two-dimensional inverse Fourier transformation part 21, and the smoothed remainder pitch waveform can be found. The latest remainder pitch waveform is sampled at a representative remainder pitch waveform sampling part 22 as the representative remainder pitch waveform of the frame, and it is quantized at a quantization part 23, then, it is sent to a reception side.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号を高能率符号
化する音声符号化制御方式に関する。近年、企業内通信
システム，ディジタル移動無線通信システム，音声蓄積
システム等に於いては、音声信号を高能率符号化して、
回線利用効率の向上や蓄積容量の削減等が図られてい
る。このような音声信号の高能率符号化に於いて、再生
音声品質の改善が要望されている。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice coding control system for highly efficiently coding a voice signal. In recent years, in enterprise communication systems, digital mobile radio communication systems, voice storage systems, etc., voice signals have been encoded with high efficiency,
It is attempting to improve the line utilization efficiency and reduce the storage capacity. In such high-efficiency encoding of audio signals, improvement of reproduced audio quality is desired.

【０００２】[0002]

【従来の技術】音声信号の高能率符号化は既に各種の方
式が提案されている。例えば、線形予測分析により求め
た線形予測係数と、音源に関するパラメータとを伝送す
る方式が広く採用されている。2. Description of the Related Art Various systems have already been proposed for high-efficiency coding of voice signals. For example, a method of transmitting a linear prediction coefficient obtained by linear prediction analysis and a parameter regarding a sound source is widely adopted.

【０００３】図５は線形予測符号化部の説明図であり、
５１は線形予測分析部、５２は予測係数量子化部、５３
は逆フィルタ部、５４は残差信号量子化部、５５は多重
化部を示す。FIG. 5 is an explanatory diagram of the linear predictive coding unit.
51 is a linear prediction analysis unit, 52 is a prediction coefficient quantization unit, 53
Is an inverse filter unit, 54 is a residual signal quantization unit, and 55 is a multiplexing unit.

【０００４】入力音声信号は、線形予測分析部５１と逆
フィルタ部５３とに加えられ、線形予測分析部５１に於
いて、分析フレーム毎に線形予測分析により予測係数が
求められ、予測係数量子化部５２に於いて量子化され
る。この予測係数を係数として、入力音声信号が加えら
れる逆フィルタ部５３から予測残差信号が出力され、残
差信号量子化部５４に於いて量子化される。多重化部５
５は、量子化された予測残差信号と予測係数とを多重化
して、伝送路を介して受信側へ送出する。The input speech signal is added to the linear prediction analysis unit 51 and the inverse filter unit 53, and in the linear prediction analysis unit 51, the prediction coefficient is obtained by the linear prediction analysis for each analysis frame, and the prediction coefficient quantization is performed. It is quantized in the part 52. The prediction residual signal is output from the inverse filter unit 53 to which the input audio signal is added, using this prediction coefficient as a coefficient, and is quantized in the residual signal quantization unit 54. Multiplexer 5
5 multiplexes the quantized prediction residual signal and the prediction coefficient, and sends them to the receiving side via the transmission path.

【０００５】受信側に於ける復号化部では、多重化され
た予測残差信号と予測係数とを分離し、予測係数を係数
とする合成フィルタに予測残差信号を加えて、音声信号
を再生することになる。The decoding unit on the receiving side separates the multiplexed prediction residual signal and the prediction coefficient, adds the prediction residual signal to a synthesis filter having the prediction coefficient as a coefficient, and reproduces a voice signal. Will be done.

【０００６】予測残差信号の効率的な伝送の為に、一定
長の予測残差信号をベクトル量子化し、そのインデック
スを伝送するコード駆動線形予測符号化方式ＣＥＬＰ
や、一定長の予測残差信号を有限個のパルス列でモデル
化し、最適なパルス位置及びパルス振幅を伝送するマル
チパルス駆動符号化方式ＭＰＣ等の各種の方式が知られ
ている。For efficient transmission of a prediction residual signal, a code-driven linear predictive coding method CELP for vector-quantizing a prediction residual signal of a fixed length and transmitting its index.
In addition, various methods such as a multi-pulse drive coding method MPC in which a prediction residual signal of a constant length is modeled by a finite number of pulse trains and an optimum pulse position and pulse amplitude are transmitted are known.

【０００７】又代表波形補間方式（ＰＷＩ；Ｐrototype
Ｗaveform Ｉnterpolation）は、４ｋｐｂｓ以下の低
ビットレートに於いても、再生音声品質の劣化が少ない
方式として知られている。この方式は、入力音声信号の
分析フレームに於ける残差信号の中から代表的な１ピッ
チ波形を抽出して量子化するもので、復号化側では、複
数ピッチ波形を含む１フレーム中の代表残差ピッチ波形
以外の残差信号は、前フレームに於ける代表残差ピッチ
波形と、現フレームに於ける代表残差ピッチ波形とを補
間することにより求めるものである。A representative waveform interpolation method (PWI; Prototype)
Waveform Interpolation) is known as a method in which the reproduced voice quality is less deteriorated even at a low bit rate of 4 kpbs or less. This method extracts a typical one-pitch waveform from the residual signal in the analysis frame of the input speech signal and quantizes it. The residual signal other than the residual pitch waveform is obtained by interpolating the representative residual pitch waveform in the previous frame and the representative residual pitch waveform in the current frame.

【０００８】図６はＰＷＩ方式の説明図であり、Ｌサン
プルの長さの現フレームの代表残差ピッチ波形をＺ_n、
そのピッチ周期をＮ_nとし、前フレームの代表残差ピッ
チ波形をＺ_n-1、そのピッチ周期をＮ_n-1として示して
おり、１フレーム内の複数残差ピッチ波形の中の一つの
残差ピッチ波形を代表として量子化するものであり、復
号化側では、現フレームの代表残差ピッチ波形Ｚ_nを除
いた残りの残差ピッチ波形Ｐ_nは、前フレームの代表残
差ピッチ波形Ｚ_n-1を巡回シフトにより、現フレームの
代表残差ピッチ波形Ｚ_nと位相を合わせた後、補間によ
り得ることができるものである。即ち、次式に示す線形
補間によって求めることができる。Ｐ(k) ＝（１−α(k) ）Ｚ_n-1(k) ＋α(k) Ｚ_n(k) …（１）但し、０≦α(k) ≦１Ｐ_n＝Ｐ_n(0) ，Ｐ_n(1) ，・・・Ｐ_n(N-1) Ｚ_n＝Ｚ_n(0) ，Ｚ_n(1) ，・・・Ｚ_n(N-1) Ｚ_n-1＝Ｚ_n-1(0) ，Ｚ_n-1(1) ，・・・Ｚ_n-1(N-1)FIG. 6 is an explanatory view of the PWI system, in which a representative residual pitch waveform of the current frame having a length of L samples is represented by Z _n ,
The pitch cycle is shown as N _n , the representative residual pitch waveform of the previous frame is shown as Z _n−1 , and the pitch cycle is shown as N _n−1 , and one of the plural residual pitch waveforms in one frame is shown. The difference pitch waveform is quantized as a representative, and on the decoding side, the remaining residual pitch waveform P _n excluding the representative residual pitch waveform Z _n of the current frame is the representative residual pitch waveform Z of the previous frame. _It can be obtained by performing interpolation after cyclic matching of _n−1 with the representative residual pitch waveform Z _n of the current frame. That is, it can be obtained by the linear interpolation shown in the following equation. P (k) = (1-α (k)) Z _n-1 (k) + α (k) Z _n (k) (1) where 0 ≦ α (k) ≦ 1 P _n = P _n (0 ), P _n (1), ... P _n (N-1) Z _n = Z _n (0), Z _n (1), ... Z _n (N-1) Z _n-1 = Z _{n -1} (0), Z _n-1 (1), ... Z _n-1 (N-1)

【０００９】[0009]

【発明が解決しようとする問題点】従来例の前述のＰＷ
Ｉ方式は、残差信号を代表残差ピッチ波形の補間によっ
て求めるものであるが、フレーム内の連続する残差ピッ
チ波形に、フレーム周波数以上の変動成分があると、代
表残差ピッチ波形以外の残差ピッチ波形を生成する補間
処理に於いて折返歪が発生し、実際の残差信号との誤差
が大きくなる。それによって、再生音声品質が劣化する
問題があった。本発明は、折返歪の発生を防止して、再
生音声品質を向上させることを目的とする。[Problems to be Solved by the Invention]
In the I method, the residual signal is obtained by interpolating the representative residual pitch waveform. However, if a continuous residual pitch waveform in a frame has a fluctuation component equal to or higher than the frame frequency, the residual residual waveform other than the representative residual pitch waveform is detected. In the interpolation process for generating the residual pitch waveform, aliasing distortion occurs, and the error from the actual residual signal becomes large. As a result, there is a problem that the quality of reproduced voice is deteriorated. It is an object of the present invention to prevent the occurrence of aliasing distortion and improve reproduced voice quality.

【００１０】[0010]

【課題を解決するための手段】本発明の音声符号化制御
方式は、図１を参照して説明すると、入力音声信号を分
析部１に於いてフレーム毎に線形予測残差信号を求め、
代表残差ピッチ波形抽出部２に於いてフレーム内の代表
残差ピッチ波形を抽出し、この代表残差ピッチ波形を符
号化情報の一つとして送出し、この符号化情報の一つの
代表残差ピッチ波形を受信した補間処理復号化部３に於
いて、現フレームの代表残差ピッチ波形と前フレームの
代表残差ピッチ波形とを用いた補間処理により、現フレ
ーム内の残差ピッチ波形を求める音声符号化制御方式に
於いて、フレーム内の各残差ピッチ波形の変化を平滑化
部４に於いて平滑化し、代表残差ピッチ波形抽出部２に
於いて代表残差ピッチ波形の抽出を行うものである。The speech coding control system of the present invention will be described with reference to FIG. 1. The input speech signal is analyzed by the analysis unit 1 to obtain a linear prediction residual signal for each frame.
The representative residual pitch waveform extracting unit 2 extracts a representative residual pitch waveform in the frame, sends this representative residual pitch waveform as one piece of encoded information, and then outputs one representative residual of this encoded information. In the interpolation processing decoding unit 3 which has received the pitch waveform, the residual pitch waveform in the current frame is obtained by the interpolation processing using the representative residual pitch waveform of the current frame and the representative residual pitch waveform of the previous frame. In the speech coding control method, the smoothing unit 4 smoothes changes in each residual pitch waveform in the frame, and the representative residual pitch waveform extracting unit 2 extracts the representative residual pitch waveform. It is a thing.

【００１１】又平滑化部４は、残差ピッチ波形を二次元
フーリエ変換し、この二次元フーリエ変換により求めた
変換係数を平滑化処理した後に、二次元逆フーリエ変換
して、代表残差ピッチ波形抽出部２に加える構成とする
ことができる。The smoothing unit 4 performs a two-dimensional Fourier transform on the residual pitch waveform, smoothes the transform coefficient obtained by the two-dimensional Fourier transform, and then performs a two-dimensional inverse Fourier transform to obtain a representative residual pitch. The configuration may be added to the waveform extracting unit 2.

【００１２】又平滑化部４は、二次元フーリエ変換過程
に於いて、変換係数の正規化を行う構成とすることがで
きる。Further, the smoothing unit 4 can be configured to normalize the transform coefficient in the two-dimensional Fourier transform process.

【００１３】[0013]

【作用】分析部１に於いては、入力音声信号を所定のフ
レーム毎に線形予測分析を行って残差ピッチ波形を出力
し、平滑化部４は、フレーム内の残差ピッチ波形間の変
化が大きくならないように平滑化処理する。そして、代
表残差ピッチ波形抽出部２に於いてフレーム内の残差ピ
ッチ波形から代表を一つ選択して代表残差ピッチ波形と
し、これを入力音声信号の符号化情報の一つとして送出
する。従って、補間処理復号化部３に於いて、現フレー
ムの代表残差ピッチ波形と、前フレームの代表残差ピッ
チ波形とを用いて補間処理し、現フレームの他の残差ピ
ッチ波形を求めた後、音声信号を再生する場合に、折返
歪を抑制することができるから、再生音声の品質を改善
することができる。In the analyzing unit 1, the input speech signal is subjected to the linear prediction analysis for each predetermined frame to output the residual pitch waveform, and the smoothing unit 4 changes between the residual pitch waveforms in the frame. Is smoothed so that does not become large. Then, in the representative residual pitch waveform extracting section 2, one representative is selected from the residual pitch waveforms in the frame to make a representative residual pitch waveform, and this is sent as one of the encoded information of the input speech signal. . Therefore, in the interpolation processing decoding unit 3, interpolation processing is performed using the representative residual pitch waveform of the current frame and the representative residual pitch waveform of the previous frame to obtain another residual pitch waveform of the current frame. After that, when the audio signal is reproduced, aliasing distortion can be suppressed, so that the quality of reproduced audio can be improved.

【００１４】又平滑化部４は、残差ピッチ波形を二次元
フーリエ変換し、その変換係数の高次の成分が残差ピッ
チ波形の変化が大きい部分を示すことから、この高次の
成分を例えば零とし、二次元逆フーリエ変換により残差
ピッチ波形に戻す。即ち、残差ピッチ波形を平滑化する
ことができる。Further, the smoothing section 4 performs a two-dimensional Fourier transform on the residual pitch waveform, and the high-order component of the transform coefficient indicates a portion where the change of the residual pitch waveform is large. For example, the value is set to zero, and the residual pitch waveform is restored by the two-dimensional inverse Fourier transform. That is, the residual pitch waveform can be smoothed.

【００１５】又残差ピッチ波形の変換過程に於いて、列
方向にフーリエ変換を行った後に、変換係数の振幅を例
えば１に正規化し、この正規化された変換係数に対して
行方向のフーリエ変換を行う。即ち、二次元フーリエ変
換過程に於いて変換係数の正規化を行うことができる。Further, in the process of converting the residual pitch waveform, after Fourier transform is performed in the column direction, the amplitude of the transform coefficient is normalized to, for example, 1 and the normalized transform coefficient is Fourier-transformed in the row direction. Do the conversion. That is, the transform coefficients can be normalized in the two-dimensional Fourier transform process.

【００１６】[0016]

【実施例】図２は本発明の実施例の説明図であり、１０
は符号化部、３０は復号化部を示し、１１は有声，無声
判定部（Ｖ／ＵＶ）、１２は逆フィルタ部、１３は線形
予測分析部、１４はピッチ分析部、１５は残差ピッチ波
形形成部、１６はシフト部、１７はフーリエ変換部（Ｄ
ＦＴ）、１８は振幅正規化部（ＮＯＭ）、１９はフーリ
エ変換部（ＤＦＴ）、２０は係数処理部（ＭＯＤ）、２
１は二次元逆フーリエ変換部（２ＤＩＤＦＴ）、２２は
代表残差ピッチ波形抽出部、２３は量子化部、３１はパ
ルス発生部、３２は波形成形部、３３は有声，無声切替
部、３４はノイズ発生部、３５は増幅器、３６は予測合
成フィルタ部、３７は補間処理部である。EXAMPLE FIG. 2 is an explanatory view of an example of the present invention.
Is an encoding unit, 30 is a decoding unit, 11 is a voiced / unvoiced determination unit (V / UV), 12 is an inverse filter unit, 13 is a linear prediction analysis unit, 14 is a pitch analysis unit, and 15 is a residual pitch. Waveform forming unit, 16 is a shift unit, 17 is a Fourier transform unit (D
FT), 18 is an amplitude normalization unit (NOM), 19 is a Fourier transform unit (DFT), 20 is a coefficient processing unit (MOD), 2
1 is a two-dimensional inverse Fourier transform unit (2D IDFT), 22 is a representative residual pitch waveform extraction unit, 23 is a quantization unit, 31 is a pulse generation unit, 32 is a waveform shaping unit, 33 is a voiced / unvoiced switching unit, and 34 is A noise generation unit, 35 is an amplifier, 36 is a prediction synthesis filter unit, and 37 is an interpolation processing unit.

【００１７】送信側の符号化部１０に於いては、入力音
声信号を、分析フレーム毎に有声，無声判定部１１に於
いて有声音であるか無声音であるかを判定し、その判定
信号を図示を省略した多重化部に加えて、受信側へ音声
信号の符号化情報の一つとして送出する。In the encoding section 10 on the transmission side, the input voice signal is judged for each analysis frame in the voiced / unvoiced judgment section 11 to determine whether it is a voiced sound or an unvoiced sound, and the judgment signal is determined. In addition to the multiplexing unit (not shown), it is sent to the receiving side as one piece of encoded information of the audio signal.

【００１８】又線形予測分析部１３に於いては、入力音
声信号から予測係数を求め、ピッチ分析部１４に於いて
はピッチ周期Ｎを求める。又逆フィルタ部１２に於いて
は、線形予測分析部１３からの予測係数を用いて入力音
声信号に対して逆フィルタ処理を行って予測残差信号Ｒ
を求め、残差ピッチ波形形成部１５に於いては、ピッチ
分析部１４で求めたピッチ周期Ｎを基に、残差ピッチ波
形の切り出し処理を行う。このような処理を行う構成
は、既に知られたＰＷＩ方式に於ける残差ピッチ波形を
求める構成と同様な構成で実現できる。The linear prediction analysis unit 13 obtains a prediction coefficient from the input speech signal, and the pitch analysis unit 14 obtains the pitch period N. In the inverse filter unit 12, the prediction residual signal R is obtained by performing an inverse filter process on the input speech signal using the prediction coefficient from the linear prediction analysis unit 13.
Then, the residual pitch waveform forming section 15 performs a cutting process of the residual pitch waveform based on the pitch cycle N obtained by the pitch analyzing section 14. The configuration for performing such processing can be realized by a configuration similar to the configuration for obtaining the residual pitch waveform in the already known PWI method.

【００１９】又シフト部１６は、残差ピッチ波形を巡回
シフトして、各残差ピッチ波形の位相を合わせ、フーリ
エ変換部１７は残差ピッチ波形のフーリエ変換を行う、
このフーリエ変換部１７と、次のフーリエ変換部１９と
により残差ピッチ波形についての二次元フーリエ変換処
理を行うものであり、その過程に於いて振幅正規化部１
８に於いて正規化処理を行う。The shift unit 16 cyclically shifts the residual pitch waveform to match the phases of the residual pitch waveforms, and the Fourier transform unit 17 performs the Fourier transform of the residual pitch waveform.
The Fourier transform unit 17 and the next Fourier transform unit 19 perform two-dimensional Fourier transform processing on the residual pitch waveform, and in the process, the amplitude normalization unit 1
In step 8, normalization processing is performed.

【００２０】二次元フーリエ変換の結果、高次の成分
は、残差ピッチ波形間で大きく変化する成分を示すか
ら、このような変化が大きい成分を抑圧することによ
り、平滑化処理を行うものであり、係数処理部２０に於
いて前述の処理を行い、次に二次元逆フーリエ変換部２
１に於いて逆フーリエ変換処理を行い、平滑化された残
差ピッチ波形を求める。そして、代表残差ピッチ波形抽
出部２２に於いて時間的に最新の残差ピッチ波形を、こ
のフレームに於ける代表残差ピッチ波形として抽出し、
量子化部２３に於いて量子化して受信側へ送出する。受
信側へは、この代表残差ピッチ波形と、ピッチ周期Ｎ
と、有声／無声判定信号とを多重化して送出することに
なる。As a result of the two-dimensional Fourier transform, the higher-order component shows a component that greatly changes between the residual pitch waveforms. Therefore, by suppressing the component having such a large change, smoothing processing is performed. Yes, the coefficient processing unit 20 performs the above-described processing, and then the two-dimensional inverse Fourier transform unit 2
In step 1, inverse Fourier transform processing is performed to obtain a smoothed residual pitch waveform. Then, the representative residual pitch waveform extracting section 22 extracts the temporally latest residual pitch waveform as a representative residual pitch waveform in this frame,
The quantizer 23 quantizes and quantizes and sends it to the receiving side. To the receiving side, this representative residual pitch waveform and pitch cycle N
And the voiced / unvoiced determination signal are multiplexed and transmitted.

【００２１】受信側の復号化部３０に於いては、ピッチ
周期Ｎに従ってパルス発生部３１からパルスを発生し、
又補間処理部３７に於いて代表残差ピッチ波形を用いて
補間処理する。即ち、前フレームの代表残差ピッチ波形
と、現フレームの代表残差ピッチ波形とを用いて、現フ
レームの他の残差ピッチ波形を補間処理によって求めて
波形成形部３２に加えて、有声音の波形を生成する。In the decoding section 30 on the receiving side, pulses are generated from the pulse generation section 31 in accordance with the pitch cycle N,
In addition, the interpolation processing unit 37 performs interpolation processing using the representative residual pitch waveform. That is, by using the representative residual pitch waveform of the previous frame and the representative residual pitch waveform of the current frame, another residual pitch waveform of the current frame is obtained by interpolation processing and added to the waveform shaping section 32 to generate the voiced sound. Generate the waveform of.

【００２２】又図示を省略した経路による有声音と無声
音との判定信号によって有声，無声切替部３３が切替制
御され、無声音の場合は白色ノイズを発生するノイズ発
生部３４側に切替え、有声音の場合は波形成形部３２側
に切替える。又増幅器３５は図示を省略した経路による
フレーム電力情報に従ってゲインが設定される。そし
て、増幅された信号は予測合成フィルタ部３６に加えら
れて、音声信号が再生される。Further, the voiced / unvoiced switching section 33 is switch-controlled by a decision signal of voiced sound and unvoiced sound by a route not shown in the figure. In this case, the waveform shaping section 32 is switched to. Further, the gain of the amplifier 35 is set in accordance with the frame power information by a path (not shown). Then, the amplified signal is added to the prediction synthesis filter unit 36 to reproduce the audio signal.

【００２３】逆フィルタ部１２からの予測残差信号ベク
トルＲは、各残差ピッチ波形Ｐ_iが連続した波形と見做
すことができるから、Ｒ＝（Ｐ₀，Ｐ₁，Ｐ２，・・・Ｐ_M-1） …（２）Ｐ_i＝（ｐ(i,0),ｐ(i,1) ，ｐ(i,2) ，・・・ｐ(i,N_i) ）０≦ｉ≦（Ｍ−１）と表すことができる。なお、Ｎはピッチ周期、Ｍはフレ
ーム内に含まれる残差ピッチ波形の数を示す。Since the prediction residual signal vector R from the inverse filter unit 12 can be regarded as a waveform in which each residual pitch waveform P _i is continuous, R = (P ₀ , P ₁ , P2, ... P _M-1 ) (2) P _i = (p (i, 0), p (i, 1), p (i, 2), ... P (i, N _i )) 0 ≦ i ≦ It can be expressed as (M-1). Note that N indicates the pitch period, and M indicates the number of residual pitch waveforms included in the frame.

【００２４】図３は残差ピッチ波形の抽出説明図であ
り、ピッチ分析部１４に於いて入力音声信号から現フレ
ームのピッチ周期Ｎを求め、この現フレームの最新の時
刻から遡ってピッチ周期Ｎで予測残差信号ベクトルＲか
ら残差ピッチ波形Ｐ_iを切り出す。この場合、時間的に
最も新しい残差ピッチ波形Ｐ₂に対して、最も古い残差
ピッチ波形Ｐ₀は、前フレームの波形の一部を含む場合
を示している。FIG. 3 is an explanatory diagram of extraction of the residual pitch waveform. The pitch analysis unit 14 finds the pitch period N of the current frame from the input voice signal, and the pitch period N is traced back from the latest time of this current frame. The residual pitch waveform P _i is cut out from the prediction residual signal vector R at. In this case, the oldest residual pitch waveform P ₀ is shown to include a part of the waveform of the previous frame with respect to the temporally newest residual pitch waveform P ₂ .

【００２５】次に、シフト部１６に於いて各残差ピッチ
波形Ｐ_iを、ｐ(i,0) が最大値となるように巡回シフト
して、各残差ピッチ波形Ｐ_iの位相を揃える。これをあ
らためてＰ_iとすると、予測残差信号ベクトルＲは、次
の（３）式のＲ＝〔〕として示すように、Ｎ×Ｍ行列
の形で表すことができる。 Next, in the shift section 16, each residual pitch waveform P _i is cyclically shifted so that p (i, 0) becomes the maximum value, and the phases of each residual pitch waveform P _i are aligned. . Letting this be P _i again, the prediction residual signal vector R can be expressed in the form of an N × M matrix as shown by R = [] in the following equation (3).

【００２６】この（３）式を二次元フーリエ変換する。
即ち、フーリエ変換部１７，１９により列方向と行方向
とに順番にフーリエ変換を行う。この場合、フーリエ変
換部１７により列方向にフーリエ変換を行った後、振幅
正規化部１８に於いて各変換係数の振幅を１に正規化
し、次のフーリエ変換部１９に於いて正規化された変換
係数に対して行方向にフーリエ変換する。この正規化に
より、代表残差ピッチ波形の量子化効率を向上すること
ができる。The equation (3) is two-dimensionally Fourier transformed.
That is, the Fourier transform units 17 and 19 sequentially perform the Fourier transform in the column direction and the row direction. In this case, the Fourier transform unit 17 performs the Fourier transform in the column direction, the amplitude normalization unit 18 normalizes the amplitude of each transform coefficient to 1, and the next Fourier transform unit 19 normalizes the amplitude. Fourier transform is performed in the row direction on the transform coefficient. By this normalization, the quantization efficiency of the representative residual pitch waveform can be improved.

【００２７】フーリエ変換部１７，１９による二次元フ
ーリエ変換によって得られた変換係数ｃ(i,j) （但し、
ｉ＝０，１，２，・・・Ｎ／２，ｊ＝０，１，２，・・
・Ｍ−１）は、次の（４）式のＣ＝〔〕として示すよ
うに、（Ｎ／２）×Ｍの行列の形で表すことができる。 The transform coefficient c (i, j) obtained by the two-dimensional Fourier transform by the Fourier transform units 17 and 19 (however,
i = 0,1,2, ... N / 2, j = 0,1,2, ...
-M-1) can be represented in the form of a matrix of (N / 2) * M as shown as C = [] in the following equation (4).

【００２８】有声音区間に於いては、連続する残差ピッ
チ波形の間には高い相関がある。例えば、図４の（ａ）
に示すフレームの音声信号波形について、残差ピッチ波
形の二次元フーリエ変換を行ったところ、（ｂ）に示す
結果が得られた。即ち、変換係数ｃ(i,j) のｊの値が小
さい程、変換係数の値が大きくなる。このｊに関して低
次の成分は、各残差ピッチ波形間でゆっくり変化する成
分であり、高次の成分は速く変化する成分である。In the voiced sound section, there is a high correlation between successive residual pitch waveforms. For example, in FIG.
When the two-dimensional Fourier transform of the residual pitch waveform was performed on the audio signal waveform of the frame shown in (2), the result shown in (b) was obtained. That is, the smaller the value of j of the conversion coefficient c (i, j), the larger the value of the conversion coefficient. The low-order component with respect to j is a component that changes slowly between the residual pitch waveforms, and the high-order component is a component that changes rapidly.

【００２９】そこで、二次元フーリエ変換係数ｃ(i,j)
の中で、ｊの値が次の（５）式で示す閾値Ｋ（Ｍ，Ｎ，
ｆ_cut）より大きい係数を零として、平滑化処理を行
う。即ち、係数処理部２０に於いて変換係数ｃ(i,j) に
ついての処理を行う。Ｋ（Ｍ，Ｎ，ｆ_cut）＝ＭＮｆ_cut／ｆ_s …（５）ｆ_cut＝ｆ_F／２但し、ｆ_cutは遮断周波数、ｆ_sはサンプリング周波
数、ｆ_Fはフレーム周波数である。Therefore, the two-dimensional Fourier transform coefficient c (i, j)
Where the value of j is the threshold value K (M, N,
The smoothing process is performed with a coefficient larger than f _cut ) set to zero. That is, the coefficient processing unit 20 processes the conversion coefficient c (i, j). K (M, N, f _cut ) = MN f _cut / f _s (5) f _cut = f _F / 2 where f _cut is the cutoff frequency, f _s is the sampling frequency, and f _F is the frame frequency.

【００３０】遮断周波数ｆ_cutをフレーム周波数ｆ_Fの
１／２に選定するのは、補間処理に於いて、このフレー
ム周波数ｆ_Fの１／２より大きい変動成分については折
返歪が発生することになり、従って、再生音声品質が劣
化する。しかし、前述のように平滑化することにより、
補間処理に於ける折返歪の発生を回避できるから、再生
音声品質を改善することができる。The _cut- off frequency f _cut is selected to be 1/2 of the frame frequency f _{F because} , in the interpolation process, aliasing distortion occurs for a fluctuation component larger than 1/2 of the frame frequency f _F. Therefore, the quality of reproduced voice is deteriorated. However, by smoothing as described above,
Since it is possible to avoid the occurrence of aliasing distortion in the interpolation processing, it is possible to improve the reproduced voice quality.

【００３１】係数処理部２０に於いて平滑化処理した二
次元フーリエ変換係数について、二次元逆フーリエ変換
部２１に於いて逆フーリエ変換し、平滑化された残差ピ
ッチ波形を求める。そして、代表残差ピッチ波形抽出部
２２に於いて、時間的に最も新しい残差ピッチ波形Ｐ
_M-1を、このフレームに於ける代表残差ピッチ波形とし
て抽出し、量子化部２３に於いて量子化する。そして、
前述のように、ピッチ周期Ｎと有声／無声判定信号等と
をそれぞれ量子化して多重化し、受信側へ送出すること
になる。The two-dimensional Fourier transform coefficient smoothed by the coefficient processing unit 20 is inverse Fourier transformed by the two-dimensional inverse Fourier transform unit 21 to obtain a smoothed residual pitch waveform. Then, in the representative residual pitch waveform extracting unit 22, the temporally newest residual pitch waveform P
_M-1 is extracted as a representative residual pitch waveform in this frame, and quantized in the quantizer 23. And
As described above, the pitch period N and the voiced / unvoiced determination signal are quantized and multiplexed, respectively, and sent to the receiving side.

【００３２】受信側の前述の復号化部３０に於いては、
代表残差ピッチ波形を用いて補間処理部３７で補間処理
し、フレーム内の残差ピッチ波形を再生して、有声音を
再生することができる。In the decoding section 30 on the receiving side,
Interpolation processing is performed by the interpolation processing unit 37 using the representative residual pitch waveform, and the residual pitch waveform in the frame is reproduced, so that the voiced sound can be reproduced.

【００３３】[0033]

【発明の効果】以上説明したように、本発明は、入力音
声信号を分析するフレーム毎に線形予測残差信号を求
め、代表残差ピッチ波形を抽出して送出し、その代表残
差ピッチ波形を受信して、前フレームの代表残差ピッチ
波形と現フレームの代表残差ピッチ波形との補間処理に
より、現フレーム内の残差ピッチ波形を再生する音声符
号化制御方式に於いて、フレーム内の残差ピッチ波形の
変化を平滑化部４に於いて平滑化するものであり、それ
によって、補間処理した場合の折返歪の発生を防止し、
再生音声品質を改善することができる利点がある。As described above, according to the present invention, a linear prediction residual signal is obtained for each frame in which an input speech signal is analyzed, a representative residual pitch waveform is extracted and transmitted, and the representative residual pitch waveform is extracted. In the voice coding control method for reproducing the residual pitch waveform in the current frame by interpolating the typical residual pitch waveform in the previous frame and the representative residual pitch waveform in the current frame. The smoothing unit 4 smoothes the change in the residual pitch waveform of, thereby preventing the occurrence of aliasing distortion in the case of interpolation processing,
There is an advantage that the reproduced voice quality can be improved.

[Brief description of drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の実施例の説明図である。FIG. 2 is an explanatory diagram of an example of the present invention.

【図３】残差ピッチ波形の抽出説明図である。FIG. 3 is an explanatory diagram for extracting a residual pitch waveform.

【図４】二次元フーリエ変換係数の説明図である。FIG. 4 is an explanatory diagram of a two-dimensional Fourier transform coefficient.

【図５】線形予測符号化部の説明図である。FIG. 5 is an explanatory diagram of a linear predictive coding unit.

【図６】ＰＷＩ方式の説明図である。FIG. 6 is an explanatory diagram of a PWI method.

[Explanation of symbols]

１分析部２代表残差ピッチ波形抽出部３補間処理復号化部４平滑化部 1 analysis unit 2 representative residual pitch waveform extraction unit 3 interpolation processing decoding unit 4 smoothing unit

Claims

[Claims]

1. An input speech signal is obtained by an analyzing unit (1) for a linear prediction residual signal for each frame, and a representative residual pitch waveform extracting unit (2) obtains a representative residual pitch waveform within a frame. In the interpolation processing decoding unit (3) that has extracted and transmitted the representative residual pitch waveform as one piece of encoded information, and received one representative residual pitch waveform of the encoded information, In a speech coding control method for obtaining a residual pitch waveform in the current frame by an interpolation process using the representative residual pitch waveform and the representative residual pitch waveform of the previous frame, each residual pitch waveform in the frame The speech coding control method is characterized in that the smoothing section (4) smoothes the change of the above and the representative residual pitch waveform extracting section (2) extracts the representative residual pitch waveform.

2. The smoothing unit (4) performs a two-dimensional Fourier transform on the residual pitch waveform, smoothes the transform coefficient obtained by the two-dimensional Fourier transform, and then performs a two-dimensional inverse Fourier transform. , The representative residual pitch waveform extraction unit (2)
The voice coding control system according to claim 1, wherein the voice coding control system is added to the above.

3. The speech coding control system according to claim 2, wherein the smoothing unit (4) has a configuration for normalizing transform coefficients in the two-dimensional Fourier transform process.