JP2001147700A

JP2001147700A - Method and device for sound signal postprocessing and recording medium with program recorded

Info

Publication number: JP2001147700A
Application number: JP33138899A
Authority: JP
Inventors: Naka Omuro; 仲大室; Kazunori Mano; 一則間野
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-11-22
Filing date: 1999-11-22
Publication date: 2001-05-29
Anticipated expiration: 2019-11-22
Also published as: JP3559485B2

Abstract

PROBLEM TO BE SOLVED: To reproduce a sound signal of high quality from an encoded bit sequence. SOLUTION: The postprocessing method of sound signal includes a stage where a sound signal is made to pass a linear filter 32 showing reverse characteristics of spectrum envelope, a stage where a peak position is detected from the signal passing the linear filter by a peak position detection part 33 and a waveform of one-pitch length is extracted by a signal waveform segmenting part 34, and a stage where peaks of mutual correlation are successively searched by a pitch reference position search part 35, and the signal which has passed the linear filter is divided into areas which correspond to the waveform of one-pitch length by a boundary area determination part 36. and a comb filter coefficient is calculated in a pitch correlation value calculation part 37, and the signal passing the linear filter in each area is made to pass a comb line filter 38 using this coefficient to obtain a sound signal output.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、フレームごとに
入力される音声信号のピッチ成分強調による品質向上を
目的とした音声信号の後処理方法に関し、特に、音声信
号のスペクトル包絡特性を表すフィルタを音源ベクトル
で駆動して音声を合成する予測符号化および復号化にお
いて、符号化されたビット系列から高品質な音声信号を
再生する音声復号化に適用して有効な音声信号の後処理
方法および装置並びにプログラムを記録した記録媒体に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio signal post-processing method for improving the quality of a speech signal input for each frame by enhancing pitch components, and more particularly, to a filter for representing a spectral envelope characteristic of the audio signal. In predictive coding and decoding for driving a sound source vector to synthesize a voice, a post-processing method and apparatus effective for a voice signal applied to voice decoding for reproducing a high-quality voice signal from a coded bit sequence And a recording medium on which the program is recorded.

【０００２】[0002]

【従来の技術】ディジタル移動体通信において、電波を
効率的に利用したり、音声または音楽蓄積サービス等で
通信回線や記憶媒体を効率的に利用するために、高能率
音声符号化方法が用いられている。現在、音声を高能率
に符号化する方法として、原音声をフレームまたはサブ
フレーム（以下、総称してフレーム）と呼ばれる５〜50
msec程度の一定間隔の区間に分割し、その１フレームの
音声を周波数スペクトルの包絡特性を表す線形フィルタ
の特性と、そのフィルタを駆動するための駆動音源信号
との２つの情報に分離し、それぞれを符号化する手法が
提案されている。この手法において、駆動音源信号を符
号化する方法として、音声のピッチ周期（基本周波数）
に対応すると考えられる周期成分と、それ以外の成分に
分離して符号化する方法が知られている。この駆動音源
情報の符号化法の例として、符号駆動線形予測符号化
（Code-Excited Linear Prediction:ＣＥＬＰ）があ
る。上記の詳細については、文献MR．Schroeder and B.
S.Atal,“Code-Excited Linear Prediction(ＣＥＬ
Ｐ):High Quality Speech at Very Low Bit Rates”，I
EEE Proc.ICASSP-85，pp.937-940，1985に記載されてい
る。2. Description of the Related Art In digital mobile communication, a high-efficiency voice coding method is used in order to efficiently use radio waves or to efficiently use a communication line or a storage medium for a voice or music storage service. ing. At present, as a method for encoding speech efficiently, original speech is called a frame or a subframe (hereinafter referred to as a frame).
The sound of one frame is divided into two information, that is, a linear filter characteristic representing an envelope characteristic of a frequency spectrum and a driving sound source signal for driving the filter. Has been proposed. In this method, as a method of encoding a drive excitation signal, a pitch period (basic frequency) of a voice is used.
There is known a method of separating and encoding a periodic component considered to correspond to the following and other components. Code-Excited Linear Prediction (CELP) is an example of an encoding method of the driving excitation information. For details of the above, see the document MR. Schroeder and B.
S. Atal, “Code-Excited Linear Prediction (CEL
P): High Quality Speech at Very Low Bit Rates ”, I
EEE Proc. ICASSP-85, pp. 937-940, 1985.

【０００３】図１に上記符号化部１の構成例を示す。入
力端子に入力された音声Xは、線形予測分析部２におい
て、入力音声の周波数スペクトル包絡特性を表す線形予
測パラメータが計算される。得られた線形予測パラメー
タは線形予測パラメータ符号化部３において、量子化お
よび符号化され、量子化されたパラメータは合成フィル
タ係数ｋとして、合成フィルタ12に送られる。FIG. 1 shows an example of the configuration of the encoding unit 1. For the speech X input to the input terminal, the linear prediction analysis unit 2 calculates a linear prediction parameter representing a frequency spectrum envelope characteristic of the input speech. The obtained linear prediction parameters are quantized and coded in the linear prediction parameter coding unit 3, and the quantized parameters are sent to the synthesis filter 12 as synthesis filter coefficients k.

【０００４】なお、線形予測分析の詳細および線形予測
パラメータの符号化例については、例えば、古井貞煕著
“ディジタル音声処理”（東海大学出版会）に記載され
ている。ここで、線形予測分析部２、線形予測パラメー
タ符号化部３および合成フィルタ12は非線形なものに置
き換えてもよい。駆動音源ベクトル生成部５では、１フ
レーム分の長さの駆動音源ベクトル候補を生成し、合成
フィルタ12に送る。駆動音源ベクトル生成部５では、１
フレーム分の長さの駆動音源ベクトル候補を生成し、合
成フィルタ12に送る。駆動音源ベクトル生成部５は一般
に適応符号帳６と固定符号帳７で構成することが多い。
適応符号帳６からはバッファに記憶された直前の過去の
駆動音源ベクトル（既に量子化された直前の１〜数フレ
ーム分の駆動音源ベクトル）を、ある周期に相当する長
さで切り出し、その切り出したベクトルをフレームの長
さになるまで繰り返すことによって、音声の周期成分に
対応する時系列ベクトルの候補が出力される。上記「あ
る周期」とは、歪み計算部13における歪ｄが小さくなる
ような周期が選択されるが、選択された周期は、一般に
は音声のピッチ周期に相当することが多い。固定符号帳
７からは、音声の非周期成分に対応する１フレーム分の
長さの時系列符号ベクトルの候補が出力される。これら
の候補は入力音声と独立に符号化のためのビット数に応
じてあらかじめ指定された数の候補ベクトルが記憶され
ている。適応符号帳６および固定符号帳７から出力され
た時系列ベクトルの候補は、乗算部８,９において、そ
れぞれ重み作成部10において作成された重みが乗算さ
れ、加算部11において加算され、駆動音源ベクトルの候
補ｃとなる。なお、駆動音源ベクトル生成部５の構成例
において、適応符号帳６を用いないで、固定符号帳７の
みの構成としてもよく、子音部や背景雑音などのピッチ
周期性の少ない信号を符号化するときには、適応符号帳
６を用いない構成にすることも多い。The details of the linear prediction analysis and examples of coding of the linear prediction parameters are described in, for example, “Digital Speech Processing” by Sadahiro Furui (Tokai University Press). Here, the linear prediction analysis unit 2, the linear prediction parameter coding unit 3, and the synthesis filter 12 may be replaced with non-linear ones. The driving sound source vector generation unit 5 generates a driving sound source vector candidate having a length of one frame, and sends the candidate to the synthesis filter 12. In the drive sound source vector generation unit 5, 1
A driving sound source vector candidate having a length corresponding to a frame is generated and sent to the synthesis filter 12. Driving excitation vector generation section 5 is generally composed of adaptive codebook 6 and fixed codebook 7 in many cases.
From the adaptive codebook 6, the immediately preceding past excitation vector stored in the buffer (the already-quantized immediately preceding one to several frames of excitation vectors) is cut out at a length corresponding to a certain period, and the cut out is performed. By repeating the generated vector until the frame length is reached, a time-series vector candidate corresponding to the periodic component of the voice is output. As the “certain period”, a period in which the distortion d in the distortion calculator 13 is reduced is selected. The selected period generally corresponds to the pitch period of the voice in many cases. The fixed codebook 7 outputs a time-series code vector candidate having a length of one frame corresponding to the non-periodic component of the voice. These candidates store a predetermined number of candidate vectors in accordance with the number of bits for encoding independently of the input speech. The time series vector candidates output from the adaptive codebook 6 and the fixed codebook 7 are multiplied by the weights created by the weight creation unit 10 in the multiplication units 8 and 9, respectively, added by the addition unit 11, and added to the driving sound source. This is a vector candidate c. In the configuration example of the driving excitation vector generation unit 5, the adaptive codebook 6 may not be used, and only the fixed codebook 7 may be used, and a signal having a small pitch periodicity such as a consonant part or background noise is encoded. In some cases, the adaptive codebook 6 is not used.

【０００５】合成フィルタ12は、線形予測パラメータの
量子化値をフィルタの係数とする線形フィルタで、駆動
音源ベクトル候補ｃを入力として再生音声の候補ｙを出
力する。合成フィルタの次数すなわち線形予測分析の次
数は、一般に10〜16次程度が用いられることが多い。な
お、既に述べたように、合成フィルタ12は非線形なフィ
ルタでもよい。歪み計算部13では、合成フィルタの出力
である再生音声の候補ｙと、入力音声Xとの歪みｄを計
算する。この歪みの計算は、例えば聴覚重み付けなど、
合成フィルタの係数または量子化していない線形予測係
数を考慮にいれて行う場合がある。[0005] The synthesis filter 12 is a linear filter using the quantized value of the linear prediction parameter as a filter coefficient, and outputs a driving sound source vector candidate c as input and outputs a reproduced sound candidate y. In general, the order of the synthesis filter, that is, the order of the linear prediction analysis, is generally about 10 to 16 order. As described above, the synthesis filter 12 may be a non-linear filter. The distortion calculator 13 calculates a distortion d between the reproduced voice candidate y output from the synthesis filter and the input voice X. The calculation of this distortion is, for example,
In some cases, this is performed in consideration of the coefficients of the synthesis filter or the linear prediction coefficients that have not been quantized.

【０００６】符号帳検索制御部14では、各再生音声候補
ｙと入力音声ｘとの歪みｄが最小となるような駆動音源
符号を選択し、そのフレームにおける駆動音源ベクトル
を決定する。符号帳検索制御部14において決定された駆
動音源符号n2と、線形予測パラメータ符号化部３の出力
である線形予測パラメータ符号n1は、符号送出部４に送
られ、利用の形態に応じて記憶装置に記憶されるか、ま
たは通信路を介して受信側へ送られる。The codebook search control unit 14 selects a driving excitation code that minimizes the distortion d between each reproduced speech candidate y and the input speech x, and determines a driving excitation vector in the frame. The excitation code n2 determined by the codebook search control unit 14 and the linear prediction parameter code n1, which is the output of the linear prediction parameter encoding unit 3, are sent to the code transmission unit 4 and stored in a storage device according to the form of use. Or sent to the receiving side via a communication path.

【０００７】図２に、上記符号化方法に対応するＣＥＬ
Ｐ復号化部20の構成例を示す。伝送路または記憶媒体か
ら受信された符号のうち、線形予測パラメータ符号n1は
線形予測パラメータ復号部21において合成フィルタ係数
に復号され、合成フィルタ22および、必要に応じて後処
理部30に送られる。駆動音源符号n2は、駆動音源ベクト
ル生成部25に送られ、符号に対応する音源ベクトルが生
成される。なお、駆動音源ベクトル生成部25の構成は、
図１に示される符号化部１の駆動音源ベクトル生成部５
に対応する構成となる。合成フィルタ22は、駆動音源ベ
クトルを入力として、音声ｓ´を再生する。後処理部30
はポストフィルタとも呼ばれ、再生された音声の雑音感
を聴覚的に低下させるような処理を行う。FIG. 2 shows a CEL corresponding to the above encoding method.
4 shows a configuration example of a P decoding unit 20. Among the codes received from the transmission path or the storage medium, the linear prediction parameter code n1 is decoded into a synthesis filter coefficient by the linear prediction parameter decoding unit 21, and is sent to the synthesis filter 22 and, if necessary, the post-processing unit 30. The driving excitation code n2 is sent to the driving excitation vector generation unit 25, and an excitation vector corresponding to the code is generated. The configuration of the drive sound source vector generation unit 25 is as follows.
Driving excitation vector generation section 5 of encoding section 1 shown in FIG.
Is a configuration corresponding to. The synthesis filter 22 reproduces the sound s' using the driving sound source vector as an input. Post-processing unit 30
Is also called a post-filter, and performs processing to reduce the noise perception of the reproduced sound.

【０００８】図３にポストフィルタの構成例を示す。ポ
ストフィルタでは、一般にスペクトルの包絡の強調と、
くし型フィルタによるピッチ強調を行う。図３では、復
号された音声信号をスペクトル包絡の逆特性を持つＭＡ
（移動平均）型フィルタ32を介し音源波形を抽出し、ピ
ッチ長の位置にタップを持つくし型フィルタ38に通して
ピッチの周期性を強調し、最後にスペクトルの包絡特性
を強調するＡＲ（自己回帰）型フィルタ39に通して、聴
覚的に改善された音声信号が得られる。ピッチの周期性
を強調するためのくし型フィルタは、ＭＡ型で実現する
場合と、ＡＲ型で実現する場合があり、ピッチ周期が整
数値のときのフィルタ特性を式で表すと、それぞれ以下
のようになる。なお、a，biは定数、ｔはピッチ周期で
ある。ＭＡ型の場合：Ｈ(z)＝１＋aＺ^-t ＡＲ型の場合：Ｈ(z)＝１／（１−aＺ^-t）実際にはピッチ周期が整数値でない場合が多いため、ア
ップサンプリングの手法を用いて、ＭＡ型の場合：Ｈ(z)＝１＋aΣ_ibiＺ^-t+i ＡＲ型の場合：Ｈ(z)＝１／（１−aΣ_ibiＺ^-t+i ）という形にすることが多い。FIG. 3 shows a configuration example of a post filter. Post filters generally include enhancement of the spectral envelope,
Performs pitch enhancement using a comb filter. In FIG. 3, the decoded speech signal is converted to an MA having the inverse characteristic of the spectrum envelope.
The sound source waveform is extracted through a (moving average) type filter 32, and the periodicity of the pitch is enhanced through a comb type filter 38 having a tap at the pitch length position, and finally, the AR (self Through a (regression) type filter 39, an audio signal which is improved in auditory sense is obtained. Comb filters for emphasizing the periodicity of pitch may be realized by MA type or AR type, and the filter characteristics when the pitch period is an integer value are expressed by the following equations. Become like Here, a and bi are constants, and t is a pitch period. In the case of the MA type: H (z) = 1 + aZ- ^{t In} the case of the AR type: H (z) = 1 / (1-aZ- ^t ) In practice, the pitch period is often not an integer value. using, if the MA type: H (z) = 1 + aΣ i biZ -t + i for AR type: H be in the form of (z) = 1 / (1 -aΣ i biZ -t + i) Many.

【０００９】上記式において、定数a，biによってピッ
チ周期性が示される。In the above equation, the constants a, The pitch periodicity is indicated by bi.

【００１０】[0010]

【発明が解決しようとする課題】ＣＥＬＰ方式における
ポストフィルタにおいて問題となるのは、ピッチ強調の
処理が符号化／復号化と同様に、フレーム単位で行われ
る（音声符号化という利用分野ではフレーム単位で行わ
ざるをえない）ことである。つまり、フレーム内では信
号の音響特性が一定であるという前提にたって処理が行
われるために、フレーム長がある程度長い場合（例えば
10msec以上）で、ピッチ周期やピッチ周期性の特性がフ
レーム内で変化しているような過渡的な特性の部分では
フレーム内でピッチ周期やピッチの周期性の度合などが
一定であると仮定する処理では十分な品質を得ることが
できないという問題がある。The problem with the post-filter in the CELP system is that pitch enhancement is performed on a frame-by-frame basis in the same manner as in encoding / decoding. Inevitably). That is, since the processing is performed on the assumption that the acoustic characteristics of the signal are constant within the frame, when the frame length is somewhat long (for example,
It is assumed that the pitch period and the degree of the periodicity of the pitch are constant in the frame in the portion of the transient characteristic in which the characteristic of the pitch period or the pitch periodicity changes in the frame. There is a problem that sufficient quality cannot be obtained in the processing.

【００１１】この発明の目的は、音声符号化／復号化の
利用分野において、フレーム単位の処理の枠組みを崩さ
ないで、より高品質な再生音声を提供することにある。[0011] It is an object of the present invention to provide a higher quality reproduced sound in a field of application of sound encoding / decoding without breaking a processing framework of a frame unit.

【００１２】[0012]

【課題を解決するための手段】上記課題を解決するため
に、請求項１記載の発明は、音声信号の後処理方法にお
いて、フレームごとに入力された音声信号を蓄積手段に
蓄積する過程と、前記蓄積手段に蓄えられた現在のフレ
ームおよび過去のフレームの音声信号を前記音声信号の
スペクトル包絡の逆特性を示す線形予測係数を用いる線
形フィルタに通過させて線形フィルタ通過信号を得る過
程と、前記線形フィルタ通過信号から波形のピーク位置
を検出し、前記ピーク位置を基準にして平均的な１ピッ
チ長の波形を抽出する過程と、前記平均１ピッチ長の波
形と前記線形フィルタ通過信号との相互相関を計算し、
前記相互相関のピークをピッチ基準位置として順次探索
する過程と、前記ピッチ基準位置をもとに前記線形フィ
ルタ通過信号を正確な１ピッチ長の波形に対応した領域
に分割し、前記線形フィルタ通過信号のピッチ周期とピ
ッチ周期性の強さに基づいてくし型フィルタ係数を算出
し、前記領域ごとに前記線形フィルタ通過信号を前記く
し型フィルタ係数をもちいたくし型フィルタに通過させ
て音声信号出力を得る過程とを有することを特徴とす
る。According to a first aspect of the present invention, there is provided an audio signal post-processing method, comprising the steps of: storing an input audio signal for each frame in a storage unit; Passing the audio signal of the current frame and the past frame stored in the storage means through a linear filter using a linear prediction coefficient indicating an inverse characteristic of the spectral envelope of the audio signal to obtain a linear filter passing signal; Detecting a peak position of the waveform from the signal passed through the linear filter and extracting an average one-pitch length waveform based on the peak position; Calculate the correlation,
A step of sequentially searching for the peak of the cross-correlation as a pitch reference position, and dividing the linear filter passing signal into an area corresponding to a waveform having an accurate one pitch length based on the pitch reference position; The comb filter coefficient is calculated based on the pitch period and the strength of the pitch periodicity, and the linear signal passing signal is passed through a comb filter using the comb filter coefficient for each of the regions, and an audio signal output is calculated. And a step of obtaining.

【００１３】請求項２記載の発明は、請求項１記載の音
声信号の後処理方法において、前記領域ごとに得られた
音声信号に窓関数を乗じ、直前の領域の窓関数を乗じて
得られた音声信号とを重畳する過程とを有することを特
徴とする。また、請求項３記載の発明は、請求項１記載
の音声信号の後処理方法において、前記くし型フィルタ
係数を前記領域ごとに算出することを特徴とする。請求
項４記載の発明は、音声信号の後処理方法において、フ
レームごとに入力された音声信号を蓄積する蓄積手段
と、前記蓄積手段に蓄えられた現在のフレームおよび過
去のフレームの音声信号を前記音声信号のスペクトル包
絡の逆特性を示す線形予測係数を用いる線形フィルタに
通過させて線形フィルタ通過信号を得るフィルタと、
前記線形フィルタ通過信号から波形のピーク位置を検出
し、前記ピーク位置を基準にして平均的な１ピッチ長の
波形を抽出するピーク位置検出・信号波形切り出し部
と、前記平均１ピッチ長の波形と前記線形フィルタ通過
信号との相互相関を計算し、前記相互相関のピークをピ
ッチ基準位置として順次探索するピッチ基準位置探索部
と、前記ピッチ基準位置をもとに前記線形フィルタ通過
信号を正確な１ピッチ長の波形に対応した領域に分割す
る領域の境界決定部と、前記線形フィルタ通過信号のピ
ッチ周期とピッチ周期性の強さに基づいてくし型フィル
タ係数を算出するピッチ相関値計算部と、前記領域ごと
に前記線形フィルタ通過信号を前記くし型フィルタ係数
をもちいたくし型フィルタに通過させて音声信号出力を
得ることを特徴とする。According to a second aspect of the present invention, in the post-processing method of the first aspect, the audio signal obtained for each area is multiplied by a window function, and is multiplied by a window function of the immediately preceding area. And a step of superimposing the audio signal. According to a third aspect of the present invention, in the post-processing method of the first aspect, the comb filter coefficient is calculated for each of the regions. According to a fourth aspect of the present invention, in the post-processing method of the audio signal, the storage means for storing the input audio signal for each frame, and the audio signals of the current frame and the past frame stored in the storage means are stored in the storage means. A filter that passes through a linear filter that uses a linear prediction coefficient that indicates the inverse characteristic of the spectral envelope of the audio signal to obtain a linearly filtered signal;
A peak position detection / signal waveform cutout unit for detecting a peak position of a waveform from the linear filter passing signal and extracting an average one-pitch length waveform based on the peak position; A pitch reference position search unit for calculating a cross-correlation with the linear filter passing signal and sequentially searching for the peak of the cross-correlation as a pitch reference position; A boundary determining unit for dividing the region into regions corresponding to the pitch length waveform, a pitch correlation value calculating unit that calculates a comb filter coefficient based on the pitch period and the strength of the pitch periodicity of the linear filter passing signal, An audio signal output is obtained by passing the linear filter passing signal for each of the regions through a comb filter using the comb filter coefficient. .

【００１４】請求項５記載の発明は、請求項４記載の音
声信号の後処理装置において、前記領域ごとに得られた
音声信号に窓関数を乗じる乗算部と、直前の領域の窓関
数を乗じて得られた音声信号とを重畳する重畳部とを有
することを特徴とする。請求項６記載の発明は、請求項
５記載の音声信号の後処理装置において、前記ピッチ相
関値計算部でくし型フィルタ係数を前記領域ごとに算出
することを特徴とする。According to a fifth aspect of the present invention, in the post-processing apparatus for an audio signal according to the fourth aspect, a multiplying section for multiplying the audio signal obtained for each area by a window function and a window function of the immediately preceding area. And a superimposing unit for superimposing the obtained audio signal. According to a sixth aspect of the present invention, in the post-processing apparatus for an audio signal according to the fifth aspect, the pitch correlation value calculator calculates a comb filter coefficient for each of the regions.

【００１５】請求項７記載の発明は、フレームごとに入
力された音声信号を蓄積手段に蓄積する手順と、前記蓄
積手段に蓄えられた現在のフレームおよび過去のフレー
ムの音声信号を前記音声信号のスペクトル包絡の逆特性
を示す線形予測係数を用いる線形フィルタに通過させて
線形フィルタ通過信号を得る手順と、前記線形フィルタ
通過信号から波形のピーク位置を検出し、前記ピーク位
置を基準にして平均的な１ピッチ長の波形を抽出する手
順と、前記平均１ピッチ長の波形と前記線形フィルタ通
過信号との相互相関を計算し、前記相互相関のピークを
ピッチ基準位置として順次探索する手順と、前記ピッチ
基準位置をもとに前記線形フィルタ通過信号を正確な１
ピッチ長の波形に対応した領域に分割し、前記線形フィ
ルタ通過信号のピッチ周期とピッチ周期性の強さに基づ
いてくし型フィルタ係数を算出し、前記領域ごとに前記
線形フィルタ通過信号を前記くし型フィルタ係数をもち
いたくし型フィルタに通過させて音声信号出力を得る手
順を実行させるプログラムを記録することを特徴とす
る。According to a seventh aspect of the present invention, there is provided a method for storing an audio signal input for each frame in a storage means, and the step of converting the audio signals of the current frame and the past frame stored in the storage means into the audio signal. Passing the signal through a linear filter using a linear prediction coefficient showing the inverse characteristic of the spectral envelope to obtain a linear filter passing signal; detecting a peak position of the waveform from the linear filter passing signal; Extracting a waveform having a single pitch length, calculating a cross-correlation between the waveform having the average one pitch length and the signal passing through the linear filter, and sequentially searching for the peak of the cross-correlation as a pitch reference position; Based on the pitch reference position, the signal passed through the linear filter
The linear filter passing signal is divided into regions corresponding to a pitch-length waveform, and a comb filter coefficient is calculated based on the pitch period and the strength of the pitch periodicity of the linear filter passing signal. A program is recorded to execute a procedure for obtaining an audio signal output by passing a type filter coefficient through a comb filter.

【００１６】請求項８記載の発明は、請求項７記載の
プログラムを記録した記録媒体において、前記領域ごと
に得られた音声信号に窓関数を乗じ、直前の領域の窓関
数を乗じて得られた音声信号とを重畳する手順を有する
ことを特徴とする。請求項９記載の発明は、請求項７記
載のプログラムを記録した記録媒体において、前記くし
型フィルタ係数を前記領域ごとに算出する手順を備えた
ことを特徴とする。According to an eighth aspect of the present invention, in the recording medium storing the program according to the seventh aspect, the audio signal obtained for each area is multiplied by a window function, and is multiplied by a window function of the immediately preceding area. And a step of superimposing the audio signal on the audio signal. According to a ninth aspect of the present invention, there is provided a recording medium storing the program according to the seventh aspect, further comprising a step of calculating the comb filter coefficient for each of the areas.

【００１７】この発明では、上記記載の構成を備えるこ
とによりポストフィルタ内において、一定時間の遅延を
許すこととし、フレーム長＋遅延時間の区間において、
ピッチ位置の検出と１ピッチ単位に強調処理を行うピッ
チ同期型ポストフィルタリングを実現する。これによ
り、１フレーム内でピッチ周期が変動している場合で
も、ピッチの微少な変動を検出して対応しながら、ピッ
チの強調処理が可能となるうえ、従来の方法に比べて、
わずかな遅延時間が増えるのみで、符号化と復号化部分
のフレーム処理の枠組みに適用可能である。According to the present invention, by providing the above-described configuration, a predetermined time delay is allowed in the post-filter, and in the section of frame length + delay time,
A pitch-synchronous post-filtering for detecting a pitch position and performing emphasis processing in units of one pitch is realized. As a result, even when the pitch period fluctuates within one frame, pitch enhancement processing can be performed while detecting and responding to minute fluctuations in pitch.
With only a slight increase in delay time, it can be applied to the frame processing framework of the encoding and decoding parts.

【００１８】[0018]

【発明の実施の形態】実施例以下にこの発明の実施例を図面を用いて説明する。図４
は、この発明によるポストフィルタ構成例を示す。ＭＡ
フィルタ部とＡＲフィルタ部は共通でよく、その内側の
処理がこの発明の特徴である。（後述するように、ＭＡ
フィルタとＡＲフィルタを、この発明の処理に合わせて
変更するとなおよい。）また、この発明では一定時間の
遅延を設けるため、遅延用のバッファ31を用意する。DESCRIPTION OF THE PREFERRED EMBODIMENTS Embodiments of the present invention will be described below with reference to the drawings. FIG.
Shows a configuration example of a post filter according to the present invention. MA
The filter unit and the AR filter unit may be common, and processing inside the filter unit is a feature of the present invention. (As described below, MA
More preferably, the filter and the AR filter are changed in accordance with the processing of the present invention. In the present invention, a delay buffer 31 is provided in order to provide a fixed time delay.

【００１９】図５から図７は、図４における各処理部の
処理を、模式的に表した図である。例えば、図５におい
て、入力フレーム位置が合成フィルタ22から出力された
１フレーム分の信号ｓ´である。なお、図５から図７の
波形は、ＭＡフィルタ32を通った後の波形eであるとす
る。実際にポストフィルタ処理するフレームは、処理フ
レームと表記される区間で、入力フレームと処理フレー
ムの時間差が遅延となる。なお、処理フレーム内におけ
る平均ピッチ長または暫定的なピッチ長（以下、「平均
ピッチ長」という。）は得られているものとする。この
平均ピッチ長は、適応符号帳の周期符号を利用してもよ
いし、合成信号の自己相関関数を計算しなおして、平均
ピッチ長を計算してもよい。この平均ピッチ長がフレー
ム長よりも長い場合は、この発明の効果はほとんどない
ので、従来のポストフィルタ処理のままでよい。この発
明は、ピッチ周期がフレーム長よりも短い場合に効果が
ある。FIGS. 5 to 7 are diagrams schematically showing the processing of each processing unit in FIG. For example, in FIG. 5, the input frame position is the signal s ′ for one frame output from the synthesis filter 22. 5 to 7 are waveforms e after passing through the MA filter 32. The frame that is actually subjected to post-filter processing is a section represented as a processing frame, and the time difference between the input frame and the processing frame is delayed. It is assumed that an average pitch length or a provisional pitch length (hereinafter, referred to as “average pitch length”) in the processing frame has been obtained. As the average pitch length, the periodic code of the adaptive codebook may be used, or the autocorrelation function of the synthesized signal may be calculated again to calculate the average pitch length. If the average pitch length is longer than the frame length, there is almost no effect of the present invention, so that the conventional post-filter processing may be used. The present invention is effective when the pitch period is shorter than the frame length.

【００２０】以下、ピッチ周期はフレーム長よりも短い
ものと仮定して説明する。まず、フレーム内ピーク位置
検出部33では、処理フレーム内における信号のピーク位
置（振幅の最大点）を探索して検出する。例えば、これ
を図５においてＰ ₀とする。（図５の例では、処理フレ
ームの右側境界付近に位置することになる。フレーム位
置と信号波形の相対的な位置関係はランダムであるの
で、ピーク位置が処理フレームのどこに位置するかはそ
の都度任意である。）次にピーク位置Ｐ₀を基準にし
て、信号波形から、信号波形切り出し部34において、平
均ピッチ長の波形を切り出す。この様子を図６に示す。
切り出し位置は、ピーク位置を中心に、前後２分の１ピ
ッチ長ずつの領域から切り出すとよい。また、図６のよ
うに、処理フレームの境界をまたいでかまわないが、入
力フレーム位置の右端を越えることができないので、切
り出し位置が入力フレーム位置の右端を越える場合は、
切り出し位置の右端は入力フレーム位置の右端とする。
次に、ピッチ基準位置探索部35において、この切り出し
波形を左右にシフトしながら切り出し波形と信号の相互
相関を計算し、Ｐ₀から平均ピッチ程度離れた位置付近
で、相互相関が最大となる位置を探索し、次のピッチ基
準位置を決定する。このピッチ基準位置探索処理を繰り
返すことによって、処理フレームと入力フレームをあわ
せた領域で、ピッチ基準位置を決定する。図６の例で
は、処理フレーム内にピッチ基準位置が３箇所、処理フ
レーム外に２箇所決まったことになる。次に、領域の境
界決定部36において、図７に示すように、ピッチ基準位
置をもとにして、信号を正確な１ピッチ波形に対応する
ように、領域の境界を決める。境界点の決め方は、例え
ば、（Ｐ₁＋Ｐ₂）／２、（Ｐ₀＋Ｐ₁）／２のように、基
準位置の中間点を境界としてもよいが中間点よりも少し
右寄り、例えばＰ₂＋（Ｐ₁−Ｐ₂）＊２／３のように、
２：１の内分点を境界点としてもよい。中間点よりも少
し右寄りのほうがよいのは、一般的に、１ピッチ波形は
急激に立ち上がって、ゆっくり収束するような波形にな
ることが多く観察されるためである。Hereinafter, the pitch period is shorter than the frame length.
It is assumed that this is the case. First, the peak position in the frame
The detector 33 detects the peak position of the signal in the processing frame.
(The maximum point of the amplitude). For example, this
In FIG. ₀And (In the example of FIG. 5, the processing frame
It will be located near the right boundary of the team. Frame position
The relative positional relationship between the position and the signal waveform is random
The position of the peak position in the processing frame
Is optional each time. ) Next, the peak position P₀Based on
From the signal waveform, the signal waveform cutout unit 34
Cut out a waveform of equal pitch length. This is shown in FIG.
The cut-out position is set to a half
It is good to cut out from the area | region of each switch length. Also, as shown in FIG.
As described above, you can cross the border of the processing frame.
Since the right end of the force frame position cannot be
If the start position exceeds the right end of the input frame position,
The right end of the cutout position is the right end of the input frame position.
Next, the pitch reference position search unit 35
Shift the waveform to the left and right, and
Calculate the correlation, P₀Near a position about an average pitch away from
Search for the position where the cross-correlation is maximum, and
Determine the sub-position. Repeat this pitch reference position search process
Returns the processing frame and the input frame.
The pitch reference position is determined in the region that has been set. In the example of FIG.
Indicates that there are three pitch reference positions in the processing frame,
Two places outside the frame are decided. Next,
In the field determination unit 36, as shown in FIG.
The signal to an accurate one-pitch waveform based on the location
The boundaries of the area. How to determine the boundary point
If (P₁+ P_Two) / 2, (P₀+ P₁) / 2
The midpoint of the sub-position may be the boundary, but it may be slightly
To the right, for example, P_Two+ (P₁−P_Two) * 2/3,
A 2: 1 internal dividing point may be used as a boundary point. Less than the middle point
Generally, it is better to move to the right.
The waveform rises sharply and converges slowly.
Is often observed.

【００２１】ピッチ強調処理は、この領域毎にピッチの
周期性を分析して処理される。まずピッチ相関値計算部
37において、領域１と過去の信号波形との相互相関を利
用して、正確なピッチ周期t1とピッチ相関の値a1を求め
る。次に領域２について、正確なピッチ周期t2と、ピッ
チ相関の値a2を求める。このように求めた正確なピッチ
周期とピッチ相関の値は、フレーム内で信号が過渡的に
変化していても、従来のポストフィルタ方法とは異な
り、変化に追従して値を求めることができる。The pitch emphasis processing is performed by analyzing the periodicity of the pitch for each area. First, the pitch correlation value calculator
At 37, an accurate pitch period t1 and a pitch correlation value a1 are obtained using the cross-correlation between the region 1 and the past signal waveform. Next, for the area 2, an accurate pitch period t2 and a pitch correlation value a2 are obtained. The accurate pitch period and pitch correlation value obtained in this way can be calculated according to the change, unlike the conventional post-filter method, even if the signal changes transiently in the frame. .

【００２２】最後に、各領域毎に求めたピッチ周期、ピ
ッチ相関の値を使って、前述のようなくし型フィルタ38
を領域毎にかけてピッチ強調処理を行う。このような処
理の場合、小さな領域毎にピッチ強調フィルタの係数が
かわるため、不連続音が生じて逆に品質劣化の原因とな
ることがある。この問題を防ぐために、領域の境界決定
部36で決定された各領域を少しずつ（例えば、10サンプ
ル、1.25msec程度）を重ねるようにして、重なる部分に
それぞれ三角窓をかけて足し合わせ、徐々に特性が変化
するようにするとよい。Finally, using the pitch period and the pitch correlation value obtained for each area, the comb filter 38 as described above is used.
For each region to perform pitch enhancement processing. In the case of such processing, since the coefficient of the pitch emphasis filter changes for each small area, a discontinuous sound may be generated, which may cause quality deterioration. In order to prevent this problem, the areas determined by the area boundary determination unit 36 are overlapped little by little (for example, about 10 samples, about 1.25 msec), and the overlapping parts are added with triangular windows, and gradually added. It is preferable that the characteristics change.

【００２３】図５から７の例では、ＭＡフィルタとＡＲ
フィルタは従来のままを前提としたが、図７のように各
領域を決定したあと、ＭＡフィルタやＡＲフィルタも、
この領域毎に係数を再分析し、領域毎にスペクトル包絡
を強調するようにフィルタをかけるとなおよい。なお、
上記実施例では、符号駆動線形予測符号化および復号化
において、符号化されたビット系列から高品質な音声信
号を再生する音声復号化方法に適用した例が説明されて
いるが、本発明は入力処理対象が復号音声信号に限定さ
れるものではなく、フレームごとに入力される復号音声
信号かどうか不明である音声信号にも適用が可能であ
る。In the examples of FIGS. 5 to 7, the MA filter and the AR
The filter was assumed to be the same as before, but after determining each region as shown in FIG. 7, the MA filter and the AR filter also
It is more preferable to re-analyze the coefficient for each region and apply a filter so as to emphasize the spectral envelope for each region. In addition,
In the above-described embodiment, in the code-driven linear prediction encoding and decoding, an example is described in which the present invention is applied to an audio decoding method for reproducing a high-quality audio signal from an encoded bit sequence. The processing target is not limited to the decoded audio signal, and the present invention can be applied to an audio signal that is unknown whether it is a decoded audio signal input for each frame.

【００２４】さらに、本発明を適用したシステムをＣＰ
Ｕやメモリ等を有するコンピュータと端末装置とＣＤ−
ＲＯＭ、磁気ディスク装置、半導体メモリ等の機械読み
取り可能な記録媒体で構成し、記録媒体に記憶された音
声信号の後処理方法の手順を実行させるプログラムをコ
ンピュータに読み取り、コンピュータの動作を制御し、
前述の実施の形態における各要素を実現する。Further, a system to which the present invention is applied is referred to as a CP.
Computer with U and memory, terminal device and CD-
ROM, a magnetic disk device, constituted by a machine-readable recording medium such as a semiconductor memory, a computer reads a program for executing a procedure of a post-processing method of an audio signal stored in the recording medium, controls the operation of the computer,
Each element in the above embodiment is realized.

【００２５】[0025]

【発明の効果】この発明によるポストフィルタを４kbit
/sの音声符号化方式の復号された信号に適用し、主観品
質評価を行った。ポストフィルタの処理単位となるフレ
ーム長は10msecとした。また、遅延時間は５msecに設定
した。この結果、従来のポストフィルタ法に比べて、特
に女性の音声の品質に著しい改善が見られた。具体的に
は、「南には」といった母音が連続して短時間に変化す
る部分の雑音感が低減し、クリアに聞こえるようになっ
た。なお、女性の音声で大きな改善が見られ、男性の音
声では従来法に比べてあまり変化がなかったのは、この
発明がピッチ周期がフレーム長に比べて十分に短い場合
に有効なためである。フレーム長の10msecは100Hzに対
応するが、男性の音声のピッチ周波数は一般に100Hz以
下である（ピッチ長が10msecよりも長い）ことが多い。
一方、女性の音声のピッチ周波数は200Hzから400Hz程度
であるため、図５から７の例のように、１フレームに２
から３ピッチ入る。即ち男性の音声とは異なり、女性の
音声のピッチ長はフレーム長よりも十分に短いため、主
観品質の点で大きな改善が得られた。The post filter according to the present invention is 4 kbits.
The subjective quality evaluation was performed by applying to the decoded signal of the / s speech coding method. The frame length as a post-filter processing unit was set to 10 msec. The delay time was set to 5 msec. As a result, a remarkable improvement was observed in the quality of the voice of a woman in particular, as compared with the conventional post-filter method. More specifically, the part where vowels such as "in the south" continuously change in a short time has reduced the noise sensation, so that it can be heard clearly. It should be noted that a significant improvement was observed in the female voice and the male voice did not change much compared to the conventional method because the present invention is effective when the pitch period is sufficiently shorter than the frame length. . Although the frame length of 10 msec corresponds to 100 Hz, the pitch frequency of male voice is generally 100 Hz or less (the pitch length is longer than 10 msec) in many cases.
On the other hand, since the pitch frequency of the female voice is about 200 Hz to 400 Hz, as shown in the example of FIGS.
3 pitches from That is, unlike the male voice, the pitch length of the female voice is sufficiently shorter than the frame length, so that a great improvement in the subjective quality was obtained.

[Brief description of the drawings]

【図１】ＣＥＬＰ符号化部の構成例を示すブロック図。FIG. 1 is a block diagram showing a configuration example of a CELP encoding unit.

【図２】ＣＥＬＰ復号化部の構成例を示すブロック図。FIG. 2 is a block diagram illustrating a configuration example of a CELP decoding unit.

【図３】ポストフィルタの構成例を示すブロック図。FIG. 3 is a block diagram showing a configuration example of a post filter.

【図４】本発明におけるポストフィルタ処理部のブロッ
ク図。FIG. 4 is a block diagram of a post-filter processing unit according to the present invention.

【図５】フレーム内ピーク位置の検出を説明する図。FIG. 5 is a view for explaining detection of a peak position in a frame.

【図６】波形の切り出しとピッチ基準位置の探索を説明
する図。FIG. 6 is a diagram for explaining waveform extraction and search for a pitch reference position.

【図７】領域境界の決定を説明する図。FIG. 7 is a view for explaining determination of an area boundary.

[Explanation of symbols]

１ＣＥＬＰ符号化部２線形予測分析部３線形予測パラメータ符号化部４符号送出部５駆動音源ベクトル生成部６適応符号帳７固定符号帳８,９乗算部 10 重み作成部 11 加算部 12,22 合成フィルタ 13 歪み計算部 14 符号帳検索制御部 21 線形予測パラメータ復号部 25 駆動音源ベクトル生成部 30 後処理部 31 バッファ 32 ＭＡフィルタ 33 フレーム内ピーク位置検出部 34 信号波形切り出し部 35 ピッチ基準位置探索部 36 領域の境界決定部 37 ピッチ相関値計算部 38 くし型フィルタ 39 ＡＲフィルタ DESCRIPTION OF SYMBOLS 1 CELP encoding part 2 Linear prediction analysis part 3 Linear prediction parameter encoding part 4 Code transmission part 5 Driven excitation vector generation part 6 Adaptive codebook 7 Fixed codebook 8,9 Multiplication part 10 Weight creation part 11 Addition part 12,22 Synthesis filter 13 Distortion calculation unit 14 Codebook search control unit 21 Linear prediction parameter decoding unit 25 Driving excitation vector generation unit 30 Post-processing unit 31 Buffer 32 MA filter 33 In-frame peak position detection unit 34 Signal waveform extraction unit 35 Pitch reference position search Section 36 area boundary determination section 37 pitch correlation value calculation section 38 comb filter 39 AR filter

Claims

[Claims]

1. A process for storing an audio signal input for each frame in a storage means, and the inverse characteristics of the spectral envelope of the audio signal in the current frame and the past frame stored in the storage means. A step of obtaining a linear filter passing signal by passing through a linear filter using a linear prediction coefficient shown, detecting a peak position of a waveform from the linear filter passing signal, and temporarily averaging one pitch length or provisionally based on the peak position. 1 pitch length determined (hereinafter, “average 1 pitch length”
That. A) extracting a waveform; calculating a cross-correlation between the waveform having an average pitch length of 1 and the linear filter-passed signal; and sequentially searching for the peak of the cross-correlation as a pitch reference position; The linear filter passing signal is divided into regions corresponding to waveforms of an accurate one pitch length based on the above, and a comb filter coefficient is calculated based on the pitch period and the pitch periodicity of the linear filter passing signal. Passing the linear filter-passed signal through the comb filter using the comb filter coefficient for each region to obtain an audio signal output.

2. The post-processing method for an audio signal according to claim 1, wherein the audio signal obtained for each area is multiplied by a window function, and the audio signal obtained by multiplying the window function of the immediately preceding area is superimposed. A post-processing method of the audio signal.

3. The post-processing method for an audio signal according to claim 1, wherein said comb filter coefficient is calculated for each of said regions.

4. A storage means for storing an audio signal input for each frame, and a linear signal representing the inverse characteristic of the spectrum envelope of the audio signal, wherein the audio signals of the current frame and the past frame stored in the storage means are linearly expressed. A filter that obtains a linear filter passing signal by passing through a linear filter using a prediction coefficient, a peak position detecting unit that detects a peak position of a waveform from the linear filter passing signal, and an average one pitch length based on the peak position or A signal waveform cutout unit for extracting a tentatively determined one-pitch length (hereinafter referred to as “average one-pitch length”) waveform; and calculating a cross-correlation between the average-one-pitch-length waveform and the linear-filter-passed signal. A pitch reference position searching unit for sequentially searching for the peak of the cross-correlation as a pitch reference position; An area boundary determining unit that divides the filter passing signal into an area corresponding to an accurate one-pitch length waveform, and calculates a comb filter coefficient based on a pitch period and a pitch periodicity of the linear filter passing signal. An audio signal post-processing device, comprising: a pitch correlation value calculation unit; and an audio signal output obtained by passing the linear filter passing signal for each of the regions through a comb filter using the comb filter coefficient.

5. The post-processing device for an audio signal according to claim 4, wherein: a multiplication unit that multiplies the audio signal obtained for each of the regions by a window function; and an audio signal obtained by multiplying a window function of the immediately preceding region. And a superimposing unit for superimposing the audio signal.

6. The post-processing apparatus according to claim 4, wherein said pitch correlation value calculation unit calculates a comb filter coefficient for each of said regions.

7. A first procedure for storing an audio signal input for each frame in a storage means, and a current frame and a past frame audio signal stored in the storage means are converted into a spectrum envelope of the audio signal. A second procedure of obtaining a linear filter passing signal by passing through a linear filter using a linear prediction coefficient indicating an inverse characteristic; detecting a peak position of a waveform from the linear filter passing signal; averaging 1 based on the peak position; Pitch length or provisionally determined one pitch length (hereinafter, “average one pitch length”
That. A) calculating a cross-correlation between the waveform having an average pitch length of 1 and the signal passing through the linear filter, and sequentially searching for the peak of the cross-correlation as a pitch reference position; The linear filter passing signal is divided into regions corresponding to waveforms of an accurate one pitch length based on a pitch reference position, and a comb filter coefficient is calculated based on a pitch period and a pitch periodicity of the linear filter passing signal. And a program for executing a fourth procedure of obtaining a sound signal output by passing the linear filter passing signal through the comb filter using the comb filter coefficient for each region.

8. A recording medium on which the program according to claim 7 is recorded, wherein the audio signal obtained for each area of the third procedure is multiplied by a window function, and the audio signal is obtained by multiplying the audio signal by a window function of the immediately preceding area. A recording medium recording a program, characterized by having a procedure for superimposing an audio signal.

9. A recording medium on which the program according to claim 7 is recorded, further comprising a step of calculating the comb filter coefficient in the third step for each of the areas. recoding media.