JP3148920B2

JP3148920B2 - Audio encoding / decoding device

Info

Publication number: JP3148920B2
Application number: JP07467295A
Authority: JP
Inventors: 真哉高橋; 陽二前田
Original assignee: 移動通信システム開発株式会社
Priority date: 1995-03-08
Filing date: 1995-03-08
Publication date: 2001-03-26
Anticipated expiration: 2016-03-26
Also published as: JPH08248999A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声信号をディジタル
信号に変換して伝送又は蓄積する場合に用いる音声符号
化復号化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio encoding / decoding apparatus used for converting an audio signal into a digital signal and transmitting or storing the digital signal.

【０００２】[0002]

【従来の技術】従来、音声信号を所定の長さのフレーム
毎に分析し、音源信号とスペクトル包絡情報とに分離し
て符号化し、その符号化されたデータを復号化して音声
信号を生成する音声符号化復号化装置としては、例え
ば、文献１（高橋、飯森「位相等化音源のベクトル量子
化を用いたボコーダ」，電気関係学会北陸支部連合大
会，Ｂ１４５，Ｐ２９１，平成４年１０月）に示された
ものが知られている。2. Description of the Related Art Conventionally, a speech signal is analyzed for each frame of a predetermined length, separated into a sound source signal and spectral envelope information and encoded, and the encoded data is decoded to generate a speech signal. For example, reference 1 (Takahashi, Iimori, "Vocorder using Vector Quantization of Phase Equalized Sound Source", Hokuriku Chapter Association of IEEJ, B145, P291, October 1992) Are known.

【０００３】この従来例は、音声信号が有声音の場合に
はピッチ周期で類似の波形が繰り返されるという点を利
用し、当該フレームの音源信号をその中の１ピッチ周期
長の信号のみで代表させることで有声音の部分の符号化
特性を改善した点に特徴があり、音源信号を短時間位相
等化することにより、代表音源の切り出しの安定化と代
表音源のベクトル量子化の効率化が図れる、という利点
がある。[0003] This conventional example utilizes the fact that a similar waveform is repeated at a pitch cycle when a voice signal is a voiced sound, and the sound source signal of the frame is represented by only one pitch cycle length signal in the frame. The characteristic is that the encoding characteristics of voiced sound parts have been improved by making the sound source signal short-time phase-equalized, thereby stabilizing the extraction of the representative sound source and increasing the efficiency of the vector quantization of the representative sound source. There is an advantage that it can be achieved.

【０００４】図４は、上記した従来の音声符号化復号化
装置の構成を示すブロック図である。図に示すように、
この音声符号化復号化装置２００は、符号化部２１０と
復号化部２２０を備えて構成されている。上記の符号化
部２１０は、スペクトル包絡分析部２と、逆フィルタ部
４と、位相等化部７と、代表音源切出し部９と、ピッチ
周期抽出部１１と、有声／無声判定部１３と、代表音源
符号化部１７を有している。また、上記の復号化部２２
０は、代表音源復号化部１９と、代表音源繰返し部２１
と、合成フィルタ部２３を有して構成されている。FIG. 4 is a block diagram showing the configuration of the above-mentioned conventional speech coding / decoding apparatus. As shown in the figure,
The speech encoding / decoding device 200 includes an encoding unit 210 and a decoding unit 220. The encoding unit 210 includes a spectrum envelope analysis unit 2, an inverse filter unit 4, a phase equalization unit 7, a representative sound source extraction unit 9, a pitch period extraction unit 11, a voiced / unvoiced determination unit 13, It has a representative excitation coding unit 17. Further, the decoding unit 22
0 represents the representative excitation decoding section 19 and the representative excitation repetition section 21
And a synthesis filter unit 23.

【０００５】次に、上記した従来の音声符号化復号化装
置２００の動作について説明する。まず符号化部２１０
の動作について説明する。スペクトル包絡分析部２は、
入力された１フレーム毎に入力音声信号Ｓ1 を分析し
て、該当するフレームの音声信号のスペクトル包絡を、
例えば線形予測分析法によって求め、スペクトル包絡情
報Ｓ3 として逆フィルタ部４に出力する。また、スペク
トル包絡分析部２は、同時に、このスペクトル包絡情報
Ｓ3 を符号化してスペクトル包絡符号Ｓ5 として外部へ
出力する。Next, the operation of the above-described conventional speech coding / decoding apparatus 200 will be described. First, the encoding unit 210
Will be described. The spectrum envelope analysis unit 2
The input audio signal S1 is analyzed for each input frame, and the spectral envelope of the audio signal of the corresponding frame is calculated as follows.
For example, it is obtained by a linear prediction analysis method and is output to the inverse filter unit 4 as spectrum envelope information S3. At the same time, the spectrum envelope analyzer 2 encodes the spectrum envelope information S3 and outputs it as a spectrum envelope code S5 to the outside.

【０００６】有声／無声判定部１３は、入力音声信号Ｓ
1 の例えばパワーを分析し、この入力音声信号Ｓ1 が有
声音と無声音のどちらであるかの判定を行い、その判定
結果を有声／無声判定情報Ｓ12としてピッチ周期抽出部
１１に出力する。また、有声／無声判定部１３は、同時
に、この有声／無声判定情報Ｓ12を符号化して有声／無
声判定符号Ｓ16として外部へ出力する。ピッチ周期抽出
部１１は、上記の有声／無声判定情報Ｓ12が有声音であ
る場合には、入力音声信号Ｓ1 に対してピッチ周期分析
を行い、得られたピッチ周期Ｓ10を代表音源切出し部９
に出力する。また、ピッチ周期抽出部１１は、同時に、
ピッチ周期Ｓ10を符号化してピッチ周期符号Ｓ15として
外部へ出力する。[0006] The voiced / unvoiced determination section 13 outputs the input voice signal S
1 is analyzed, for example, to determine whether the input voice signal S1 is a voiced sound or an unvoiced sound, and the result of the determination is output to the pitch period extraction unit 11 as voiced / unvoiced determination information S12. At the same time, the voiced / unvoiced determination unit 13 encodes the voiced / unvoiced determination information S12 and outputs it as a voiced / unvoiced determination code S16 to the outside. When the voiced / unvoiced judgment information S12 is a voiced sound, the pitch cycle extracting unit 11 performs a pitch cycle analysis on the input speech signal S1, and uses the obtained pitch cycle S10 as the representative sound source extracting unit 9.
Output to Further, the pitch period extracting unit 11 simultaneously
The pitch period S10 is encoded and output to the outside as a pitch period code S15.

【０００７】逆フィルタ部４は、上記のスペクトル包絡
情報Ｓ3 を用いて、入力音声信号Ｓ1 に、例えば線形予
測逆フィルタ処理を行って音源信号Ｓ6 を求め、位相等
化部７に出力する。位相等化部７は、音源信号Ｓ6 の位
相が、ピッチ周期間隔で現れる波形上の振幅最大位置
（ピーク位置）においてゼロになるように短時間位相等
化処理を行い、位相等化音源信号Ｓ8 を代表音源切出し
部９に出力する。代表音源切出し部９は、ピッチ周期Ｓ
10が入力された場合、すなわち有声／無声判定情報Ｓ12
が有声音であった場合には、該当するフレームにおける
位相等化音源信号８の振幅最大の位置を基準にして位相
等化音源信号Ｓ8 から１ピッチ周期Ｓ10の長さ分の信号
を切り出し、代表音源Ｓ14として代表音源符号化部１７
に出力する。代表音源符号化部１７は、上記の代表音源
Ｓ14を、例えばベクトル量子化して符号化し、得られた
代表音源符号Ｓ18を外部へ出力する。The inverse filter unit 4 performs, for example, a linear prediction inverse filter process on the input speech signal S 1 using the above-mentioned spectrum envelope information S 3 to obtain a sound source signal S 6, and outputs the sound source signal S 6 to the phase equalization unit 7. The phase equalizer 7 performs a short-time phase equalization process so that the phase of the sound source signal S6 becomes zero at the maximum amplitude position (peak position) on the waveform appearing at the pitch period interval, and the phase equalized sound source signal S8 Is output to the representative sound source cutout unit 9. The representative sound source cutout unit 9 has a pitch cycle S
10 is input, that is, voiced / unvoiced determination information S12
Is a voiced sound, a signal having a length of one pitch period S10 is cut out from the phase-equalized excitation signal S8 based on the position of the maximum amplitude of the phase-equalized excitation signal 8 in the corresponding frame. Representative excitation coding section 17 as excitation S14
Output to The representative excitation coding unit 17 encodes the above-mentioned representative excitation S14 by, for example, vector quantization, and outputs the obtained representative excitation code S18 to the outside.

【０００８】次に、復号化部の動作について説明する。
代表音源復号化部１９は、符号化された代表音源符号Ｓ
18を復号化し、復号代表音源Ｓ20を代表音源繰返し部２
１に出力する。代表音源繰返し部２１は、有声／無声判
定符号Ｓ16とピッチ周期符号Ｓ15を、それぞれ有声／無
声判定情報とピッチ周期に復号化し、有声／無声判定情
報が有声音であった場合には、ピッチ周期の間隔で復号
代表音源Ｓ20を繰り返して復号音源信号Ｓ27を生成し、
合成フィルタ２３に出力する。また有声／無声判定情報
が無声音であった場合には、白色雑音を生成し、復号音
源信号Ｓ27として合成フィルタ２３に出力する。合成フ
ィルタ部２３は、スペクトル包絡符号Ｓ5 よりスペクト
ル包絡を復号化し、この復号スペクトル包絡と復号音源
信号Ｓ27より復号音声信号Ｓ28を例えば線形予測合成フ
ィルタ手法によって合成する。Next, the operation of the decoding unit will be described.
The representative excitation decoding unit 19 outputs the encoded representative excitation code S
18 and decodes the decoded representative sound source S20 into the representative sound source repetition unit 2.
Output to 1. The representative sound source repetition unit 21 decodes the voiced / unvoiced determination code S16 and the pitch period code S15 into voiced / unvoiced determination information and a pitch period, respectively, and if the voiced / unvoiced determination information is a voiced sound, the pitch period The decoded representative excitation S20 is repeated at intervals of to generate a decoded excitation signal S27,
Output to the synthesis filter 23. If the voiced / unvoiced determination information is unvoiced, white noise is generated and output to the synthesis filter 23 as a decoded excitation signal S27. The synthesis filter unit 23 decodes the spectrum envelope from the spectrum envelope code S5, and synthesizes the decoded speech signal S28 from the decoded spectrum envelope and the decoded excitation signal S27 by, for example, a linear prediction synthesis filter technique.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、図４に
示したような従来の音声符号化復号化装置２００におい
ては、復号化部２２０の代表音源繰返し部２１が復号代
表音源Ｓ20を単純にピッチ周期間隔で繰り返して該当フ
レームの復号音源信号Ｓ27を生成していたため、該当す
るフレームと直前直後のフレームの復号代表音源の形状
がある程度以上異なる場合には、フレーム境界で復号音
源信号が急に変化して復号音声信号に音質劣化（急激な
変化による異音等）を生ずる、という問題点があった。However, in the conventional speech encoding / decoding apparatus 200 as shown in FIG. 4, the representative excitation repetition section 21 of the decoding section 220 simply converts the decoded representative excitation S20 to the pitch period. Since the decoded excitation signal S27 of the corresponding frame is repeatedly generated at intervals, the decoded excitation signal suddenly changes at the frame boundary if the shape of the decoded representative excitation of the relevant frame and the frame immediately before and after is different to some extent or more. As a result, there is a problem that the sound quality of the decoded audio signal is deteriorated (such as abnormal noise due to a sudden change).

【００１０】このことを図５によって説明する。図５
は、上記従来の音声符号化復号化装置における復号代表
音源の単純繰返しによる復号音源信号の生成を説明する
図であり、図５（Ａ）は代表音源を、図５（Ｂ）は復号
音源信号を示している。図に示すように、該当するフレ
ームＭと直前直後のフレーム（Ｍ−１），（Ｍ＋１）の
復号代表音源の形状が異なると、フレーム境界Ｂ3 ，Ｂ
4 において復号音源信号が急に変化し、結果的に復号音
声信号に音質劣化（急激な変化による異音等）が生じ
る。This will be described with reference to FIG. FIG.
5A and 5B are diagrams illustrating generation of a decoded excitation signal by simple repetition of a decoded representative excitation in the above-described conventional speech encoding / decoding apparatus. FIG. 5A illustrates a representative excitation, and FIG. Is shown. As shown in the figure, if the shape of the decoded representative sound source of the corresponding frame M is different from that of the immediately preceding and succeeding frames (M-1) and (M + 1), the frame boundaries B3 and B3
In FIG. 4, the decoded excitation signal suddenly changes, and as a result, the sound quality of the decoded audio signal deteriorates (such as abnormal noise due to a sudden change).

【００１１】本発明は、かかる課題を解決するためにな
されたものであり、該当するフレームと直前直後のフレ
ームの復号代表音源が異なっても、連続性の良い復号音
源信号を生成し、品質の良い復号音声信号を得ることが
できる音声符号化復号化装置を提供することを目的とす
る。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problem, and it is possible to generate a decoded excitation signal with good continuity even if the corresponding representative frame and the immediately preceding and succeeding frames have different decoded representative excitations. It is an object of the present invention to provide a speech encoding / decoding device capable of obtaining a good decoded speech signal.

【００１２】[0012]

【課題を解決するための手段】上記課題を解決するた
め、請求項１記載の発明は、入力音声信号を所定の長さ
のフレーム毎に分析して、前記入力音声信号のスペクト
ル包絡情報と音源信号及びピッチ周期を求め、前記入力
音声信号が有声音である場合には前記音源信号の位相が
前記ピッチ周期の間隔で現れる信号波形上の振幅最大位
置においてゼロになるように短時間位相等化処理を行っ
た後に１ピッチ周期長分の代表音源を抽出し、この代表
音源と前記ピッチ周期及び前記スペクトル包絡情報を符
号化しスペクトル包絡符号として出力する符号化部と、
この符号化部より出力された符号化された各パラメータ
を復号化し、前記入力音声信号が有声音である場合には
復号化された１ピッチ周期長の前記代表音源を前記ピッ
チ周期の間隔で並べることにより復号音源信号を生成す
るとともに、この復号音源信号と復号化されたスペクト
ル包絡を用いて復号音声信号を復号する復号化部と、を
備えた音声符号化復号化装置において、復号化された該
当フレームの前記代表音源とインパルス信号とを重み付
け加算する音源整形手段と、この音源整形手段から出力
される整形代表音源を前記ピッチ周期の間隔で並べるこ
とにより前記復号音源信号を生成する代表音源繰り返し
手段と、を前記復号化部に備えて構成される。According to a first aspect of the present invention, an input audio signal is analyzed for each frame of a predetermined length, and spectral envelope information of the input audio signal and a sound source are analyzed. A signal and a pitch period are obtained, and when the input voice signal is a voiced sound, short-time phase equalization is performed so that the phase of the sound source signal becomes zero at the maximum amplitude position on the signal waveform appearing at the pitch period interval. An encoding unit that extracts a representative sound source for one pitch period length after performing the process, encodes the representative sound source, the pitch period, and the spectrum envelope information, and outputs the information as a spectrum envelope code;
The coded parameters output from the coder are decoded, and when the input voice signal is a voiced sound, the decoded representative sound source having one pitch cycle length is arranged at intervals of the pitch cycle. And a decoding unit that generates a decoded excitation signal and decodes the decoded audio signal using the decoded excitation signal and the decoded spectrum envelope. Sound source shaping means for weighting and adding the representative sound source and the impulse signal of the corresponding frame; and representative sound source repetition for generating the decoded sound source signal by arranging the shaped representative sound sources output from the sound source shaping device at intervals of the pitch period. And means in the decoding unit.

【００１３】また、請求項２記載の発明は、請求項１記
載の音声符号化復号化装置において、前記音源整形手段
がは、前記該当フレームと前記該当フレームの近傍のフ
レームの音声信号の変化の大きさに従って、前記代表音
源と前記インパルス信号とを重み付け加算する場合の重
み付け係数の値を変化させるように構成される。According to a second aspect of the present invention, in the audio encoding / decoding apparatus according to the first aspect, the sound source shaping means is configured to determine a change in an audio signal of the corresponding frame and a frame near the corresponding frame. It is configured to change the value of a weighting coefficient when weighting and adding the representative sound source and the impulse signal according to the magnitude.

【００１４】[0014]

【作用】上記構成を有する請求項１記載の発明によれ
ば、音源整形手段は、復号化された該当フレームの前記
代表音源とインパルス信号とを重み付け加算する。ま
た、代表音源繰り返し手段は、音源整形手段から出力さ
れる整形代表音源を前記ピッチ周期の間隔で並べること
により前記復号音源信号を生成する。According to the first aspect of the present invention, the sound source shaping means weights and adds the representative sound source and the impulse signal of the decoded frame. In addition, the representative excitation repeating unit generates the decoded excitation signal by arranging the shaped representative excitations output from the excitation shaping unit at intervals of the pitch period.

【００１５】また上記構成を有する請求項２記載の発明
によれば、前記音源整形手段は、前記該当フレームと前
記該当フレームの近傍のフレームの音声信号の変化の大
きさに従って、前記代表音源と前記インパルス信号とを
重み付け加算する場合の重み付け係数の値を変化させ
る。According to the second aspect of the present invention having the above structure, the sound source shaping means may be configured to set the representative sound source and the representative sound source in accordance with a magnitude of a change in an audio signal between the relevant frame and a frame near the relevant frame. The value of the weighting coefficient when weighting and adding the impulse signal is changed.

【００１６】[0016]

【Example】

（第１実施例）以下に、本発明の実施例を図に基づいて
説明する。図１は、本発明の第１実施例である音声符号
化復号化装置の構成を示すブロック図である。図に示す
ように、この音声符号化復号化装置１０１は、符号化部
１１０と復号化部１２０を備えて構成されている。上記
の符号化部１１０は、スペクトル包絡分析部２と、逆フ
ィルタ部４と、位相等化部７と、代表音源切出し部９
と、ピッチ周期抽出部１１と、有声／無声判定部１３
と、代表音源符号化部１７を有している。また、上記の
復号化部１２０は、代表音源復号化部１９と、代表音源
繰返し部２１と、音源整形部２５と、合成フィルタ部２
３を有して構成されている。この第１実施例の音声符号
化復号化装置１０１が、上記した従来例の音声符号化復
号化装置２００と構成において異なる点は、復号化部１
２０に音源整形手段である音源整形部２５を備えた点で
ある。この音源整形部２５は、代表音源復号化部１９の
出力側及び代表音源繰返し手段である代表音源繰返し部
２１の入力側に配置される。音源整形部２５以外の構成
要素は、図４に示す構成要素と同一構成で同一作用効果
を有するので、それらについては、図４に示す構成要素
と同一符号を符し、その説明は省略する。(First Embodiment) An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a speech encoding / decoding device according to a first embodiment of the present invention. As shown in the figure, the speech encoding / decoding device 101 includes an encoding unit 110 and a decoding unit 120. The encoding unit 110 includes a spectrum envelope analysis unit 2, an inverse filter unit 4, a phase equalization unit 7, and a representative sound source extraction unit 9
, Pitch period extracting unit 11, voiced / unvoiced determining unit 13
And a representative excitation coding unit 17. The decoding section 120 includes a representative excitation decoding section 19, a representative excitation repetition section 21, an excitation shaping section 25, and a synthesis filter section 2.
3. The difference between the speech encoding / decoding device 101 of the first embodiment and the speech encoding / decoding device 200 of the conventional example described above is that
20 in that a sound source shaping unit 25 as a sound source shaping means is provided. The sound source shaping unit 25 is arranged on the output side of the representative sound source decoding unit 19 and on the input side of the representative sound source repetition unit 21 which is a representative sound source repetition unit. The components other than the sound source shaping unit 25 have the same functions and effects as those of the components shown in FIG. 4, and therefore, are denoted by the same reference numerals as those of the components shown in FIG. 4, and description thereof will be omitted.

【００１７】以下、上記の音声符号化復号化装置１０１
の動作について説明する。上記の音源整形部２５は、代
表音源Ｓ20にインパルス信号を重み付け加算して代表音
源Ｓ20を整形する。この整形により、整形された整形代
表音源Ｓ26の波形形状が常にインパルスに近づくため、
これを代表音源繰返し部２１で繰り返しても、変化の滑
らかな復号化音源信号Ｓ22が生成できる。また、現在の
フレームとその近傍のフレームの間で大きく入力音声信
号が変化した場合は、上記の重み付け係数を調整して代
表音源をよりインパルスに近づけることにより、音源信
号の変化を滑らかにすることができる。The speech encoding / decoding device 101 will be described below.
Will be described. The sound source shaping section 25 shapes the representative sound source S20 by weighting and adding the impulse signal to the representative sound source S20. Due to this shaping, the shaped waveform of the shaped representative sound source S26 always approaches the impulse.
Even if this is repeated in the representative excitation repeating section 21, a decoded excitation signal S22 having a smooth change can be generated. If the input audio signal changes greatly between the current frame and its neighboring frames, adjust the above weighting coefficient to make the representative sound source closer to the impulse, thereby smoothing the change of the sound source signal. Can be.

【００１８】以下に、音源整形部２５の動作について、
さらに詳しく説明する。ここで、代表音源音Ｓ20をＲＣ
i （ｉは１からＰまでの正の整数）とし、この代表音源
Ｓ20の振幅最大位置と同じ位置にパルスを持つインパル
ス信号をＲＩiとし、ピッチ周期をＰとし、重み付け係
数をα（αは０≦α≦１なる実数）としたとき、音源整
形部２５は、例えば下式ＲＭi ＝（１−α）×ＲＣi ＋α×ＲＩi ………（１）に従って両者を重み付け加算し、整形代表音源Ｓ26であ
る信号ＲＭi を得る。Hereinafter, the operation of the sound source shaping section 25 will be described.
This will be described in more detail. Here, the representative sound source sound S20 is RC
i (i is a positive integer from 1 to P), an impulse signal having a pulse at the same position as the maximum amplitude position of the representative sound source S20 is RIi, a pitch period is P, and a weighting coefficient is α (α is 0 When ≦ α ≦ 1 (real number), the sound source shaping unit 25 weights and adds the two according to, for example, the following formula: RMi = (1−α) × RCi + α × RIi (1) A certain signal RMi is obtained.

【００１９】上式（１）において、下式In the above equation (1), the following equation

【数１】なる関係がある。(Equation 1) There is a relationship.

【００２０】図２は、上記の音源整形部２５における上
式（１）で示した代表音源の重み付け加算の動作を説明
する図である。図に示すように、インパルス波形の形状
に近づいた整形代表音源が得られることが示されてい
る。FIG. 2 is a diagram for explaining the operation of weighted addition of the representative sound source represented by the above equation (1) in the sound source shaping section 25. As shown in the figure, it is shown that a shaped representative sound source that approaches the shape of the impulse waveform can be obtained.

【００２１】また音源整形部２５は、スペクトル包絡符
号Ｓ5 を復号化してスペクトル包絡を求め、現在のフレ
ームのスペクトル包絡と例えば直前のフレームのスペク
トル包絡との距離Ｄを計算し、距離Ｄが大きい場合は重
み付け係数αを大きく設定して整形代表音源をよりイン
パルスに近づける。The sound source shaping section 25 decodes the spectrum envelope code S5 to obtain a spectrum envelope, and calculates a distance D between the spectrum envelope of the current frame and the spectrum envelope of the immediately preceding frame. Sets a large weighting coefficient α to make the shaped representative sound source closer to an impulse.

【００２２】代表音源繰返し部２１は音源整形部２５で
求められた整形代表音源Ｓ26をピッチ周期間隔で繰り返
して復号音源信号Ｓ22を求める。The representative excitation repetition section 21 repeats the shaped representative excitation S26 obtained by the excitation shaping section 25 at a pitch cycle interval to obtain a decoded excitation signal S22.

【００２３】図３は本実施例で生成される復号音源信号
Ｓ22の例を示しており、図５（Ａ）は整形された整形代
表音源を、図５（Ｂ）は復号音源信号を示している。図
に示すように、音源整形部２１の処理によりフレームと
フレームの間で変化の滑らかな復号音源信号が得られ
る。FIG. 3 shows an example of the decoded excitation signal S22 generated in this embodiment. FIG. 5A shows a shaped representative excitation, and FIG. 5B shows a decoded excitation signal. I have. As shown in the figure, a decoded excitation signal having a smooth change between frames is obtained by the processing of the excitation shaping section 21.

【００２４】（第２実施例）上記した第１実施例では、
音源整形部２５が重み付け係数αの大きさを設定するの
に現在のフレームと直前のフレームのスペクトル包絡間
の距離値Ｄを用いたが、本発明はこれには限定されず、
両フレームの代表音源の相互相関値を用いる第２実施例
のように構成してもよい。(Second Embodiment) In the first embodiment described above,
Although the sound source shaping unit 25 uses the distance value D between the spectral envelopes of the current frame and the immediately preceding frame to set the magnitude of the weighting coefficient α, the present invention is not limited to this.
It may be configured as in the second embodiment using the cross-correlation values of the representative sound sources of both frames.

【００２５】なお、本発明は、上記実施例に限定される
ものではない。上記実施例は、例示であり、本発明の特
許請求の範囲に記載された技術的思想と実質的に同一な
構成を有し、同様な作用効果を奏するものは、いかなる
ものであっても本発明の技術的範囲に包含される。The present invention is not limited to the above embodiment. The above-described embodiment is an exemplification, and has substantially the same configuration as the technical idea described in the claims of the present invention, and any device having the same function and effect can be realized by the present invention. It is included in the technical scope of the invention.

【００２６】[0026]

【発明の効果】以上説明したように本発明によれば、音
源整形手段により代表音源をインパルスに近づけるよう
に整形したので、フレーム間で変化の滑らかな復号音源
信号を得られる。また入力音声信号の変化が大きいとき
はさらにインパルスに近づくように整形するので、大き
な変化にも追従可能な復号音源信号を得られる。したが
って、品質の良い復号音声信号を得ることができる、と
いう利点がある。As described above, according to the present invention, since the representative sound source is shaped by the sound source shaping means so as to approximate an impulse, a decoded sound source signal having a smooth change between frames can be obtained. Further, when the change of the input speech signal is large, the input speech signal is shaped so as to be closer to the impulse, so that a decoded excitation signal that can follow a large change can be obtained. Therefore, there is an advantage that a high-quality decoded audio signal can be obtained.

[Brief description of the drawings]

【図１】本発明の第１実施例である音声符号化復号化装
置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a speech encoding / decoding device according to a first embodiment of the present invention.

【図２】図１に示す音声符号化復号化装置における音源
整形部の動作を説明する図である。FIG. 2 is a diagram illustrating an operation of a sound source shaping unit in the speech encoding / decoding device illustrated in FIG.

【図３】図１に示す音声符号化復号化装置における復号
音源信号の生成動作を説明する図である。FIG. 3 is a diagram for explaining an operation of generating a decoded excitation signal in the speech encoding / decoding apparatus shown in FIG.

【図４】従来の音声符号化復号化装置の構成を示すブロ
ック図である。FIG. 4 is a block diagram showing a configuration of a conventional speech encoding / decoding device.

【図５】図５に示す音声符号化復号化装置における復号
音源信号の生成動作を説明する図である。5 is a diagram illustrating an operation of generating a decoded excitation signal in the speech encoding / decoding device illustrated in FIG. 5;

[Explanation of symbols]

２スペクトル包絡分析部４逆フィルタ部７位相等化部９代表音源切出し部１１ピッチ周期抽出部１３有声／無声判定部１７代表音源符号化部１９代表音源復号化部２１代表音源繰返し部２３合成フィルタ部２５音源整形部１０１音声符号化復号化装置１１０符号化部１２０復号化部２００音声符号化復号化装置２１０符号化部２２０復号化部Ｓ信号等 2 Spectrum Envelope Analysis Unit 4 Inverse Filter Unit 7 Phase Equalization Unit 9 Representative Sound Source Extraction Unit 11 Pitch Period Extraction Unit 13 Voiced / Unvoiced Judgment Unit 17 Representative Sound Source Coding Unit 19 Representative Sound Source Decoding Unit 21 Representative Sound Source Repetition Unit 23 Synthesis Filter Unit 25 sound source shaping unit 101 voice coding / decoding device 110 coding unit 120 decoding unit 200 voice coding / decoding device 210 coding unit 220 decoding unit S signal, etc.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−250694（ＪＰ，Ａ) 特開昭61−51200（ＪＰ，Ａ) 特開昭60−260098（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-6-250694 (JP, A) JP-A-61-51200 (JP, A) JP-A-60-260098 (JP, A)

Claims

(57) [Claims]

1. An input audio signal is analyzed for each frame of a predetermined length to obtain spectrum envelope information, a sound source signal, and a pitch period of the input audio signal, and when the input audio signal is a voiced sound, After performing a short-time phase equalization process so that the phase of the sound source signal becomes zero at the maximum amplitude position on the signal waveform appearing at the pitch period interval, a representative sound source for one pitch period length is extracted. An encoding unit that encodes the sound source and the pitch cycle and the spectrum envelope information and outputs the encoded information as a spectrum envelope code, and decodes each encoded parameter output from the encoding unit,
When the input audio signal is a voiced sound, a decoded excitation signal is generated by arranging the decoded representative excitation having one pitch cycle length at intervals of the pitch cycle,
A decoding unit that decodes the decoded speech signal using the decoded excitation signal and the decoded spectrum envelope, wherein the representative excitation and the impulse signal of the decoded corresponding frame are Sound source shaping means for weighting and adding, and representative sound source repetition means for generating the decoded excitation signal by arranging shaped representative sound sources output from the sound source shaping means at intervals of the pitch period, provided in the decoding unit. Speech encoding / decoding device.

2. The sound source shaping means includes: a weighting coefficient value for weighting and adding the representative sound source and the impulse signal according to the magnitude of a change in an audio signal between the relevant frame and a frame near the relevant frame. 2. The speech encoding / decoding apparatus according to claim 1, wherein