JP3563400B2

JP3563400B2 - Audio decoding device and audio decoding method

Info

Publication number: JP3563400B2
Application number: JP2003312063A
Authority: JP
Inventors: 正山浦
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1997-12-24
Filing date: 2003-09-04
Publication date: 2004-09-08
Anticipated expiration: 2018-12-07
Also published as: JP2004046232A

Description

この発明は音声信号をディジタル信号に圧縮符号化復号化する際に使用する音声符号化・復号化方法及び音声符号化・復号化装置に関し、特に低ビットレートで品質の高い音声を再生するための音声符号化方法及び音声復号化方法並びに音声符号化装置及び音声復号化装置に関する。 The present invention relates to a voice coding / decoding method and a voice coding / decoding apparatus used when a voice signal is compression-coded / decoded into a digital signal, and particularly to a high-quality voice at a low bit rate. The present invention relates to an audio encoding method and an audio decoding method, and an audio encoding device and an audio decoding device.

従来、高能率音声符号化方法としては、符号駆動線形予測（Code-Excited Linear Prediction：ＣＥＬＰ）符号化が代表的であり、その技術については、「Code-excited linear prediction（ＣＥＬＰ）：High-quality speech at very low bit rates」（M.R.Shroeder and B.S.Atal著、ICASSP '85, pp.937-940, 1985）に述べられている。 Conventionally, as a high-efficiency speech coding method, Code-Excited Linear Prediction (CELP) coding is typical, and the technique is described in "Code-excited Linear Prediction (CELP): High-quality". speech at very low bit rates "(MR Shroeder and BSAtal, ICASSP '85, pp.937-940, 1985).

図６は、ＣＥＬＰ音声符号化復号化方法の全体構成の一例を示すもので、図中１０１は符号化部、１０２は復号化部、１０３は多重化手段、１０４は分離手段である。符号化部１０１は線形予測パラメータ分析手段１０５、線形予測パラメータ符号化手段１０６、合成フィルタ１０７、適応符号帳１０８、駆動符号帳１０９、ゲイン符号化手段１１０、距離計算手段１１１、重み付け加算手段１３８より構成されている。また、復号化部１０２は線形予測パラメータ復号化手段１１２、合成フィルタ１１３、適応符号帳１１４、駆動符号帳１１５、ゲイン復号化手段１１６、重み付け加算手段１３９より構成されている。 FIG. 6 shows an example of the overall configuration of the CELP speech coding / decoding method. In the drawing, reference numeral 101 denotes a coding unit, 102 denotes a decoding unit, 103 denotes a multiplexing unit, and 104 denotes a separating unit. The encoding unit 101 includes a linear prediction parameter analysis unit 105, a linear prediction parameter encoding unit 106, a synthesis filter 107, an adaptive codebook 108, a driving codebook 109, a gain encoding unit 110, a distance calculation unit 111, and a weighting addition unit 138. It is configured. The decoding unit 102 includes a linear prediction parameter decoding unit 112, a synthesis filter 113, an adaptive codebook 114, a driving codebook 115, a gain decoding unit 116, and a weighting addition unit 139.

ＣＥＬＰ音声符号化では、5〜50ms程度を１フレームとして、そのフレームの音声をスペクトル情報と音源情報に分けて符号化する。まず、ＣＥＬＰ音声符号化方法の動作について説明する。符号化部１０１において、線形予測パラメータ分析手段１０５は入力音声Ｓ１０１を分析し、音声のスペクトル情報である線形予測パラメータを抽出する。線形予測パラメータ符号化手段１０６はその線形予測パラメータを符号化し、符号化した線形予測パラメータを合成フィルタ１０７の係数として設定する。 In CELP speech coding, about 5 to 50 ms is defined as one frame, and the speech of that frame is divided into spectrum information and sound source information and encoded. First, the operation of the CELP speech coding method will be described. In the encoding unit 101, the linear prediction parameter analysis unit 105 analyzes the input speech S101 and extracts a linear prediction parameter that is speech spectrum information. The linear prediction parameter coding unit 106 codes the linear prediction parameter, and sets the coded linear prediction parameter as a coefficient of the synthesis filter 107.

次に音源情報の符号化について説明する。適応符号帳１０８には、過去の駆動音源信号が記憶されており、距離計算手段１１１から入力される適応符号に対応して過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。駆動符号帳１０９には、例えば学習用音声とその符号化音声との歪みが小さくなるように学習して構成された複数の時系列ベクトルが記憶されており、距離計算手段１１１から入力される駆動符号に対応した時系列ベクトルを出力する。 Next, encoding of the sound source information will be described. The adaptive codebook 108 stores past driving excitation signals, and outputs a time-series vector obtained by periodically repeating the past driving excitation signals corresponding to the adaptive code input from the distance calculation unit 111. The drive codebook 109 stores, for example, a plurality of time-series vectors configured by learning so that distortion between the learning speech and the encoded speech is reduced. Output a time-series vector corresponding to the code.

適応符号帳１０８、駆動符号帳１０９からの各時系列ベクトルはゲイン符号化手段１１０から与えられるそれぞれのゲインに応じて重み付け加算手段１３８で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ１０７へ供給し符号化音声を得る。距離計算手段１１１は符号化音声と入力音声Ｓ１０１との距離を求め、距離が最小となる適応符号、駆動符号、ゲインを探索する。上記符号化が終了した後、線形予測パラメータの符号、入力音声と符号化音声との歪みを最小にする適応符号、駆動符号、ゲインの符号を符号化結果として出力する。 The time series vectors from the adaptive codebook 108 and the driving codebook 109 are weighted and added by the weighting and adding means 138 according to the respective gains provided from the gain coding means 110, and the added results are combined as a driving excitation signal. The encoded voice is supplied to the filter 107 to obtain the encoded voice. The distance calculation unit 111 obtains the distance between the coded speech and the input speech S101, and searches for an adaptive code, a drive code, and a gain that minimize the distance. After the above-mentioned encoding is completed, the code of the linear prediction parameter, the adaptive code for minimizing the distortion between the input speech and the encoded speech, the drive code, and the gain code are output as the encoding result.

次にＣＰＥＬ音声復号化方法の動作について説明する。 Next, the operation of the CPEL audio decoding method will be described.

一方復号化部１０２において、線形予測パラメータ復号化手段１１２は線形予測パラメータの符号から線形予測パラメータを復号化し、合成フィルタ１１３の係数として設定する。次に、適応符号帳１１４は、適応符号に対応して、過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力し、また駆動符号帳１１５は駆動符号に対応した時系列ベクトルを出力する。これらの時系列ベクトルは、ゲイン復号化手段１１６でゲインの符号から復号化したそれぞれのゲインに応じて重み付け加算手段１３９で重み付けして加算され、その加算結果が駆動音源信号として合成フィルタ１１３へ供給され出力音声Ｓ１０３が得られる。 On the other hand, in the decoding unit 102, the linear prediction parameter decoding unit 112 decodes the linear prediction parameter from the code of the linear prediction parameter, and sets it as a coefficient of the synthesis filter 113. Next, adaptive codebook 114 outputs a time-series vector corresponding to the adaptive code and periodically repeats the past excitation signal, and drive codebook 115 outputs a time-series vector corresponding to the drive code. I do. These time-series vectors are weighted and added by weighting and adding means 139 according to the respective gains decoded from the sign of the gain by the gain decoding means 116, and the addition result is supplied to the synthesis filter 113 as a drive excitation signal. Thus, an output voice S103 is obtained.

またＣＥＬＰ音声符号化復号化方法で再生音声品質の向上を目的として改良された従来の音声符号化復号化方法として、「Phonetically-based vector excitation coding of speech at 3.6kbps」（S.Wang and A.Gersho著、ICASSP '89, pp.49-52, 1989）に示されたものがある。図６との対応手段分に同一符号を付けた図７は、この従来の音声符号化復号化方法の全体構成の一例を示し、図中符号化部１０１において１１７は音声状態判定手段、１１８駆動符号帳切替手段、１１９は第１の駆動符号帳、１２０は第２の駆動符号帳である。また図中復号化手段１０２において１２１は駆動符号帳切替手段、１２２は第１の駆動符号帳、１２３は第２の駆動符号帳である。このような構成による符号化復号化方法の動作を説明する。まず符号化手段１０１において、音声状態判定手段１１７は入力音声Ｓ１０１を分析し、音声の状態を例えば有声／無声の２つの状態のうちどちらであるかを判定する。駆動符号帳切替手段１１８はその音声状態判定結果に応じて、例えば有声であれば第１の駆動符号帳１１９を、無声であれば第２の駆動符号帳１２０を用いるとして符号化に用いる駆動符号帳を切り替え、また、どちらの駆動符号帳を用いたかを符号化する。 As a conventional speech coding / decoding method improved for the purpose of improving reproduced speech quality by the CELP speech coding / decoding method, “Phonetically-based vector excitation coding of speech at 3.6 kbps” (S. Wang and A. Gersho, ICASSP '89, pp.49-52, 1989). FIG. 7 in which the same reference numerals are assigned to the corresponding units as in FIG. 6 shows an example of the entire configuration of the conventional audio encoding / decoding method. In the encoding unit 101 in FIG. Codebook switching means, 119 is a first drive codebook, and 120 is a second drive codebook. Further, in the decoding means 102 in the figure, 121 is a driving codebook switching means, 122 is a first driving codebook, and 123 is a second driving codebook. The operation of the encoding / decoding method having such a configuration will be described. First, in the encoding unit 101, the voice state determination unit 117 analyzes the input voice S101, and determines which of the two voice states is voiced / unvoiced. The drive codebook switching unit 118 determines that the first drive codebook 119 is used if voiced, and the second drive codebook 120 is used if unvoiced, according to the voice state determination result. The book is switched, and which driving codebook is used is encoded.

次に復号化手段１０２において、駆動符号帳切替手段１２１は符号化手段１０１でどちらの駆動符号帳を用いたかの符号に応じて、符号化手段１０１で用いたのと同じ駆動符号帳を用いるとして第１の駆動符号帳１２２と第２の駆動符号帳１２３とを切り替える。このように構成することにより、音声の各状態毎に符号化に適した駆動符号帳を用意し、入力された音声の状態に応じて駆動符号帳を切り替えて用いることで再生音声の品質を向上することができる。 Next, in the decoding unit 102, the driving codebook switching unit 121 determines that the same driving codebook as used in the encoding unit 101 is used according to the code of which driving codebook was used in the encoding unit 101. The first drive codebook 122 and the second drive codebook 123 are switched. With this configuration, a driving codebook suitable for encoding is prepared for each state of the voice, and the quality of the reproduced voice is improved by switching and using the driving codebook according to the state of the input voice. can do.

また送出ビット数を増加することなく、複数の駆動符号帳を切り替える従来の音声符号化復号化方法として特開平８−１８５１９８号公報に開示されたものがある。これは、適応符号帳で選択したピッチ周期に応じて、複数個の駆動符号帳を切り替えて用いるものである。これにより、伝送情報を増やさずに入力音声の特徴に適応した駆動符号帳を用いることができる。 A conventional speech coding / decoding method for switching between a plurality of driving codebooks without increasing the number of transmission bits is disclosed in Japanese Patent Application Laid-Open No. 8-185198. In this method, a plurality of driving codebooks are switched and used according to the pitch period selected in the adaptive codebook. As a result, it is possible to use a drive codebook adapted to the characteristics of the input speech without increasing the transmission information.

「Code-excited linear prediction（ＣＥＬＰ）：High-quality speech at very low bit rates」（M.R.Shroeder and B.S.Atal著、ICASSP '85, pp.937-940, 1985）"Code-excited linear prediction (CELP): High-quality speech at very low bit rates" (M.R.Shroeder and B.S.Atal, ICASSP '85, pp.937-940, 1985) 「Phonetically-based vector excitation coding of speech at 3.6kbps」（S.Wang and A.Gersho著、ICASSP '89, pp.49-52, 1989）"Phonetically-based vector excitation coding of speech at 3.6kbps" (S. Wang and A. Gersho, ICASSP '89, pp.49-52, 1989) 特開平８−１８５１９８号公報JP-A-8-185198

上述したように図６に示す従来の音声符号化復号化方法では、単一の駆動符号帳を用いて合成音声を生成している。低ビットレートでも品質の高い符号化音声を得るためには、駆動符号帳に格納する時系列ベクトルはパルスを多く含む非雑音的なものとなる。このため、背景雑音や摩擦性子音など雑音的な音声を符号化、合成した場合、符号化音声はジリジリ、チリチリといった不自然な音を発するという問題があった。駆動符号帳を雑音的な時系列ベクトルからのみ構成すればこの問題は解決するが、符号化音声全体としての品質が劣化する。 As described above, in the conventional speech encoding / decoding method shown in FIG. 6, a synthesized speech is generated using a single driving codebook. In order to obtain high-quality coded speech even at a low bit rate, the time-series vector stored in the driving codebook is non-noise containing many pulses. For this reason, when coding and synthesizing noise-like sounds such as background noise and fricative consonants, there is a problem in that the coded sounds emit unnatural sounds such as gritty sounds and dusty sounds. This problem can be solved by constructing the driving codebook only from noise-like time-series vectors, but the quality of the encoded speech as a whole deteriorates.

また改良された図７に示す従来の音声符号化復号化方法では、入力音声の状態に応じて複数の駆動符号帳を切り替えて符号化音声を生成している。これにより例えば入力音声が雑音的な無声部分では雑音的な時系列ベクトルから構成された駆動符号帳を、またそれ以外の有声部分では非雑音的な時系列ベクトルから構成された駆動符号帳を用いることができ、雑音的な音声を符号化、合成しても不自然なジリジリした音を発することはなくなる。しかし、復号化側でも符号化側と同じ駆動符号帳を用いるために、新たにどの駆動符号帳を使用したかの情報を符号化、伝送する必要が生じ、これが低ビットレート化の妨げになるという問題があった。 In the improved conventional speech encoding / decoding method shown in FIG. 7, a plurality of driving codebooks are switched according to the state of input speech to generate encoded speech. Thus, for example, in an unvoiced part where the input speech is noisy, a driving codebook composed of a noise-like time series vector is used, and in other voiced parts, a driving codebook composed of a non-noise time-series vector is used. Therefore, even if noise-like speech is encoded and synthesized, an unnatural sizzling sound is not emitted. However, since the decoding side uses the same driving codebook as the encoding side, it is necessary to encode and transmit information on which driving codebook is newly used, which hinders a reduction in bit rate. There was a problem.

また送出ビット数を増加することなく、複数の駆動符号帳を切り替える従来の音声符号化復号化方法では、適応符号帳で選択されるピッチ周期に応じて駆動符号帳を切り替えている。しかし、適応符号帳で選択されるピッチ周期は実際の音声のピッチ周期とは異なり、その値からだけでは入力音声の状態が雑音的か非雑音的かを判定できないので、音声の雑音的な部分の符号化音声が不自然であるという課題は解決されない。 Further, in the conventional speech coding / decoding method for switching between a plurality of drive codebooks without increasing the number of transmission bits, the drive codebook is switched according to a pitch cycle selected by the adaptive codebook. However, the pitch period selected in the adaptive codebook is different from the pitch period of the actual voice, and it is not possible to judge whether the state of the input voice is noisy or non-noise based on the value alone. Does not solve the problem that the coded speech is unnatural.

この発明はかかる課題を解決するためになされたものであり、低ビットレートでも品質の高い音声を再生する音声符号化復号化方法及び装置を提供するものである。 The present invention has been made to solve such a problem, and an object of the present invention is to provide an audio encoding / decoding method and apparatus for reproducing high quality audio even at a low bit rate.

上述の課題を解決するためにこの発明の音声符号化方法は、スペクトル情報、パワー情報、ピッチ情報のうち少なくとも１つの符号または符号化結果を用いて該符号化区間における音声の雑音性の度合いを評価し、評価結果に応じて複数の駆動符号帳のうち１つを選択するようにした。 In order to solve the above-described problems, a speech encoding method according to the present invention uses at least one code or encoding result of spectrum information, power information, and pitch information to determine the degree of noise of speech in the encoding section. The evaluation is performed, and one of the plurality of driving codebooks is selected according to the evaluation result.

さらに次の発明の音声符号化方法は、格納している時系列ベクトルの雑音性の度合いが異なる複数の駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、複数の駆動符号帳を切り替えるようにした。 The speech encoding method according to the next invention further includes a plurality of driving codebooks having different degrees of noise of the stored time-series vectors, and a plurality of driving codes according to the evaluation result of the degree of noise of speech. Switch books.

さらに次の発明の音声符号化方法は、音声の雑音性の度合いの評価結果に応じて、駆動符号帳に格納している時系列ベクトルの雑音性の度合いを変化させるようにした。 Further, in the speech encoding method according to the next invention, the degree of noise of the time series vector stored in the driving codebook is changed according to the evaluation result of the degree of noise of speech.

さらに次の発明の音声符号化方法は、雑音的な時系列ベクトルを格納している駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、駆動音源の信号サンプルを間引くことにより雑音性の度合いが低い時系列ベクトルを生成するようにした。 Furthermore, the speech encoding method according to the next invention includes a driving codebook storing a noise-like time-series vector, and thins out signal samples of the driving sound source according to the evaluation result of the degree of noise of the speech. A time series vector with a low degree of noise is generated.

さらに次の発明の音声符号化方法は、雑音的な時系列ベクトルを格納している第１の駆動符号帳と、非雑音的なの時系列ベクトルを格納している第２の駆動符号帳とを備え、音声の雑音性の度合いの評価結果に応じて、第１の駆動符号帳の時系列ベクトルと第２の駆動符号帳の時系列ベクトルを重み付けし加算した時系列ベクトルを生成するようにした。 The speech encoding method according to the next invention further comprises a first driving codebook storing a noise-like time-series vector and a second driving codebook storing a non-noise-like time-series vector. The time series vector of the first driving codebook and the time series vector of the second driving codebook are weighted and added to generate a time series vector in accordance with the evaluation result of the degree of noise of speech. .

また次の発明の音声復号化方法は、スペクトル情報、パワー情報、ピッチ情報のうち少なくとも１つの符号または復号化結果を用いて該復号化区間における音声の雑音性の度合いを評価し、評価結果に応じて複数の駆動符号帳のうちの１つを選択するようにした。 Further, the speech decoding method of the next invention evaluates the degree of noise of speech in the decoding section using at least one code or decoding result among spectrum information, power information, and pitch information, and Accordingly, one of a plurality of driving codebooks is selected.

さらに次の発明の音声復号化方法は、格納している時系列ベクトルの雑音性の度合いが異なる複数の駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、複数の駆動符号帳を切り替えるようにした。 Furthermore, the speech decoding method of the next invention comprises a plurality of driving codebooks having different degrees of noise of stored time-series vectors, and a plurality of driving codes according to the evaluation result of the degree of noise of speech. Switch books.

さらに次の発明の音声復号化方法は、音声の雑音性の度合いの評価結果に応じて、駆動符号帳に格納している時系列ベクトルの雑音性の度合いを変化させるようにした。 Further, in the speech decoding method of the next invention, the degree of noise of the time-series vector stored in the driving codebook is changed according to the evaluation result of the degree of noise of speech.

さらに次の発明の音声復号化方法は、雑音的な時系列ベクトルを格納している駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、駆動音源の信号サンプルを間引くことにより雑音性の度合いが低い時系列ベクトルを生成するようにした。 Further, the speech decoding method according to the next invention includes a driving codebook storing a noise-like time-series vector, and thins out signal samples of the driving sound source according to the evaluation result of the degree of noise of the speech. A time series vector with a low degree of noise is generated.

さらに次の発明の音声復号化方法は、雑音的な時系列ベクトルを格納している第１の駆動符号帳と、非雑音的な時系列ベクトルを格納している第２の駆動符号帳とを備え、音声の雑音性の度合いの評価結果に応じて、第１の駆動符号帳の時系列ベクトルと第２の駆動符号帳の時系列ベクトルを重み付けし加算した時系列ベクトルを生成するようにした。 Further, the speech decoding method according to the next invention is characterized in that a first driving codebook storing a noise-like time series vector and a second driving codebook storing a non-noise time-series vector are used. The time series vector of the first driving codebook and the time series vector of the second driving codebook are weighted and added to generate a time series vector in accordance with the evaluation result of the degree of noise of speech. .

さらに次の発明の音声符号化装置は、入力音声のスペクトル情報を符号化し、符号化結果の１要素として出力するスペクトル情報符号化部と、このスペクトル情報符号化部からの符号化されたスペクトル情報から得られるスペクトル情報、パワー情報のうち少なくとも１つの符号または符号化結果を用いて該符号化区間における音声の雑音性の度合いを評価し、評価結果を出力する雑音度評価部と、非雑音的な複数の時系列ベクトルが記憶された第１の駆動符号帳と、雑音的な複数の時系列ベクトルが記憶された第２の駆動符号帳と、前記雑音度評価部の評価結果により、第１の駆動符号帳と第２の駆動符号帳とを切り替える駆動符号帳切替部と、前記第１の駆動符号帳または第２の駆動符号帳からの時系列ベクトルをそれぞれの時系列ベクトルのゲインに応じて重み付けし加算する重み付け加算部と、この重み付けされた時系列ベクトルを駆動音源信号とし、この駆動音源信号と前記スペクトル情報符号化部からの符号化されたスペクトル情報とに基づいて符号化音声を得る合成フィルタと、この符号化音声と前記入力音声との距離を求め、距離が最小となる駆動符号、ゲインを探索し、その結果を駆動符号，ゲインの符号を符号化結果として出力する距離計算部とを備えた。 Further, a speech encoding apparatus according to the next invention encodes spectrum information of an input speech and outputs the encoded spectrum information as one element of an encoding result, and the encoded spectrum information from the spectrum information encoding section. A noise evaluation unit that evaluates the degree of noise of speech in the coding section using at least one code or coding result of the spectrum information and power information obtained from The first driving codebook in which a plurality of time-series vectors are stored, the second driving codebook in which a plurality of noise-like time-series vectors are stored, and an evaluation result of the noise degree evaluation unit, A drive codebook switching unit for switching between the first drive codebook and the second drive codebook, and a time series vector from the first drive codebook or the second drive codebook. A weighted addition unit for weighting and adding according to the gain of the signal, and using the weighted time-series vector as a drive excitation signal, based on the drive excitation signal and the encoded spectrum information from the spectrum information encoding unit. A synthesis filter for obtaining a coded voice by means of a filter, obtaining a distance between the coded voice and the input voice, searching for a drive code and a gain that minimize the distance, and using the result as a drive code and a code for the gain to obtain the coding result. And a distance calculation unit that outputs the result as

さらに次の発明の音声復号化装置は、スペクトル情報の符号からスペクトル情報を復号化するスペクトル情報復号化部と、このスペクトル情報復号化部からの復号化されたスペクトル情報から得られるスペクトル情報、パワー情報のうち少なくとも１つの復号化結果または前記スペクトル情報の符号を用いて該復号化区間における音声の雑音性の度合いを評価し、評価結果を出力する雑音度評価部と、非雑音的な複数の時系列ベクトルが記憶された第１の駆動符号帳と、雑音的な複数の時系列ベクトルが記憶された第２の駆動符号帳と、前記雑音度評価部の評価結果により、第１の駆動符号帳と第２の駆動符号帳とを切り替える駆動符号帳切替部と、前記第１の駆動符号帳または第２の駆動符号帳からの時系列ベクトルをそれぞれの時系列ベクトルのゲインに応じて重み付けし加算する重み付け加算部と、この重み付けされた時系列ベクトルを駆動音源信号とし、この駆動音源信号と前記スペクトル情報復号化部からの復号化されたスペクトル情報とに基づいて復号化音声を得る合成フィルタとを備えた。 A speech decoding apparatus according to the next invention further comprises a spectrum information decoding unit for decoding spectrum information from a code of the spectrum information, and spectrum information and power obtained from the decoded spectrum information from the spectrum information decoding unit. A noise evaluation unit that evaluates the degree of noise of speech in the decoding section using at least one decoding result of the information or the code of the spectrum information, and outputs a non-noise evaluation unit; A first driving codebook in which a time series vector is stored, a second driving codebook in which a plurality of noise-like time series vectors are stored, and a first driving code A drive codebook switching unit for switching between a booklet and a second drive codebook, and a time-series vector from the first drive codebook or the second drive codebook. A weighted addition unit for weighting and adding according to the gain of the torquer, and using the weighted time-series vector as a drive excitation signal, based on the drive excitation signal and the decoded spectrum information from the spectrum information decoding unit. And a synthesis filter for obtaining a decoded speech.

この発明に係る音声符号化装置は、符号駆動線形予測（ＣＥＬＰ）音声符号化装置において、スペクトル情報、パワー情報、ピッチ情報のうち少なくとも１つの符号または符号化結果を用いて該符号化区間における音声の雑音性の度合いを評価する雑音度評価部と、上記雑音度評価部の評価結果に応じて複数の駆動符号帳を切り替える駆動符号帳切替部とを備えたことを特徴とする。 A speech coding apparatus according to the present invention is a code-driven linear prediction (CELP) speech coding apparatus, wherein at least one of spectrum information, power information, and pitch information or a coding result is used. And a driving codebook switching unit that switches a plurality of driving codebooks according to the evaluation result of the noise evaluation unit.

この発明に係る音声復号化装置は、符号駆動線形予測（ＣＥＬＰ）音声復号化装置において、スペクトル情報、パワー情報、ピッチ情報のうち少なくとも１つの符号または復号化結果を用いて該復号化区間における音声の雑音性の度合いを評価する雑音度評価部と、上記雑音度評価部の評価結果に応じて複数の駆動符号帳を切り替える駆動符号帳切替部とを備えたことを特徴とする。 A speech decoding apparatus according to the present invention is a code-driven linear prediction (CELP) speech decoding apparatus, wherein at least one of spectrum information, power information, and pitch information or a decoding result is used. And a driving codebook switching unit that switches a plurality of driving codebooks according to the evaluation result of the noise evaluation unit.

本発明に係る音声符号化方法及び音声復号化方法並びに音声符号化装置及び音声復号化装置によれば、スペクトル情報、パワー情報、ピッチ情報のうち少なくとも１つの符号または符号化結果を用いて該符号化区間における音声の雑音性の度合いを評価し、評価結果に応じて異なる駆動符号帳を用いるので、少ない情報量で品質の高い音声を再生することができる。 According to the speech encoding method and the speech decoding method, and the speech encoding apparatus and the speech decoding apparatus according to the present invention, at least one of the spectrum information, the power information, and the pitch information or the encoding result is used for the encoding. Since the degree of noise of speech in the segmentation section is evaluated and different driving codebooks are used in accordance with the evaluation result, high-quality speech can be reproduced with a small amount of information.

またこの発明によれば、音声符号化方法及び音声復号化方法で、格納している駆動音源の雑音性の度合いが異なる複数の駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、複数の駆動符号帳を切り替えて用いるので、少ない情報量で品質の高い音声を再生することができる。 Further, according to the present invention, in the speech encoding method and the speech decoding method, a plurality of driving codebooks having different degrees of noise of the driving excitation stored therein are provided, and the plurality of driving codebooks are provided according to the evaluation result of the degree of noise of the speech. Since a plurality of driving codebooks are switched and used, high-quality sound can be reproduced with a small amount of information.

またこの発明によれば、音声符号化方法及び音声復号化方法で、音声の雑音性の度合いの評価結果に応じて、駆動符号帳に格納している時系列ベクトルの雑音性の度合いを変化させたので、少ない情報量で品質の高い音声を再生することができる。 Further, according to the present invention, in the speech encoding method and the speech decoding method, the degree of noise of the time-series vector stored in the driving codebook is changed according to the evaluation result of the degree of noise in speech. Therefore, high-quality sound can be reproduced with a small amount of information.

またこの発明によれば、音声符号化方法及び音声復号化方法で、雑音的な時系列ベクトルを格納している駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、時系列ベクトルの信号サンプルを間引くことにより雑音性の度合いが低い時系列ベクトルを生成したので、少ない情報量で品質の高い音声を再生することができる。 Further, according to the present invention, in the speech encoding method and the speech decoding method, a driving codebook storing a noise-like time-series vector is provided. Since a time-series vector with a low degree of noise is generated by thinning out the signal samples of the vector, high-quality sound can be reproduced with a small amount of information.

またこの発明によれば、音声符号化方法及び音声復号化方法で、雑音的な時系列ベクトルを格納している第１の駆動符号帳と、非雑音的な時系列ベクトルを格納している第２の駆動符号帳とを備え、音声の雑音性の度合いの評価結果に応じて、第１の駆動符号帳の時系列ベクトルと第２の駆動符号帳の時系列ベクトルを重み付け加算した時系列ベクトルを生成したので、少ない情報量で品質の高い音声を再生することができる。 Further, according to the present invention, in the speech encoding method and the speech decoding method, the first driving codebook storing the noise-like time-series vector and the second driving codebook storing the non-noise-like time-series vector. And a time series vector obtained by weighting and adding the time series vector of the first drive codebook and the time series vector of the second drive codebook according to the evaluation result of the degree of noise in speech. Is generated, a high-quality sound can be reproduced with a small amount of information.

以下図面を参照しながら、この発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

実施の形態１．
図１は、この発明による音声符号化方法及び音声復号化方法の実施の形態１の全体構成を示す。図中、１は符号化部、２は復号化部、３は多重化部、４は分離部である。符号化部１は、線形予測パラメータ分析部５、線形予測パラメータ符号化部６、合成フィルタ７、適応符号帳８、ゲイン符号化部１０、距離計算部１１、第１の駆動符号帳１９、第２の駆動符号帳２０、雑音度評価部２４、駆動符号帳切替部２５、重み付け加算部３８より構成されている。また、復号化部２は線形予測パラメータ復号化部１２、合成フィルタ１３、適応符号帳１４、第１の駆動符号帳２２、第２の駆動符号帳２３、雑音度評価部２６、駆動符号帳切替部２７、ゲイン復号化部１６、重み付け加算部３９より構成されている。図１中５は入力音声Ｓ１を分析し、音声のスペクトル情報である線形予測パラメータを抽出するスペクトル情報分析部としての線形予測パラメータ分析部、６はスペクトル情報であるその線形予測パラメータを符号化し、符号化した線形予測パラメータを合成フィルタ７の係数として設定するスペクトル情報符号化部としての線形予測パラメータ符号化部、１９、２２は非雑音的な複数の時系列ベクトルが記憶された第１の駆動符号帳、２０、２３は雑音的な複数の時系列ベクトルが記憶された第２の駆動符号帳、２４、２６は雑音の度合いを評価する雑音度評価部、２５、２７は雑音の度合いにより駆動符号帳を切り替える駆動符号帳切替部である。 Embodiment 1 FIG.
FIG. 1 shows an overall configuration of a first embodiment of a speech encoding method and a speech decoding method according to the present invention. In the figure, 1 is an encoding unit, 2 is a decoding unit, 3 is a multiplexing unit, and 4 is a demultiplexing unit. The coding unit 1 includes a linear prediction parameter analysis unit 5, a linear prediction parameter coding unit 6, a synthesis filter 7, an adaptive codebook 8, a gain coding unit 10, a distance calculation unit 11, a first driving codebook 19, 2, a driving codebook 20, a noise evaluation unit 24, a driving codebook switching unit 25, and a weighting addition unit 38. The decoding unit 2 includes a linear prediction parameter decoding unit 12, a synthesis filter 13, an adaptive codebook 14, a first driving codebook 22, a second driving codebook 23, a noise evaluation unit 26, and a driving codebook switching. It comprises a unit 27, a gain decoding unit 16, and a weighting addition unit 39. In FIG. 1, reference numeral 5 denotes a linear prediction parameter analysis unit as a spectrum information analysis unit for analyzing the input speech S1 and extracting a linear prediction parameter which is spectrum information of the speech, 6 encodes the linear prediction parameter which is spectrum information, The linear prediction parameter coding units 19 and 22 serving as spectrum information coding units for setting the coded linear prediction parameters as coefficients of the synthesis filter 7 are the first drive in which a plurality of non-noise time-series vectors are stored. Codebooks 20 and 23 are second driving codebooks storing a plurality of noise-like time-series vectors, 24 and 26 are noise degree evaluation units for evaluating the degree of noise, and 25 and 27 are driven based on the degree of noise. A driving codebook switching unit that switches codebooks.

以下、動作を説明する。まず、符号化部１において、線形予測パラメータ分析部５は入力音声Ｓ１を分析し、音声のスペクトル情報である線形予測パラメータを抽出する。線形予測パラメータ符号化部６はその線形予測パラメータを符号化し、符号化した線形予測パラメータを合成フィルタ７の係数として設定するとともに、雑音度評価部２４へ出力する。次に、音源情報の符号化について説明する。適応符号帳８には、過去の駆動音源信号が記憶されており、距離計算部１１から入力される適応符号に対応して過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２４は、前記線形予測パラメータ符号化部６から入力された符号化した線形予測パラメータと適応符号とから、例えば図２に示すようにスペクトルの傾斜、短期予測利得、ピッチ変動から該符号化区間の雑音の度合いを評価し、評価結果を駆動符号帳切替部２５に出力する。駆動符号帳切替部２５は前記雑音度の評価結果に応じて、例えば雑音度が低ければ第１の駆動符号帳１９を、雑音度が高ければ第２の駆動符号帳２０を用いるとして符号化に用いる駆動符号帳を切り替える。 Hereinafter, the operation will be described. First, in the encoding unit 1, the linear prediction parameter analysis unit 5 analyzes the input speech S1 and extracts a linear prediction parameter that is speech spectrum information. The linear prediction parameter coding unit 6 codes the linear prediction parameter, sets the coded linear prediction parameter as a coefficient of the synthesis filter 7, and outputs the coefficient to the noise evaluation unit 24. Next, encoding of the sound source information will be described. The adaptive codebook 8 stores past driving excitation signals, and outputs a time-series vector obtained by periodically repeating the past driving excitation signals corresponding to the adaptive code input from the distance calculation unit 11. The noise degree evaluator 24 uses the coded linear prediction parameters and the adaptive codes input from the linear prediction parameter encoder 6 to calculate, for example, the spectral tilt, the short-term prediction gain, and the pitch variation as shown in FIG. The degree of noise in the coding section is evaluated, and the evaluation result is output to the driving codebook switching unit 25. According to the evaluation result of the noise degree, the driving codebook switching unit 25 determines that the first driving codebook 19 is used if the noise degree is low, and that the second driving codebook 20 is used if the noise degree is high. Switch the driving codebook to be used.

第１の駆動符号帳１９には、非雑音的な複数の時系列ベクトル、例えば学習用音声とその符号化音声との歪みが小さくなるように学習して構成された複数の時系列ベクトルが記憶されている。また、第２の駆動符号帳２０には、雑音的な複数の時系列ベクトル、例えばランダム雑音から生成した複数の時系列ベクトルが記憶されており、距離計算部１１から入力されるそれぞれ駆動符号に対応した時系列ベクトルを出力する。適応符号帳８、第１の駆動音源符号帳１９または第２の駆動符号帳２０からの各時系列ベクトルは、ゲイン符号化部１０から与えられるそれぞれのゲインに応じて重み付け加算部３８で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ７へ供給され符号化音声を得る。距離計算部１１は符号化音声と入力音声Ｓ１との距離を求め、距離が最小となる適応符号、駆動符号、ゲインを探索する。以上符号化が終了した後、線形予測パラメータの符号、入力音声と符号化音声との歪みを最小にする適応符号、駆動符号，ゲインの符号を符号化結果Ｓ２として出力する。以上がこの実施の形態１の音声符号化方法に特徴的な動作である。 The first driving codebook 19 stores a plurality of non-noise time-series vectors, for example, a plurality of time-series vectors configured by learning such that distortion between the learning speech and the encoded speech is reduced. Have been. The second driving codebook 20 stores a plurality of noise-like time-series vectors, for example, a plurality of time-series vectors generated from random noise. Output the corresponding time series vector. Each time-series vector from the adaptive codebook 8, the first excitation codebook 19, or the second excitation codebook 20 is weighted by the weighting and adding unit 38 according to each gain given from the gain encoding unit 10. The result of the addition is supplied to the synthesis filter 7 as a drive excitation signal to obtain an encoded voice. The distance calculation unit 11 obtains the distance between the coded speech and the input speech S1, and searches for an adaptive code, a driving code, and a gain that minimize the distance. After the coding is completed, the code of the linear prediction parameter, the adaptive code for minimizing the distortion between the input voice and the coded voice, the driving code, and the gain code are output as the coding result S2. The above is the characteristic operation of the speech encoding method according to the first embodiment.

次に復号化部２について説明する。復号化部２では、線形予測パラメータ復号化部１２は線形予測パラメータの符号から線形予測パラメータを復号化し、合成フィルタ１３の係数として設定するとともに、雑音度評価部２６へ出力する。次に、音源情報の復号化について説明する。適応符号帳１４は、適応符号に対応して、過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２６は、前記線形予測パラメータ復号化部１２から入力された復号化した線形予測パラメータと適応符号とから符号化部１の雑音度評価部２４と同様の方法で雑音の度合いを評価し、評価結果を駆動符号帳切替部２７に出力する。駆動符号帳切替部２７は前記雑音度の評価結果に応じて、符号化部１の駆動符号帳切替部２５と同様に第１の駆動符号帳２２と第２の駆動符号帳２３とを切り替える。 Next, the decoding unit 2 will be described. In the decoding unit 2, the linear prediction parameter decoding unit 12 decodes the linear prediction parameter from the code of the linear prediction parameter, sets the decoded parameter as a coefficient of the synthesis filter 13, and outputs the coefficient to the noise evaluation unit 26. Next, decoding of the sound source information will be described. The adaptive codebook 14 outputs a time-series vector obtained by periodically repeating a past excitation signal in accordance with the adaptive code. The noise degree evaluation unit 26 evaluates the degree of noise from the decoded linear prediction parameters input from the linear prediction parameter decoding unit 12 and the adaptive code in the same manner as the noise degree evaluation unit 24 of the encoding unit 1. Then, the evaluation result is output to the driving codebook switching unit 27. The driving codebook switching unit 27 switches between the first driving codebook 22 and the second driving codebook 23 in the same manner as the driving codebook switching unit 25 of the encoding unit 1 in accordance with the noise level evaluation result.

第１の駆動符号帳２２には非雑音的な複数の時系列ベクトル、例えば学習用音声とその符号化音声との歪みが小さくなるように学習して構成された複数の時系列ベクトルが、第２の駆動符号帳２３には雑音的な複数の時系列ベクトル、例えばランダム雑音から生成した複数の時系列ベクトルが記憶されており、それぞれ駆動符号に対応した時系列ベクトルを出力する。適応符号帳１４と第１の駆動符号帳２２または第２の駆動符号帳２３からの時系列ベクトルは、ゲイン復号化部１６でゲインの符号から復号化したそれぞれのゲインに応じて重み付け加算部３９で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ１３へ供給され出力音声Ｓ３が得られる。以上がこの実施の形態１の音声復号化方法に特徴的な動作である。 The first driving codebook 22 includes a plurality of non-noise time-series vectors, for example, a plurality of time-series vectors configured by learning such that distortion between the learning speech and the encoded speech is reduced. The second drive codebook 23 stores a plurality of noise-like time-series vectors, for example, a plurality of time-series vectors generated from random noise, and outputs a time-series vector corresponding to each drive code. The time series vectors from the adaptive codebook 14 and the first driving codebook 22 or the second driving codebook 23 are weighted and added to each other by the gain decoding unit 16 in accordance with the respective gains decoded from the gain codes. , And the result of the addition is supplied to the synthesis filter 13 as a drive sound source signal to obtain an output sound S3. The above is the characteristic operation of the speech decoding method according to the first embodiment.

この実施の形態１によれば、入力音声の雑音の度合いを符号および符号化結果から評価し、評価結果に応じて異なる駆動符号帳を用いることにより、少ない情報量で、品質の高い音声を再生することができる。 According to the first embodiment, the degree of noise of the input speech is evaluated from the code and the coding result, and different driving codebooks are used in accordance with the evaluation result, thereby reproducing high quality speech with a small amount of information. can do.

また、上記実施の形態では、駆動符号帳１９，２０，２２，２３には、複数の時系列ベクトルが記憶されている場合を説明したが、少なくとも１つの時系列ベクトルが記憶されていれば、実施可能である。 Further, in the above embodiment, the case where a plurality of time-series vectors are stored in driving codebooks 19, 20, 22, and 23 has been described. However, if at least one time-series vector is stored, It is feasible.

実施の形態２．
上述の実施の形態１では、２つの駆動符号帳を切り替えて用いているが、これに代え、３つ以上の駆動符号帳を備え、雑音の度合いに応じて切り替えて用いるとしても良い。この実施の形態２によれば、音声を雑音／非雑音の２通りだけでなく、やや雑音的であるなどの中間的な音声に対してもそれに適した駆動符号帳を用いることができるので、品質の高い音声を再生することができる。 Embodiment 2 FIG.
In Embodiment 1 described above, two driving codebooks are used by switching, but instead, three or more driving codebooks may be provided and used by switching according to the degree of noise. According to the second embodiment, it is possible to use a driving codebook suitable for not only two types of speech, that is, noise / non-noise, but also intermediate speech such as slightly noisy one. High quality audio can be reproduced.

実施の形態３．
図１との対応部分に同一符号を付けた図３は、この発明の音声符号化方法及び音声復号化方法の実施の形態３の全体構成を示し、図中２８、３０は雑音的な時系列ベクトルを格納した駆動符号帳、２９、３１は時系列ベクトルの低振幅なサンプルの振幅値を零にするサンプル間引き部である。 Embodiment 3 FIG.
FIG. 3 where the same reference numerals are assigned to corresponding parts to FIG. 1 shows the entire configuration of a speech encoding method and speech decoding method according to a third embodiment of the present invention. Driving codebooks 29 and 31 storing the vectors are sample thinning units for setting the amplitude values of low-amplitude samples of the time series vector to zero.

以下、動作を説明する。まず、符号化部１において、線形予測パラメータ分析部５は入力音声Ｓ１を分析し、音声のスペクトル情報である線形予測パラメータを抽出する。線形予測パラメータ符号化部６はその線形予測パラメータを符号化し、符号化した線形予測パラメータを合成フィルタ７の係数として設定するとともに、雑音度評価部２４へ出力する。次に、音源情報の符号化について説明する。適応符号帳８には、過去の駆動音源信号が記憶されており、距離計算部１１から入力される適応符号に対応して過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２４は、前記線形予測パラメータ符号化部６から入力された符号化した線形予測パラメータと適応符号とから、例えばスペクトルの傾斜、短期予測利得、ピッチ変動から該符号化区間の雑音の度合いを評価し、評価結果をサンプル間引き部２９に出力する。 Hereinafter, the operation will be described. First, in the encoding unit 1, the linear prediction parameter analysis unit 5 analyzes the input speech S1 and extracts a linear prediction parameter that is speech spectrum information. The linear prediction parameter coding unit 6 codes the linear prediction parameter, sets the coded linear prediction parameter as a coefficient of the synthesis filter 7, and outputs the coefficient to the noise evaluation unit 24. Next, encoding of the sound source information will be described. The adaptive codebook 8 stores past driving excitation signals, and outputs a time-series vector obtained by periodically repeating the past driving excitation signals corresponding to the adaptive code input from the distance calculation unit 11. The noise degree evaluator 24 uses the coded linear prediction parameter input from the linear prediction parameter encoder 6 and the adaptive code to calculate, for example, the noise of the coding section from the spectrum slope, short-term prediction gain, and pitch fluctuation. The degree is evaluated, and the evaluation result is output to the sample thinning unit 29.

駆動符号帳２８には、例えばランダム雑音から生成した複数の時系列ベクトルが記憶されており、距離計算部１１から入力される駆動符号に対応した時系列ベクトルを出力する。サンプル間引き部２９は、前記雑音度の評価結果に応じて、雑音度が低ければ前記駆動符号帳２８から入力された時系列ベクトルに対して、例えば所定の振幅値に満たないサンプルの振幅値を零にした時系列ベクトルを出力し、また、雑音度が高ければ前記駆動符号帳２８から入力された時系列ベクトルをそのまま出力する。適応符号帳８、サンプル間引き部２９からの各時系列ベクトルは、ゲイン符号化部１０から与えられるそれぞれのゲインに応じて重み付け加算部３８で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ７へ供給され符号化音声を得る。距離計算部１１は符号化音声と入力音声Ｓ１との距離を求め、距離が最小となる適応符号、駆動符号、ゲインを探索する。以上符号化が終了した後、線形予測パラメータの符号、入力音声と符号化音声との歪みを最小にする適応符号、駆動符号，ゲインの符号を符号化結果Ｓ２として出力する。以上がこの実施の形態３の音声符号化方法に特徴的な動作である。 The drive codebook 28 stores, for example, a plurality of time-series vectors generated from random noise, and outputs a time-series vector corresponding to the drive code input from the distance calculation unit 11. According to the noise level evaluation result, if the noise level is low, the sample decimation unit 29 converts the time series vector input from the driving codebook 28 into, for example, the amplitude value of a sample less than a predetermined amplitude value. The time series vector set to zero is output. If the noise level is high, the time series vector input from the driving codebook 28 is output as it is. The time series vectors from the adaptive codebook 8 and the sample thinning unit 29 are weighted and added by the weighting and adding unit 38 according to the respective gains given from the gain coding unit 10, and the added result is used as a drive excitation signal. The encoded voice is supplied to the synthesis filter 7 and is obtained. The distance calculation unit 11 obtains the distance between the coded speech and the input speech S1, and searches for an adaptive code, a driving code, and a gain that minimize the distance. After the coding is completed, the code of the linear prediction parameter, the adaptive code for minimizing the distortion between the input voice and the coded voice, the driving code, and the gain code are output as the coding result S2. The above is the characteristic operation of the speech encoding method according to the third embodiment.

次に復号化部２について説明する。復号化部２では、線形予測パラメータ復号化部１２は線形予測パラメータの符号から線形予測パラメータを復号化し、合成フィルタ１３の係数として設定するとともに、雑音度評価部２６へ出力する。次に、音源情報の復号化について説明する。適応符号帳１４は、適応符号に対応して、過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２６は、前記線形予測パラメータ復号化部１２から入力された復号化した線形予測パラメータと適応符号とから符号化部１の雑音度評価部２４と同様の方法で雑音の度合いを評価し、評価結果をサンプル間引き部３１に出力する。 Next, the decoding unit 2 will be described. In the decoding unit 2, the linear prediction parameter decoding unit 12 decodes the linear prediction parameter from the code of the linear prediction parameter, sets the decoded parameter as a coefficient of the synthesis filter 13, and outputs the coefficient to the noise evaluation unit 26. Next, decoding of the sound source information will be described. The adaptive codebook 14 outputs a time-series vector obtained by periodically repeating a past excitation signal in accordance with the adaptive code. The noise degree evaluation unit 26 evaluates the degree of noise from the decoded linear prediction parameters input from the linear prediction parameter decoding unit 12 and the adaptive code in the same manner as the noise degree evaluation unit 24 of the encoding unit 1. Then, the evaluation result is output to the sample thinning unit 31.

駆動符号帳３０は駆動符号に対応した時系列ベクトルを出力する。サンプル間引き部３１は、前記雑音度評価結果に応じて、前記符号化部１のサンプル間引き部２９と同様の処理により時系列ベクトルを出力する。適応符号帳１４、サンプル間引き部３１からの各時系列ベクトルは、ゲイン復号化部１６から与えられるそれぞれのゲインに応じて重み付け加算部３９で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ１３へ供給され出力音声Ｓ３が得られる。 The driving codebook 30 outputs a time-series vector corresponding to the driving code. The sample thinning unit 31 outputs a time-series vector by the same processing as that of the sample thinning unit 29 of the encoding unit 1 according to the noise degree evaluation result. The time series vectors from the adaptive codebook 14 and the sample decimating unit 31 are weighted and added by a weighting and adding unit 39 according to the respective gains provided from the gain decoding unit 16, and the result of the addition is used as a driving excitation signal. The output sound S3 supplied to the synthesis filter 13 is obtained.

この実施の形態３によれば、雑音的な時系列ベクトルを格納している駆動符号帳を備え、音声の雑音性の度合いの評価結果に応じて、駆動音源の信号サンプルを間引くことにより雑音性の度合いが低い駆動音源を生成することにより、少ない情報量で、品質の高い音声を再生することができる。また、複数の駆動符号帳を備える必要がないので、駆動符号帳の記憶に要するメモリ量を少なくする効果もある。 According to the third embodiment, a driving codebook that stores a noise-like time-series vector is provided, and a signal sample of a driving sound source is thinned out according to the evaluation result of the degree of noise of the speech. By generating a driving sound source with a low degree of sound, high-quality sound can be reproduced with a small amount of information. Further, since there is no need to provide a plurality of driving codebooks, there is also an effect of reducing the amount of memory required for storing the driving codebooks.

実施の形態４．
上述の実施の形態３では、時系列ベクトルのサンプルを間引く／間引かないの２通りとしているが、これに代え、雑音の度合いに応じてサンプルを間引く際の振幅閾値を変更するとしても良い。この実施の形態４によれば、音声を雑音／非雑音の２通りだけでなく、やや雑音的であるなどの中間的な音声に対してもそれに適した時系列ベクトルを生成し、用いることができるので、品質の高い音声を再生することができる。 Embodiment 4 FIG.
In the above-described third embodiment, the time series vector samples are decimated / not decimated. Alternatively, the amplitude threshold value for decimating the samples may be changed according to the degree of noise. According to the fourth embodiment, it is possible to generate and use a time-series vector suitable not only for two kinds of speech, that is, noise / non-noise but also for an intermediate speech such as a little noise. As a result, high-quality sound can be reproduced.

実施の形態５．
図１との対応部分に同一符号を付けた図４は、この発明の音声符号化方法及び音声復号化方法の実施の形態５の全体構成を示し、図中３２、３５は雑音的な時系列ベクトルを記憶している第１の駆動符号帳、３３、３６は非雑音的な時系列ベクトルを記憶している第２の駆動符号帳、３４、３７は重み決定部である。 Embodiment 5 FIG.
FIG. 4 in which parts corresponding to those in FIG. 1 are assigned the same reference numerals shows the overall configuration of a fifth embodiment of the speech encoding method and the speech decoding method of the present invention. The first driving codebooks 33 and 36 storing vectors are second driving codebooks storing non-noise time-series vectors, and 34 and 37 are weight determining units.

以下、動作を説明する。まず、符号化部１において、線形予測パラメータ分析部５は入力音声Ｓ１を分析し、音声のスペクトル情報である線形予測パラメータを抽出する。線形予測パラメータ符号化部６はその線形予測パラメータを符号化し、符号化した線形予測パラメータを合成フィルタ７の係数として設定するとともに、雑音度評価部２４へ出力する。次に、音源情報の符号化について説明する。適応符号帳８には、過去の駆動音源信号が記憶されており、距離計算部１１から入力される適応符号に対応して過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２４は、前記線形予測パラメータ符号化部６から入力された符号化した線形予測パラメータと適応符号とから、例えばスペクトルの傾斜、短期予測利得、ピッチ変動から該符号化区間の雑音の度合いを評価し、評価結果を重み決定部３４に出力する。 Hereinafter, the operation will be described. First, in the encoding unit 1, the linear prediction parameter analysis unit 5 analyzes the input speech S1 and extracts a linear prediction parameter that is speech spectrum information. The linear prediction parameter coding unit 6 codes the linear prediction parameter, sets the coded linear prediction parameter as a coefficient of the synthesis filter 7, and outputs the coefficient to the noise evaluation unit 24. Next, encoding of the sound source information will be described. The adaptive codebook 8 stores past driving excitation signals, and outputs a time-series vector obtained by periodically repeating the past driving excitation signals corresponding to the adaptive code input from the distance calculation unit 11. The noise degree evaluator 24 uses the coded linear prediction parameter input from the linear prediction parameter encoder 6 and the adaptive code to calculate, for example, the noise of the coding section from the spectrum slope, short-term prediction gain, and pitch fluctuation. The degree is evaluated, and the evaluation result is output to the weight determining unit 34.

第１の駆動符号帳３２には、例えばランダム雑音から生成した複数の雑音的な時系列ベクトルが記憶されており、駆動符号に対応した時系列ベクトルを出力する。第２の駆動符号帳３３には、例えば学習用音声とその符号化音声との歪みが小さくなるように学習して構成された複数の時系列ベクトルが記憶されており、距離計算部１１から入力される駆動符号に対応した時系列ベクトルを出力する。重み決定部３４は前記雑音度評価部２４から入力された雑音度の評価結果に応じて、例えば図５に従って、第１の駆動符号帳３２からの時系列ベクトルと第２の駆動符号帳３３からの時系列ベクトルに与える重みを決定する。第１の駆動符号帳３２、第２の駆動符号帳３３からの各時系列ベクトルは上記重み決定部３４から与えられる重みに応じて重み付けして加算される。適応符号帳８から出力された時系列ベクトルと、前記重み付け加算して生成された時系列ベクトルはゲイン符号化部１０から与えられるそれぞれのゲインに応じて重み付け加算部３８で重み付けして加算され、その加算結果を駆動音源信号として合成フィルタ７へ供給し符号化音声を得る。距離計算部１１は符号化音声と入力音声Ｓ１との距離を求め、距離が最小となる適応符号、駆動符号、ゲインを探索する。この符号化が終了した後、線形予測パラメータの符号、入力音声と符号化音声との歪みを最小にする適応符号、駆動符号、ゲインの符号を符号化結果として出力する。 The first driving codebook 32 stores, for example, a plurality of noise-like time-series vectors generated from random noise, and outputs a time-series vector corresponding to the driving code. The second driving codebook 33 stores, for example, a plurality of time-series vectors configured by learning such that distortion between the learning speech and the encoded speech is reduced, and is input from the distance calculation unit 11. A time series vector corresponding to the driving code to be output is output. The weight determination unit 34 determines the time series vector from the first drive codebook 32 and the second drive codebook 33 from the second drive codebook 33 according to, for example, FIG. The weight given to the time series vector of is determined. Each time-series vector from the first drive codebook 32 and the second drive codebook 33 is weighted and added according to the weight given from the weight determination unit 34. The time series vector output from the adaptive codebook 8 and the time series vector generated by the weighted addition are weighted and added by the weighting and adding unit 38 according to the respective gains given from the gain coding unit 10, The result of the addition is supplied to the synthesis filter 7 as a drive excitation signal to obtain an encoded voice. The distance calculation unit 11 obtains the distance between the coded speech and the input speech S1, and searches for an adaptive code, a driving code, and a gain that minimize the distance. After this coding is completed, the code of the linear prediction parameter, the adaptive code for minimizing the distortion between the input voice and the coded voice, the driving code, and the gain code are output as the coding result.

次に復号化部２について説明する。復号化部２では、線形予測パラメータ復号化部１２は線形予測パラメータの符号から線形予測パラメータを復号化し、合成フィルタ１３の係数として設定するとともに、雑音度評価部２６へ出力する。次に、音源情報の復号化について説明する。適応符号帳１４は、適応符号に対応して、過去の駆動音源信号を周期的に繰り返した時系列ベクトルを出力する。雑音度評価部２６は、前記線形予測パラメータ復号化部１２から入力された復号化した線形予測パラメータと適応符号とから符号化部１の雑音度評価部２４と同様の方法で雑音の度合いを評価し、評価結果を重み決定部３７に出力する。 Next, the decoding unit 2 will be described. In the decoding unit 2, the linear prediction parameter decoding unit 12 decodes the linear prediction parameter from the code of the linear prediction parameter, sets the decoded parameter as a coefficient of the synthesis filter 13, and outputs the coefficient to the noise evaluation unit 26. Next, decoding of the sound source information will be described. The adaptive codebook 14 outputs a time-series vector obtained by periodically repeating a past excitation signal in accordance with the adaptive code. The noise degree evaluation unit 26 evaluates the degree of noise from the decoded linear prediction parameters input from the linear prediction parameter decoding unit 12 and the adaptive code in the same manner as the noise degree evaluation unit 24 of the encoding unit 1. Then, the evaluation result is output to the weight determination unit 37.

第１の駆動符号帳３５および第２の駆動符号帳３６は駆動符号に対応した時系列ベクトルを出力する。重み決定部３７は前記雑音度評価部２６から入力された雑音度評価結果に応じて、符号化部１の重み決定部３４と同様に重みを与えるとする。第１の駆動符号帳３５、第２の駆動符号帳３６からの各時系列ベクトルは上記重み決定部３７から与えれるそれぞれの重みに応じて重み付けして加算される。適応符号帳１４から出力された時系列ベクトルと、前記重み付け加算して生成された時系列ベクトルは、ゲイン復号化部１６でゲインの符号から復号化したそれぞれのゲインに応じて重み付け加算部３９で重み付けして加算され、その加算結果が駆動音源信号として合成フィルタ１３へ供給され出力音声Ｓ３が得られる。 The first drive codebook 35 and the second drive codebook 36 output a time-series vector corresponding to the drive code. It is assumed that the weight determining unit 37 gives weights in the same manner as the weight determining unit 34 of the encoding unit 1 in accordance with the noise evaluation result input from the noise evaluation unit 26. The respective time-series vectors from the first driving codebook 35 and the second driving codebook 36 are weighted and added according to the respective weights given from the weight determining unit 37. The time series vector output from the adaptive codebook 14 and the time series vector generated by the weighted addition are combined by the weight addition section 39 in accordance with the respective gains decoded from the gain codes by the gain decoding section 16. The weighted addition is performed, and the result of the addition is supplied to the synthesis filter 13 as a driving sound source signal to obtain an output sound S3.

この実施の形態５によれば、音声の雑音の度合いを符号および符号化結果から評価し、評価結果に応じて雑音的な時系列ベクトルと非雑音的な時系列ベクトルを重み付き加算して用いることにより、少ない情報量で、品質の高い音声を再生することができる。 According to the fifth embodiment, the degree of speech noise is evaluated from the code and the coding result, and a noise-like time-series vector and a non-noise time-series vector are weighted and used in accordance with the evaluation result. Thus, high-quality sound can be reproduced with a small amount of information.

実施の形態６．
上述の実施の形態１〜５でさらに、雑音の度合いの評価結果に応じてゲインの符号帳を変更するとしても良い。この実施の形態６によれば、駆動符号帳に応じて最適なゲインの符号帳を用いることができるので、品質の高い音声を再生することができる。 Embodiment 6 FIG.
In the above-described first to fifth embodiments, the codebook of the gain may be changed according to the evaluation result of the degree of noise. According to the sixth embodiment, a codebook having an optimum gain can be used according to the driving codebook, so that high-quality sound can be reproduced.

実施の形態７．
上述の実施の形態１〜６では、音声の雑音の度合いを評価し、その評価結果に応じて駆動符号帳を切り替えているが、有声の立ち上がりや破裂性の子音などをそれぞれ判定、評価し、その評価結果に応じて駆動符号帳を切り替えても良い。この実施の形態７によれば、音声の雑音的な状態だけでなく、有声の立ち上がりや破裂性子音などさらに細かく分類し、それぞれに適した駆動符号帳を用いることができるので、品質の高い音声を再生することができる。 Embodiment 7 FIG.
In the above-described first to sixth embodiments, the degree of noise in speech is evaluated, and the driving codebook is switched according to the evaluation result. However, voiced rising and bursting consonants are determined and evaluated, respectively. The driving codebook may be switched according to the evaluation result. According to the seventh embodiment, not only the noise state of the voice but also voiced rising and burst consonants can be further finely classified, and a driving codebook suitable for each voice can be used. Can be played.

実施の形態８．
上述の実施の形態１〜６では、図２に示すスペクトル傾斜、短期予測利得、ピッチ変動から、符号化区間の雑音の度合いを評価しているが、適応符号帳出力に対するゲイン値の大小を用いて評価しても良い。 Embodiment 8 FIG.
In the above-described first to sixth embodiments, the degree of noise in the coding section is evaluated based on the spectrum tilt, the short-term prediction gain, and the pitch fluctuation shown in FIG. 2, but the magnitude of the gain value for the adaptive codebook output is used. May be evaluated.

この発明による音声符号化及び音声復号化装置の実施の形態１の全体構成を示すブロック図である。FIG. 1 is a block diagram illustrating an overall configuration of a speech encoding and speech decoding apparatus according to a first embodiment of the present invention. 図１の実施の形態１における雑音の度合い評価の説明に供する表である。3 is a table provided for describing noise degree evaluation according to Embodiment 1 in FIG. 1. この発明による音声符号化及び音声復号化装置の実施の形態３の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of Embodiment 3 of the audio coding and audio decoding apparatus by this invention. この発明による音声符号化及び音声復号化装置の実施の形態５の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of Embodiment 5 of the audio coding and audio decoding apparatus by this invention. 図４の実施の形態５における重み付け決定処理の説明に供する略線図である。FIG. 15 is a schematic diagram used for describing a weight determination process according to the fifth embodiment of FIG. 4. 従来のＣＥＬＰ音声符号化復号化装置の全体構成を示すブロック図である。FIG. 11 is a block diagram illustrating the overall configuration of a conventional CELP speech encoding / decoding device. 従来の改良されたＣＥＬＰ音声符号化復号化装置の全体構成を示すブロック図である。FIG. 10 is a block diagram showing the overall configuration of a conventional improved CELP speech encoding / decoding device.

Claims

In a code-driven linear prediction (CELP) speech decoding device that receives speech code including gain code and synthesizes speech,
A gain decoding unit that inputs the gain code and decodes the audio gain in the decoding section according to the input gain code,
Using the magnitude of the value of the gain decoded by the gain decoding unit, a noise degree evaluation unit that evaluates the degree of the degree of noise of speech in the decoding section,
A drive codebook that stores a plurality of time-series vectors corresponding to a plurality of predetermined drive excitation signals,
A time-series vector is input from the driving codebook, and the noise level of the time-series vector output from the driving codebook according to the evaluation result of the degree of noise of the speech evaluated by the noise evaluation unit is determined. the degree is changed and a noise level controller you output time-series vector with varying degrees of noise resistance,
The time series vector output from the driving codebook and the noise degree is changed by the noise degree control unit and the gain decoded by the gain decoding unit are input, and the noise output from the driving codebook is output from the driving codebook. A weight adder for weighting the time-series vector whose degree of noise has been changed by the degree controller using the gain.

In a code-driven linear prediction (CELP) speech decoding device that inputs a speech code including a code of a linear prediction parameter, an adaptive code, a drive code, and a gain code and synthesizes speech,
An adaptive codebook that stores a past driving excitation signal, inputs an adaptive code, and outputs a time-series vector corresponding to the past driving excitation signal according to the input adaptive code,
A drive codebook that stores a plurality of time-series vectors corresponding to a plurality of predetermined drive excitation signals, inputs a drive code, and outputs a time-series vector corresponding to the drive excitation signal according to the input drive code. ,
A gain decoding unit that inputs a gain code and decodes a gain of a voice in a decoding section according to the input gain code,
The gain decoding unit inputs the decoded gain, a noise level evaluation unit that evaluates the degree of the noise level of the voice in the decoding section using the level of the input gain value,
The evaluation result of the degree of noise of the speech evaluated by the noise degree evaluation unit and the time series vector output from the driving codebook are input, and the noise level of the speech evaluated by the noise degree evaluation unit is input. depending on the magnitude degree of evaluation results by changing the degree of noise of time-series vector outputted from the driving codebook, a noise level controller you output time-series vector with varying degrees of noise of ,
The time series vector output from the adaptive codebook, the time series vector output from the driving codebook and the degree of noise is changed by the noise degree control unit, and the gain decoded by the gain decoding unit. Input, weighting the time series vector output from the adaptive codebook and the time series vector output from the driving codebook and the degree of noise is changed by the noise degree control unit using the gain, A weighted addition unit that adds the two time-series vectors weighted using the gain, and outputs an addition result;
A linear prediction parameter decoding unit that receives the sign of the linear prediction parameter, decodes the linear prediction parameter from the input sign of the linear prediction parameter, and outputs the linear prediction parameter;
A synthesis filter that inputs the linear prediction parameter output from the linear prediction parameter decoding unit and the addition result output from the weighting addition unit, and synthesizes a voice using the linear prediction parameter and the addition result. An audio decoding device, comprising:

In a code-driven linear prediction (CELP) speech decoding method for inputting a speech code including a gain code and synthesizing speech,
The above-mentioned gain code is input, and the voice gain in the decoding section is decoded according to the input gain code,
Using the magnitude of the decoded gain value, the magnitude of the degree of noise of the speech in the decoding section is evaluated,
A plurality of time-series vectors corresponding to a plurality of predetermined drive excitation signals are stored in the drive codebook,
Enter the time-series vector from the driving codebook, according to the evaluation result of the magnitude of the degree of the noise of the sound by varying the degree of noise of time-series vector outputted from the driving codebook, the degree of noise resistance Output a time-series vector with
The time series vector output from the driving codebook and the degree of noise is changed and the gain are input, and the time series vector output from the driving codebook and the degree of noise is changed and the gain is converted to the gain. A speech decoding method characterized by using and weighting.

In a code-driven linear prediction (CELP) speech decoding method for inputting a speech code including a code of a linear prediction parameter, an adaptive code, a drive code, and a gain code and synthesizing speech,
Input an adaptive code to the adaptive codebook that stores the past drive excitation signal, and output a time-series vector corresponding to the past drive excitation signal from the adaptive codebook according to the input adaptive code,
A plurality of time-series vectors corresponding to a plurality of predetermined drive excitation signals are stored in the drive codebook,
A drive code is input to the drive codebook, and a time-series vector corresponding to the drive excitation signal is output from the drive codebook according to the input drive code,
A gain code is input, and the gain of the audio in the decoding section is decoded using the input gain code,
The decoded gain is input, and the magnitude of the degree of noise of the speech in the decoding section is evaluated using the magnitude of the input gain value,
The evaluation result of the degree of noise of speech and the time-series vector output from the driving codebook are input, and are output from the driving codebook according to the evaluation result of the degree of noise of the speech. Change the degree of noise of the time series vector, and output a time series vector with the degree of noise changed,
The time series vector output from the adaptive codebook, the time series vector output from the driving codebook and changed in the degree of noise, and the decoded gain are input and output from the adaptive codebook. The time series vector and the time series vector output from the driving codebook and having a changed degree of noise are weighted using the gain, and the two time series vectors weighted using the gain are added and added. Output the result,
Input the sign of the linear prediction parameter, decode and output the linear prediction parameter from the input sign of the linear prediction parameter,
A speech decoding method comprising: inputting the output linear prediction parameter and the output result, and synthesizing a speech using the linear prediction parameter and the addition result.