JP2797348B2

JP2797348B2 - Audio encoding / decoding device

Info

Publication number: JP2797348B2
Application number: JP63299822A
Authority: JP
Inventors: 茂細井; 好男佐藤; 光一本間
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1988-11-28
Filing date: 1988-11-28
Publication date: 1998-09-17
Anticipated expiration: 2013-09-17
Also published as: JPH02146100A

Description

【発明の詳細な説明】産業上の利用分野本発明はディジタル通信、ボイスメール等に利用する
音声符号化・復号化装置に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech encoding / decoding apparatus used for digital communication, voice mail, and the like.

従来の技術第６図（ａ）は、従来の音声符号化装置を示し、第６
図（ｂ）は、従来の音声復号化装置を示す。2. Description of the Related Art FIG. 6 (a) shows a conventional speech coding apparatus, and FIG.
FIG. 1B shows a conventional speech decoding apparatus.

第６図（ａ）において、14は、リニアPCMによりA/D変
換された音声信号により予測誤差と短期／長期予測フィ
ルタ係数を求める予測器、15は、第７図に示すように、
一定長の複数の信号列（代表ベクトル）が予め格納され
たコードブック15aと、予測器14からの予測誤差をベク
トル量子化して代表ベクトルの番号を出力するベクトル
量子化器15bを備えた量子化器、16は、予測器14からの
短期／長期予測フィルタ係数の量子化値と、量子化器15
からの代表ベクトルの番号を多重化する多重化器であ
る。In FIG. 6 (a), 14 is a predictor for calculating a prediction error and a short-term / long-term prediction filter coefficient from the audio signal A / D converted by the linear PCM, and 15 is a predictor, as shown in FIG.
A codebook 15a in which a plurality of signal sequences (representative vectors) of a predetermined length are stored in advance, and a vector quantizer 15b for vector-quantizing the prediction error from the predictor 14 and outputting the number of the representative vector , A quantization value of the short-term / long-term prediction filter coefficient from the predictor 14 and a quantizer 15
Is a multiplexer for multiplexing the numbers of representative vectors from.

第６図（ｂ）において、17は、上記符号器からの短期
予測フィルタ係数及び長期予測フィルタ係数と、代表ベ
クトルの番号を分離する分離器、18は、符号器のコード
ブック15aと同一のコードブック（不図示）を備え、分
離器17からの番号に対応する代表ベクトルを予測誤差と
して出力する逆量子化器、19は、短期予測フィルタ係数
及び長期予測フィルタ係数と逆量子化器18からの代表ベ
クトルにより音声信号を合成する合成器である。In FIG. 6 (b), 17 is a separator for separating the short-term prediction filter coefficient and the long-term prediction filter coefficient from the encoder and the number of the representative vector, and 18 is the same code as the codebook 15a of the encoder. A dequantizer 19 having a book (not shown) and outputting a representative vector corresponding to the number from the separator 17 as a prediction error; 19 is a short-term prediction filter coefficient and a long-term prediction filter coefficient; This is a synthesizer that synthesizes an audio signal using a representative vector.

次に、上記従来例の動作を説明する。 Next, the operation of the above conventional example will be described.

第６図（ａ）において、リニアPCMによりA/D変換され
た音声信号が入力すると、予測器14では、音声信号の近
接サンプル値間の相関を除去するために短期予測フィル
タ係数を求め、この短期予測フィルタ係数により短期予
測誤差を求める。In FIG. 6 (a), when an audio signal that has been A / D-converted by the linear PCM is input, the predictor 14 calculates a short-term prediction filter coefficient in order to remove a correlation between adjacent sample values of the audio signal. A short-term prediction error is obtained using a short-term prediction filter coefficient.

更に、音声信号の音源ピッチの周期的な相関を除去す
るために、短期予想誤差により長期予測フィルタ係数を
求め、この長期予測フィルタ係数により予測誤差を求め
る。Further, in order to remove the periodic correlation of the sound source pitch of the audio signal, a long-term prediction filter coefficient is obtained from the short-term prediction error, and a prediction error is obtained from the long-term prediction filter coefficient.

量子化器15では、この予測誤差の信号列とコードブッ
ク15aの各代表ベクトルの２乗距離を計算し、その値が
最も小さい代表ベクトルの番号を量子化値として出力す
る。The quantizer 15 calculates the square distance between the signal sequence of the prediction error and each representative vector of the codebook 15a, and outputs the number of the representative vector having the smallest value as a quantization value.

したがって、多重化器16からは、音声信号が短期予測
フィルタ係数及び長期予測フィルタ係数と、代表ベクト
ルの番号に圧縮されたデータとして復号器に送出され
る。Therefore, the audio signal is transmitted from the multiplexer 16 to the decoder as data compressed to the short-term prediction filter coefficient and the long-term prediction filter coefficient and the representative vector number.

第６図（ｂ）において、合成器19は、符号器の予測器
14のフィルタと逆特性のフィルタを備えており、したが
って、符号器からの短期／長期予測フィルタ係数に応じ
たフィルタにより代表ベクトルを音声信号に復号するこ
とができる。In FIG. 6B, the synthesizer 19 is a predictor of the encoder.
A filter having inverse characteristics to the fourteen filters is provided. Therefore, the representative vector can be decoded into a speech signal by a filter corresponding to the short-term / long-term prediction filter coefficient from the encoder.

発明が解決しようとする課題しかしながら、上記従来の音声符号化装置と音声復号
化装置では、有声音、無声音、無音等の音声の特性にか
かわらず同じ処理を行うので、低ビットレートで音声符
号化する場合、音声の特性に応じて符号化しないので、
復号された音声の品質が良好でなく、符号化効率を向上
することができないという問題点がある。However, the above-described conventional speech coding apparatus and speech decoding apparatus perform the same processing irrespective of the characteristics of voice such as voiced voice, unvoiced voice, and silent voice. Is not encoded according to the characteristics of the audio,
There is a problem that the quality of the decoded speech is not good and the coding efficiency cannot be improved.

本発明はこのような問題点を解決するものであり、符
号化効率を向上することができる音声符号化・復号化装
置を提供することを目的とする。The present invention solves such a problem, and an object of the present invention is to provide a speech encoding / decoding device capable of improving encoding efficiency.

課題を解決するための手段本発明の音声符号化装置は、上記目的を達成するため
に、音声信号を有声音定常部と、有声音過渡部と無声音
等に分類し、有声音定常部を時間軸上に圧縮し、有声音
過渡部と、圧縮された有声音定常部の予測誤差を出力
し、無声音等と、有声音定常部と有声音過渡部の予測誤
差を量子化するようにしたものである。Means for Solving the Problems In order to achieve the above object, a speech coding apparatus of the present invention classifies a speech signal into a voiced sound stationary part, a voiced sound transient part, an unvoiced sound, and the like. Outputs the prediction error of the voiced transient part and the compressed voiced stationary part which are compressed on the axis, and quantizes the unvoiced sound and the prediction error of the voiced stationary part and the voiced transient part. It is.

また、本発明の音声復号化装置は、上記目的を達成す
るために、無声音等と、有声音定常部と有声音過渡部の
量子化値を逆量子化し、この逆量子化された有声音定常
部と有声音過渡部を音声信号に合成し、この合成された
有声音定常部を時間軸上で伸張するようにしたものであ
る。Further, in order to achieve the above object, the speech decoding apparatus of the present invention inversely quantizes the unvoiced sound and the like and the quantized values of the voiced stationary part and the voiced transient part. The voiced sound transient part is synthesized with the voice signal, and the synthesized voiced sound stationary part is expanded on the time axis.

作用本発明は上記構成により、有声音、無声音、無音等の
音声の特性に応じて量子化することができるので、復号
された音声の品質が良好となり、また、符号化効率を向
上することができる。Operation The present invention can quantize according to the characteristics of voices such as voiced voices, unvoiced voices, and voiceless voices according to the above configuration, so that the quality of decoded voices is good and the coding efficiency can be improved. it can.

実施例以下、図面を参照して本発明の実施例を説明する。第
１図（ａ）は、本発明に係る音声符号化装置の一実施例
を示すブロック図、第１図（ｂ）は、本発明に係る音声
復号化装置の一実施例を示すブロック図、第２図は、第
１図（ａ）の予測器を示す詳細なブロック図、第３図
は、第１図（ａ）の量子化器を示す詳細なブロック図、
第４図は、一般的な音声信号の波形図、第５図は、第１
図（ａ）の分類器の動作を説明するためのフローチャー
トである。Embodiments Hereinafter, embodiments of the present invention will be described with reference to the drawings. FIG. 1 (a) is a block diagram showing an embodiment of a speech encoding device according to the present invention, FIG. 1 (b) is a block diagram showing an embodiment of a speech decoding device according to the present invention, FIG. 2 is a detailed block diagram showing the predictor of FIG. 1 (a), FIG. 3 is a detailed block diagram showing the quantizer of FIG. 1 (a),
FIG. 4 is a waveform diagram of a general audio signal, and FIG.
5 is a flowchart for explaining the operation of the classifier of FIG.

第１図（ａ）において、１は、後述するように、入力
した音声信号の特性を分析し、有声音定常部、有声音過
渡部、無声音、無音に分類する分類器、２は、分類器２
からの判定データにより音声信号の有声音定常部の類似
波形を間引き、時間軸上で圧縮する時間軸圧縮器であ
る。In FIG. 1 (a), reference numeral 1 denotes a classifier for analyzing characteristics of an input audio signal and classifying the characteristics into a voiced sound stationary portion, a voiced sound transient portion, unvoiced sound, and no sound, as described later. 2
This is a time axis compressor that thins out a similar waveform of the voiced stationary part of the audio signal based on the determination data from the, and compresses it on the time axis.

３は、音声信号の有声音により、その予測誤差と短期
／長期予測フィルタ係数を求める予測器であり、予測器
３は、第２図に示すように、有声音により短期予測誤差
を求めるための短期予測フィルタ21と、有声音の近接サ
ンプル値間の相関を除去するために短期予測フィルタ21
の短期予測フィルタ係数を求める短期予測分析器22と、
短期予測フィルタ21からの短期予測誤差により予測誤差
を求めるための長期予測フィルタ23と、有声音の音源ピ
ッチの周期的な相関を除去するために、短期予測誤差に
より長期予測フィルタ23の長期予測フィルタ係数を求め
る長期予測分析器（ピッチ予測器）24と、短期予測フィ
ルタ係数と長期予測フィルタ係数をそれぞれ量子化する
量子化器25より構成されている。Reference numeral 3 denotes a predictor for obtaining a prediction error and a short-term / long-term prediction filter coefficient based on the voiced sound of the audio signal. As shown in FIG. The short-term prediction filter 21 and a short-term prediction filter 21 for removing a correlation between adjacent sample values of voiced sounds.
A short-term prediction analyzer 22 for determining a short-term prediction filter coefficient of
A long-term prediction filter 23 for obtaining a prediction error based on a short-term prediction error from the short-term prediction filter 21 and a long-term prediction filter of the long-term prediction filter 23 using a short-term prediction error to remove a periodic correlation between voiced sound source pitches. It is composed of a long-term prediction analyzer (pitch predictor) 24 for obtaining coefficients, and a quantizer 25 for quantizing the short-term prediction filter coefficients and the long-term prediction filter coefficients, respectively.

４は、予測器３からの予測誤差を例えばベクトル量子
化する量子化器であり、量子化器４は、第３図に示すよ
うに、有声音定常部用、有声音過渡部用、無声音用、無
音用の代表ベクトルがそれぞれ格納されたコードブック
31〜34と、分類器１により分類された音声信号の判定デ
ータによりコードブック31〜34の１つを選択し、予測器
３からの予測誤差の信号列と当該コードブックの各代表
ベクトルの２乗距離を計算し、その値が最も小さい代表
ベクトルの番号を量子化値として出力するベクトル量子
化器35より構成されている。Numeral 4 is a quantizer for vector-quantizing the prediction error from the predictor 3, for example, and as shown in FIG. 3, the quantizer 4 is for a voiced stationary part, a voiced transient part, and an unvoiced sound. , A codebook containing the representative vectors for silence
One of the codebooks 31 to 34 is selected based on the determination data of the audio signals classified by the classifier 1 and the codebooks 31 to 34, and the signal sequence of the prediction error from the predictor 3 and the two representative vectors of the codebook are selected. It is composed of a vector quantizer 35 that calculates the squared distance and outputs the number of the representative vector having the smallest value as a quantization value.

５は、分類器１により分類された音声信号の判定デー
タと、予測器３からの短期／長期予測フィルタ係数の量
子化値と、量子化器４からの代表ベクトルの番号を多重
化する多重化器、10、11はそれぞれ、分類器１により分
類された音声の特性に応じて切り替えられるスイッチで
ある。Reference numeral 5 denotes multiplexing for multiplexing the determination data of the audio signal classified by the classifier 1, the quantized values of the short-term / long-term predicted filter coefficients from the predictor 3, and the number of the representative vector from the quantizer 4. Switches 10 and 11 are switches that are switched according to the characteristics of the sound classified by the classifier 1.

第１図（ｂ）において、符号化装置からの伝送データ
を音声信号の判定データと、短期／長期予測フィルタ係
数の量子化値と代表ベクトルの番号に分離する分離器、
７は、符号器のコードブック31〜34と同一のコードブッ
ク（不図示）を備え、分離器６からの音声信号の判定デ
ータにより、当該コードブックを選択し、分離器６から
の番号に対応する代表ベクトルを予測誤差として出力す
る逆量子化器、８は、短期予測フィルタ係数及び長期予
測フィルタ係数と逆量子化器７からの代表ベクトルによ
り、音声信号の有声音を合成する合成器、９は、音声信
号の有声音定常部を時間軸上で伸張する時間軸伸張器、
12、13はそれぞれ、分離器６からの音声信号の判定デー
タに応じて切り替えられるスイッチである。In FIG. 1 (b), a separator that separates transmission data from the encoding device into speech signal determination data, quantized values of short-term / long-term prediction filter coefficients, and representative vector numbers.
7 has the same codebook (not shown) as the codebooks 31 to 34 of the encoder, selects the codebook according to the determination data of the audio signal from the separator 6, and corresponds to the number from the separator 6. An inverse quantizer 8 for outputting a representative vector to be output as a prediction error; a synthesizer 8 for synthesizing a voiced sound of an audio signal by using the short-term prediction filter coefficient and the long-term prediction filter coefficient and the representative vector from the inverse quantizer 7; Is a time axis expander that expands the voiced stationary part of the audio signal on the time axis,
Reference numerals 12 and 13 denote switches that are switched according to the determination data of the audio signal from the separator 6.

次に、上記実施例の動作を説明する。 Next, the operation of the above embodiment will be described.

第１図（ａ）において、リニアPCMによりA/D変換され
た音声信号は、一定の時間長（フレーム）毎に符号化装
置に入力する。In FIG. 1A, an audio signal that has been A / D converted by the linear PCM is input to an encoding device at fixed time intervals (frames).

音声信号（アナログ信号）は、第４図に示すように、
無音と、音声の始まりである比較的長い無声音と、無声
音と有声音定常部の間に続く比較的短い有声音過渡部
と、有声音定常部より構成されている。The audio signal (analog signal) is, as shown in FIG.
It is composed of a silence, a relatively long unvoiced sound which is the beginning of a voice, a relatively short voiced sound transition part continuing between the unvoiced sound and the voiced sound stationary part, and a voiced sound stationary part.

この音声信号においては、（１）有声音定常部は、類
似した波形が連続して周期性を示し、サンプル間、ピッ
チ周期間で相関が強いという特性があり、（２）有声音
過渡部は、類似波形は連続しないが、サンプル間、ピッ
チ周期間で相関が強いという特性があり、（３）無声音
は、サンプル間、ピッチ周期間で相関が弱く、信号の変
化が激しいという特性があり、（４）無音は、信号の振
幅が小さいという特性がある。In this voice signal, (1) the voiced sound stationary part has a characteristic that similar waveforms continuously show periodicity and the correlation between samples and pitch periods is strong, and (2) the voiced sound transient part has , Similar waveforms are not continuous, but have a characteristic that the correlation between samples and pitch periods is strong. (3) Unvoiced sound has a characteristic that the correlation between samples and pitch period is weak and the signal changes drastically. (4) Silence has a characteristic that the signal amplitude is small.

このような音声信号が入力すると、分類器１は、第５
図に示すように、１フレーム内の波形のパワーPwを計算
し（ステップ51）、パワーPwが閾値以上か否かを判別す
る（ステップ52）。パワーPwが閾値未満の場合には、そ
のフレームを無音と判別し（ステップ53）、閾値以上の
場合には有音と判定してステップ54以下に進む。When such an audio signal is input, the classifier 1
As shown in the figure, the power Pw of the waveform in one frame is calculated (step 51), and it is determined whether the power Pw is equal to or more than a threshold (step 52). If the power Pw is less than the threshold, the frame is determined to be silent (step 53). If the power Pw is equal to or greater than the threshold, the frame is determined to be sound and the process proceeds to step 54 and below.

ステップ55では、１フレーム内の波形の自己相関係数
γ（ｉ）を計算し、計算した自己相関係数γ（ｉ）によ
り、指定されたｉにおける最大値γmaxを求め（ステッ
プ56）、この最大値γmaxが閾値以上か否かを判別する
（ステップ57）。最大値γmaxが閾値未満の場合には、
そのフレームを無声音部と判別し（ステップ58）、閾値
以上の場合には有声音と判定してステップ59以下に進
む。In step 55, the autocorrelation coefficient γ (i) of the waveform in one frame is calculated, and the maximum value γmax at the designated i is obtained from the calculated autocorrelation coefficient γ (i) (step 56). It is determined whether or not the maximum value γmax is equal to or larger than a threshold (step 57). If the maximum value γmax is less than the threshold,
The frame is determined to be an unvoiced sound part (step 58).

ステップ60では、前のフレームが有声音か否かを判定
し、NOの場合にはそのフレームを有声音過渡部と判定し
（ステップ61）、YESの場合にはステップ62以下に進
む。In step 60, it is determined whether or not the previous frame is a voiced sound. In the case of NO, the frame is determined to be a voiced sound transition section (step 61), and in the case of YES, the process proceeds to step 62 and subsequent steps.

ステップ62では、自己相関係数γ（ｉ）によりピッチ
周期Pnを計算し、前のフレームのピッチ周期Pn−１との
変化率ρを計算し（ステップ63）、変化率ρが閾値以上
か否かを判別する（ステップ64）。変化率ρが閾値未満
の場合には、そのフレームを有声音定常部と判定し、閾
値以上の場合には有声音過渡部と判定する。In step 62, the pitch period Pn is calculated from the autocorrelation coefficient γ (i), and the rate of change ρ from the pitch period Pn-1 of the previous frame is calculated (step 63). Is determined (step 64). If the rate of change ρ is less than the threshold value, the frame is determined to be a voiced sound stationary part, and if the change rate ρ is equal to or greater than the threshold value, the frame is determined to be a voiced sound transient part.

分類器１は、この音声信号の４種類の特性に応じて、
例えば２ビットの判定データをスイッチ、10、11、量子
化器４、多重化器５に出力する。The classifier 1 calculates four types of characteristics of the audio signal,
For example, it outputs 2-bit decision data to the switches, 10, 11, the quantizer 4, and the multiplexer 5.

有声音定常部と判定されたフレームにおいては、スイ
ッチ10は時間軸圧縮器２側に切り替わり、有声音過渡部
と判定したフレームにおいては、スイッチ11は予測器３
側に切り替わる。In a frame determined to be a voiced sound stationary part, the switch 10 is switched to the time axis compressor 2 side, and in a frame determined to be a voiced sound transient part, the switch 11 is switched to the predictor 3
Switch to the side.

したがって、音声信号の無音と無声音は直接量子化器
４に入力してそれぞれ無音用コードブック34、無声音用
コードブック33の代表ベクトルの番号に量子化され、有
声音過渡部は予測器３を介して量子化器４に入力して有
声音過渡部用コードブック32の代表ベクトルの番号に量
子化され、有声音定常部は時間軸圧縮器２により圧縮さ
れた後予測器３を介して量子化器４に入力して有声音定
常部用コードブック31の代表ベクトルの番号に量子化さ
れる。Accordingly, the silence and unvoiced sound of the audio signal are directly input to the quantizer 4 and quantized to the representative vector numbers of the codebook for silence 34 and the codebook 33 for unvoiced sound, respectively. The quantized signal is input to the quantizer 4 and quantized to the representative vector number of the voiced transient codebook 32. The voiced stationary part is compressed by the time axis compressor 2 and then quantized via the predictor 3. The voice signal is quantized into a representative vector number of the codebook 31 for voiced stationary part.

ここで、無音と無声音を符号化する場合には、予測器
３を用いないので、予測フィルタ係数等に割り当てられ
たビットを量子化器４のビットに割り当て、伝送される
ビット数を一定にする。Here, in the case of encoding silence and unvoiced speech, since the predictor 3 is not used, the bits allocated to the prediction filter coefficients and the like are allocated to the bits of the quantizer 4 to keep the number of transmitted bits constant. .

音声信号の判定データは、予測フィルタ係数と代表ベ
クトルの番号とともに復号器に伝送されるので、復号化
装置は元の音声信号に復号することができ、また、有声
音、無声音、無音等の特性に応じて量子化された値を復
号するので、復号された音声の品質が良好となる。The decision data of the audio signal is transmitted to the decoder together with the prediction filter coefficient and the number of the representative vector, so that the decoding device can decode the original audio signal, and also has characteristics such as voiced sound, unvoiced sound, and silence. Since the quantized value is decoded in accordance with, the quality of the decoded speech is good.

発明の効果以上説明したように、本発明の音声符号化・復号化装
置は、音声信号を入力し、１フレーム内のパワーを計算
し、そのパワーが閾値未満の場合には、そのフレームを
無音と判別し、閾値以上の場合は、有音と判定し、１フ
レーム内の自己相関係数を計算し、その値が閾値未満の
場合には、そのフレームを無声音と判別し、閾値以上の
場合には、有声音と判定し、前のフレームが有声音か否
かを判定し、有声音ではないと判断した場合、そのフレ
ームは有声音過渡部と判別し、有声音であると判断した
場合、自己相関係数によりピッチ周期を計算し、前のフ
レームのピッチ周期との変化率を計算し、変化率が閾値
未満の場合には、そのフレームを有声音定常部と判別
し、閾値以上の場合には有声音過渡部と判別する分類手
段を設けているので、音声信号を有声音定常部、有声音
過渡部、無声音、無音に正確に分類することができる。
また、コードブックを用いたベクトル量子化を行ってい
るので、効率的に少ない情報伝送での符号化を行うこと
ができる。As described above, the audio encoding / decoding device of the present invention receives an audio signal, calculates the power in one frame, and if the power is less than a threshold, converts the frame to silence. If the value is less than the threshold, the frame is determined to be voiced, and the autocorrelation coefficient within one frame is calculated.If the value is less than the threshold, the frame is determined to be unvoiced. Is determined to be a voiced sound, it is determined whether or not the previous frame is a voiced sound, and if it is determined that the frame is not a voiced sound, the frame is determined to be a voiced sound transient portion, and it is determined that the frame is a voiced sound. Calculate the pitch cycle by the autocorrelation coefficient, calculate the rate of change from the pitch cycle of the previous frame, if the rate of change is less than the threshold, determine the frame as a voiced sound stationary part, In this case, a classification means is provided to determine the voiced sound transient part. Therefore, the speech signal can be accurately classified into a voiced sound stationary part, a voiced sound transition part, an unvoiced sound, and a silent sound.
Further, since vector quantization using a codebook is performed, encoding can be performed efficiently with less information transmission.

[Brief description of the drawings]

第１図（ａ）は、本発明に係る音声符号化装置の一実施
例を示すブロック図、第１図（ｂ）は、本発明に係る音
声復号化装置の一実施例を示すブロック図、第２図は、
第１図（ａ）の予測器を示す詳細なブロック図、第３図
は、第１図（ａ）の量子化器を示す詳細なブロック図、
第４図は、一般的な音声信号の波形図、第５図は、第１
図（ａ）の分類器の動作を説明するためのフローチャー
ト、第６図（ａ）は、従来の音声符号化装置を示すブロ
ック図、第６図（ｂ）は、従来の音声復号化装置を示す
ブロック図、第７図は、第６図（ａ）の量子化器を示す
詳細なブロック図である。１……分類器、２……時間軸圧縮器、３……予測器、４
……量子化器、５……多重化器、６……分離器、７……
逆量子化器、８……合成器、９……時間軸伸張器、31…
…有声音定常部用コードブック、32……有声音過渡部用
コードブック、33……無声音用コードブック、34……無
音用コードブック。FIG. 1 (a) is a block diagram showing an embodiment of a speech encoding device according to the present invention, FIG. 1 (b) is a block diagram showing an embodiment of a speech decoding device according to the present invention, Fig. 2
FIG. 1 (a) is a detailed block diagram showing the predictor, FIG. 3 is a detailed block diagram showing the quantizer of FIG. 1 (a),
FIG. 4 is a waveform diagram of a general audio signal, and FIG.
FIG. 6 (a) is a flowchart for explaining the operation of the classifier, FIG. 6 (a) is a block diagram showing a conventional speech encoding device, and FIG. 6 (b) is a block diagram showing a conventional speech decoding device. FIG. 7 is a detailed block diagram showing the quantizer of FIG. 6 (a). 1 ... Classifier, 2 ... Time axis compressor, 3 ... Predictor, 4
..... quantizer, 5 ... multiplexer, 6 ... separator, 7 ...
Inverse Quantizer, 8: Synthesizer, 9: Time Axis Decompressor, 31 ...
... Codebook for voiced sound steady part, 32 ... Codebook for voiced transient part, 33 ... Codebook for unvoiced sound, 34 ... Codebook for silence.

フロントページの続き (56)参考文献特開昭55−11616（ＪＰ，Ａ) 特開昭63−59127（ＪＰ，Ａ) 特開昭62−194299（ＪＰ，Ａ) 特開昭59−12499（ＪＰ，Ａ) 特開昭59−82608（ＪＰ，Ａ) 特開昭63−204300（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 - 9/18 H03M 7/30 H04B 14/04 ＪＩＣＳＴContinuation of the front page (56) References JP-A-55-11616 (JP, A) JP-A-63-59127 (JP, A) JP-A-62-194299 (JP, A) JP-A-59-12499 (JP, A) JP-A-59-82608 (JP, A) JP-A-63-204300 (JP, A) (58) Fields investigated (Int. Cl. ⁶ , DB name) G10L 3/00-9/18 H03M 7/30 H04B 14/04 JICST

Claims

(57) [Claims]

An audio signal is input, the power in one frame is calculated, and if the power is less than a threshold, the frame is determined to be silent, and if the power is greater than the threshold, it is determined to be sound. The autocorrelation coefficient in one frame is calculated, and if the value is less than the threshold value, the frame is determined to be unvoiced. If the value is equal to or greater than the threshold value, the frame is determined to be voiced. If it is determined that it is not a voiced sound,
The frame is determined to be a voiced sound transition part, and when it is determined that the frame is a voiced sound, a pitch cycle is calculated by an autocorrelation coefficient, a change rate with respect to a pitch cycle of a previous frame is calculated, and the change rate is less than a threshold value. In the case of, the frame is determined to be a voiced stationary part, and if it is greater than or equal to a threshold, the classification means is determined to be a voiced transient part, and the voiced stationary part of the audio signal classified by the classification means is classified into a time axis. Time axis compression means for compressing upward, a voiced sound transient part of the audio signal classified by the classification means, and prediction means for outputting a prediction error of the voiced stationary part compressed by the compression means, and the classification means A speech encoding device having quantization means for quantizing an unvoiced sound of the classified speech signal, silence, and an output signal of a prediction error output by the prediction means; a voiced sound stationary unit from the speech encoding device; Vocal transients Dequantizing means for dequantizing the quantized value of the unvoiced sound, the unvoiced sound, and a voiced sound stationary part dequantized by the dequantizing means, a synthesizing means for synthesizing the voiced sound transient part into a voice signal, A speech encoding device comprising: a time axis extension unit for extending a voiced sound steady part synthesized by the synthesis unit on a time axis.

2. A quantization means comprising a codebook in which a voiced sound stationary part, a voiced sound transient part, an unvoiced sound, and a representative vector for unvoiced sound are stored, and a codebook based on speech signal determination data classified by the classification means. 2. The speech code according to claim 1, wherein a selected value is calculated, a square distance between a signal sequence of the prediction error from the prediction means and each representative vector of the codebook is calculated, and the number of the representative vector having the smallest value is quantized and output.・
Decryption device.