JP3489704B2

JP3489704B2 - Method and decoder for decoding encoded audio signal, and method and encoder for encoding audio signal

Info

Publication number: JP3489704B2
Application number: JP33436795A
Authority: JP
Inventors: ハーゲンジェスパー; バスティアンクレインウィレム
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1994-11-30
Filing date: 1995-11-30
Publication date: 2004-01-26
Anticipated expiration: 2015-11-30
Also published as: DE69521272D1; US5839102A; DE69521272T2; EP0715297B1; EP0715297A2; EP0715297A3; ES2158052T3; CA2156558C; CA2156558A1; KR960020012A; JPH08254994A; TW260846B

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声符号化システ
ムに関し、特に、音声符号化システムにおけるパラメー
タの量子化に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding system, and more particularly to parameter quantization in a speech coding system.

[Prior art]

【０００２】音声符号化システムは、通信路あるいはネ
ットワークを通しての通信のために、音声信号の符号語
による表示を、システム受信器へ提供する機能を果たし
ている。各システム受信器は、受信した符号語から音声
信号を再構成する。与えられた時間間隔内でシステムに
よって通信された符号語情報の量が、システムの帯域幅
を定義付け、システム受信器によって受信された音声の
質に影響を及ぼすのである。Speech coding systems perform the function of providing a system word receiver with a codeword representation of a speech signal for communication over a channel or network. Each system receiver reconstructs a speech signal from the received codeword. The amount of codeword information communicated by the system within a given time interval defines the bandwidth of the system and affects the quality of speech received by the system receiver.

【０００３】音声符号化システムの目的は、入力信号の
質、通信路の質、帯域幅の限定、コストといったような
副次的な条件が与えられているときに、音声の質と帯域
幅の間のトレードオフ関係の内で最良のものを提供する
ということにある。音声信号は、伝送のために量子化さ
れるパラメータの組み合わせによって表示される。おそ
らく、音声符号器の設計において、音声信号を記述する
ためのパラメータの良い組み合わせを探求するというこ
とは、もっとも重要なことである。パラメータの良い組
み合わせの場合、知覚的にも正確な音声信号の再構成に
も、低速度のシステムの帯域幅しか必要としない。加え
て、パラメータの組み合わせとして望ましい特性は、パ
ラメータが独立しているということである。パラメータ
が独立しているときには、量子化器は独立して設計され
ることが可能であり、不正確に受信された情報でも、再
構成された音声信号の質への影響をより少なくするであ
ろう。各パラメータに要求される帯域幅は、各パラメー
タが変化する速度（レート）とパラメータの値の軌跡
が、要求された質を有する再構成された音声を得るため
に記述される必要のある精度についての関数である。The purpose of a speech coding system is to determine the quality and bandwidth of speech given the secondary conditions such as input signal quality, channel quality, bandwidth limitation, and cost. It is about providing the best trade-off relationship between them. Audio signals are represented by a combination of parameters that are quantized for transmission. Perhaps in the design of a speech coder, the search for a good combination of parameters to describe the speech signal is of paramount importance. With a good combination of parameters, both perceptually accurate reconstruction of the speech signal requires only the bandwidth of the slow system. In addition, a desirable property for a combination of parameters is that the parameters are independent. When the parameters are independent, the quantizer can be designed independently so that even incorrectly received information will have less impact on the quality of the reconstructed speech signal. Let's do it. The bandwidth required for each parameter is about the accuracy with which the trajectory of the rate at which each parameter changes and the value of the parameter must be described in order to obtain a reconstructed speech with the required quality. Is a function of.

【０００４】音声信号の出力は、符号化パラメータの組
み合わせのひとつのパラメータとしては、望ましいもの
である。その他のパラメータは、容易に信号出力と独立
となされる。さらに、信号出力は、音声信号の物理的特
性を表示しており、量子化器のための設計上の判断基準
を定義付けることを容易にするものである。信号出力
は、準周期的な音声部分については１ピッチ周期にわた
って、さらに非周期的な音声部分については、ある程度
の所定の時間間隔にわたって平均化された、サンプル毎
の信号のエネルギーとして定義されることが可能であ
る。非周期的部分に対する時間間隔は、知覚的に有意
（重要）であるために、十分に短くあるべきである。
（５ｍｓあるいはそれ以下が有効である。）このような
定義を用いると、持続した母音の間では、音声信号の出
力はなめらかな関数となり、音の出だしや破裂音を明確
に示すことになる。The output of a voice signal is desirable as one parameter of a combination of coding parameters. The other parameters are easily made independent of the signal output. In addition, the signal output is an indication of the physical characteristics of the audio signal and facilitates defining design criteria for the quantizer. The signal output shall be defined as the energy of the signal per sample, averaged over one pitch period for the quasi-periodic speech part and over some predetermined time interval for the aperiodic speech part. Is possible. The time intervals for aperiodic parts should be short enough to be perceptually significant (important).
(5 ms or less is effective.) By using such a definition, the output of the voice signal becomes a smooth function between the continuous vowels, and the onset and plosive sound are clearly shown.

【０００５】高解像度の信号出力の推定（見積もり）
は、固定したあるいは大きなウインドウ（窓）サイズで
は得られることができない。推定（見積もり）のための
大きなウインドウ（窓）サイズは、推定（見積もり）さ
れた信号出力について、低い時間分解能に結びつく。結
果として、このようなアプローチを用い、低いレートの
符号器をもって再構成された音声は、一般に鋭さ（歯切
れの良さ）が欠けていることになる。一方で、短い、固
定されたウインドウ（窓）は、信号出力の不安定さへ結
びつく。このようなことから、ＣＥＬＰ（Ｃｏｄｅ−Ｅ
ｘｃｉｔｅｄ−Ｌｉｎｅａｒ−Ｐｒｅｄｉｃｔｉｖｅ，
Ｃｏｄｅ−ＥｘｃｉｔｅｄＬＰＣ）方式といったよう
な、短く固定したウインドウを用いる符号器は、一般に
信号出力を明示的なパラメータとしては用いていない。
（なお、例えば、B.S.Atal,"High-Quality Speech at L
ow Bit Rates:Multi-Pulse and Stochastically Excite
dLinear Predictive Coders,"Proc.Int.Conf.Acoust.Sp
eech Sign.Process.,Tokyo,pp.1681-1684,1986等を参照
のこと。）Estimation of high-resolution signal output (estimation)
Cannot be obtained with a fixed or large window size. The large window size for estimation leads to a low temporal resolution for the estimated signal power. As a result, speech reconstructed with a low rate encoder using such an approach will generally lack sharpness. On the other hand, short, fixed windows lead to instabilities in the signal output. Because of this, CELP (Code-E
xcited-Linear-Predictive,
An encoder using a short fixed window, such as a Code-Excited LPC method, generally does not use the signal output as an explicit parameter.
(For example, BSAtal, "High-Quality Speech at L
ow Bit Rates: Multi-Pulse and Stochastically Excite
dLinear Predictive Coders, "Proc.Int.Conf.Acoust.Sp
eech Sign. Process., Tokyo, pp. 1681-1684, 1986, etc. )

【０００６】ますます増える符号化効率の要求と共に、
独立して符号化がなされるための明示的なパラメータと
して、信号出力を利用する符号器がさらに増えることが
期待されている。最近では、特徴的な波形という点から
音声信号を記述し、高レートで（およそ５００Ｈｚ）サ
ンプリングされる、符号化処理が導入されてきた。（な
お、例えば、W.B.Kleijin and J.Haagen,"Transformati
on and Decompositionof the Speech Signal for Codin
g,"IEEE Signal Processing Letters,Vol.1,September
1994,pp.136-138.等を参照のこと。）これらの、いわゆ
る波形補間による符号器においては、信号出力の推定
（見積もり）ウインドウ（窓）は１ピッチ周期である。
（音声については）これらの新しい波形補間符号器は、
高い時間分解能をもって、非常に正確な信号出力の推定
（見積もり）をする解析を用いている。当該信号出力は
独立に符号化される。With the increasing demand for coding efficiency,
It is expected that more encoders will use the signal output as an explicit parameter for independent encoding. Recently, a coding process has been introduced which describes a speech signal in terms of its characteristic waveform and is sampled at a high rate (approximately 500 Hz). (For example, WBKleijin and J. Haagen, "Transformati
on and Decompositionof the Speech Signal for Codin
g, "IEEE Signal Processing Letters, Vol.1, September
1994, pp. 136-138. In these encoders based on so-called waveform interpolation, the estimation (estimation) window of the signal output is one pitch period.
These new waveform interpolation encoders (for speech)
The analysis is used to estimate the signal output very accurately with a high time resolution. The signal output is encoded independently.

[Problems to be Solved by the Invention]

【０００７】信号出力を明示的なパラメータとして用い
ている、従来からの符号化技術においては、信号出力
は、かなり低速度（低いビット伝送速度）で伝送されて
いる。長時間にわたって更新される時間間隔を通じての
直線補間は、その際、信号出力の輪郭を再構成するため
に用いられている。（このような補間は、しばしば出力
のｌｏｇ表示（対数表示）に対して当てはめられること
が多い。）（なお、例えば、T.E.Tremain,"The Governm
ent Standard Linear Predictive Coding Algorithm,"S
peech Technology,pp40-49,April 1982.等を参照のこ
と。）出力の輪郭についてのより詳細な記述を行うこと
は、再構成された信号の質を改良することになるであろ
う。しかしながら、課題は、信号出力の輪郭について単
に知覚的に重要な詳細部分のみを伝送するということに
留まり、低いビット伝送速度が未だ用いられ得るのであ
る。In conventional coding techniques that use the signal output as an explicit parameter, the signal output is transmitted at a fairly low rate (low bit rate). Linear interpolation over time intervals that are updated over time is then used to reconstruct the contour of the signal output. (Such interpolation is often applied to the log representation of the output.) (For example, TETremain, "The Governm
ent Standard Linear Predictive Coding Algorithm, "S
See peech Technology, pp40-49, April 1982. ) Making a more detailed description of the output contours will improve the quality of the reconstructed signal. However, the problem remains to transmit only the perceptually important details of the contour of the signal output, and low bit rates can still be used.

[Means for Solving the Problems]

【０００８】本発明は、低いビット伝送速度で、音声符
号化パラメータの知覚的に重要な特徴の伝送を許容する
方法及び装置を提供するものである。例えば、音声符号
化パラメータは、音声の信号出力を含みうる。パラメー
タは、ブロック毎を基本として処理されている。ブロッ
クの境界におけるパラメータ値は、例えば差分量子化の
手段といったような従来からの方法によって伝送されて
いる。そこで、本発明においては、ブロック境界内の再
構成されたパラメータの輪郭の形状は、分類化に基つ゛
くものとなっている。分類化は、ブロック内でのパラメ
ータの輪郭の知覚的に重要な特徴に依存するものであ
る。分類化は、符号器の伝送側末端（例えば、高い時間
分解能を有する元のパラメータの輪郭やその他、同様の
考えられ得る音声のパラメータを用いて）あるいは符号
器の受信器側末端（例えば、伝送されたパラメー−タの
値、及びその他、同様の考えられ得る伝送された音声の
パラメータを用いて）のいずれにおいても実行されるこ
とが可能である。ブロックの境界におけるパラメータの
値同様、分類化の結果を元にして、パラメータの輪郭
（ブロック内での）が、考えられるパラメータの輪郭の
目録（インベントリー）から選択される。The present invention provides a method and apparatus that allows transmission of perceptually significant features of speech coding parameters at low bit rates. For example, the audio coding parameters may include audio signal output. The parameters are processed on a block-by-block basis. The parameter value at the block boundary is transmitted by a conventional method such as a means for differential quantization. Therefore, in the present invention, the contour shape of the reconstructed parameter within the block boundary is based on the classification. The classification relies on the perceptually important features of the contours of the parameters within the block. The classification can be done at the transmitter end of the encoder (eg, using the contour of the original parameter with high temporal resolution or other similar possible audio parameters) or at the receiver end of the encoder (eg, transmitter Parameter values, as well as other similar possible transmitted voice parameters). Based on the result of the classification, as well as the values of the parameters at the boundaries of the blocks, the parameter contours (within the blocks) are selected from the inventory of possible parameter contours.

【０００９】[0009]

DETAILED DESCRIPTION OF THE INVENTION

[イントロダクション]音声符号化の目的は、通信路の
質、ハードウエア、遅延による制約に従った場合に、再
構成された音声の質と要求された帯域幅の間のトレード
オフ関係の内で望ましいものを得るということにある。
一般的に、音声信号についてはモデルが用いられてお
り、時間の関数としてのモデルとなるパラメータ（ベク
トルともなりうるが）の軌跡が、ある精度をもって伝送
されている。（もっとも単純なモデルにおいては、モデ
ルとなるパラメータは、音声信号そのものである。）デ
ジタル音声符号器では、モデルとなるパラメータの軌跡
はスカラー量あるいはベクトル量のサンプルの配列とし
て記述されている。これらのパラメータは低速度（低い
ビット伝送速度）で伝送される場合が考えられ、軌跡
は、更新されている各点間での補間により再構成されて
いる。選択的には、予測器（線形予測器である場合があ
り得る。）が、前に再構成されたサンプルからパラメー
タを予測するために用いられ、実際の値と予測された値
の間の違い（残差）のみが伝送される。さらに他の処理
においては、パラメータの軌跡の、高い時間分解能を有
する記述が、逐次的なブロックに分割されうる場合があ
る。このような逐次的なブロックは、伝送のために量子
化されたベクトルである。符号器の中には、ベクトル量
子化と予測が組み合わされているものがある。[Introduction] The goal of speech coding is desirable within the trade-off between reconstructed speech quality and required bandwidth, subject to channel quality, hardware, and delay constraints. The thing is to get things.
In general, a model is used for a voice signal, and a locus of a model parameter (which can be a vector) as a function of time is transmitted with a certain accuracy. (In the simplest model, the model parameter is the speech signal itself.) In the digital speech coder, the locus of the model parameter is described as an array of scalar quantity or vector quantity samples. These parameters may be transmitted at a low speed (low bit transmission rate), and the locus is reconstructed by interpolation between the updated points. Alternatively, a predictor (which may be a linear predictor) is used to predict the parameters from the previously reconstructed samples, and the difference between the actual and predicted values. Only (residual) is transmitted. In yet another process, a high temporal resolution description of the parameter trajectory may be divided into sequential blocks. Such sequential blocks are quantized vectors for transmission. Some encoders combine vector quantization and prediction.

【００１０】本発明の例示的な実施例においては、パラ
メータの軌跡（ベクトルである場合もあり得る。）は、
上述した補間、予測、ベクトル量子化処理の方法を増加
させる方法で伝送される。パラメータは、ブロック毎を
基本に伝送され、各ブロックは、分析を行う側で、複数
のパラメータのサンプルを含んでいる。パラメータの信
号は、低域フィルターをかけられダウンサンプリングさ
れる。このダウンサンプリングされたパラメータ配列
は、従来の手段に従って伝送される。（例えば、次のセ
クションにおいて記述されている例示的な実施例におい
ては、この従来からの伝送では、差分量子化器を用いて
いる。）受信器においては、パラメータ配列は、音声モ
デルにより再構成されるために必要なレートまで、アッ
プサンプリングされなくてはならない。アップサンプリ
ングのために帯域制限や直線補間が用いられているとき
には、明らかに、信号の特徴は失われていることにな
る。本発明の例示的な実施例においては、パラメータの
軌跡の知覚的に重要な特徴を識別するために、分類化が
用いられており、そうでない場合、補間のみを元にして
再構成されたパラメータ配列においては、このようなパ
ラメータの軌跡の知覚的に重要な特徴は存在しないこと
になる。このような分類化の結果によって、ブロックの
境界におけるサンプル間のパラメータの軌跡を構成する
ために、軌跡の目録（インベントリー）からひとつの軌
跡が選択される。さらに、この目録（インベントリー）
はブロックの境界におけるパラメータの値に適応してい
る。ここで記述された例示的な方法は、必ずしも付加的
な情報の伝送を必要とするものではない。すなわち、伝
送され、ダウンサンプリングが行われたパラメータの配
列のみを用いて、符号器の受信器側末端において、分類
化は実行される。In an exemplary embodiment of the invention, the parameter trajectory (which may be a vector) is:
It is transmitted by a method that increases the methods of interpolation, prediction, and vector quantization described above. The parameters are transmitted on a block-by-block basis, and each block contains a plurality of parameter samples at the analysis side. The parameter signal is low-pass filtered and down-sampled. This downsampled parameter array is transmitted according to conventional means. (For example, in the exemplary embodiment described in the next section, this conventional transmission uses a differential quantizer.) At the receiver, the parameter array is reconstructed by the speech model. It must be upsampled to the rate needed to be done. Obviously, signal features are lost when band limiting or linear interpolation is used for upsampling. In an exemplary embodiment of the invention, classification is used to identify perceptually important features of the parameter trajectory, otherwise the reconstructed parameter is based solely on interpolation. In the array, there will be no perceptually significant features of such parameter trajectories. According to the result of such classification, one trajectory is selected from the inventory of trajectories in order to construct the trajectory of the parameters between the samples at the boundary of the block. In addition, this inventory
Is adapted to the parameter values at the block boundaries. The exemplary methods described herein do not necessarily require the transmission of additional information. That is, the classification is performed at the receiver end of the encoder using only the transmitted and downsampled array of parameters.

【００１１】[例示的な実施例]ここで示された例示的な
実施例においては、上で記述された処理が、特に音声出
力に対して適用されている。ステップ形をした輪郭の音
声信号は、平滑な輪郭をした音声信号とは、有意に異な
って聞こえる。平滑な輪郭は、持続した音声の音に典型
的にみられる一方、ステップ形をした輪郭は、音の出だ
しを発音する際に共通してみられる。伝送され、ダウン
サンプリングされた音声出力の配列を用いる、単純な分
類化のスキームでは、高い信頼性をもって、ステップ形
をした音声信号の輪郭を識別することが可能である。そ
こでは、ステップ形をした輪郭は、再構成された信号出
力の配列のために用いられている。実験により、音声出
力の信号におけるステップの正確な位置は、認識された
音声の質にとって、僅かな重要性しか有していないとい
うことが示されている。Exemplary Embodiment In the exemplary embodiment shown here, the process described above is applied specifically to audio output. A stepped contour audio signal sounds significantly different than a smooth contoured audio signal. Smooth contours are typically found in sustained vocal sounds, while stepped contours are common in pronouncing sounds. A simple classification scheme, which uses an array of transmitted and down-sampled audio outputs, makes it possible to reliably identify stepped audio signal contours. There, stepped contours are used for the array of reconstructed signal outputs. Experiments have shown that the exact position of the steps in the signal of the speech output has only a slight importance to the perceived speech quality.

【００１２】符号器の伝送側末端において実行された分
類化は、破裂音のようなサンプル間でのエネルギーの輪
郭における特徴を識別するために、用いられることが可
能である。また、再構成された破裂音の正確な位置も、
ごく僅かな知覚上の重要性しか有していない。このよう
に、伝送側末端において破裂音が識別されたときには常
に、音声出力の信号における単純な膨らみ（隆起）部分
が、ブロックの中央に加えられることになる。The classification performed at the transmitter end of the encoder can be used to identify features in the contour of energy between samples, such as plosives. Also, the exact location of the reconstructed plosive is
It has negligible perceptual importance. Thus, whenever a plosive is identified at the transmitting end, a simple bulge in the audio output signal will be added to the center of the block.

【００１３】図１は、波形補間符号器において、信号出
力の抽出を実行する、本発明の例示的な実施例の伝送側
部分を示している。元の音声信号は、最初に、符号化ユ
ニット（エンコーデイングユニット）１０１において処
理される。波形補間符号器においては、この符号化ユニ
ット（エンコーデイングユニット）は特徴的な波形を抽
出する。これらの特徴的な波形は、音声が発音されてい
る間の１ピッチ周期に対応している。既知の方法に従っ
て、音声信号は、特徴的な波形の配列（線形予測による
差の部分で定義される。）、ピッチ周期での軌跡、時間
変化する線形予測係数によって表示される。そのような
技術は、例えば、本発明の譲受人に譲り受けられた、共
に出願中のU.S.Patent application Ser.No.08/179,831
（米国特許出願番号Ｎｏ．０８／１７９，８３１）の、
W.B.Kleijinによる"Method andAppratus For Prototype
Waveform Speech Coding"において、記述されており、
参照することで、ここで十分に示されるのと同様に、こ
のような技術が組み入れられている。（なお、さらに、
例えば、W.B.Kleijin,"Encoding Speech UsingPrototyp
e Waveforms,"IEEE Trans.Speech and Audio Processin
g,Vol.1,No.4,pp.386-399,1993及びW.B.Kleijin and J.
Haagen,"Transformation andDecomposition of the Spe
ech Signal for Coding,"IEEE Signal ProcessingLette
rs,Vol.1,September 1994,pp.136-138.等を参照のこ
と。）FIG. 1 illustrates the transmit side portion of an exemplary embodiment of the present invention which performs signal output extraction in a waveform interpolation encoder. The original speech signal is first processed in a coding unit (encoding unit) 101. In the waveform interpolation encoder, this encoding unit (encoding unit) extracts a characteristic waveform. These characteristic waveforms correspond to one pitch period during which the voice is being produced. According to known methods, a speech signal is represented by a characteristic array of waveforms (defined by the difference part of a linear prediction), a trajectory in pitch periods, and a time-varying linear prediction coefficient. Such technology is, for example, US Patent application Ser. No. 08 / 179,831 filed together with the assignee of the present invention.
(US Patent Application No. 08 / 179,831),
"Method and Appratus For Prototype" by WB Kleijin
Waveform Speech Coding ",
By reference, such techniques are incorporated, as well shown fully herein. (In addition,
For example, WBKleijin, "Encoding Speech Using Prototyp
e Waveforms, "IEEE Trans.Speech and Audio Processin
g, Vol. 1, No. 4, pp. 386-399, 1993 and WBKleijin and J.
Haagen, "Transformation and Decomposition of the Spe
ech Signal for Coding, "IEEE Signal ProcessingLette
rs, Vol.1, September 1994, pp.136-138. )

【００１４】特徴的な波形の記述は、大抵の場合、有限
フーリエ系列の形式をとっている。特徴的な波形は残り
（差）の部分で記述される。というのは、このようにす
ることで、抽出化及び量子化が容易となるからである。
特徴的な波形のサンプリング（抽出化）レートは、およ
そ５００Ｈｚに合わせられるのが有効である。本図にお
いては、以下の図面同様、ピッチ周期での軌跡、線形予
測係数は、これらのパラメータを必要とする、あらゆる
処理ユニット（プロセッシングユニット）に対しても利
用可能なものであると考えられる。ピッチ周期での軌跡
及び線形予測係数の両方とも、従来の方法に従って定義
付けられ、補間されているのである。The characteristic waveform description is often in the form of a finite Fourier sequence. The characteristic waveform is described in the remaining (difference) part. This is because this makes extraction and quantization easier.
The sampling (extraction) rate of the characteristic waveform is effectively tuned to approximately 500 Hz. In this figure, as in the following drawings, it is considered that the locus in the pitch period and the linear prediction coefficient can be used for any processing unit (processing unit) that requires these parameters. Both the trajectory in pitch period and the linear prediction coefficient are defined and interpolated according to conventional methods.

【００１５】特徴的な波形のうち量子化されない部分
（図１においては、量子化されていない中間的な信号と
して表記されている。）は、出力抽出器１０２へと供給
される。出力抽出器１０２においては、特徴的な波形の
残り（差）の部分が、まず、線形予測合成フィルターを
用いた巡回畳み込みという手段により、当該特徴的な波
形の音声部分へと変換される。（このような畳み込み
は、例えば、W.B.Kleijin,"Encoding Speech Using Pro
totype Waveforms,"IEEE Trans.Speech and AudioProce
ssing,Vol.1,No.4,pp.386-399,1993の式（１９）の手段
により、直接的にフーリエ系列を元にして実行されるこ
とが可能である。）音声部分での信号出力は、線形予測
係数における伝送誤り（エラー）が、音声信号の出力に
影響を及ぼすことを避けることから、用いられるのであ
る。The unquantized portion of the characteristic waveform (indicated as an unquantized intermediate signal in FIG. 1) is supplied to the output extractor 102. In the output extractor 102, the remaining (difference) portion of the characteristic waveform is first converted into the speech portion of the characteristic waveform by means of cyclic convolution using a linear prediction synthesis filter. (This kind of convolution can be performed, for example, by WBKleijin, "Encoding Speech Using Pro
totype Waveforms, "IEEE Trans.Speech and AudioProce
It can be directly executed based on the Fourier sequence by means of the equation (19) of ssing, Vol. 1, No. 4, pp. 386-399, 1993. The signal output in the audio part is used because transmission errors in the linear prediction coefficients do not affect the output of the audio signal.

【００１６】そこで、出力抽出器１０２は、各音声サン
プルについて、特徴的な波形の出力を算出する。出力
は、信号出力がピッチ周期に依存しないように、サンプ
ルベース毎に正規化される。これによって、量子化を容
易にし、ピッチ周期に影響を及ぼす通信路での誤り（エ
ラー）に対して、出力が影響を受け難くすることにな
る。最終的には、出力抽出器１０２は、結果として得ら
れる音声部分の出力を、音声部分の出力の対数（表示）
に変換する。例えば、広く知られている、デシベル（ｄ
Ｂ）のｌｏｇスケール（表示）が、このような目的のた
めに用いられ得るであろう。（線形信号出力よりもむし
ろ信号出力の対数（表示）を利用することの方が、人間
の知覚特性に動機付けられたものである。人間の耳は、
何桁ものオーダーの大きさにわたって変化する信号出力
を扱うことが可能である。）特徴的な波形と同じレート
でサンプリングされた、このような信号は破裂音検出器
１０５、低域フィルター１０６、正規化器（ノーマライ
ザー）１０３へと供給される。正規化器（ノーマライザ
ー）１０３は、正規化された特徴的な波形を作り出すた
めに、抽出された信号出力を用いている。このような正
規化された特徴的な波形は、さらに符号化ユニット（エ
ンコーデイングユニット）１０４において、符号化さ
れ、また、信号出力を付加的な情報として利用する場合
もあり得るであろう。Therefore, the output extractor 102 calculates the output of a characteristic waveform for each voice sample. The output is normalized on a sample-by-sample basis so that the signal output is independent of pitch period. This facilitates quantization and makes the output less susceptible to errors in the communication path that affect the pitch period. Finally, the output extractor 102 takes the resulting output of the audio portion as the logarithm (display) of the output of the audio portion.
Convert to. For example, the widely known decibel (d
The log scale (display) of B) could be used for such purpose. (Using the logarithm (display) of the signal output rather than the linear signal output is more motivated by the human perceptual characteristics.
It is possible to handle signal outputs that vary over orders of magnitude. ) Such a signal, sampled at the same rate as the characteristic waveform, is fed to a plosive detector 105, a low pass filter 106, a normalizer 103. A normalizer (normalizer) 103 uses the extracted signal output to create a normalized characteristic waveform. Such a normalized characteristic waveform may be further encoded in the encoding unit (encoding unit) 104, and the signal output may be used as additional information.

【００１７】エーリアシング（重ねて同じ信号が処理さ
れる）を避けるために、低域フィルター１０６は、ダウ
ンサンプラー１０７の出力信号についてのサンプリング
周波数の半分を越える周波数を取り除く。２．４Ｋｂ／
ｓの符号器についていえば、ダウンサンプリングを行っ
た後のサンプリング周波数は、１００Ｈｚ（ここで与え
られた実施例においては、５という係数でダウンサンプ
リングがなされることに対応している。）に合わせられ
ることが有効である。To avoid aliasing (the same signal is overlaid), the low pass filter 106 removes frequencies above half the sampling frequency for the output signal of the downsampler 107. 2.4 Kb /
For the s encoder, the sampling frequency after downsampling is matched to 100 Hz (corresponding to downsampling with a factor of 5 in the example given here). Is effective.

【００１８】出力符号器（出力エンコーダー）１０８
は、ダウンサンプリングされたｌｏｇ表示の出力配列を
符号化（エンコード）する。これは、差分量子化器を用
いて処理されることがより有効である。ここで、サンプ
リング時刻ｎにおけるｌｏｇ表示の出力がｘ（ｎ）であ
るとしよう。すると、差分信号ｅ（ｎ）を量子化するた
めには、単なるスカラー量の量子化器が用いられ、すな
わち、ｅ（ｎ）＝ｘ（ｎ）−α^*ｘ（ｎ−１）（１）と表せる。Ｑ（ｅ（ｎ））は、ｅ（ｎ）の量子化された
値を表示するものとしよう。すると、再構成されたｌｏ
ｇ表示による出力は、Ｘ（ｎ）＝Ｑ（ｅ（ｎ））＋α^*ｘ（ｎ−１）（２）となる。１以下のαについては、等式（２）は、良く知
られている、ｌｅａｋｙｉｎｔｅｇｒａｔｏｒ（漏出積
分器）を示している。ｌｅａｋｙｉｎｔｅｇｒａｔｏ
ｒ（漏出積分器）の機能は、通信路における誤り（エラ
ー）への感度を減少させるということにある。α＝０．
８という値が用いられることが有効である。Output encoder (output encoder) 108
Encodes the downsampled log representation of the output array. This is more effectively processed using a differential quantizer. Here, assume that the output of the log display at the sampling time n is x (n). Then, in order to quantize the difference signal e (n), a quantizer with a simple scalar quantity is used, that is, e (n) = x (n) −α ^* x (n−1) (1) Can be expressed as Let Q (e (n)) denote the quantized value of e (n). Then the reconstructed lo
The output in g display is X (n) = Q (e (n)) + α ^* x (n−1) (2). For α less than or equal to 1, equation (2) shows the well known leaky integrator. leaky integral
The function of r (leakage integrator) is to reduce the sensitivity to errors in the channel. α = 0.
A value of 8 is effectively used.

【００１９】破裂音検出器１０５は、処理されていない
ｌｏｇ表示の出力の配列と低域フィルターをとおしたｌ
ｏｇ表示の出力の配列を利用する。ダウンサンプリング
されたｌｏｇ表示の出力の配列のサンプル間における各
時間間隔（例えば、１００Ｈｚという、ダウンサンプリ
ングされたサンプリングレートに基つ゛いた場合の１０
ｍｓといった）について、破裂音検出器の出力は、２つ
の判断となる。すなわち、１が破裂音を検出したことを
意味する一方で、０は破裂音が検出されなかったことを
意味している。The plosive detector 105 is an array of unprocessed log output and low-pass filtered.
Use the output array of og display. Each time interval between samples of the array of downsampled log representations (eg, 10 Hz based on a downsampled sampling rate of 100 Hz).
The output of the plosive detector (for example, ms) is two decisions. That is, 1 means that a plosive sound is detected, while 0 means that a plosive sound is not detected.

【００２０】破裂音検出器１０５の操作は図３において
示されている。ピーククリアランス検出器３０４は、ｌ
ｏｇ表示の出力のサンプルから、同じサンプルで低域フ
ィルターをとおしたｌｏｇ表示の出力の配列を引いた値
が、所与の閾値よりも大きいかについて判断がなされ
る。（例えば、このような閾値は、信号出力のｌｏｇ表
示については、１６ｄＢに合わせられることが有効であ
る。）もし、このような場合、ピーククリアランス検出
器３０４の出力は１であり、そうでない場合には出力は
０である。The operation of plosive detector 105 is shown in FIG. The peak clearance detector 304 is
A determination is made as to whether the sample of the og display output minus the array of the low display filtered log display output of the same sample is greater than a given threshold. (For example, such a threshold is effectively matched to 16 dB for log representations of signal output.) If this is the case, the output of peak clearance detector 304 is 1, otherwise. The output is zero.

【００２１】ハットハンガー３０１の操作は、図５及び
図６に例示されている。概念的には、ハット形の曲線
が、ここでの出力信号サンプルから、いわば吊るされて
いることになる。すなわち、ハットに相当する部分の頂
上部は、ここでのサンプルの頂上部に等しいレベルに合
わせられている。ハットクリアランス検出器３０３の出
力は、ハットの形状によりカバーされている部分のサン
プルが、ハット及びその周辺部より下に適合しているな
らば、１となる。例えば、図５は当該ハットが隣接する
サンプルとの衝突を避けていない状況であることを示し
ている。このようなことから、ハットクリアランス検出
器３０３の出力は０である。一方、図６は、当該ハット
が隣接するサンプルとの衝突を避けている状況であるこ
とを示している。このようなことから、ハットクリアラ
ンス検出器３０３の出力は１である。ハットの特性はハ
ットキーパー３０２に保存されている。ハットの形状は
検出間隔の範囲内で変化させることが可能であり、周辺
部の高さは左側と右側で異ならせることが可能である。
例えば、ハットが左右対称である場合には、ハットの頂
上部の幅及び周辺部の幅は、それぞれ、５ｍｓに合わせ
られることが有効であり得るし、頂上部までの周辺部の
距離は、信号出力のｌｏｇ表示を記述している輪郭につ
いては、１２ｄＢに合わせられることが有効であり得
る。例えば、ハットクリアランス検出器３０３が、サン
プルのレベルのテストを行い、これらのレベルと与えら
れた所定の閾値とを比較するためのサンプルメモリー及
びプロセッサーをもって補足されうるということは、当
業者の認めるところであろう。The operation of the hat hanger 301 is illustrated in FIGS. 5 and 6. Conceptually, a hat-shaped curve is, so to speak, hung from the output signal samples here. That is, the top of the portion corresponding to the hat is leveled with the top of the sample here. The output of the hat clearance detector 303 will be 1 if the sample of the portion covered by the shape of the hat fits below the hat and its periphery. For example, FIG. 5 shows that the hat does not avoid collision with an adjacent sample. Because of this, the output of the hat clearance detector 303 is zero. On the other hand, FIG. 6 shows a situation in which the hat avoids collision with an adjacent sample. Because of this, the output of the hat clearance detector 303 is 1. The characteristics of the hat are stored in the hat keeper 302. The shape of the hat can be changed within the detection interval, and the height of the peripheral portion can be different between the left side and the right side.
For example, if the hat is symmetrical, the width of the top of the hat and the width of the perimeter may be effectively adjusted to 5 ms respectively, and the distance of the perimeter to the top may be For the contour describing the log representation of the output, it may be useful to be tuned to 12 dB. For example, it will be recognized by those skilled in the art that the hat clearance detector 303 can be supplemented with a sample memory and processor for testing sample levels and comparing these levels to a given threshold. Ah

【００２２】論理的にＡＮＤの機能を持つ演算器３０５
は、ピーククリアランス検出器３０４からの出力とハッ
トクリアランス検出器３０３からの出力を結びつける。
もしこれらの２つの出力のいずれかひとつが０であるな
らば、論理的にＡＮＤの機能を持つ演算器３０５の出力
は０となる。論理的にはＯＲの機能を持ったダウンサン
プラー３０６は、各時間間隔について、ダウンサンプリ
ングされたｌｏｇ表示の出力の配列という、ひとつの出
力を有する。（すなわち、ダウンサンプラー１０７の出
力）例えば、前に記述された例としてのケースについて
は、これは１０ｍｓ毎に一出力ということになろう。も
し、論理的にはＯＲの機能を持ったダウンサンプラー３
０６への入力が、このような時間間隔内でのいかなると
きにおいても０でない場合には、論理的にはＯＲの機能
を持ったダウンサンプラー３０６の出力は１に合わせら
れる。そして、このことは、破裂音が検出されたという
ことを示している。もし入力が時間間隔内でのいかなる
ときにおいても０である場合には、論理的にはＯＲの機
能を持ったダウンサンプラー３０６の出力は０に合わせ
られる。これは、破裂音は検出されなかったことを示し
ている。An arithmetic unit 305 having a logical AND function
Combines the output from the peak clearance detector 304 and the output from the hat clearance detector 303.
If any one of these two outputs is 0, the output of the arithmetic unit 305 having a logical AND function becomes 0. The logically ORed downsampler 306 has one output for each time interval, an array of downsampled log representation outputs. (Ie, output of downsampler 107) For example, for the example case described previously, this would be one output every 10 ms. If logically, the down sampler 3 with the function of OR
If the input to 06 is non-zero at any time within such a time interval, then the output of the downsampler 306, which is logically an OR function, is set to one. And this shows that the plosive sound was detected. If the input is zero at any time within the time interval, then the output of the logically ORed downsampler 306 is zeroed. This indicates that no plosive sound was detected.

【００２３】図２は、図１において示された伝送部分に
対応した本発明における例示的実施例の受信部分を示し
ている。復号器（デコーダー）ユニット２０１は特徴的
な波形を再構成する。復号器（デコーダー）ユニット２
０１の中で実行される操作のいくつかは伝送器において
実行される操作に対応していない。例えば、出力信号の
スペクトル形状を強調するために、特徴的な波形に対し
て、形状が成形される前のスペクトルが加えられる場合
がありうる。このことは、復号器（デコーダ）ユニット
２０１の出力を形成する特徴的な波形は、一般的に、正
規化された出力を有することが保証されていないという
ことを意味している。このようなことから、量子化され
た特徴的な波形をスケーリングする（換算する）に先だ
って、これらの出力が評価されなくてはならない。これ
は、出力抽出器１０２と類似した方法で機能する、出力
抽出器２０２によってなされる。また、出力は音声の部
分において評価される。FIG. 2 shows the receiving part of an exemplary embodiment of the invention corresponding to the transmitting part shown in FIG. The decoder unit 201 reconstructs the characteristic waveform. Decoder unit 2
Some of the operations performed in 01 do not correspond to the operations performed at the transmitter. For example, in order to emphasize the spectral shape of the output signal, the spectrum before the shape is shaped may be added to the characteristic waveform. This means that the characteristic waveform forming the output of the decoder unit 201 is generally not guaranteed to have a normalized output. For this reason, these outputs must be evaluated before scaling (converting) the quantized characteristic waveform. This is done by the output extractor 202, which functions in a manner similar to the output extractor 102. The output is also evaluated in the audio part.

【００２４】スケールファクター（換算係数）プロセッ
サー２０６は復号器（デコーダ）ユニット２０１によっ
て生成された特徴的な波形に適用されるべき適切なスケ
ールファクター（換算係数）を決定する。それぞれの特
徴的な波形については、スケールファクター（換算係
数）プロセッサー２０６への入力は、伝送された情報か
ら再構成された、ｌｏｇ表示の出力値であり、スケーリ
ングを行う（換算を行う）前の量子化された特徴的な波
形である。ｌｏｇ表示の出力値は、線形の出力値に変換
され、スケーリングされていない量子化された特徴的な
波形の出力により割られる。このような除法が、スケー
リングされていない量子化された特徴的な波形にとって
の適切なスケールファクター（換算係数）を作り出すこ
とになる。結果として生じるスケールファクター（換算
係数）は増幅器２０７において用いられ、この増幅器は
その出力として、適切にスケーリングされた量子化され
た特徴的な波形を有する。この特徴的な波形は復号器
（デコーダ）ユニット２０３の入力であり、この復号器
（デコーダ）ユニットが、特徴的な波形の配列の記述を
（ピッチ周期での軌跡及び線形予測係数による補助も共
に）再構成された音声信号へと変換する。復号器（デコ
ーダ）ユニット２０３において用いられた、良く知られ
ている方法は、例えば、U.S.Patent application Ser.N
o.08/179,831（米国特許出願番号Ｎｏ．０８／１７９，
８３１）に記述されている。The scale factor processor 206 determines the appropriate scale factor to be applied to the characteristic waveform generated by the decoder unit 201. For each characteristic waveform, the input to the scale factor processor 206 is the log-represented output value reconstructed from the transmitted information, prior to scaling (conversion). It is a quantized characteristic waveform. The output values in log representation are converted to linear output values and divided by the output of the unscaled quantized characteristic waveform. Such division will produce an appropriate scale factor for the unscaled quantized characteristic waveform. The resulting scale factor is used in amplifier 207, which has as output its appropriately scaled and quantized characteristic waveform. This characteristic waveform is the input of a decoder (decoder) unit 203, which provides a description of the array of characteristic waveforms (with the assistance of loci in pitch periods and linear prediction coefficients). ) Convert to reconstructed audio signal. Well-known methods used in the decoder unit 203 are described in US Patent application Ser.N.
08.179,831 (US Patent Application No. 08/179,
831).

【００２５】ここでは、ｌｏｇ表示の出力の配列の再構
成が説明されよう。出力復号器（パワーデコーダー）２
０４は、ダウンサンプリングされ、量子化されたｌｏｇ
表示の出力の配列を、上の式（２）に基つ゛いて再構成
する。出力（パワー）エンベローププロセッサー２０５
は、このダウンサンプリングされた配列をサンプリング
がなされていないｌｏｇ表示の出力の配列へと変換す
る。出力（パワー）エンベローププロセッサー２０５の
操作は、図４に詳細に例示されている。第一に、破裂音
についての情報が０である場合（破裂音が存在しないと
いうことを示している。）が考えられるであろう。出力
ステップ評価器（パワーステップエバリュエター）４０
１が、ダウンサンプリングされた配列の、現在でのｌｏ
ｇ表示での出力値から、ダウンサンプリングされた配列
の、前のｌｏｇ表示での出力値を差し引くことで、その
差を決定する。アップサンプラー４０２は、アップサン
プリング処理に従って、ｌｏｇ表示の出力の配列をアッ
プサンプリングする。とりわけ、アップサンプラー４０
２により実行されるアップサンプリング処理は、連続的
なサンプル（出力ステップ評価器（パワーステップエバ
リュエター）４０１により決定される。）間の差を閾値
と比較することを元にして選択される。例えば、閾値
は、音声出力のｌｏｇ表示では１２ｄｂで、１００Ｈｚ
というサンプリングレートに選ばれることが有効であり
得る。更新される各点間での直線補間は、連続的なサン
プル間の差が当該閾値よりも小さい場合には、アップサ
ンプラー４０２により実行される。ほとんどの時間間隔
についてもこのような場合が当てはまり、図７に例示さ
れている。図７は、ダウンサンプリングされたｌｏｇ表
示での出力の配列について、２つのサンプル値を太線で
示している。これらの２つのサンプル値の間のサンプル
は直線補間により得られる。Here, the reconstruction of the array of outputs in the log display will be explained. Output decoder (power decoder) 2
04 is the downsampled and quantized log
The display output array is reconstructed based on equation (2) above. Output (power) envelope processor 205
Converts this downsampled array to an array of unsampled log representations of the output. The operation of the output (power) envelope processor 205 is illustrated in detail in FIG. First, the case where the information about the plosive is 0 (indicating that there is no plosive) would be considered. Output step evaluator (power step evaluator) 40
1 is the current lo of the downsampled array
The difference is determined by subtracting the output value from the previous log display of the downsampled array from the output value from the g display. The upsampler 402 upsamples the array of outputs in log display according to the upsampling process. Above all, upsampler 40
The upsampling process performed by 2 is selected based on comparing the difference between successive samples (determined by the output step evaluator (power step evaluator) 401) with a threshold value. For example, the threshold is 12 dB in the log display of the audio output and 100 Hz.
It may be useful to choose a sampling rate of The linear interpolation between the updated points is performed by the upsampler 402 if the difference between consecutive samples is less than the threshold. This is the case for most time intervals and is illustrated in FIG. FIG. 7 shows two sample values with a thick line for the array of the down-sampled output in the log display. The sample between these two sample values is obtained by linear interpolation.

【００２６】信号出力において、連続的なサンプル間の
差が閾値を越えるような、より大きな増加は、主として
シャープに発音された音の出だしにおいて生じる。ｌｏ
ｇ表示の出力の直線補間は、そのような音の出だしにと
っては良いモデルであるとはいえない。それ故に、この
ような場合、アップサンプラー４０２はステップ形をし
た輪郭を利用する。とりわけ、連続的なサンプル間の差
が当該閾値を越えるときはいつも、左側のｌｏｇ表示の
出力値（すなわち、前のサンプル）が、当該時間間隔の
中点に至るまで用いられ、右側のｌｏｇ表示の出力値
（すなわち、現在におけるサンプル）は、当該時間間隔
の残りの部分に対して用いられる。この例については、
図９に例示されている。一般には、当該ステップは、元
の信号の出だしと同じ（時間的）瞬間には位置しないと
考えられる。しかしながら、人間の知覚にとって、出力
の輪郭におけるステップの正確な位置は、当該時間間隔
内に平滑な輪郭よりもステップが含まれているというこ
とに比べて、あまり重要ではないのである。A larger increase in the signal output, such that the difference between consecutive samples exceeds the threshold, occurs mainly at the beginning of a sharply pronounced note. lo
The linear interpolation of the g-display output is not a good model for producing such a sound. Therefore, in such a case, the upsampler 402 utilizes a stepped contour. In particular, whenever the difference between consecutive samples exceeds the threshold, the output value of the left log representation (ie, the previous sample) is used until the midpoint of the time interval and the right log representation is reached. The output value of (i.e., the current sample) is used for the rest of the time interval. For this example,
This is illustrated in FIG. Generally, it is considered that the step is not located at the same (temporal) moment as the start of the original signal. However, the exact position of the steps in the output contour is less important to human perception than the steps are contained in the time interval rather than the smooth contour.

【００２７】ステップ形をした出力の輪郭の知覚的な効
果は、再構成された音声信号を、目立ってより鋭く（歯
切れを良く）するということがある。しかしながら、ス
テップ形をした出力の輪郭を無分別に用いるということ
は、出力信号の質の重大な低下という結果へつながるの
である。信号出力が、急速に変化している場合につい
て、ステップ毎の輪郭を利用することを限定すること
は、結果として、直線補間された輪郭について絶えず一
貫して利用する場合に比べて、改良された音声の質を得
られることになる。さらに、信号出力が、急速ではある
が滑らかに変化している場合について、ステップ毎の輪
郭を利用しても、再構成された音声に重大な影響をもた
らすものではないのである。A perceptual effect of the stepped output contour may be to make the reconstructed audio signal noticeably sharper (more crisp). However, the indiscriminate use of stepped output contours results in a significant loss of output signal quality. Limiting the use of step-by-step contours for cases where the signal output is changing rapidly has resulted in an improvement over using consistently consistent use for linearly interpolated contours. You will get the quality of the voice. Moreover, the use of step-by-step contours does not have a significant effect on the reconstructed speech in the case where the signal output is changing rapidly but smoothly.

【００２８】次に、破裂音についての情報が１となる
（破裂音が存在しているということを示している。）場
合が考えられよう。また、これは図４に関連して記述さ
れている。破裂音が存在しているときには、破裂音加算
器（破裂音アダー）４０３が、当該破裂音が存在してい
ると知られている時間間隔内での、サンプリングされて
いないｌｏｇ表示の出力の配列の一つあるいはそれ以上
の特定のサンプルに対して、固定値を加算する。例え
ば、信号出力のｌｏｇ表示については、１．２という固
定値が用いられることが有効であるかもしれないし、こ
の値は、５ｍｓという時間間隔でのｌｏｇ表示の出力の
信号に加えられることが有効であるかもしれない。図８
は、本来は直線補間がなされる輪郭の場合について、破
裂音が加えられていることを例示している。図９は、ス
テップ毎の輪郭の場合について、破裂音が加えられてい
ることを例示している。後者の場合について、破裂音は
ステップの後に加えられることが有効である。そうでな
いと、聞き取ることができないであろう。Next, a case may be considered in which the information about the plosive sound becomes 1 (indicating that the plosive sound exists). This is also described in connection with FIG. When a plosive is present, a plosive adder (plosive adder) 403 arranges the output of the unsampled log representation within the time interval in which the plosive is known to be present. A fixed value is added to one or more specific samples of. For example, for a log representation of the signal output, it may be useful to use a fixed value of 1.2, and this value may be added to the signal of the log representation output at a time interval of 5 ms. May be. Figure 8
Exemplifies that a plosive sound is added in the case of a contour that is originally linearly interpolated. FIG. 9 exemplifies that a plosive sound is added in the case of a contour for each step. For the latter case, it is useful that the plosive be added after the step. Otherwise you will not be able to hear.

【００２９】上で記述された、本発明の例示的実施例
は、２つの、関連するが区別できる、分類化処理を含ん
でいる。例えば、図４において示されているように、出
力ステップ評価器（パワーステップエバリュエター）４
０１は、２つの連続的なサンプル間のｌｏｇ表示の出力
の輪郭が、直線的に補間され得るのかどうか、あるい
は、ステップ形をした輪郭が提示されうるのかというこ
とを決定するのである。加えて、破裂音加算器（破裂音
アダー）４０３が、破裂音が２つの連続的サンプル間の
ｌｏｇ表示の出力の輪郭に加えられ得るのかどうかにつ
き、決定するのである。本発明のその他の例示的実施例
においては、これらの処理のいずれか一つは、その他の
処理とは独立に実行されうるのである。The exemplary embodiment of the invention described above includes two related but distinguishable classification processes. For example, as shown in FIG. 4, an output step evaluator (power step evaluator) 4
01 determines whether the output contour of the log representation between two consecutive samples can be linearly interpolated or a stepped contour can be presented. In addition, a plosive adder (plosive adder) 403 determines whether a plosive can be added to the output contour of the log representation between two consecutive samples. In other exemplary embodiments of the invention, any one of these processes may be performed independently of the other processes.

【００３０】説明を明確化するために、本発明の例示的
実施例は、個々の機能的なブロックあるいはプロセッサ
ーを含むものとして提示されている。これらのブロック
が提示する機能は、限定はされないが、ソフトウエアを
実行できるハードウエアを含む、共有化あるいは専用化
されたハードウエアのいずれかの利用を通じて提供され
うるのである。例えば、図１から図４までで示されたプ
ロセッサーの機能は単一の共有化されたプロセッサーに
よって提供されうる。（いわゆる、プロセッサーという
語の利用は、もっぱらソフトウエアを実行可能なハード
ウエアのみを呼称するものと解釈されるべきではな
い。）For clarity of explanation, the illustrative embodiments of the present invention are presented as including individual functional blocks or processors. The functionality presented by these blocks may be provided through the use of either shared or specialized hardware, including, but not limited to, hardware capable of executing software. For example, the functionality of the processors shown in FIGS. 1-4 may be provided by a single shared processor. (The use of the term processor is not to be construed as referring exclusively to hardware capable of executing software.)

【００３１】例示的な実施例は、ＡＴ＆ＴのＤＳＰ１６
あるいはＤＳＰ３２Ｃといったような、デジタル信号プ
ロセッサー（ＤＳＰ）、これまで論じられた操作を実行
するソフトウエアを保存するための読み出し専用メモリ
ー（ＲＯＭ）、ＤＳＰによる結果を保存するためのラン
ダムアクセスメモリー（ＲＡＭ）を含みうる。超大規模
集積回路（ＶＬＳＩ）ハードウエアの実施例もまた、一
般的な用途用のＤＳＰ回路と結びつけてカスタム化され
たＶＬＳＩ回路と同様、提供されうる。An exemplary embodiment is the AT & T DSP16.
Alternatively, a digital signal processor (DSP), such as the DSP32C, read only memory (ROM) for storing software to perform the operations discussed above, random access memory (RAM) for storing results by the DSP. Can be included. Very large scale integrated circuit (VLSI) hardware embodiments may also be provided, as may customized VLSI circuits in combination with DSP circuits for general use.

【００３２】本発明の特定の実施例のいくつかはここで
も示され、また記述されてきたが、これらの実施例は単
に、本発明の原理の応用において案出可能な、考えられ
る多くの特定の配置を例示しているものに過ぎないとい
うことは理解されるべきである。これらの原理に従っ
て、非常に多くの、そして変化に富んだ他の配置が、本
発明の真意及び範囲から離れることなく、当業者によっ
て案出されうるのである。While some of the specific embodiments of the present invention have been shown and described herein, these embodiments are merely the numerous possible specific embodiments that may be devised in application of the principles of the invention. It should be understood that this is merely an example of the arrangement of In accordance with these principles, numerous and varied other arrangements can be devised by those skilled in the art without departing from the spirit and scope of the invention.

【発明の効果】本発明によって、音声の符号化における
パラメータについて、知覚的に重要な特徴を伝送するこ
とが可能となり、パラメータの輪郭についてのより詳細
な記述が得られることになった。その結果、信号出力
が、比較的低いビット伝送速度（ビットレート）で伝送
される場合においても、符号化された上で、再構成され
る信号の質が向上することとなった。According to the present invention, perceptually important characteristics can be transmitted for parameters in speech coding, and a more detailed description of parameter contours can be obtained. As a result, even when the signal output is transmitted at a relatively low bit transmission rate (bit rate), the quality of the signal to be reconstructed after being encoded is improved.

[Brief description of drawings]

【図１】信号出力を明示的なパラメータとして有し、本
発明の例示的な実施例に従い符号化を行う、例示的な符
号化システムの伝送部分の全体図を示している。FIG. 1 shows a general view of the transmission portion of an exemplary coding system with signal output as an explicit parameter and encoding according to an exemplary embodiment of the present invention.

【図２】信号出力を明示的なパラメータとして有し、本
発明の例示的な実施例に従い符号化を行う、例示的な符
号化システムの受信部分の全体図を示している。FIG. 2 shows a general view of a receiving portion of an exemplary coding system having a signal output as an explicit parameter and encoding according to an exemplary embodiment of the present invention.

【図３】図１の例示的な伝送器において利用されるため
の例示的な破裂音検出器を示している。3 illustrates an exemplary plosive detector for use in the exemplary transmitter of FIG.

【図４】図２の例示的な受信器において利用されるため
の例示的な出力（パワー）エンベローププロセッサーを
示している。FIG. 4 illustrates an exemplary output (power) envelope processor for use in the exemplary receiver of FIG.

【図５】破裂音が存在しない場合において操作する、図
３の例示的な破裂音検出器のいわゆるハットハンギング
機構を示している。5 shows a so-called hat-hanging mechanism of the exemplary plosive detector of FIG. 3 operating in the absence of plosives.

【図６】破裂音が存在する場合において操作する、図３
の例示的な破裂音検出器のいわゆるハットハンギング機
構を示している。6 is operated in the presence of a plosive, FIG.
2 shows a so-called hat-hanging mechanism of the exemplary plosive detector of FIG.

【図７】本発明の例示的な実施例に従い、直線補間によ
って得られるｌｏｇ表示の信号出力の輪郭を示してい
る。FIG. 7 shows a contour of a log-displayed signal output obtained by linear interpolation according to an exemplary embodiment of the present invention.

【図８】本発明の例示的な実施例に従い、直線補間によ
って得られるｌｏｇ表示の信号出力の輪郭及び付加され
た破裂音を示している。FIG. 8 shows the log output signal output contours and added plosives obtained by linear interpolation according to an exemplary embodiment of the present invention.

【図９】本発明の例示的な実施例に従い、ステップ形の
補間によって得られるｌｏｇ表示の信号出力の輪郭を示
している。FIG. 9 shows the contour of the signal output in log representation obtained by stepwise interpolation, according to an exemplary embodiment of the invention.

【図１０】本発明の例示的な実施例に従い、ステップ形
の補間によって得られるｌｏｇ表示の信号出力の輪郭及
び付加された破裂音を示している。FIG. 10 illustrates a log output signal output contour and added plosives obtained by stepwise interpolation in accordance with an exemplary embodiment of the present invention.

[Explanation of symbols]

１０１符号化ユニット（エンコーデイングユニット）１０２出力抽出器１０３正規化器（ノーマライザー）１０４符号化ユニット（エンコーデイングユニット）１０５破裂音検出器１０６低域フィルター１０７ダウンサンプラー１０８出力符号器（出力エンコーダー）２０１復号器（デコーダー）ユニット２０２出力抽出器２０３復号器（デコーダ）ユニット２０４出力復号器（パワーデコーダー）２０５出力（パワー）エンベローププロセッサー２０６スケールファクター（換算係数）プロセッサー２０７増幅器３０１ハットハンガー３０２ハットキーパー３０３ハットクリアランス検出器３０４ピーククリアランス検出器３０５論理的にＡＮＤの機能を持つ演算器３０６論理的にはＯＲの機能を持ったダウンサンプラ
ー４０１出力ステップ評価器（パワーステップエバリュ
エター）４０２アップサンプラー４０３破裂音加算器（破裂音アダー）101 Encoding Unit (Encoding Unit) 102 Output Extractor 103 Normalizer (Normalizer) 104 Encoding Unit (Encoding Unit) 105 Buzzer Detector 106 Low-pass Filter 107 Downsampler 108 Output Encoder (Output Encoder) 201 Decoder (Decoder) Unit 202 Output Extractor 203 Decoder (Decoder) Unit 204 Output Decoder (Power Decoder) 205 Output (Power) Envelope Processor 206 Scale Factor (Conversion Factor) Processor 207 Amplifier 301 Hat Hanger 302 Hat Keeper 303 Hat clearance detector 304 Peak clearance detector 305 Arithmetic unit 306 having logical AND function Down sampler having logical OR function 401 Output step evaluator (Power Step evaluator Eta) 402 upsampler 403 plosive adder (pop adder)

───────────────────────────────────────────────────── フロントページの続き (72)発明者ウィレムバスティアンクレインアメリカ合衆国，07920 ニュージャージー，バスキングリッジ、ビレッジドライブ 87 (56)参考文献特開昭60−195600（ＪＰ，Ａ) 特開昭51−149706（ＪＰ，Ａ) 特開昭56−80099（ＪＰ，Ａ) 特開昭59−102297（ＪＰ，Ａ) 特開平１−219895（ＪＰ，Ａ) 特開昭51−149706（ＪＰ，Ａ) 特公昭62−39758（ＪＰ，Ｂ２) ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor Willem Bastian Crane United States, 07920 New Jersey Gee, Basking Ridge, Villaged Live 87 (56) References JP-A-60-195600 (JP, A) JP-A-51-149706 (JP, A) JP-A-56-80099 (JP, A) JP-A-59-102297 (JP, A) JP-A 1-219895 (JP, A) JP-A-51-149706 (JP, A) Japanese Patent Publication Sho 62-39758 (JP, B2)

Claims

(57) [Claims]

1. A method for decoding an encoded signal, wherein the encoded signal comprises an array of consecutive encoded parameter value signals indicative of a value of a predetermined parameter, Classifying the predetermined parameter into one of a plurality of categories based on a signal of the values of two consecutive coded parameters of the array; and based on the classified categories, Generating one or more intermediate parameter value signals indicating the value of the predetermined parameter one or more times between two consecutive encoded parameter value signals And the step of performing, the category includes a linear interpolation category and a step function category, and the predetermined parameter is divided into the linear interpolation category. Generating a signal of the intermediate parameter value, when digitized, numerically among the predetermined parameter values indicated by the two consecutive encoded parameter value signals. , Generating a signal of an intermediate parameter value exhibiting a value less than the greater one and greater than the lesser, wherein the predetermined parameter is classified into the category of the step function, The step of generating a signal of the intermediate parameter value has a value numerically equal to one of the values of the predetermined parameter indicated by the two consecutive encoded parameter value signals. A method of decoding an encoded signal, the method comprising: generating a signal of an intermediate parameter value.

2. The method of claim 1, wherein the encoded signal comprises an encoded speech signal.

3. Method according to claim 2, characterized in that the predetermined parameter represents the output of the audio signal.

4. The method of claim 3, wherein the predetermined parameter represents the output of a characteristic waveform.

5. The method of claim 1, wherein the step of classifying the predetermined parameter is the numerical difference between the values indicated by the signals of the values of the two consecutive encoded parameters. Classifying the predetermined parameter based on a method of decoding an encoded signal.

6. The method of claim 1, wherein the step of generating a signal of the value of the intermediate parameter when the predetermined parameter is classified into the category of the step function comprises a first intermediate value. Generating at least two intermediate parameter value signals, including a common parameter value signal and a second intermediate parameter value signal, the first intermediate parameter value And the signal of the value of the second intermediate parameter represent different numerical values for the predetermined parameter.

7. The method of claim 6, wherein the encoded signal comprises an encoded speech signal and the predetermined parameter represents an output of a characteristic waveform. A method of decoding a digitized signal.

8. The method of claim 1, wherein the encoded signal further comprises one or more of said one or more times between said two consecutive encoded parameter value signals. Including a coded parameter feature signal representing a value of the predetermined parameter, wherein the step of classifying classifies the predetermined parameter based on the coded parameter feature signal; A method of decoding an encoded signal, comprising:

9. The method of claim 8, wherein the encoded signal comprises an encoded speech signal.

10. A method according to claim 9, wherein the predetermined parameter represents the output of the audio signal.

11. The method of claim 10, wherein the plurality of categories represent the presence of audio signal output plosives and the absence of audio signal output plosives. A method of decoding an encoded signal, characterized in that it comprises a category.

12. A method of encoding a signal, the method comprising the steps of: generating an array of consecutive encoded parameter value signals indicative of a value of a predetermined parameter, The predetermined parameter is divided into a plurality of categories based on one or more times of the predetermined parameter values between two consecutive encoded parameter value signals. , Further, generating a signal characteristic of the encoded parameter based on the classified category, the plurality of categories, the plosive sound of the output of the audio signal A method of encoding a signal, characterized in that it comprises a category representing the presence and a category representing the absence of plosives in the output of the audio signal.

13. The method of claim 12, wherein the signal comprises an audio signal.

14. Method according to claim 13, characterized in that the predetermined parameter represents the output of the audio signal.

15. A decoder for decoding an encoded signal, wherein the encoded signal comprises an array of consecutive encoded parameter value signals indicative of predetermined parameter values. And the decoder classifies the predetermined parameter into one of a plurality of categories based on the following means: a signal of the values of two consecutive coded parameters of the array. Means for indicating the value of the predetermined parameter one or more times between the signals of the values of the two consecutive coded parameters based on the classified categories, Means for generating a signal of one or more intermediate parameter values, the categories including linear interpolation categories and step function categories, Means for generating said intermediate parameter value signal when said constant parameter is classified into said linear interpolation category, by said two consecutive encoded parameter value signals. Among the values of the predetermined parameter shown, including means for generating a signal of an intermediate parameter value that is numerically smaller than the larger one and greater than the smaller one, the predetermined one Means for generating the intermediate parameter value signal when the parameters of the are classified into the step function category are indicated by the two consecutive encoded parameter value signals. Means for generating an intermediate parameter value signal exhibiting a value that is numerically equal to one of the predetermined parameter values That, the decoder for decoding the encoded signal.

16. Decoder for decoding a coded signal according to claim 15, characterized in that the coded signal comprises a coded speech signal.

17. Decoder for decoding a coded signal according to claim 16, characterized in that said predetermined parameter represents the output of the audio signal.

18. Decoder for decoding a coded signal according to claim 17, characterized in that said predetermined parameter represents the output of a characteristic waveform.

19. The decoder of claim 15, wherein the means for classifying the predetermined parameter is numerically between the values indicated by the two encoded parameter value signals in succession. Decoder for decoding an encoded signal, characterized in that it comprises means for classifying said predetermined parameter based on a difference.

20. The decoder of claim 15, wherein the means for generating a signal of the value of the intermediate parameter when the predetermined parameter is classified into the category of the step function comprises: Means for generating at least two intermediate parameter value signals, including one intermediate parameter value signal and a second intermediate parameter value signal, said first intermediate A coded signal, wherein the typical parameter value signal and the second intermediate parameter value signal represent different numerical values for the predetermined parameter. Decoder for doing.

21. The decoder of claim 20, wherein the coded signal comprises a coded voice signal and the predetermined parameter represents an output of a characteristic waveform. A decoder for decoding the encoded signal.

22. The decoder of claim 15, wherein the encoded signal further comprises one or more times between the two consecutive encoded parameter value signals. Means for classifying the predetermined parameter, the coded parameter characteristic signal being representative of a value of the predetermined parameter, the means for classifying the predetermined parameter being based on the coded parameter characteristic signal; Decoder for decoding an encoded signal, characterized in that it comprises means for classifying the parameters.

23. Decoder for decoding a coded signal according to claim 22, characterized in that the coded signal comprises a coded speech signal.

24. Decoder for decoding a coded signal according to claim 23, characterized in that said predetermined parameter represents the output of a speech signal.

25. The decoder according to claim 24, wherein the plurality of categories represent a category indicating that there is a plosive sound at the output of an audio signal and a category indicating that there is no plosive sound at the output of an audio signal. Decoder for decoding an encoded signal, characterized in that it comprises a certain category.

26. An encoder for encoding a speech signal, the encoder means for: producing an array of signals of consecutive encoded parameter values indicative of the value of a given parameter. Means, between the two consecutive encoded parameter value signals in the array, one or more times, based on one or more of the predetermined parameter values; Means for classifying the parameter of one of a plurality of categories, further comprising means for generating a signal characteristic of the encoded parameter based on the classified category, Multiple categories, including a category that indicates the presence of an audio signal output plosive and a category that indicates an absence of the audio signal output plosive It characterized the door, encoder for encoding the signal.

27. The encoder of claim 26, wherein the signal comprises an audio signal.

28. The encoder of claim 27, wherein the predetermined parameter represents the output of the audio signal.