JPH11119798A

JPH11119798A - Method of encoding speech and device therefor, and method of decoding speech and device therefor

Info

Publication number: JPH11119798A
Application number: JP9285458A
Authority: JP
Inventors: Kazuyuki Iijima; 和幸飯島; Masayuki Nishiguchi; 正之西口
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1997-10-17
Filing date: 1997-10-17
Publication date: 1999-04-30
Anticipated expiration: 2017-10-17
Also published as: JP4230550B2

Abstract

PROBLEM TO BE SOLVED: To generate background noise with unnatural feeling relaxed by using encoded data by outputting plural kinds of parameters after a predetermined time in a speech encoding process when a judgment result shows background noise section. SOLUTION: An input signal judgment device 21 judges whether an input signal supplied from an input terminal 1 and converted into a digital signal by an A/D converter 10, is voice sound V or unvoiced sound UV or background noise in a predetermined time section. And, an idVUV parameter is outputted which shows the result of V/UV judgment such as '0' presenting unvoiced sound, '1' presenting background noise, '2' presenting 1st voiced sound or '3' presenting 2nd second voiced sound. When a voice encoding device 20 grasps from the judgment result in this input judgment part 21, namely, the idVUV parameter that the input signal is a background noise section, the voice encoding device 20 outputs plural kinds of parameters for UV after a predetermined time.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、有声音又は無声音
区間に分けられる音声信号区間と、背景雑音区間からな
る入力信号を、各区間の判定結果に基づいた可変レート
で符号化する音声符号化方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to voice coding for coding an input signal comprising a voice signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on the determination result of each section. Method and apparatus.

【０００２】また、本発明は、上記音声符号化方法及び
装置によって符号化された符号化データを復号化する音
声復号化方法及び装置に関する。[0002] The present invention also relates to a speech decoding method and apparatus for decoding coded data encoded by the above speech encoding method and apparatus.

【０００３】[0003]

【従来の技術】近年、伝送路を必要とする通信分野にお
いては、伝送帯域の有効利用を実現するために、伝送し
ようとする入力信号の種類、例えば有声音又は無声音区
間に分けられる音声信号区間と、背景雑音区間のような
種類によって、符号化レートを可変してから伝送するこ
とが考えられるようになった。2. Description of the Related Art In recent years, in a communication field requiring a transmission line, in order to realize effective use of a transmission band, a type of an input signal to be transmitted, for example, a voice signal section divided into a voiced sound or an unvoiced sound section. According to such a type as a background noise section, it has been considered that the transmission is performed after changing the coding rate.

【０００４】例えば、背景雑音区間と判断されると、符
号化パラメータを全く送らずに、復号化装置側では、特
に背景雑音を生成することをせずに、単にミュートする
ことが考えられた。[0004] For example, if it is determined that the background noise section is present, it has been considered that the decoding apparatus does not transmit any coding parameter and simply mutes the image without generating background noise.

【０００５】しかし、これでは通信相手が音声を発して
いればその音声には背景雑音が乗っているが、音声を発
しないときには突然無音になってしまうことになるので
不自然な通話となってしまう。[0005] However, in this case, if the communication partner is making a voice, the voice has background noise, but if the voice is not made, the voice suddenly becomes silent. I will.

【０００６】そのため、可変レートコーデックにおいて
は、背景雑音区間として判断されると符号化のパラメー
タのいくつかを送らずに、復号化装置側では過去のパラ
メータを繰り返し用いて背景雑音を生成するということ
を行っていた。For this reason, in the variable rate codec, if it is determined that the background noise section is present, some of the coding parameters are not sent, and the decoding apparatus generates background noise by repeatedly using past parameters. Had gone.

【０００７】[0007]

【発明が解決しようとする課題】ところで、上述したよ
うに、過去のパラメータをそのまま繰り返し用いると、
雑音自体がピッチを持つような印象を受け、不自然な雑
音になることが多い。これは、レベルなどを変えても、
線スペクトル対（ＬＳＰ）パラメータが同じである限り
起こってしまう。As described above, when the past parameters are repeatedly used as they are,
The noise itself has the impression of having a pitch, and often results in unnatural noise. This means that even if you change the level,
This happens as long as the line spectrum pair (LSP) parameters are the same.

【０００８】他のパラメータを乱数等で変えるようにし
ても、ＬＳＰパラメータが同一であると、不自然な感じ
を与えてしまう。Even if other parameters are changed by random numbers or the like, if the LSP parameters are the same, an unnatural feeling is given.

【０００９】本発明は、上記実情に鑑みてなされたもの
であり、可変レート符号化を効率良く実現する音声符号
化方法及び装置の提供を目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and has as its object to provide a speech encoding method and apparatus for efficiently implementing variable rate encoding.

【００１０】また、本発明は、上記実情に鑑みてなされ
たものであり、可変レート符号化を実現する音声符号化
方法及び装置により符号化された符号化データを用い
て、背景雑音を不自然感を緩和して生成できる音声復号
化方法及び装置の提供を目的とする。Further, the present invention has been made in view of the above-mentioned circumstances, and uses a coded data coded by a voice coding method and apparatus for realizing variable rate coding to reduce background noise. It is an object of the present invention to provide a speech decoding method and apparatus capable of generating a sound with reduced feeling.

【００１１】[0011]

【課題を解決するための手段】本発明に係る音声符号化
方法は、有声音又は無声音区間に分けられる音声信号区
間と、背景雑音区間からなる入力信号を、各区間の判定
結果に基づいた可変レートで符号化する音声符号化方法
であって、上記課題を解決するために、上記入力信号を
時間軸上で所定の符号化単位で区分して各符号化単位で
符号化を行って複数種類の音声符号化パラメータを出力
する音声符号化工程を備え、上記音声符号化工程は、上
記判定結果が上記背景雑音区間を示すときには、所定時
間をおいて上記複数種類のパラメータを出力する。SUMMARY OF THE INVENTION A speech encoding method according to the present invention is characterized in that an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section, and an input signal comprising a background noise section is variable based on the judgment result of each section. A speech encoding method for encoding at a rate, wherein in order to solve the above-described problem, the input signal is divided into predetermined encoding units on a time axis and encoded in each encoding unit to perform a plurality of types. And outputting the plurality of types of parameters after a predetermined time when the determination result indicates the background noise section.

【００１２】ここで、上記音声符号化工程は、上記各区
間の判定結果を基本パラメータとして上記所定時間中に
も常に出力する。Here, in the voice encoding step, the determination result of each section is always output as a basic parameter even during the predetermined time.

【００１３】また、上記音声符号化工程は、入力信号の
短期予測残差を求める短期予測残差算出工程と、求めら
れた短期予測残差をサイン波分析符号化するサイン波分
析符号化工程又は上記入力信号を波形符号化により符号
化する波形符号化工程とを備えてなる。Further, the speech encoding step includes a short-term prediction residual calculating step of obtaining a short-term prediction residual of the input signal, a sine wave analysis encoding step of performing a sine wave analysis encoding of the obtained short-term prediction residual, or And a waveform encoding step of encoding the input signal by waveform encoding.

【００１４】また、上記音声信号区間が有声音であると
きには上記サイン波分析符号化工程により、無声音であ
るときには上記波形符号化工程により上記入力信号を符
号化する。When the voice signal section is voiced, the input signal is coded by the sine wave analysis coding step, and when the voice signal section is unvoiced, the input signal is coded by the waveform coding step.

【００１５】また、上記音声符号化工程は、上記背景雑
音区間中、又は１フレーム前が背景雑音区間であるとき
には上記短期予測残差算出工程にて差分量子化を行わな
い。Further, in the speech encoding step, when the background noise section is within the background noise section or one frame before is the background noise section, difference quantization is not performed in the short-term prediction residual calculation step.

【００１６】本発明に係る音声符号化装置は、有声音又
は無声音区間に分けられる音声信号区間と、背景雑音区
間からなる入力信号を、各区間の判定結果に基づいた可
変レートで符号化する音声符号化装置であって、上記課
題を解決するために、上記入力信号を時間軸上で所定の
符号化単位で区分して各符号化単位で符号化を行って複
数種類の音声符号化パラメータを出力する音声符号化手
段を備え、上記音声符号化手段は、上記判定結果が上記
背景雑音区間を示すときには、所定時間をおいて上記複
数種類のパラメータを出力する。A speech encoding apparatus according to the present invention is a speech encoding apparatus for encoding an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on the determination result of each section. In the encoding device, in order to solve the above-described problem, the input signal is divided into predetermined coding units on the time axis, and the coding is performed in each coding unit to perform a plurality of types of speech coding parameters. And a speech encoding unit that outputs the plurality of types of parameters after a predetermined time when the determination result indicates the background noise section.

【００１７】本発明に係る音声復号化方法は、有声音又
は無声音区間に分けられる音声信号区間と、背景雑音区
間からなる入力信号を、各区間の判定結果に基づいた可
変レートにより符号化して伝送されてきた音声符号化デ
ータを復号化する音声復号化方法であって、上記課題を
解決するために、上記背景雑音区間中には、所定時間を
おいて伝送されてきた複数種類のパラメータを用いて上
記背景雑音を生成すると共に、上記所定時間中には過去
に送られたパラメータを用いて上記背景雑音を生成す
る。In the speech decoding method according to the present invention, an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section and an input signal comprising a background noise section is encoded and transmitted at a variable rate based on a result of determination of each section. A speech decoding method for decoding speech encoded data that has been performed. In order to solve the above-described problem, during the background noise section, a plurality of types of parameters transmitted at predetermined time intervals are used. The background noise is generated using the parameters transmitted in the past during the predetermined time.

【００１８】このため、過去の例えば線形予測符号化パ
ラメータをそのまま繰り返し使用することがなく、さら
に過去に送られた線形予測符号化パラメータと補間しな
がら使うので、背景雑音の不自然感を緩和できる。For this reason, the past, for example, the linear prediction coding parameter is not used repeatedly as it is, but is used while interpolating with the linear prediction coding parameter sent in the past, so that the unnatural feeling of the background noise can be reduced. .

【００１９】ここで、上記音声符号化データは、入力音
声信号の短期予測残差を求める短期予測残差算出工程
と、求められた短期予測残差をサイン波分析符号化する
サイン波分析符号化工程又は上記入力音声信号を波形符
号化工程により符号化する波形符号化工程とを備えてな
る音声符号化工程により生成されている。The speech coded data includes a short-term prediction residual calculation step for obtaining a short-term prediction residual of the input speech signal, and a sine-wave analysis coding for performing sine-wave analysis coding on the obtained short-term prediction residual. And a waveform encoding step of encoding the input audio signal by a waveform encoding step.

【００２０】また、上記音声符号化データは、上記音声
信号区間が有声音であるときには上記サイン波分析符号
化工程により、無声音であるときには上記波形符号化工
程により符号化されている。The coded voice data is coded by the sine wave analysis coding step when the voice signal section is voiced, and is coded by the waveform coding step when the voice signal section is unvoiced.

【００２１】また、上記所定時間中に背景雑音を生成す
るのに用いる過去のパラメータは、上記短期予測残差算
出工程が算出した少なくとも短期予測符号化係数である
することのできる。The past parameter used to generate background noise during the predetermined time may be at least a short-term prediction coding coefficient calculated by the short-term prediction residual calculation step.

【００２２】また、上記所定時間をおいて伝送されてき
た複数種類のパラメータ中の上記波形符号化工程からの
符号化出力の前回の値との差に応じて上記背景雑音を生
成する。ここで、上記波形符号化工程からの符号化出力
は、短期予測符号化係数に基づいたゲインインデックス
である。Further, the background noise is generated in accordance with a difference between a previous value of an encoded output from the waveform encoding step and a previous value of the plurality of types of parameters transmitted at the predetermined time. Here, the encoded output from the waveform encoding step is a gain index based on the short-term predicted encoding coefficient.

【００２３】本発明に係る音声復号化装置は、有声音又
は無声音区間に分けられる音声信号区間と、背景雑音区
間からなる入力信号を、各区間の判定結果に基づいた可
変レートにより符号化して伝送されてきた音声符号化デ
ータを復号化する音声復号化装置であって、上記課題を
解決するために、上記背景雑音区間中には、所定時間を
おいて伝送されてきた複数種類のパラメータを用いて上
記背景雑音を生成すると共に、上記所定時間中には過去
に送られたパラメータを用いて上記背景雑音を生成する
音声復号化手段を備える。The speech decoding apparatus according to the present invention encodes an audio signal section divided into a voiced sound section or an unvoiced sound section and an input signal composed of a background noise section at a variable rate based on the determination result of each section and transmits the encoded signal. A speech decoding device for decoding encoded speech encoded data, wherein in order to solve the above-mentioned problem, during the background noise section, a plurality of types of parameters transmitted at predetermined time intervals are used. Speech decoding means for generating the background noise by using the parameters transmitted in the past during the predetermined time.

【００２４】このため、過去の例えば線形予測符号化パ
ラメータをそのまま繰り返し使用することがなく、さら
に過去に送られた線形予測符号化パラメータと補間しな
がら使うので、背景雑音の不自然感を緩和できる。For this reason, the past, for example, the linear prediction coding parameter is not repeatedly used as it is, but is used while interpolating with the linear prediction coding parameter sent in the past, so that the unnatural feeling of the background noise can be reduced. .

【００２５】[0025]

【発明の実施の形態】以下、本発明に係る音声符号化方
法及び装置、並びに音声復号化方法及び装置の実施の形
態について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The embodiments of the speech encoding method and apparatus and speech decoding method and apparatus according to the present invention will be described below.

【００２６】この実施の形態は、本発明に係る音声符号
化方法及び装置、並びに音声復号化方法及び装置を図１
に示すように、音声符号化装置２０、並びに音声復号化
装置３１として用いた携帯電話装置である。In this embodiment, a speech encoding method and apparatus and a speech decoding method and apparatus according to the present invention are shown in FIG.
As shown in FIG. 1, a mobile phone device used as a voice encoding device 20 and a voice decoding device 31.

【００２７】この携帯電話装置において、本発明に係る
音声符号化方法を適用した音声符号化装置２０は、有声
音（Voiced：Ｖ）又は無声音（UnVoiced：ＵＶ）区間に
分けられる音声信号区間と、背景雑音（Back grand noi
ze：ＢＧＮ）区間からなる入力信号を、各区間の判定結
果に基づいた可変レートで符号化する音声符号化装置で
あり、上記入力信号を時間軸上で所定の符号化単位で区
分して各符号化単位で符号化を行って複数種類の音声符
号化パラメータを出力する。In this portable telephone device, the speech encoding apparatus 20 to which the speech encoding method according to the present invention is applied includes a speech signal section divided into a voiced (Voiced: V) or unvoiced (UnVoiced: UV) section; Background noise (Back grand noi
ze: BGN) is a speech encoding apparatus that encodes an input signal composed of sections at a variable rate based on the determination result of each section. Encoding is performed in units of encoding, and a plurality of types of speech encoding parameters are output.

【００２８】この音声符号化装置２０は、上記判定結果
が上記背景雑音区間を示すときには、所定時間をおいて
上記複数種類のパラメータを出力する。なお、上記判定
結果を示すパラメータについては、基本パラメータ或い
はモードビットとして上記所定時間中にも常に出力す
る。When the determination result indicates the background noise section, the speech coding apparatus 20 outputs the plurality of types of parameters after a predetermined time. The parameter indicating the determination result is always output as a basic parameter or a mode bit even during the predetermined time.

【００２９】ここで、入力信号が有声音（Ｖ）又は無声
音（ＵＶ）区間であるか、又は背景雑音（ＢＧＮ）区間
であるかは、入力信号判定部２１が判定する。Here, the input signal determination section 21 determines whether the input signal is a voiced (V) or unvoiced (UV) section or a background noise (BGN) section.

【００３０】すなわち、この入力信号判定装置２１は、
入力端子１から供給され、Ａ／Ｄ変換器１０でディジタ
ル信号に変換された入力信号が、所定時間区間で有声音
（Ｖ）、又は無声音（ＵＶ）であるか、又は背景雑音
（ＢＧＮ）であるかを判定する。そして、上記無声音で
あることを示す“０”、上記背景雑音であることを示す
“１”、第１有声音であることを示す“２”、又は第２
有声音であることを示す“３”というＶ／ＵＶ判定の結
果を示すidVUVパラメータを出力する。このidVUVパラメ
ータは、上述したように基本パラメータ或いはモードビ
ットとされて、上記所定時間中にも常に出力される。That is, the input signal determination device 21
An input signal supplied from the input terminal 1 and converted to a digital signal by the A / D converter 10 is a voiced sound (V), an unvoiced sound (UV), or a background noise (BGN) in a predetermined time interval. It is determined whether there is. Then, "0" indicating the unvoiced sound, "1" indicating the background noise, "2" indicating the first voiced sound, or the second.
An idVUV parameter indicating the result of the V / UV determination “3” indicating voiced sound is output. The idVUV parameter is set as a basic parameter or a mode bit as described above, and is always output even during the predetermined time.

【００３１】この入力信号判定部２１での判定結果、す
なわちidVUVパラメータにより、入力信号が背景雑音区
間であることを音声符号化装置２０が把握すると、この
音声符号化装置２０は、所定時間をおいて、後述するＵ
Ｖ用の複数種類のパラメータを出力する。When the speech coder 20 recognizes that the input signal is in the background noise section based on the result of the determination by the input signal determiner 21, that is, the idVUV parameter, the speech coder 20 waits for a predetermined time. And U
A plurality of types of parameters for V are output.

【００３２】また、この携帯電話装置において、本発明
に係る音声復号化方法を適用した音声復号化装置３１
は、有声音又は無声音区間に分けられる音声信号区間
と、背景雑音区間からなる入力信号を、各区間の判定結
果に基づいた可変レートにより符号化して伝送されてき
た音声符号化データを復号化する音声復号化装置であ
り、上記背景雑音区間中には、所定時間をおいて伝送さ
れてきた複数種類のパラメータを用いて上記背景雑音を
生成すると共に、上記所定時間中には過去に送られたパ
ラメータを用いて上記背景雑音を生成する。In this portable telephone device, a speech decoding apparatus 31 to which the speech decoding method according to the present invention is applied.
Encodes an input signal consisting of a voice signal section divided into a voiced sound section or an unvoiced sound section and an input signal consisting of a background noise section at a variable rate based on the determination result of each section and decodes the transmitted voice coded data. A speech decoding device, wherein during the background noise section, the background noise is generated using a plurality of types of parameters transmitted at a predetermined time, and the background noise is transmitted in the past during the predetermined time. The background noise is generated using parameters.

【００３３】また、この携帯電話装置は、送信時には、
マイクロホン１から入力された音声信号を、Ａ／Ｄ変換
器１０によりディジタル信号に変換し、音声符号化装置
２０により上記idVUVパラメータに基づいた可変レート
の符号化を施し、伝送路符号化器２２により伝送路の品
質が音声品質に影響を受けにくいように符号化した後、
変調器２３で変調し、送信機２４で出力ビットに送信処
理を施し、アンテナ共用器２５を通して、アンテナ２６
から送信する。In addition, this portable telephone device, when transmitting,
The audio signal input from the microphone 1 is converted into a digital signal by the A / D converter 10, subjected to variable rate encoding based on the idVUV parameter by the audio encoding device 20, and transmitted by the transmission line encoder 22. After encoding so that the quality of the transmission path is not affected by the voice quality,
The signal is modulated by the modulator 23, the output bit is subjected to transmission processing by the transmitter 24, and transmitted to the antenna 26 through the antenna duplexer 25.
Send from.

【００３４】また、受信時には、アンテナ２６で捉えた
電波を、アンテナ共用器２５を通じて受信機２７で受信
し、復調器２９で復調し、伝送路復号化器３０で伝送路
誤りを訂正し、音声復号化装置３１で復号し、Ｄ／Ａ変
換器３２でアナログ音声信号に戻して、スピーカ３３か
ら出力する。At the time of reception, the radio wave captured by the antenna 26 is received by the receiver 27 through the antenna duplexer 25, demodulated by the demodulator 29, the transmission path decoder 30 corrects the transmission path error, and The signal is decoded by a decoding device 31, converted back to an analog audio signal by a D / A converter 32, and output from a speaker 33.

【００３５】また、制御部３４は上記各部をコントロー
ルし、シンセサイザ２８は送受信周波数を送信機２４、
及び受信機２７に与えている。また、キーパッド３５及
びＬＣＤ表示器３６はマンマシンインターフェースに利
用される。The control unit 34 controls the above units, and the synthesizer 28 controls the transmission / reception frequency by the transmitter 24,
And the receiver 27. The keypad 35 and the LCD display 36 are used for a man-machine interface.

【００３６】次に、有声音又は無声音区間に分けられる
音声信号区間と、背景雑音区間からなる入力信号を、入
力信号判定部２１の判定結果idVUVパラメータに基づい
て、可変レートで符号化する音声符号化装置２０につい
て説明する。Next, a speech code for encoding an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on the judgment result idVUV parameter of the input signal judgment section 21. The chemical conversion device 20 will be described.

【００３７】先ず、可変レートエンコードを説明してお
く。ＬＳＰ量子化インデクス、及び励起パラメータイン
デクスは、以下の表１に示すように、idVUV判定パラメ
ータが“１”の背景雑音のときには、１フレーム２０ms
ec当たり０ビットにしてしまう。idVUV判定パラメータ
が“０”の無声音、“２，３”の有声音のときには、そ
のまま20mseec当たり１８ビット、及び２０ビットとす
る。これにより可変レートエンコードが実現できる。First, the variable rate encoding will be described. As shown in Table 1 below, the LSP quantization index and the excitation parameter index are 20 ms per frame when the idVUV determination parameter is “1” for background noise.
0 bits per ec. When the idVUV determination parameter is an unvoiced sound of “0” and a voiced sound of “2, 3”, the bits are directly set to 18 bits and 20 bits per 20 mseec. Thereby, variable rate encoding can be realized.

【００３８】[0038]

【表１】 [Table 1]

【００３９】ここで、idVUV判定パラメータの内の、Ｖ
／ＵＶ判定出力となる２ビットは、常に符号化されてい
る。なお、無声音時の励起パラメータインデクスとして
は、後述する雑音符号帳のコードブックのシェイプイン
デクスと、ゲインインデクスが挙げられる。Here, V in the idVUV determination parameter
Two bits serving as the / UV determination output are always encoded. The excitation parameter index at the time of unvoiced sound includes a shape index of a codebook of a random codebook described later and a gain index.

【００４０】この音声符号化装置の構成を図２、図３に
示す。図２の音声符号化装置２０の基本的な考え方は、
入力音声信号の短期予測残差例えばＬＰＣ（線形予測符
号化）残差を求めてサイン波分析（sinusoidal analysi
s ）符号化、例えばハーモニックコーディング（harmon
ic coding ）を行う第１の符号化部１１０と、入力音声
信号に対して位相伝送を行う波形符号化により符号化す
る第２の符号化部１２０とを有し、入力信号の有声音
（Ｖ：Voiced）の部分の符号化には第１の符号化部１１
０を用い、入力信号の無声音（ＵＶ：Unvoiced）の部分
の符号化には第２の符号化部１２０を用いるようにする
ことである。FIGS. 2 and 3 show the configuration of this speech coding apparatus. The basic concept of the speech encoding device 20 in FIG.
Sine wave analysis (sinusoidal analysis) is performed by obtaining a short-term prediction residual of the input speech signal, for example, an LPC (linear predictive coding) residual.
s) coding, for example harmonic coding (harmon coding)
ic coding), and a second encoding unit 120 that encodes the input audio signal by waveform encoding that performs phase transmission on the input audio signal. : Voiced) is encoded by the first encoding unit 11
The second encoding unit 120 is used to encode the unvoiced (UV) portion of the input signal using 0.

【００４１】上記第１の符号化部１１０には、例えばＬ
ＰＣ残差をハーモニック符号化やマルチバンド励起（Ｍ
ＢＥ）符号化のようなサイン波分析符号化を行う構成が
用いられる。上記第２の符号化部１２０には、例えば合
成による分析法を用いて最適ベクトルのクローズトルー
プサーチによるベクトル量子化を用いた符号励起線形予
測（ＣＥＬＰ）符号化の構成が用いられる。The first encoding unit 110 has, for example, L
Harmonic coding and multi-band excitation (M
A configuration for performing sine wave analysis encoding such as BE) encoding is used. The second encoding unit 120 employs, for example, a configuration of code excitation linear prediction (CELP) encoding using vector quantization by closed-loop search for an optimal vector using an analysis method based on synthesis.

【００４２】図２の例では、入力端子１０１に供給され
た音声信号が、第１の符号化部１１０のＬＰＣ逆フィル
タ１１１及びＬＰＣ分析・量子化部１１３に送られてい
る。ＬＰＣ分析・量子化部１１３で得られたＬＰＣ係数
あるいはいわゆるαパラメータは、ＬＰＣ逆フィルタ１
１１に送られて、このＬＰＣ逆フィルタ１１１により入
力音声信号の線形予測残差（ＬＰＣ残差）が取り出され
る。また、ＬＰＣ分析・量子化部１１３からは、後述す
るようにＬＳＰ（線スペクトル対）の量子化出力が取り
出され、これが出力端子１０２に送られる。ＬＰＣ逆フ
ィルタ１１１からのＬＰＣ残差は、サイン波分析符号化
部１１４に送られる。サイン波分析符号化部１１４で
は、ピッチ検出やスペクトルエンベロープ振幅計算が行
われると共に、上記入力信号判定部２１と同一構成の入
力信号判定部１１５により入力信号の上記idVUVパラメ
ータが求められる。サイン波分析符号化部１１４からの
スペクトルエンベロープ振幅データはベクトル量子化部
１１６に送られる。スペクトルエンベロープのベクトル
量子化出力としてのベクトル量子化部１１６からのコー
ドブックインデクスは、スイッチ１１７を介して出力端
子１０３に送られ、サイン波分析符号化部１１４からの
ピッチ出力は、スイッチ１１８を介して出力端子１０４
に送られる。また、入力信号判定部１１５からのidVUV
判定パラメータ出力は出力端子１０５に送られると共
に、スイッチ１１７、１１８及び図３に示すスイッチ１
１９の制御信号に使われる。スイッチ１１７、１１８
は、上記制御信号により有声音（Ｖ）のとき上記インデ
クス及びピッチを選択して各出力端子１０３及び１０４
からそれぞれ出力する。In the example of FIG. 2, the audio signal supplied to the input terminal 101 is sent to the LPC inverse filter 111 and the LPC analysis / quantization unit 113 of the first encoding unit 110. The LPC coefficient or the so-called α parameter obtained by the LPC analysis / quantization unit 113 is
The LPC inverse filter 111 extracts the linear prediction residual (LPC residual) of the input audio signal. Also, a quantized output of an LSP (line spectrum pair) is extracted from the LPC analysis / quantization unit 113 and sent to the output terminal 102 as described later. The LPC residual from LPC inverse filter 111 is sent to sine wave analysis encoding section 114. In the sine wave analysis coding unit 114, pitch detection and spectrum envelope amplitude calculation are performed, and the idVUV parameter of the input signal is obtained by the input signal determination unit 115 having the same configuration as the input signal determination unit 21. The spectrum envelope amplitude data from the sine wave analysis encoding unit 114 is sent to the vector quantization unit 116. The codebook index from the vector quantization unit 116 as the vector quantization output of the spectrum envelope is sent to the output terminal 103 via the switch 117, and the pitch output from the sine wave analysis encoding unit 114 is sent via the switch 118. Output terminal 104
Sent to Also, idVUV from the input signal determination unit 115
The judgment parameter output is sent to the output terminal 105, and the switches 117 and 118 and the switch 1 shown in FIG.
Used for 19 control signals. Switches 117, 118
Selects the index and the pitch when the voiced sound (V) is received by the control signal, and selects the output terminals 103 and 104
Output from each.

【００４３】また、上記ベクトル量子化部１１６でのベ
クトル量子化の際には、例えば、周波数軸上の有効帯域
１ブロック分の振幅データに対して、ブロック内の最後
のデータからブロック内の最初のデータまでの値を補間
するようなダミーデータ，又は最後のデータ及び最初の
データを延長するようなダミーデータを最後と最初に適
当な数だけ付加してデータ個数をＮ_F 個に拡大した後、
帯域制限型のＯ_S 倍（例えば８倍）のオーバーサンプリ
ングを施すことによりＯ_S 倍の個数の振幅データを求
め、このＯ_S 倍の個数（（ｍ_MX＋１）×Ｏ_S 個）の振幅
データを直線補間してさらに多くのＮ_M 個（例えば２０
４８個）に拡張し、このＮ_M 個のデータを間引いて上記
一定個数Ｍ（例えば４４個）のデータに変換した後、ベ
クトル量子化している。In the vector quantization performed by the vector quantization unit 116, for example, the amplitude data of one effective band on the frequency axis is compared with the first data in the block from the last data in the block. dummy data as to interpolate values up to data, or the last data and the first data of the last and first added by an appropriate number of data number of dummy data as to extend the after expanding the N _F ,
Obtain an amplitude data of O _S times the number by performing oversampling O _S times the band-limited (e.g., 8 times), amplitude data of O _S times the number _{((m MX +1) × O} S pieces) Is linearly interpolated into N _M more (eg, 20
48), the N _M pieces of data are thinned out, converted into the above-mentioned fixed number M (for example, 44 pieces), and then vector-quantized.

【００４４】図２の第２の符号化部１２０は、この例で
はＣＥＬＰ（符号励起線形予測）符号化構成を有してお
り、雑音符号帳１２１からの出力を、重み付きの合成フ
ィルタ１２２により合成処理し、得られた重み付き音声
を減算器１２３に送り、入力端子１０１に供給された音
声信号を聴覚重み付けフィルタ１２５を介して得られた
音声との誤差を取り出し、この誤差を距離計算回路１２
４に送って距離計算を行い、誤差が最小となるようなベ
クトルを雑音符号帳１２１でサーチするような、合成に
よる分析（Analysis by Synthesis ）法を用いたクロー
ズドループサーチを用いた時間軸波形のベクトル量子化
を行っている。このＣＥＬＰ符号化は、上述したように
無声音部分の符号化に用いられており、雑音符号帳１２
１からのＵＶデータとしてのコードブックインデクス
は、上記入力信号判定部１１５からのidVUV判定パラメ
ータが無声音（ＵＶ）のときオンとなるスイッチ１２７
を介して、出力端子１０７より取り出される。The second encoding unit 120 in FIG. 2 has a CELP (Code Excitation Linear Prediction) encoding configuration in this example, and outputs the output from the noise codebook 121 using a weighted synthesis filter 122. The synthesized voice signal is sent to the subtractor 123, and the audio signal supplied to the input terminal 101 is extracted from the audio signal obtained through the auditory weighting filter 125. 12
4 to calculate the distance, and search for a vector that minimizes the error in the noise codebook 121 by using a closed-loop search using an analysis by synthesis method. Vector quantization is performed. This CELP coding is used for coding the unvoiced sound portion as described above,
The codebook index as UV data from No. 1 is turned on when the idVUV determination parameter from the input signal determination unit 115 is unvoiced (UV).
Through the output terminal 107.

【００４５】また、スイッチ１２７の制御信号となるid
VUV判定パラメータが“１”となり入力信号が背景雑音
信号であると判定したときには、所定時間、例えば８フ
レーム分の時間をおいて無声音時の複数のパラメータ、
例えば雑音符号帳１２１からのＵＶデータとしてのシェ
イプインデクスやゲインインデクスを送る。Further, id which is a control signal for the switch 127
When the VUV determination parameter is “1” and the input signal is determined to be a background noise signal, a plurality of parameters at the time of unvoiced sound after a predetermined time, for example, 8 frames,
For example, a shape index or a gain index as UV data from the noise codebook 121 is sent.

【００４６】次に、上記図２に示した音声信号符号化装
置のより具体的な構成について、図３を参照しながら説
明する。なお、図３において、上記図２の各部と対応す
る部分には同じ指示符号を付している。Next, a more specific configuration of the audio signal encoding apparatus shown in FIG. 2 will be described with reference to FIG. In FIG. 3, parts corresponding to the respective parts in FIG. 2 are denoted by the same reference numerals.

【００４７】この図３に示された音声信号符号化装置に
おいて、入力端子１０１に供給された音声信号は、ハイ
パスフィルタ（ＨＰＦ）１０９にて不要な帯域の信号を
除去するフィルタ処理が施された後、ＬＰＣ（線形予測
符号化）分析・量子化部１１３のＬＰＣ分析回路１３２
と、ＬＰＣ逆フィルタ回路１１１とに送られる。In the audio signal encoding apparatus shown in FIG. 3, the audio signal supplied to input terminal 101 has been subjected to filter processing for removing signals in unnecessary bands by high-pass filter (HPF) 109. After that, the LPC analysis circuit 132 of the LPC (linear prediction coding) analysis / quantization unit 113
To the LPC inverse filter circuit 111.

【００４８】ＬＰＣ分析・量子化部１１３のＬＰＣ分析
回路１３２は、入力信号波形の２５６サンプル程度の長
さを１ブロックとしてハミング窓をかけて、自己相関法
により線形予測係数、いわゆるαパラメータを求める。
データ出力の単位となるフレーミングの間隔は、１６０
サンプル程度とする。サンプリング周波数ｆｓが例えば
８ｋHzのとき、１フレーム間隔は１６０サンプルで２０
ｍsec となる。The LPC analysis circuit 132 of the LPC analysis / quantization unit 113 obtains a linear prediction coefficient, that is, an α parameter by an autocorrelation method by applying a Hamming window with a length of about 256 samples of the input signal waveform as one block. .
The framing interval, which is the unit of data output, is 160
Make it about a sample. When the sampling frequency fs is, for example, 8 kHz, one frame interval is 20 for 160 samples.
msec.

【００４９】ＬＰＣ分析回路１３２からのαパラメータ
は、α→ＬＳＰ変換回路１３３に送られて、線スペクト
ル対（ＬＳＰ）パラメータに変換される。これは、直接
型のフィルタ係数として求まったαパラメータを、例え
ば１０個、すなわち５対のＬＳＰパラメータに変換す
る。変換は例えばニュートン−ラプソン法等を用いて行
う。このＬＳＰパラメータに変換するのは、αパラメー
タよりも補間特性に優れているからである。The α parameter from the LPC analysis circuit 132 is sent to the α → LSP conversion circuit 133 and is converted into a line spectrum pair (LSP) parameter. This converts the α parameter obtained as a direct type filter coefficient into, for example, ten, ie, five pairs of LSP parameters. The conversion is performed using, for example, the Newton-Raphson method. The conversion to the LSP parameter is because it has better interpolation characteristics than the α parameter.

【００５０】α→ＬＳＰ変換回路１３３からのＬＳＰパ
ラメータは、ＬＳＰ量子化器１３４によりマトリクスあ
るいはベクトル量子化される。このとき、このＬＳＰ量
子化器１３４は、リーキングファクタを有する差分ベク
トル量子化（差分ＶＱ）を行っても良い。差分ＶＱの量
子化効率はマトリクス量子化（ＭＱ）に比べると優れて
おり、また、ＭＱのように量子化誤差が一方のフレーム
に偏ることがないため、滑らかで異音の少ない音声が得
られる。しかし、差分ＶＱでは一度エラーが起こると、
その影響がしばらく続くので、差分のリーク量を大目に
とるようにしている。しかし、入力信号判定部１１５
で、背景雑音と判断されたときには、ＬＳＰは送らない
ので、ＬＳＰ量子化器１３４では、差分量子化の一種で
ある上記差分ＶＱを行わない。The LSP parameters from the α → LSP conversion circuit 133 are subjected to matrix or vector quantization by the LSP quantizer 134. At this time, the LSP quantizer 134 may perform difference vector quantization (difference VQ) having a leaking factor. The quantization efficiency of the difference VQ is superior to that of the matrix quantization (MQ), and since the quantization error is not biased to one frame as in the case of MQ, a smooth voice with less noise is obtained. . However, once an error occurs in the differential VQ,
Since the effect continues for a while, the leak amount of the difference is set to be large. However, the input signal determination unit 115
When the background noise is determined, the LSP is not transmitted, and the LSP quantizer 134 does not perform the difference VQ, which is a type of difference quantization.

【００５１】このＬＳＰ量子化器１３４からの量子化出
力、すなわちＬＳＰ量子化のインデクスは、スイッチ１
１９によって切り換えられてから、端子１０２を介して
取り出され、また量子化済みのＬＳＰベクトルは、ＬＳ
Ｐ補間回路１３６に送られる。ここで、スイッチ１１９
は、上記入力信号判定部１１５からのidVUV判定フラグ
により切り換えが制御され、例えば有声音（Ｖ）のとき
にオンとなる。The quantized output from the LSP quantizer 134, that is, the index of LSP quantization, is
19, and then retrieved via terminal 102 and the quantized LSP vector is LS
The signal is sent to the P interpolation circuit 136. Here, the switch 119
The switching of is controlled by the idVUV determination flag from the input signal determination unit 115, and is turned on when, for example, a voiced sound (V).

【００５２】また、ＬＳＰ補間回路１３６は、２０ｍse
c毎に量子化されたＬＳＰのベクトルを補間し、８倍の
レートにする。すなわち、２．５ｍsec 毎にＬＳＰベク
トルが更新されるようにする。これは、残差波形をハー
モニック符号化復号化方法により分析合成すると、その
合成波形のエンベロープは非常になだらかでスムーズな
波形になるため、ＬＰＣ係数が２０ｍsec 毎に急激に変
化すると異音を発生することがあるからである。すなわ
ち、２．５ｍsec 毎にＬＰＣ係数が徐々に変化してゆく
ようにすれば、このような異音の発生を防ぐことができ
る。The LSP interpolation circuit 136 has a
The vector of the LSP quantized for each c is interpolated to make the rate 8 times. That is, the LSP vector is updated every 2.5 msec. This is because when the residual waveform is analyzed and synthesized by the harmonic encoding / decoding method, the envelope of the synthesized waveform becomes a very smooth and smooth waveform, so that an abnormal sound is generated when the LPC coefficient changes abruptly every 20 msec. This is because there are times. That is, if the LPC coefficient is gradually changed every 2.5 msec, the occurrence of such abnormal noise can be prevented.

【００５３】このような補間が行われた２．５ｍsec 毎
のＬＳＰベクトルを用いて入力音声の逆フィルタリング
を実行するために、ＬＳＰ→α変換回路１３７により、
ＬＳＰパラメータを例えば１０次程度の直接型フィルタ
の係数であるαパラメータに変換する。このＬＳＰ→α
変換回路１３７からの出力は、上記ＬＰＣ逆フィルタ回
路１１１に送られ、このＬＰＣ逆フィルタ１１１では、
２．５ｍsec 毎に更新されるαパラメータにより逆フィ
ルタリング処理を行って、滑らかな出力を得るようにし
ている。このＬＰＣ逆フィルタ１１１からの出力は、サ
イン波分析符号化部１１４、具体的には例えばハーモニ
ック符号化回路、の直交変換回路１４５、例えばＤＦＴ
（離散フーリエ変換）回路に送られる。In order to perform inverse filtering of the input voice using the LSP vector every 2.5 msec in which such interpolation has been performed, the LSP → α conversion circuit 137
The LSP parameter is converted into, for example, an α parameter which is a coefficient of a direct-order filter of about the tenth order. This LSP → α
The output from the conversion circuit 137 is sent to the LPC inverse filter circuit 111, where the LPC inverse filter 111
Inverse filtering is performed using the α parameter updated every 2.5 msec to obtain a smooth output. An output from the LPC inverse filter 111 is output to an orthogonal transform circuit 145 of a sine wave analysis encoding unit 114, specifically, for example, a harmonic encoding circuit, for example, a DFT.
(Discrete Fourier Transform) sent to the circuit.

【００５４】ＬＰＣ分析・量子化部１１３のＬＰＣ分析
回路１３２からのαパラメータは、聴覚重み付けフィル
タ算出回路１３９に送られて聴覚重み付けのためのデー
タが求められ、この重み付けデータが後述する聴覚重み
付きのベクトル量子化器１１６と、第２の符号化部１２
０の聴覚重み付けフィルタ１２５及び聴覚重み付きの合
成フィルタ１２２とに送られる。The α parameter from the LPC analysis circuit 132 of the LPC analysis / quantization unit 113 is sent to a perceptual weighting filter calculating circuit 139 to obtain data for perceptual weighting. Vector quantizer 116 and the second encoding unit 12
0 and a synthesis filter 122 with a perceptual weight.

【００５５】ハーモニック符号化回路等のサイン波分析
符号化部１１４では、ＬＰＣ逆フィルタ１１１からの出
力を、ハーモニック符号化の方法で分析する。すなわ
ち、ピッチ検出、各ハーモニクスの振幅Ａｍの算出、有
声音（Ｖ）／無声音（ＵＶ）の判別を行い、ピッチによ
って変化するハーモニクスのエンベロープあるいは振幅
Ａｍの個数を次元変換して一定数にしている。The sine wave analysis encoding unit 114 such as a harmonic encoding circuit analyzes the output from the LPC inverse filter 111 by a harmonic encoding method. That is, pitch detection, calculation of the amplitude Am of each harmonic, determination of voiced sound (V) / unvoiced sound (UV) are performed, and the number of the envelopes or amplitudes Am of the harmonics that change with the pitch is dimensionally converted to a constant number. .

【００５６】図３に示すサイン波分析符号化部１１４の
具体例においては、一般のハーモニック符号化を想定し
ているが、特に、ＭＢＥ（Multiband Excitation: マル
チバンド励起）符号化の場合には、同時刻（同じブロッ
クあるいはフレーム内）の周波数軸領域いわゆるバンド
毎に有声音（Voiced）部分と無声音（Unvoiced）部分と
が存在するという仮定でモデル化することになる。それ
以外のハーモニック符号化では、１ブロックあるいはフ
レーム内の音声が有声音か無声音かの択一的な判定がな
されることになる。なお、以下の説明中のフレーム毎の
Ｖ／ＵＶとは、ＭＢＥ符号化に適用した場合には全バン
ドがＵＶのときを当該フレームのＵＶとしている。ここ
で上記ＭＢＥの分析合成手法については、本件出願人が
先に提案した特願平４−９１４２２号明細書及び図面に
詳細な具体例を開示している。In the specific example of the sine wave analysis encoding unit 114 shown in FIG. 3, general harmonic encoding is assumed. In particular, in the case of MBE (Multiband Excitation) encoding, Modeling is performed on the assumption that a voiced portion and an unvoiced portion exist in the frequency domain at the same time (in the same block or frame), that is, for each band. In other harmonic coding, an alternative determination is made as to whether voice in one block or frame is voiced or unvoiced. In the following description, the term “V / UV for each frame” means that when all bands are UV when applied to MBE coding, the UV of the frame is used. Regarding the MBE analysis / synthesis technique, detailed specific examples are disclosed in the specification and drawings of Japanese Patent Application No. 4-91422 previously proposed by the present applicant.

【００５７】図３のサイン波分析符号化部１１４のオー
プンループピッチサーチ部１４１には、上記入力端子１
０１からの入力音声信号が、またゼロクロスカウンタ１
４２には、上記ＨＰＦ（ハイパスフィルタ）１０９から
の信号がそれぞれ供給されている。サイン波分析符号化
部１１４の直交変換回路１４５には、ＬＰＣ逆フィルタ
１１１からのＬＰＣ残差あるいは線形予測残差が供給さ
れている。オープンループピッチサーチ部１４１では、
入力信号のＬＰＣ残差をとってオープンループによる比
較的ラフなピッチのサーチが行われ、抽出された粗ピッ
チデータは高精度ピッチサーチ１４６に送られて、後述
するようなクローズドループによる高精度のピッチサー
チ（ピッチのファインサーチ）が行われる。また、オー
プンループピッチサーチ部１４１からは、上記粗ピッチ
データと共にＬＰＣ残差の自己相関の最大値をパワーで
正規化した正規化自己相関最大値ｒ(p) が取り出され、
入力信号判定部１１５に送られている。The open-loop pitch search section 141 of the sine wave analysis encoding section 114 shown in FIG.
01 and the zero-cross counter 1
Signals from the HPF (high-pass filter) 109 are supplied to 42 respectively. The LPC residual or the linear prediction residual from the LPC inverse filter 111 is supplied to the orthogonal transform circuit 145 of the sine wave analysis encoding unit 114. In the open loop pitch search section 141,
An LPC residual of the input signal is used to perform a relatively rough pitch search by an open loop, and the extracted coarse pitch data is sent to a high-precision pitch search 146, and a high-precision closed loop as described later is used. A pitch search (fine search of the pitch) is performed. From the open loop pitch search section 141, a normalized autocorrelation maximum value r (p) obtained by normalizing the maximum value of the autocorrelation of the LPC residual with power together with the coarse pitch data is extracted.
The signal is sent to the input signal determination unit 115.

【００５８】直交変換回路１４５では例えばＤＦＴ（離
散フーリエ変換）等の直交変換処理が施されて、時間軸
上のＬＰＣ残差が周波数軸上のスペクトル振幅データに
変換される。この直交変換回路１４５からの出力は、高
精度ピッチサーチ部１４６及びスペクトル振幅あるいは
エンベロープを評価するためのスペクトル評価部１４８
に送られる。The orthogonal transform circuit 145 performs an orthogonal transform process such as DFT (Discrete Fourier Transform) to convert the LPC residual on the time axis into spectrum amplitude data on the frequency axis. An output from the orthogonal transform circuit 145 is output to a high-precision pitch search unit 146 and a spectrum evaluation unit 148 for evaluating a spectrum amplitude or an envelope.
Sent to

【００５９】高精度（ファイン）ピッチサーチ部１４６
には、オープンループピッチサーチ部１４１で抽出され
た比較的ラフな粗ピッチデータと、直交変換部１４５に
より例えばＤＦＴされた周波数軸上のデータとが供給さ
れている。この高精度ピッチサーチ部１４６では、上記
粗ピッチデータ値を中心に、0.２〜0.５きざみで±数サ
ンプルずつ振って、最適な小数点付き（フローティン
グ）のファインピッチデータの値へ追い込む。このとき
のファインサーチの手法として、いわゆる合成による分
析 (Analysis by Synthesis)法を用い、合成されたパワ
ースペクトルが原音のパワースペクトルに最も近くなる
ようにピッチを選んでいる。このようなクローズドルー
プによる高精度のピッチサーチ部１４６からのピッチデ
ータについては、スイッチ１１８を介して出力端子１０
４に送っている。High precision (fine) pitch search section 146
Is supplied with relatively rough coarse pitch data extracted by the open loop pitch search unit 141 and data on the frequency axis, for example, DFT performed by the orthogonal transform unit 145. The high-precision pitch search unit 146 oscillates ± several samples at intervals of 0.2 to 0.5 around the coarse pitch data value to drive the value of the fine pitch data with a decimal point (floating) to an optimum value. At this time, as a method of fine search, a so-called analysis by synthesis method is used, and the pitch is selected so that the synthesized power spectrum is closest to the power spectrum of the original sound. The pitch data from the high-precision pitch search unit 146 by such a closed loop is output via the switch 118 to the output terminal 10.
4

【００６０】スペクトル評価部１４８では、ＬＰＣ残差
の直交変換出力としてのスペクトル振幅及びピッチに基
づいて各ハーモニクスの大きさ及びその集合であるスペ
クトルエンベロープが評価され、高精度ピッチサーチ部
１４６及び聴覚重み付きのベクトル量子化器１１６に送
られる。The spectrum evaluation section 148 evaluates the magnitude of each harmonic and the spectrum envelope which is a set of the harmonics based on the spectrum amplitude and the pitch as the orthogonal transform output of the LPC residual, and outputs a high-precision pitch search section 146 and a hearing weight. Is sent to the vector quantizer 116 with

【００６１】入力信号判定部１１５は、オープンループ
ピッチサーチ部１４１からの正規化自己相関最大値ｒ
(p) と、ゼロクロスカウンタ１４２からのゼロクロスカ
ウント値とに基づいて、当該フレームが上記有声音であ
るか、無声音であるか、あるいは背景雑音であるかの判
定を行い、上記idVUV判定パラメータを出力する。この
入力信号判定部１１５からの上記idVUV判定パラメータ
は、出力端子１０５を介して取り出されると共に、上述
したように、スイッチ１１９及び、スイッチ１１７、１
１８及び１２７の切り換え制御信号としても用いられ
る。The input signal determination section 115 receives the normalized autocorrelation maximum value r from the open loop pitch search section 141.
Based on (p) and the zero cross count value from the zero cross counter 142, determine whether the frame is the voiced sound, unvoiced sound, or background noise, and output the idVUV determination parameter. I do. The idVUV determination parameter from the input signal determination unit 115 is extracted via the output terminal 105 and, as described above, the switch 119, the switch 117, and the switch 117.
18 and 127 are also used as switching control signals.

【００６２】ところで、スペクトル評価部１４８の出力
部あるいはベクトル量子化器１１６の入力部には、デー
タ数変換（一種のサンプリングレート変換）部が設けら
れている。このデータ数変換部は、上記ピッチに応じて
周波数軸上での分割帯域数が異なり、データ数が異なる
ことを考慮して、エンベロープの振幅データ｜Ａ_m｜を
一定の個数にするためのものである。すなわち、例えば
有効帯域を３４００ｋHzまでとすると、この有効帯域が
上記ピッチに応じて、８バンド〜６３バンドに分割され
ることになり、これらの各バンド毎に得られる上記振幅
データ｜Ａ_m｜の個数ｍ_MX＋１も８〜６３と変化するこ
とになる。このためデータ数変換部では、この可変個数
ｍ_MX＋１の振幅データを一定個数Ｍ個、例えば４４個、
のデータに変換している。Incidentally, a data number conversion (a kind of sampling rate conversion) section is provided at the output section of the spectrum evaluation section 148 or the input section of the vector quantizer 116. The number-of-data converters are used to make the amplitude data | A _m | of the envelope a constant number in consideration of the fact that the number of divided bands on the frequency axis varies according to the pitch and the number of data varies. It is. That is, for example, if the effective band is up to 3400 kHz, this effective band is divided into 8 bands to 63 bands according to the pitch, and the amplitude data | A _m | of each of these bands is obtained. The number m _MX +1 also changes from 8 to 63. For this reason, in the data number converter, the variable number m _MX +1 of amplitude data is set to a fixed number M, for example, 44,
Is converted to data.

【００６３】このスペクトル評価部１４８の出力部ある
いはベクトル量子化器１１６の入力部に設けられたデー
タ数変換部からの上記一定個数Ｍ個（例えば４４個）の
振幅データあるいはエンベロープデータが、ベクトル量
子化器１１６により、所定個数、例えば４４個のデータ
毎にまとめられてベクトルとされ、重み付きベクトル量
子化が施される。この重みは、聴覚重み付けフィルタ算
出回路１３９からの出力により与えられる。ベクトル量
子化器１１６からの上記エンベロープのインデクスは、
スイッチ１１７を介して出力端子１０３より取り出され
る。なお、上記重み付きベクトル量子化に先だって、所
定個数のデータから成るベクトルについて適当なリーク
係数を用いたフレーム間差分をとっておくようにしても
よい。The fixed number M (for example, 44) of the amplitude data or envelope data from the output unit of the spectrum evaluation unit 148 or the data number conversion unit provided at the input unit of the vector quantizer 116 is a vector quantization unit. The data is grouped into a vector by a predetermined number, for example, 44 pieces of data, and weighted vector quantization is performed. This weight is given by the output from the auditory weighting filter calculation circuit 139. The envelope index from the vector quantizer 116 is:
It is taken out from the output terminal 103 via the switch 117. Prior to the weighted vector quantization, an inter-frame difference using an appropriate leak coefficient may be calculated for a vector composed of a predetermined number of data.

【００６４】次に、第２の符号化部１２０について説明
する。第２の符号化部１２０は、いわゆるＣＥＬＰ（符
号励起線形予測）符号化構成を有しており、特に、入力
音声信号の無声音部分の符号化のために用いられてい
る。この無声音部分用のＣＥＬＰ符号化構成において、
雑音符号帳、いわゆるストキャスティック・コードブッ
ク（stochastic code book）１２１からの代表値出力で
ある無声音のＬＰＣ残差に相当するノイズ出力を、ゲイ
ン回路１２６を介して、聴覚重み付きの合成フィルタ１
２２に送っている。重み付きの合成フィルタ１２２で
は、入力されたノイズをＬＰＣ合成処理し、得られた重
み付き無声音の信号を減算器１２３に送っている。減算
器１２３には、上記入力端子１０１からＨＰＦ（ハイパ
スフィルタ）１０９を介して供給された音声信号を聴覚
重み付けフィルタ１２５で聴覚重み付けした信号が入力
されており、合成フィルタ１２２からの信号との差分あ
るいは誤差を取り出している。なお、聴覚重み付けフィ
ルタ１２５の出力から聴覚重み付き合成フィルタの零入
力応答を事前に差し引いておくものとする。この誤差を
距離計算回路１２４に送って距離計算を行い、誤差が最
小となるような代表値ベクトルを雑音符号帳１２１でサ
ーチする。このような合成による分析（Analysis by Sy
nthesis ）法を用いたクローズドループサーチを用いた
時間軸波形のベクトル量子化を行っている。Next, the second encoding section 120 will be described. The second encoding unit 120 has a so-called CELP (Code Excited Linear Prediction) encoding configuration, and is particularly used for encoding an unvoiced sound portion of an input audio signal. In this unvoiced CELP coding configuration,
A noise output corresponding to an LPC residual of unvoiced sound, which is a representative value output from a noise codebook, that is, a so-called stochastic codebook 121, is passed through a gain circuit 126 to a synthesis filter 1 with auditory weights.
22. The weighted synthesis filter 122 performs an LPC synthesis process on the input noise, and sends the obtained weighted unvoiced sound signal to the subtractor 123. A signal obtained by subjecting the audio signal supplied from the input terminal 101 via the HPF (high-pass filter) 109 to auditory weighting by the auditory weighting filter 125 is input to the subtractor 123, and the difference from the signal from the synthesis filter 122 is input to the subtractor 123. Alternatively, the error is extracted. It is assumed that the zero input response of the synthesis filter with auditory weight is subtracted from the output of the auditory weight filter 125 in advance. This error is sent to the distance calculation circuit 124 to calculate the distance, and a representative value vector that minimizes the error is searched in the noise codebook 121. Analysis by Sy
Vector quantization of the time axis waveform is performed using a closed loop search using the nthesis) method.

【００６５】このＣＥＬＰ符号化構成を用いた第２の符
号化部１２０からのＵＶ（無声音）部分用のデータとし
ては、雑音符号帳１２１からのコードブックのシェイプ
インデクスと、ゲイン回路１２６からのコードブックの
ゲインインデクスとが取り出される。雑音符号帳１２１
からのＵＶデータであるシェイプインデクスは、スイッ
チ１２７ｓを介して出力端子１０７ｓに送られ、ゲイン
回路１２６のＵＶデータであるゲインインデクスは、ス
イッチ１２７ｇを介して出力端子１０７ｇに送られてい
る。The data for the UV (unvoiced sound) portion from the second encoding unit 120 using the CELP encoding configuration includes the shape index of the codebook from the noise codebook 121 and the code from the gain circuit 126. The gain index of the book is extracted. Noise codebook 121
Is sent to the output terminal 107s via the switch 127s, and the gain index which is UV data of the gain circuit 126 is sent to the output terminal 107g via the switch 127g.

【００６６】ここで、これらのスイッチ１２７ｓ、１２
７ｇ及び上記スイッチ１１７、１１８は、上記入力信号
判定部１１５からのidVUV判定パラメータによりオン／
オフ制御され、スイッチ１１７、１１８は、現在伝送し
ようとするフレームの音声信号のidVUV判定パラメータ
が有声音（Ｖ）のときオンとなり、スイッチ１２７ｓ、
１２７ｇは、現在伝送しようとするフレームの音声信号
が無声音（ＵＶ）のときオンとなる。また、idVUV判定
パラメータが背景雑音を表すとき、上記スイッチ１２７
ｓ、１２７ｇは、８フレーム時間毎にオンとされ、上記
シェイプインデクス、ゲインインデクスを出力する。ま
た、上記スイッチ１１９も８フレーム時間毎にオンとさ
れ、上記ＵＶ用のＬＳＰインデクスを出力する。これら
が、上述したＵＶ用の複数種類のパラメータである。Here, these switches 127s, 12s
7g and the switches 117 and 118 are turned on / off by the idVUV determination parameter from the input signal determination unit 115.
The switches 117 and 118 are turned off when the idVUV determination parameter of the audio signal of the frame to be transmitted is voiced (V), and the switches 117 and 118 are turned on.
127g is turned on when the audio signal of the frame to be transmitted at present is unvoiced (UV). When the idVUV determination parameter indicates background noise, the switch 127 is used.
s and 127g are turned on every eight frame times, and output the shape index and the gain index. The switch 119 is also turned on every eight frame time, and outputs the UV LSP index. These are a plurality of types of parameters for UV described above.

【００６７】図４には、上記入力信号判定部（図中、入
力信号判定装置と記す。）１１５の詳細な構成を示す。
すなわち、この入力信号判定部１１５は、入力端子１か
ら実効（root mean square、r.m.s）値演算部２を介し
て入力された入力信号（実効値）の最小レベルを所定時
間区間、例えば２０msecで検出し、この最小レベルを保
持する最小レベル演算部４と、上記実効値演算部２から
の入力信号実効値からリファレンスレベルを演算により
求めるリファレンスレベル演算部５とを備え、最小レベ
ル演算部４からの最小レベルとリファレンスレベル演算
部５からのリファレンスレベルとに基づいて上記入力信
号が所定時間区間で有声音（Voice）、又は無声音（UnV
oice）であるか、又は背景雑音であるかを判定する。そ
して、上記無声音であることを示す“０”、上記背景雑
音であることを示す“１”、第１有声音であることを示
す“２”、又は第２有声音であることを示す“３”とい
うＶ／ＵＶ判定の結果を示すidVUVパラメータを出力す
る。FIG. 4 shows a detailed configuration of the input signal determination section (referred to as an input signal determination device) 115 in the figure.
That is, the input signal determination unit 115 detects the minimum level of the input signal (effective value) input from the input terminal 1 via the effective (root mean square, rms) value calculation unit 2 in a predetermined time interval, for example, 20 msec. A minimum level calculator 4 for holding the minimum level; and a reference level calculator 5 for calculating a reference level from the effective value of the input signal from the effective value calculator 2 by calculation. Based on the minimum level and the reference level from the reference level calculation unit 5, the input signal is voiced (Voice) or unvoiced (UnV) in a predetermined time interval.
oice) or background noise. Then, “0” indicating the unvoiced sound, “1” indicating the background noise, “2” indicating the first voiced sound, or “3” indicating the second voiced sound And outputs an idVUV parameter indicating the result of the V / UV determination.

【００６８】また、この入力信号判定装置２１は、上記
実効値演算部２からの入力信号実効値の所定時間区間分
に対して仮に有声音（Ｖ）／無声音（ＵＶ）とを判定す
るＶ／ＵＶ判定部３を備え、最小レベル演算部４にＶ／
ＵＶ判定結果を供給する。最小レベル演算部４は、この
Ｖ／ＵＶ判定結果に基づいて上記最小レベルを演算す
る。Further, the input signal determination device 21 temporarily determines whether the input signal effective value from the effective value calculation section 2 is voiced (V) / unvoiced (UV) for a predetermined time section. A UV judgment unit 3 is provided.
Supply UV judgment result. The minimum level calculator 4 calculates the minimum level based on the V / UV determination result.

【００６９】また、Ｖ／ＵＶ判定部３からのＶ／ＵＶ判
定結果はパラメータ生成部８にも供給される。このパラ
メータ生成部８は、出力端子１０５から上記idVUVパラ
メータを出力する。The V / UV determination result from the V / UV determination unit 3 is also supplied to the parameter generation unit 8. The parameter generator 8 outputs the idVUV parameter from the output terminal 105.

【００７０】このパラメータ生成部８は、最小レベル演
算部４からの最小レベルとリファレンスレベル演算部５
からのリファレンスレベルとを比較する比較部７からの
比較結果と、上記Ｖ／ＵＶ判定部３からのＶ／ＵＶ判定
結果に基づいて上記idVUVパラメータを出力する。The parameter generation unit 8 includes the minimum level from the minimum level calculation unit 4 and the reference level calculation unit 5.
The idVUV parameter is output based on the comparison result from the comparison unit 7 for comparing the reference level with the reference level and the V / UV determination result from the V / UV determination unit 3.

【００７１】以下、この入力信号判定装置２１の動作に
ついて説明する。例えば、走行している電車の中では、
背景雑音のレベルが高い。そこで、周りの雑音に合わせ
てしきい値（スレショルド）を決定することが望まし
い。The operation of the input signal determination device 21 will be described below. For example, in a running train,
Background noise level is high. Therefore, it is desirable to determine the threshold value according to the surrounding noise.

【００７２】そこで、上記最小レベル演算部４では、適
当な所定時間区間で一番小さいレベルを最小レベルと
し、その最小レベルを上記所定時間毎に更新していく。Therefore, the minimum level calculating section 4 sets the smallest level in an appropriate predetermined time interval as the minimum level, and updates the minimum level every predetermined time.

【００７３】図５は最小レベル演算部４のアルゴリズム
を示すフローチャートである。このフローチャートで
は、最小レベルの更新（トラッキング）を、最小レベル
候補値cdLevのセットとクリア、及び最小レベルgmlのセ
ットとクリアに大きく分けている。FIG. 5 is a flowchart showing the algorithm of the minimum level calculation unit 4. In this flowchart, the updating (tracking) of the minimum level is roughly divided into the setting and clearing of the minimum level candidate value cdLev and the setting and clearing of the minimum level gml.

【００７４】先ず、ステップＳ１で、Ｖ／ＵＶ判定部３
からのＶ／ＵＶ判定結果に基づいて有声音フレームの連
続回数vContが４より多い整数となるかを判断する。す
なわち、有声音Ｖと判断されたフレームが４より大きな
整数回、つまり５フレーム連続したか否かを判断する。
ここで、有声音フレームが５フレーム以上連続している
場合は、音声区間に入っていると判断し、ステップＳ２
に進み、最小レベル候補値cdLevをクリアする。このス
テップＳ２では、候補値が設定され続けた回数gmlSetSt
ateは０である。一方、ステップＳ１で、有声音フレー
ムの連続回数vContが４以下であると判断すると、ステ
ップＳ３に進む。First, in step S1, the V / UV determination unit 3
It is determined whether or not the number of continuous voiced frame frames vCont is an integer greater than 4 based on the V / UV determination result from. That is, it is determined whether or not the frame determined to be voiced sound V is an integer number of times greater than 4, that is, five consecutive frames.
Here, when five or more voiced sound frames are continuous, it is determined that the voice section is in the voice section, and step S2 is performed.
To clear the minimum level candidate value cdLev. In this step S2, the number of times the candidate value has been set gmlSetSt
ate is 0. On the other hand, if it is determined in step S1 that the number of continuous voiced frames vCont is 4 or less, the process proceeds to step S3.

【００７５】ステップＳ３では、現在の入力信号の実効
値演算部２を介した入力レベルlevが最小レベルの最低
値MIN_GMLより小さいか否かを判断する。ここでいう、
最小レベルの最低値MIN_GMLは、最小レベルgmlが０とな
らないように決定されている。ここで、入力レベルlev
が最小レベルの最低値MIN_GMLより小さいと判断する
と、ステップＳ４で上記最小レベルの最低値MIN_GMLを
最小レベルgmlとして設定する。このステップＳ４で
は、候補値が設定され続けた回数gmlSetStateと、最小
レベルが設定された後、候補値が設定されていない回数
gmlResetStateとは０である。一方、ステップＳ３で現
在の入力レベルlevは最小レベルの最低値MIN_GML以上で
あると判断するとステップＳ５に進む。At step S3, it is determined whether or not the input level lev of the current input signal via the effective value calculating section 2 is smaller than the minimum value MIN_GML of the minimum level. Here,
The minimum value MIN_GML of the minimum level is determined so that the minimum level gml does not become zero. Where the input level lev
Is smaller than the minimum value MIN_GML of the minimum level, the minimum value MIN_GML of the minimum level is set as the minimum level gml in step S4. In this step S4, the number of times that the candidate value has been set, gmlSetState, and the number of times that the candidate value has not been set after the minimum level has been set
gmlResetState is 0. On the other hand, if it is determined in step S3 that the current input level lev is equal to or more than the minimum minimum value MIN_GML, the process proceeds to step S5.

【００７６】ステップＳ５では、現在の入力レベルlev
が最小レベルgmlより小さいか否かを判断する。ここで
ＹＥＳとなれば、ステップＳ６に進む。すなわち、この
ステップＳ６は、上記ステップＳ３の判断で現在の入力
レベルlevが最小レベルの最低値MIN_GML以上であり、さ
らに上記ステップＳ５で現在の入力レベルlevが最小レ
ベルgmlより小さいと判断されたときに、その入力レベ
ルlevを最小レベルgmlとして設定する。一方、このステ
ップＳ５で現在の入力レベルlevが最小レベルgml以上で
あると判断すると、ステップＳ７に進む。In step S5, the current input level lev
Is smaller than the minimum level gml. If “YES” here, the process proceeds to a step S6. That is, in step S6, when the current input level lev is equal to or more than the minimum minimum value MIN_GML in the determination in step S3, and when the current input level lev is determined to be smaller than the minimum level gml in step S5. , The input level lev is set as the minimum level gml. On the other hand, if it is determined in step S5 that the current input level lev is equal to or higher than the minimum level gml, the process proceeds to step S7.

【００７７】ステップＳ７では、現在の入力レベルが充
分小さいか、候補値cdLevとの変動が小さいか否かを判
断する。このステップＳ７では、現在の入力レベルが充
分小さいか、候補値cdLevとの変動が小さいかという判
断を、status０であるか否かで判断している。status０
は、入力レベルlevが100.0以下というように充分小さい
か、又は、入力レベルが500.0以下で候補値cdLev＊0.70
より大きく、かつ候補値cdLev＊1.30より小さいという
ように候補値cdLevとの変動が小さいことを表す状態で
ある。ここで、ＹＥＳを選択し、現在の入力レベルが充
分小さいか、又は候補値cdLevとの変動が小さいと判断
すると、候補値cdLevが更新される。一方、ＮＯを選択
するとステップＳ１１に進む。In the step S7, it is determined whether or not the current input level is sufficiently small and whether or not the fluctuation with the candidate value cdLev is small. In this step S7, it is determined whether the current input level is sufficiently small or the variation with the candidate value cdLev is small based on whether or not the status is status0. status0
Is sufficiently small such that the input level lev is 100.0 or less, or the candidate value cdLev * 0.70 when the input level is 500.0 or less.
This is a state indicating that the fluctuation from the candidate value cdLev is small, such as being larger and smaller than the candidate value cdLev * 1.30. Here, if YES is selected and it is determined that the current input level is sufficiently small or the variation from the candidate value cdLev is small, the candidate value cdLev is updated. On the other hand, if NO is selected, the process proceeds to step S11.

【００７８】ステップＳ８では、候補値cdLevが更新さ
れ続けた回数gmlSetStateが７回以上であるか否か、す
なわち候補値cdLevの更新が７フレーム連続するか否か
を判断する。ここで、候補値cdLevの更新が７フレーム
以上連続したと判断すれば、ステップＳ９に進み、その
ときの入力レベルlevを最小レベルgmlとする。候補値cd
Levの更新が６フレーム以内であれば、ＮＯとなり、ス
テップＳ１０で入力レベルlevを候補値cdLevとする。In step S8, it is determined whether or not the number of times gmlSetState in which the candidate value cdLev is continuously updated is 7 or more, that is, whether or not the update of the candidate value cdLev is continued for seven frames. If it is determined that the update of the candidate value cdLev has continued for seven or more frames, the process proceeds to step S9, and the input level lev at that time is set to the minimum level gml. Candidate value cd
If the update of Lev is within six frames, the result is NO, and the input level lev is set to the candidate value cdLev in step S10.

【００７９】一方、ステップＳ７での判断でＮＯとなっ
た後に進んだステップＳ１１では、１フレーム過去の入
力レベルprevLevと現在の入力レベルlevの変動が小さい
か否かを判断する。この１フレーム過去の入力レベルpr
evLevと現在の入力レベルlevの変動が小さいか否かの判
断は、status１であるか否かの判断で行っている。stat
us１は、現在の入力レベルlevが100.0以下というように
充分小さいか、又は現在の入力レベルlevが500.0以下で
１フレーム過去の入力レベルprevLev＊0.70より大き
く、かつ１フレーム過去の入力レベルprevLev＊1.30よ
り小さいというように１フレーム過去の入力レベルprev
Levとの変動が小さいことを表す状態である。ここで、
ＹＥＳを選択し、現在の入力レベルが充分小さいか、又
は１フレーム過去の入力レベルと現在の入力レベルの変
動が小さい場合には、ステップＳ１２に進み、現在の入
力レベルlevを最小レベル候補値cdLevに設定する。一
方、ステップＳ１１で現在の入力レベルが小さくない
か、又は１フレーム過去の入力レベルと現在の入力レベ
ルの変動が小さくないと判断すると、ステップＳ１３に
進む。On the other hand, in step S11, which proceeds after the determination in step S7 is NO, it is determined whether or not the change between the input level prevLev one frame before and the current input level lev is small. Input level pr one frame before
The determination as to whether or not the fluctuation between evLev and the current input level lev is small is made by determining whether or not the status is status1. stat
us1 is sufficiently small such that the current input level lev is equal to or less than 100.0, or is larger than the input level prevLev * 0.70 in the past one frame when the current input level lev is 500.0 or less and the input level prevLev * 1.30 in the past one frame Input level prev one frame past, such as less than
This is a state indicating that the fluctuation from Lev is small. here,
If YES is selected and the current input level is sufficiently small or the change between the input level one frame past and the current input level is small, the process proceeds to step S12, where the current input level lev is set to the minimum level candidate value cdLev. Set to. On the other hand, if it is determined in step S11 that the current input level is not low or that the change between the input level one frame past and the current input level is not small, the process proceeds to step S13.

【００８０】ステップＳ１３では、最小レベルが設定さ
れた後、候補値が設定されていない回数gmlResetState
が４０より大きいか否かを判断する。ここで、ＮＯとな
り候補値が設定されていない回数gmlResetStateが４０
以下であるときには、ステップＳ１４で最小レベル候補
値cdLevをクリアし、予め定めた最小値をセットする。
一方ここで、ＹＥＳとなり、候補値が設定されていない
回数gmlResetStateが４０回を越えていると判断する
と、ステップＳ１５に進み、最小レベルgmlは最小レベ
ルの最低値MIN_GMLに設定される。In step S13, after the minimum level is set, the number of times no candidate value is set gmlResetState
Is greater than 40. Here, the result is NO and the number of times the candidate value has not been set gmlResetState is 40
If not, the minimum level candidate value cdLev is cleared in step S14, and a predetermined minimum value is set.
On the other hand, if "YES" here, and it is determined that the number of times gmlResetState in which no candidate value has been set exceeds 40, the process proceeds to step S15, and the minimum level gml is set to the minimum value MIN_GML of the minimum level.

【００８１】以上のように最小レベルはある時間保持さ
れ、順次更新される。As described above, the minimum level is held for a certain period of time and is sequentially updated.

【００８２】次に、リファレンスレベル演算部５の動作
について図６を用いて説明する。このリファレンスレベ
ル演算部５は、リファレンスレベルrefLevを次の（１）
式で算出する。Next, the operation of the reference level calculator 5 will be described with reference to FIG. The reference level calculation unit 5 converts the reference level refLev into the following (1)
It is calculated by the formula.

【００８３】 refLev＝Ａ×max（lev,refLev）＋（1.0−Ａ）×min（lev,refLev）・・・（１）この（１）式において、入力端子６から与えるＡ＝０．
７５としたときの入力レベルlevとリファレンスレベルr
efLevとの関係を図６に示す。リファレンスレベルrefLe
vは、立ち上がりは入力レベルlevと同様に立ち上がる
が、立ち下がりでは緩やかに減少していく。このため、
上記リファレンスレベルを用いることで、音声信号区間
において、瞬間的にたまたまレベルが下がった状態を背
景雑音区間として判定してしまうことを防いでいる。こ
のように、リファレンスレベル演算部５は、瞬間的なレ
ベル変動に対してもある程度余裕を持たせるような滑ら
かなレベルを演算する。RefLev = A × max (lev, refLev) + (1.0−A) × min (lev, refLev) (1) In the equation (1), A = 0.
Input level lev and reference level r when 75
FIG. 6 shows the relationship with efLev. Reference level refLe
v rises at the rising edge like the input level lev, but gradually decreases at the falling edge. For this reason,
By using the reference level, it is possible to prevent a state in which the level is instantaneously lowered in the audio signal section from being determined as a background noise section. As described above, the reference level calculation unit 5 calculates a smooth level that allows a certain margin even for an instantaneous level change.

【００８４】比較部７は、最小レベル演算部４からの最
小レベルに所定の定数Ｂを掛けたＢ×gmlと上記リファ
レンスレベル演算部５からのリファレンスレベルrefLev
とを比較する。そして、その比較結果は、パラメータ生
成部８に送られる。The comparison unit 7 calculates B × gml obtained by multiplying the minimum level from the minimum level calculation unit 4 by a predetermined constant B and the reference level refLev from the reference level calculation unit 5.
Compare with Then, the comparison result is sent to the parameter generation unit 8.

【００８５】パラメータ生成部８は、Ｖ／ＵＶ判定部３
での判定結果により、Ｖと判定されたフレームについて
は、上記（１）式に示したリファレンスレベルrefLevが
最小レベルgmlのＢ倍より小さいか否かを調べ、小さい
ときには背景雑音区間と判断する。ただし、過去のＶ／
ＵＶ判断を調べ、Ｖのフレームが２フレーム以上連続し
ている場合は音声区間が始まっているものとし、背景雑
音区間と判断することはない。すなわち、現在のフレー
ムがＶと判断されたときは、過去Ｖフレームが連続して
いるかを調べ、連続している場合には背景雑音モードに
入らない。これは、Ｖフレームが連続しているときに背
景雑音モードに入ると不連続感が生じるためである。The parameter generation unit 8 includes the V / UV determination unit 3
With respect to the frame determined to be V according to the determination result in step (1), it is checked whether or not the reference level refLev shown in the above equation (1) is smaller than B times the minimum level gml. However, past V /
The UV judgment is checked, and when two or more V frames are continuous, it is assumed that the voice section has started, and no judgment is made as a background noise section. That is, when it is determined that the current frame is V, it is checked whether or not the past V frames are continuous, and if it is continuous, the apparatus does not enter the background noise mode. This is because if the background noise mode is entered while V frames are continuous, a sense of discontinuity occurs.

【００８６】また、パラメータ生成部８は、Ｖ／ＵＶ判
定部３からの判定結果がＵＶと判定されたフレームにつ
いては、Ｖの場合と同様に、リファレンスレベルrefLev
が最小レベルgmlのＢ倍より小さいかを調べ、この条件
を４回満たした場合、背景雑音区間と判断する。すなわ
ち、ＵＶと判断された場合は、４フレーム連続で上記条
件を満たした後、背景雑音区間と判断する。The parameter generation unit 8 sets the reference level refLev for the frame whose determination result from the V / UV determination unit 3 is determined to be UV, as in the case of V.
Is smaller than B times the minimum level gml, and if this condition is satisfied four times, it is determined to be a background noise section. In other words, when it is determined that the condition is UV, the condition is satisfied for four consecutive frames, and then the background noise section is determined.

【００８７】なお、上記Ｂは適当な定数であり、ここで
は２．０と定める。また、このような定数とせずに入力
レベルlevの分散に比例した量とすることも考えられ
る。Note that B is an appropriate constant, and is set to 2.0 here. It is also conceivable to use an amount proportional to the variance of the input level lev instead of such a constant.

【００８８】そして、パラメータ生成部８は、出力端子
１０５からidVUVパラメータを出力する。Then, the parameter generator 8 outputs the idVUV parameter from the output terminal 105.

【００８９】このようにして、入力信号判定装置２１
は、最小レベルgmlを更新しながら、それを元に音声信
号区間と背景雑音区間とのしきい値（スレショルド）gm
l×Ｂを変化できるので、このスレショルドgml×Ｂとリ
ファレンスレベルrefLevを比較することにより音声信号
区間と背景雑音区間との高精度な区別を可能とする。As described above, the input signal judging device 21
Updates the minimum level gml, and based on that, the threshold gm between the audio signal section and the background noise section
Since l × B can be changed, by comparing the threshold gml × B with the reference level refLev, it is possible to distinguish the speech signal section from the background noise section with high accuracy.

【００９０】ここで、図１に戻る。音声復号化装置３１
は、他の携帯電話装置の上記音声符号化装置２０により
可変レート符号化された上記符号化データをアンテナ２
６、アンテナ共用器２５、受信機２７、復調器２９及び
伝送路復号化器３０を介して受け取り、復号化する。Now, return to FIG. Voice decoding device 31
Transmits the coded data variable-rate coded by the voice coding device 20 of another mobile phone device to the antenna 2
6. The signal is received and decoded through the antenna duplexer 25, the receiver 27, the demodulator 29 and the transmission path decoder 30.

【００９１】この音声復号化装置３１は、上述したよう
に、音声符号化装置２０から上記所定時間中にも常に伝
送されてくるidVUV判定パラメータに基づいて、上記符
号化データを復号化する。特に、idVUV判定パラメータ
が背景雑音区間を示す“１”であれば、８フレーム分を
おいて伝送されてきた上記複数種類のパラメータ、例え
ば雑音符号帳のシェイプインデクスや、ゲインインデク
ス、又はＬＳＰパラメータを用いて、背景雑音を生成す
ると共に、８フレーム中では過去に送られてきた線スペ
クトル対（ＬＳＰ）パラメータを補間して上記背景雑音
を生成する。As described above, the speech decoding device 31 decodes the coded data based on the idVUV determination parameter which is always transmitted from the speech encoding device 20 during the predetermined time. In particular, if the idVUV determination parameter is “1” indicating a background noise section, the plurality of types of parameters transmitted at intervals of eight frames, for example, a shape index of a noise codebook, a gain index, or an LSP parameter are used. In addition to generating background noise, the background noise is generated by interpolating a line spectrum pair (LSP) parameter transmitted in the past in eight frames.

【００９２】実際に、音声復号化装置３１では、常に、
前回送られたＬＳＰ（prevLsp1）と前々回送られたＬＳ
Ｐ（prevLsp2）を、例えばＲＡＭ内に保持している。Actually, in the speech decoding device 31, always,
LSP sent last time (prevLsp1) and LS sent two times before
P (prevLsp2) is held in, for example, a RAM.

【００９３】そして、上記idVUV判定パラメータが背景
雑音モードに入ると、新たなＬＳＰは送られてこないの
で、prevLsp1、prevLsp2の更新を行ず、この二つのＬＳ
Ｐを線形補間することにより、現在のフレームのＬＳＰ
とし、背景雑音を形成する。When the idVUV determination parameter enters the background noise mode, no new LSP is sent, so that prevLsp1 and prevLsp2 are updated, and the two LSs are updated.
By linearly interpolating P, the LSP of the current frame
To form background noise.

【００９４】背景雑音モード中、８フレーム目に通常の
ＵＶとして音声符号化装置側からＵＶの全パラメータが
送られてくるが、このときゲイン回路１２６からのゲイ
ンインデクスを調べ、インデックスが前回送られたイン
デックス＋２より小さければ、そのフレームの合成に用
いるＬＳＰを前回送られたパラメータに置き換える。こ
の動作については後述する。ただし、ゲインインデクス
は小さい順にソートされているものとする。In the background noise mode, all the UV parameters are sent from the speech coding apparatus side as normal UV in the eighth frame. At this time, the gain index from the gain circuit 126 is checked, and the index is sent last time. If the index is smaller than the index +2, the LSP used for synthesizing the frame is replaced with the previously transmitted parameter. This operation will be described later. However, it is assumed that the gain indexes are sorted in ascending order.

【００９５】このような音声復号化装置３１の構成を図
７及び図８に示す。図７は、音声復号化装置３１の基本
構成を示すブロック図である。FIG. 7 and FIG. 8 show the configuration of such a speech decoding device 31. FIG. 7 is a block diagram showing a basic configuration of the audio decoding device 31.

【００９６】この図７において、入力端子２０２には上
記図２の出力端子１０２からの上記ＬＳＰ（線スペクト
ル対）の量子化出力としてのコードブックインデクスが
入力される。入力端子２０３、２０４、及び２０５に
は、上記図２の各出力端子１０３、１０４、及び１０５
からの各出力、すなわちエンベロープ量子化出力として
のインデクス、ピッチ、及びＶ／ＵＶ判定出力がそれぞ
れ入力される。また、入力端子２０７には、上記図２の
出力端子１０７からのＵＶ（無声音）用のデータとして
のインデクスが入力される。In FIG. 7, a codebook index as a quantized output of the LSP (line spectrum pair) from the output terminal 102 of FIG. 2 is input to an input terminal 202. The input terminals 203, 204, and 205 include the output terminals 103, 104, and 105 of FIG.
, That is, an index, a pitch, and a V / UV determination output as an envelope quantization output. An index as UV (unvoiced sound) data from the output terminal 107 in FIG. 2 is input to the input terminal 207.

【００９７】入力端子２０３からのエンベロープ量子化
出力としてのインデクスは、逆ベクトル量子化器２１２
に送られて逆ベクトル量子化され、ＬＰＣ残差のスペク
トルエンベロープが求められて有声音合成部２１１に送
られる。有声音合成部２１１は、サイン波合成により有
声音部分のＬＰＣ（線形予測符号化）残差を合成するも
のであり、この有声音合成部２１１には入力端子２０４
及び２０５からのピッチ及びidVUV判定パラメータも供
給されている。有声音合成部２１１からの有声音のＬＰ
Ｃ残差は、ＬＰＣ合成フィルタ２１４に送られる。ま
た、入力端子２０７からのＵＶデータのインデクスは、
無声音合成部２２０に送られて、雑音符号帳を参照する
ことにより無声音部分のＬＰＣ残差が取り出される。こ
のＬＰＣ残差もＬＰＣ合成フィルタ２１４に送られる。
ＬＰＣ合成フィルタ２１４では、上記有声音部分のＬＰ
Ｃ残差と無声音部分のＬＰＣ残差とがそれぞれ独立に、
ＬＰＣ合成処理が施される。あるいは、有声音部分のＬ
ＰＣ残差と無声音部分のＬＰＣ残差とが加算されたもの
に対してＬＰＣ合成処理を施すようにしてもよい。ここ
で入力端子２０２からのＬＳＰのインデクスは、ＬＰＣ
パラメータ再生部２１３に送られて、ＬＰＣのαパラメ
ータが取り出され、これがＬＰＣ合成フィルタ２１４に
送られる。ＬＰＣ合成フィルタ２１４によりＬＰＣ合成
されて得られた音声信号は、出力端子２０１より取り出
される。The index as the envelope quantization output from the input terminal 203 is calculated by the inverse vector quantizer 212.
, And is subjected to inverse vector quantization, and the spectrum envelope of the LPC residual is obtained and sent to the voiced sound synthesis unit 211. The voiced sound synthesizer 211 synthesizes an LPC (linear predictive coding) residual of the voiced sound part by sine wave synthesis.
And 205 and the pitch and idVUV determination parameters are also provided. LP of voiced sound from voiced sound synthesizer 211
The C residual is sent to LPC synthesis filter 214. The index of the UV data from the input terminal 207 is
It is sent to the unvoiced sound synthesis unit 220, and the LPC residual of the unvoiced sound portion is extracted by referring to the noise codebook. This LPC residual is also sent to the LPC synthesis filter 214.
In the LPC synthesis filter 214, the LP of the voiced sound portion is
The C residual and the LPC residual of the unvoiced part are independent of each other,
An LPC synthesis process is performed. Alternatively, the voiced sound portion L
LPC synthesis processing may be performed on the sum of the PC residual and the LPC residual of the unvoiced sound portion. Here, the index of the LSP from the input terminal 202 is LPC
The parameter is sent to the parameter reproducing unit 213 to extract the α parameter of the LPC, which is sent to the LPC synthesis filter 214. An audio signal obtained by LPC synthesis by the LPC synthesis filter 214 is extracted from the output terminal 201.

【００９８】ここで、入力端子２０５に供給されたidVU
V判定パラメータと入力端子２０７に供給された上記Ｕ
Ｖデータとしての雑音符号帳のシェイプインデクス及び
ゲインインデクスは、上記ＬＰＣパラメータ生成部２１
３でのＬＰＣパラメータの再生を制御するＬＰＣパラメ
ータ再生制御部２４０に送られる。Here, idVU supplied to the input terminal 205
V judgment parameter and the above U supplied to the input terminal 207
The shape index and the gain index of the random codebook as V data are obtained by the LPC parameter generation unit 21.
3 is transmitted to the LPC parameter reproduction control section 240 which controls the reproduction of the LPC parameter.

【００９９】このＬＰＣ再生制御部２４０により制御さ
れ、ＬＰＣパラメータ再生部２１３は、背景雑音信号生
成用のＬＰＣを生成し、ＬＰＣ合成フィルタ２１４に送
る。The LPC reproduction section 213 controls the LPC reproduction section 240 to generate an LPC for generating a background noise signal and sends it to the LPC synthesis filter 214.

【０１００】次に、図８は、上記図７に示した音声復号
化装置３１のより具体的な構成を示している。この図８
において、上記図７の各部と対応する部分には、同じ指
示符号を付している。Next, FIG. 8 shows a more specific configuration of the speech decoding device 31 shown in FIG. This FIG.
In FIG. 7, portions corresponding to the respective portions in FIG. 7 are denoted by the same reference numerals.

【０１０１】この図８において、入力端子２０２には、
上記図２、３の出力端子１０２からの出力に相当するＬ
ＳＰのベクトル量子化出力、いわゆるコードブックのイ
ンデクスが供給されている。In FIG. 8, an input terminal 202 has
L corresponding to the output from the output terminal 102 in FIGS.
An SP vector quantization output, a so-called codebook index, is supplied.

【０１０２】このＬＳＰのインデクスは、ＬＰＣパラメ
ータ再生部２１３のＬＳＰの逆ベクトル量子化器２３１
に送られてＬＳＰ（線スペクトル対）データに逆ベクト
ル量子化され、スイッチ２４３を介してＬＳＰ補間回路
２３２、２３３に送られてＬＳＰの補間処理が施された
後、ＬＳＰ→α変換回路２３４、２３５でＬＰＣ（線形
予測符号）のαパラメータに変換され、このαパラメー
タがＬＰＣ合成フィルタ２１４に送られる。ここで、Ｌ
ＳＰ補間回路２３２及びＬＳＰ→α変換回路２３４は有
声音（Ｖ）用であり、ＬＳＰ補間回路２３３及びＬＳＰ
→α変換回路２３５は無声音（ＵＶ）用である。またＬ
ＰＣ合成フィルタ２１４は、有声音部分のＬＰＣ合成フ
ィルタ２３６と、無声音部分のＬＰＣ合成フィルタ２３
７とを分離している。すなわち、有声音部分と無声音部
分とでＬＰＣの係数補間を独立に行うようにして、有声
音から無声音への遷移部や、無声音から有声音への遷移
部で、全く性質の異なるＬＳＰ同士を補間することによ
る悪影響を防止している。The index of this LSP is calculated by the inverse vector quantizer 231 of the LSP of the LPC parameter reproducing unit 213.
Is subjected to inverse vector quantization into LSP (line spectrum pair) data, sent to LSP interpolation circuits 232 and 233 via a switch 243, and subjected to LSP interpolation processing. At 235, the parameter is converted to an α parameter of LPC (linear prediction code), and the α parameter is sent to the LPC synthesis filter 214. Where L
The SP interpolation circuit 232 and the LSP → α conversion circuit 234 are for voiced sound (V), and the LSP interpolation circuit 233 and the LSP
→ The α conversion circuit 235 is for unvoiced sound (UV). Also L
The PC synthesis filter 214 includes an LPC synthesis filter 236 for a voiced portion and an LPC synthesis filter 23 for an unvoiced portion.
7 is separated. That is, LPC coefficient interpolation is performed independently for voiced and unvoiced parts, and LSPs having completely different properties are interpolated between the transition from voiced to unvoiced and the transition from unvoiced to voiced. To prevent the adverse effects of doing so.

【０１０３】また、図８の入力端子２０３には、上記図
２、図３のエンコーダ側の端子１０３からの出力に対応
するスペクトルエンベロープ（Ａｍ）の重み付けベクト
ル量子化されたコードインデクスデータが供給され、入
力端子２０４には、上記図２、図３の端子１０４からの
ピッチのデータが供給され、入力端子２０５には、上記
図２、図３の端子１０５からのidVUV判定パラメータが
供給されている。The input terminal 203 shown in FIG. 8 is supplied with the code index data obtained by quantizing the weight of the spectral envelope (Am) corresponding to the output from the terminal 103 on the encoder side shown in FIGS. , Input terminal 204 is supplied with pitch data from terminal 104 in FIGS. 2 and 3, and input terminal 205 is supplied with idVUV determination parameters from terminal 105 in FIGS. 2 and 3. .

【０１０４】入力端子２０３からのスペクトルエンベロ
ープＡｍのベクトル量子化されたインデクスデータは、
逆ベクトル量子化器２１２に送られて逆ベクトル量子化
が施され、上記データ数変換に対応する逆変換が施され
て、スペクトルエンベロープのデータとなって、有声音
合成部２１１のサイン波合成回路２１５に送られてい
る。The vector-quantized index data of the spectral envelope Am from the input terminal 203 is
The data is sent to the inverse vector quantizer 212, subjected to inverse vector quantization, subjected to an inverse transform corresponding to the above-described data number conversion, becomes spectral envelope data, and becomes a sine wave synthesizing circuit of the voiced sound synthesizer 211. 215.

【０１０５】なお、エンコード時にスペクトルのベクト
ル量子化に先だってフレーム間差分をとっている場合に
は、ここでの逆ベクトル量子化後にフレーム間差分の復
号を行ってからデータ数変換を行い、スペクトルエンベ
ロープのデータを得る。If the inter-frame difference is obtained prior to the vector quantization of the spectrum at the time of encoding, the inter-frame difference is decoded after the inverse vector quantization here, the number of data is converted, and the spectrum envelope is converted. To get the data.

【０１０６】サイン波合成回路２１５には、入力端子２
０４からのピッチ及び入力端子２０５からの上記idVUV
判定パラメータが供給されている。サイン波合成回路２
１５からは、上述した図２、図３のＬＰＣ逆フィルタ１
１１からの出力に相当するＬＰＣ残差データが取り出さ
れ、これが加算器２１８に送られている。このサイン波
合成の具体的な手法については、例えば本件出願人が先
に提案した、特願平４−９１４２２号の明細書及び図
面、あるいは特願平６−１９８４５１号の明細書及び図
面に開示されている。The sine wave synthesis circuit 215 has an input terminal 2
04 and the idVUV from input terminal 205
Judgment parameters are supplied. Sine wave synthesis circuit 2
15, the LPC inverse filter 1 shown in FIGS.
LPC residual data corresponding to the output from 11 is extracted and sent to the adder 218. The specific method of the sine wave synthesis is disclosed in, for example, the specification and drawings of Japanese Patent Application No. 4-91422 or the specification and drawings of Japanese Patent Application No. 6-198451, which were previously proposed by the present applicant. Have been.

【０１０７】また、逆ベクトル量子化器２１２からのエ
ンベロープのデータと、入力端子２０４、２０５からの
ピッチ、idVUV判定パラメータとは、有声音（Ｖ）部分
のノイズ加算のためのノイズ合成回路２１６に送られて
いる。このノイズ合成回路２１６からの出力は、重み付
き重畳加算回路２１７を介して加算器２１８に送ってい
る。これは、サイン波合成によって有声音のＬＰＣ合成
フィルタへの入力となるエクサイテイション（Excitati
on：励起、励振）を作ると、男声等の低いピッチの音で
鼻づまり感がある点、及びＶ（有声音）とＵＶ（無声
音）とで音質が急激に変化し不自然に感じる場合がある
点を考慮し、有声音部分のＬＰＣ合成フィルタ入力すな
わちエクサイテイションについて、音声符号化データに
基づくパラメータ、例えばピッチ、スペクトルエンベロ
ープ振幅、フレーム内の最大振幅、残差信号のレベル等
を考慮したノイズをＬＰＣ残差信号の有声音部分に加え
ているものである。The envelope data from the inverse vector quantizer 212, the pitches from the input terminals 204 and 205, and the idVUV determination parameter are sent to the noise synthesis circuit 216 for adding noise in the voiced (V) portion. Has been sent. The output from the noise synthesis circuit 216 is sent to an adder 218 via a weighted superposition addition circuit 217. This is an excitation (Excitati) which is input to the LPC synthesis filter of voiced sound by sine wave synthesis.
When on (excitation, excitation) is made, there is a case where a low pitch sound such as a male voice has a feeling of stuffy nose, and the sound quality suddenly changes between V (voiced sound) and UV (unvoiced sound) and feels unnatural. Considering a certain point, the LPC synthesis filter input of the voiced sound portion, that is, the excitation, was considered in consideration of parameters based on the speech coded data, for example, pitch, spectrum envelope amplitude, maximum amplitude in a frame, residual signal level, and the like. Noise is added to the voiced portion of the LPC residual signal.

【０１０８】加算器２１８からの加算出力は、ＬＰＣ合
成フィルタ２１４の有声音用の合成フィルタ２３６に送
られてＬＰＣの合成処理が施されることにより時間波形
データとなり、さらに有声音用ポストフィルタ２３８ｖ
でフィルタ処理された後、加算器２３９に送られる。The added output from the adder 218 is sent to the voiced sound synthesis filter 236 of the LPC synthesis filter 214 and subjected to LPC synthesis processing to become time waveform data, and further to the voiced sound post filter 238v.
, And sent to the adder 239.

【０１０９】次に、図８の入力端子２０７ｓ及び２０７
ｇには、上記図３の出力端子１０７ｓ及び１０７ｇから
のＵＶデータとしてのシェイプインデクス及びゲインイ
ンデクスがそれぞれ供給され、無声音合成部２２０に送
られている。端子２０７ｓからのシェイプインデクス
は、無声音合成部２２０の雑音符号帳２２１に、端子２
０７ｇからのゲインインデクスはゲイン回路２２２にそ
れぞれ送られている。雑音符号帳２２１から読み出され
た代表値出力は、無声音のＬＰＣ残差に相当するノイズ
信号成分であり、これがゲイン回路２２２で所定のゲイ
ンの振幅となり、窓かけ回路２２３に送られて、上記有
声音部分とのつなぎを円滑化するための窓かけ処理が施
される。Next, the input terminals 207s and 207 of FIG.
The shape index and the gain index as UV data from the output terminals 107 s and 107 g in FIG. 3 are supplied to g, and are sent to the unvoiced sound synthesis unit 220. The shape index from the terminal 207s is stored in the noise codebook 221 of the unvoiced sound synthesizer 220 in the terminal 2
The gain index from 07g is sent to the gain circuit 222, respectively. The representative value output read from the noise codebook 221 is a noise signal component corresponding to the LPC residual of the unvoiced sound. The noise signal component has an amplitude of a predetermined gain in the gain circuit 222 and is sent to the windowing circuit 223. A windowing process is performed to smooth the connection with the voiced sound portion.

【０１１０】窓かけ回路２２３からの出力は、無声音合
成部２２０からの出力として、ＬＰＣ合成フィルタ２１
４のＵＶ（無声音）用の合成フィルタ２３７に送られ
る。合成フィルタ２３７では、ＬＰＣ合成処理が施され
ることにより無声音部分の時間波形データとなり、この
無声音部分の時間波形データは無声音用ポストフィルタ
２３８ｕでフィルタ処理された後、加算器２３９に送ら
れる。The output from the windowing circuit 223 is output from the unvoiced sound synthesis section 220 as the LPC synthesis filter 21.
4 is sent to the synthesis filter 237 for UV (unvoiced sound). The synthesis filter 237 performs LPC synthesis processing to obtain unvoiced sound time waveform data. The unvoiced sound time waveform data is filtered by the unvoiced sound post filter 238u, and then sent to the adder 239.

【０１１１】加算器２３９では、有声音用ポストフィル
タ２３８ｖからの有声音部分の時間波形信号と、無声音
用ポストフィルタ２３８ｕからの無声音部分の時間波形
データとが加算され、出力端子２０１より取り出され
る。In the adder 239, the time waveform signal of the voiced sound portion from the voiced sound post filter 238v and the time waveform data of the unvoiced sound portion from the unvoiced sound post filter 238u are added, and the result is taken out from the output terminal 201.

【０１１２】また、ＬＰＣパラメータ再生部２１３内部
には、背景雑音信号生成用に用いられるＬＰＣパラメー
タを再生するためのＬＳＰ補間回路２４５と、ＬＳＰ→
α変換回路２４７も上記スイッチ２４３の後段に設けら
れている。さらに、上記ＬＳＰの逆量子化器２３１によ
り得られた上記prevLSP1とprevLSP2を保持しておくため
のＲＡＭ２４４と、８フレームの間隔があいている上記
prevLSP1とprevLSP2との補間をフレーム間隔に補正する
ためのフレーム補間部２４５も備えている。Further, inside the LPC parameter reproducing section 213, an LSP interpolation circuit 245 for reproducing the LPC parameter used for generating the background noise signal, and an LSP →
The α conversion circuit 247 is also provided after the switch 243. Further, a RAM 244 for holding the prevLSP1 and prevLSP2 obtained by the inverse quantizer 231 of the LSP is provided with a RAM 244 having an interval of 8 frames.
A frame interpolation unit 245 for correcting interpolation between prevLSP1 and prevLSP2 to a frame interval is also provided.

【０１１３】ＬＳＰ→α変換回路２４７からのBGN用の
αパラメータは、ＬＰＣ合成フィルタ２１４の無声音部
分のＬＰＣ合成フィルタ２３７に送られる。The α parameter for BGN from the LSP → α conversion circuit 247 is sent to the LPC synthesis filter 237 of the unvoiced sound portion of the LPC synthesis filter 214.

【０１１４】また、上記ＬＰＣパラメータ再生制御部２
４０は、入力端子２０７ｇからの上記ＵＶデータ用のゲ
インインデスクが、前回送られたインデックス＋２より
小さいか否かを判定するインデックス判定部２４２と、
このインデクス判定部２４２からの判定結果と上記入力
端子２０５から供給されるidVUV判定パラメータとに基
づいて上記スイッチ２４３の切り換えを制御する切り換
え制御部２４１とを備えてなる。Further, the LPC parameter reproduction control unit 2
40, an index determination unit 242 that determines whether the gain desk for the UV data from the input terminal 207g is smaller than the previously transmitted index +2,
A switching control unit 241 for controlling switching of the switch 243 based on the determination result from the index determining unit 242 and the idVUV determination parameter supplied from the input terminal 205 is provided.

【０１１５】上記idVUV判定パラメータが１であると
き、すなわち背景雑音区間であることを示す上記基本パ
ラメータとなるモードビットを受信したときのこの音声
復号化装置３１の動作を、図９に示すフローチャートを
用いて説明する。FIG. 9 is a flowchart showing the operation of the speech decoding apparatus 31 when the idVUV determination parameter is 1, that is, when a mode bit serving as the basic parameter indicating a background noise section is received. It will be described using FIG.

【０１１６】先ず、ステップＳ２１でidVUV判定パラメ
ータが１であるとき、ＬＰＣパラメータ再生制御部２４
０の切り換え制御部２４１は、切り換えスイッチ２４３
をオフにする。そして、ステップＳ２２に進み、ＬＳＰ
補間回路２４６で上記ＲＡＭ２４４に保持されたPrevLS
P１とPrevLSP2をフレーム補間回路２４５を通して得た
フレーム毎の直線補間値を使ってBGN用のＬＳＰを求め
る。そして、このBGN用のＬＳＰは、UV用の合成フィル
タ２３７に供給され、背景雑音が合成される。First, when the idVUV determination parameter is 1 in step S21, the LPC parameter reproduction control unit 24
0 switch control unit 241
Turn off. Then, the process proceeds to step S22, where the LSP
PrevLS held in the RAM 244 by the interpolation circuit 246
An LSP for BGN is obtained by using a linear interpolation value for each frame obtained from P1 and PrevLSP2 through the frame interpolation circuit 245. Then, the LSP for BGN is supplied to the UV synthesis filter 237, and background noise is synthesized.

【０１１７】なお、音声符号化装置２０からは、背景雑
音区間と判断されたフレームが連続８フレームとなる
と、次の９フレーム目は通常のＵＶデータ用のシェイプ
インデクス、ゲインインデクス及びＬＳＰパラメータが
送られてくる。ここで、音声符号化装置２０では、９フ
レーム目に本当にidVUV判定パラメータがＵＶになるこ
とがないとは限らない。そこで、音声復号化装置３１側
では、本当のＵＶ用データなのか、あるいは単に９フレ
ーム目に送られたＵＶ用の全パラメータなのかを判断す
る必要がある。Note that, when the number of frames determined to be a background noise section becomes eight consecutive frames, the speech encoding apparatus 20 transmits the shape index, gain index, and LSP parameter for normal UV data in the next ninth frame. Come. Here, in the speech encoding device 20, the idVUV determination parameter does not always become UV at the ninth frame. Therefore, it is necessary for the audio decoding device 31 to determine whether the data is true UV data or simply all the UV parameters sent in the ninth frame.

【０１１８】そこで、ＬＰＣパラメータ再生制御部２４
０では、ステップＳ２３でidＶＵＶ＝０であるとき、ス
テップＳ２４に進み、インデクス判定部２４２により入
力端子２０７ｇを介して送られてきたＵＶデータ用のゲ
インインデクスを調べ、ステップＳ２６のルーティンの
処理か、ステップＳ２７の本来のＵＶの処理かを判断す
る。具体的には、上述したように、入力端子２０７ｇか
らの上記ＵＶデータ用のゲインインデスクが、前回送ら
れたインデクス＋２より小さいか否かを判定する。９フ
レーム目に送られてきたＵＶデータ用のインデクスが音
声としてのＵＶ用インデクスであれば前回送られたゲイ
ンインデクスのパラメータよりも大きいはずである。Therefore, the LPC parameter reproduction control unit 24
In step S23, when idVUV = 0 in step S23, the process proceeds to step S24, in which the index determination unit 242 checks the gain index for the UV data transmitted via the input terminal 207g, and determines whether the routine process in step S26 is performed. It is determined whether the processing is the original UV processing in step S27. Specifically, as described above, it is determined whether or not the gain index for the UV data from the input terminal 207g is smaller than the previously transmitted index +2. If the UV data index sent in the ninth frame is a UV index as audio, it should be larger than the parameter of the gain index sent last time.

【０１１９】ステップＳ２４で上記ゲインインデクスが
前回送られてきたインデクス＋２より小さいとなれば、
これは９フレーム目に送られたＵＶデータ用の、すなわ
ちルーティン用のデータであると判断し、ステップＳ２
６に進み、切り換え制御部２４１により、スイッチ２４
３をＬＳＰ補間回路２４６側に接続し、上記PrevLSP１
とPrevLSP2を用いた直線補間により求めた値に変えて、
ＵＶ用に送られたＬＳＰ逆量子化部２３１からのＬＳＰ
パラメータをＬＳＰ補間回路２４６に供給する。ＬＳＰ
補間回路２４６では、このアップデートされたＬＳＰパ
ラメータを、そのままＬＳＰ→α変換回路２４７に供給
する。そして、ＬＳＰ→α変換回路２４７からのBGN用
のαパラメータは、ＬＰＣ合成フィルタ２１４の無声音
部分のＬＰＣ合成フィルタ２３７に送られ、９フレーム
目には８フレーム間の背景雑音とは異なった背景雑音が
得られることになる。このため、背景雑音の不自然さを
緩和することができる。If the gain index is smaller than the previously transmitted index +2 in step S24,
This is determined to be the data for the UV data sent in the ninth frame, that is, the data for the routine, and step S2
6 and the switching control unit 241
3 is connected to the LSP interpolation circuit 246 side, and the PrevLSP1
And the value obtained by linear interpolation using PrevLSP2,
LSP sent from LSP inverse quantization unit 231 for UV
The parameters are supplied to the LSP interpolation circuit 246. LSP
The interpolation circuit 246 supplies the updated LSP parameter to the LSP → α conversion circuit 247 as it is. Then, the BGN α parameter from the LSP → α conversion circuit 247 is sent to the LPC synthesis filter 237 of the unvoiced sound portion of the LPC synthesis filter 214, and the ninth frame has a different background noise from the background noise between the eight frames. Is obtained. For this reason, the unnaturalness of the background noise can be reduced.

【０１２０】また、ステップＳ２４で上記ゲインインデ
クスが前回送られてきたインデクス＋２以上より大きい
となれば、これは９フレーム目に送られてきたのは、本
当のＵＶ用のパラメータであると判断し、ステップＳ２
７に進む。ステップＳ２７では、切り換え制御部２４１
がスイッチ２４３をＵＶ用のＬＳＰ補間回路２３３に切
り換えて、通常のＵＶ用のＬＳＰ補間により得られたＬ
ＳＰを使った無声音の合成が行われる。If the gain index is greater than or equal to the previously transmitted index + 2 in step S24, it is determined that the transmitted ninth frame is a true UV parameter. , Step S2
Go to 7. In step S27, the switching control unit 241
Switches the switch 243 to the LSP interpolation circuit 233 for UV, and obtains L obtained by normal LSP interpolation for UV.
Synthesis of unvoiced sound using SP is performed.

【０１２１】一方、上記ステップＳ２３でidVUV判定パ
ラメータが０でないと判断すると、ステップＳ２５に進
み、ＵＶ用のＬＳＰからαパラメータを変換し、合成フ
ィルタ２３６で有声音を合成する。On the other hand, if it is determined in step S23 that the idVUV determination parameter is not 0, the process proceeds to step S25, where the α parameter is converted from the UV LSP, and a voiced sound is synthesized by the synthesis filter 236.

【０１２２】以上のように、音声復号化装置３１では、
idVUV判定パラメータが１であるとき、すなわち他の携
帯電話装置の音声符号化装置が背景雑音区間を検出した
ときには、８フレームをおいて伝送されてきた複数種類
のパラメータを用いて上記背景雑音を生成すると共に、
上記８フレーム中には過去に送られたパラメータを用い
て上記背景雑音を生成する。このため、背景雑音の不自
然さを緩和することができる。また、９フレーム目に偶
然に、本当の無声音区間が検出されて伝送されてきて
も、正確に判断できるので、高品質な音声を復号でき
る。As described above, in the speech decoding device 31,
When the idVUV determination parameter is 1, that is, when the speech coder of another mobile phone device detects a background noise section, the background noise is generated using a plurality of types of parameters transmitted with eight frames apart. Along with
In the eight frames, the background noise is generated by using parameters transmitted in the past. For this reason, the unnaturalness of the background noise can be reduced. Further, even if a true unvoiced sound section is detected and transmitted by chance in the ninth frame, accurate judgment can be made, so that high-quality speech can be decoded.

【０１２３】ＬＳＰは８フレーム分のディレイが生じる
ことになるが、背景雑音モード中は完全に滑らかに繋が
ることになり、急激にＬＳＰが変化して異音を発生する
ことがなくなる。また、音声の子音部は背景雑音より高
いレベルを有することが多いので、音声の子音部を誤っ
て背景雑音として処理してしまうことを防げる。Although the LSP has a delay of eight frames, the connection is completely smooth during the background noise mode, and the LSP does not suddenly change to generate abnormal noise. Further, since the consonant part of the voice often has a higher level than the background noise, it is possible to prevent the consonant part of the voice from being erroneously processed as the background noise.

【０１２４】そして、再び背景雑音モードに入ったらpr
evLsp1、prevLsp2を線形補間することにより、現在のフ
レームのＬＳＰとする。Then, when entering the background noise mode again, pr
evLsp1 and prevLsp2 are linearly interpolated to be the LSP of the current frame.

【０１２５】ところで、モードビットが充分にあり、こ
れを一つのモードとして送ることができれば、このよう
な処理は必要ない。また、prevLsp1とprevLsp2の補間方
法は線形補間以外にも様々なものが考えられる。By the way, if there are enough mode bits and they can be transmitted as one mode, such processing is not necessary. Various interpolation methods other than linear interpolation are conceivable for the prevLsp1 and prevLsp2 interpolation methods.

【０１２６】[0126]

【発明の効果】本発明に係る音声符号化方法及び装置
は、可変レート符号化を効率良く実現する。The speech encoding method and apparatus according to the present invention realize variable rate encoding efficiently.

【０１２７】また、本発明に係る音声復号化方法及び装
置は、可変レート符号化を実現する音声符号化方法及び
装置により符号化された符号化データを用いて、背景雑
音を不自然感を緩和して生成できる。Further, the speech decoding method and apparatus according to the present invention reduce background noise by using encoded data encoded by the speech encoding method and apparatus for implementing variable rate encoding. Can be generated.

[Brief description of the drawings]

【図１】本発明に係る音声符号化方法及び装置、並びに
音声復号化方法及び装置の実施の形態となる携帯電話装
置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a mobile phone device which is an embodiment of a speech encoding method and device and a speech decoding method and device according to the present invention.

【図２】上記携帯電話装置を構成する音声符号化装置の
基本的な構成を示すブロック図である。FIG. 2 is a block diagram showing a basic configuration of a speech encoding device constituting the mobile phone device.

【図３】上記図２に示した音声符号化装置の詳細な構成
を示すブロック図である。FIG. 3 is a block diagram showing a detailed configuration of the speech encoding device shown in FIG. 2;

【図４】上記音声符号化装置内部にあって入力信号を判
定する入力信号判定装置の構成を示すブロック図であ
る。FIG. 4 is a block diagram illustrating a configuration of an input signal determination device that determines an input signal inside the speech encoding device.

【図５】上記図４に示した入力信号判定装置を構成する
最小レベル演算部のアルゴリズムを説明するためのフロ
ーチャートである。FIG. 5 is a flowchart for explaining an algorithm of a minimum level calculation unit included in the input signal determination device shown in FIG. 4;

【図６】上記図４に示した入力信号判定装置を構成する
リファレンスレベル演算部を説明するための特性図であ
る。FIG. 6 is a characteristic diagram for explaining a reference level calculation unit included in the input signal determination device shown in FIG. 4;

【図７】上記携帯電話装置を構成する音声復号化装置の
基本的な構成を示すブロック図である。FIG. 7 is a block diagram showing a basic configuration of a speech decoding device constituting the mobile phone device.

【図８】上記図７に示した音声復号化装置の詳細な構成
を示すブロック図である。FIG. 8 is a block diagram showing a detailed configuration of the speech decoding device shown in FIG. 7;

【図９】上記音声復号化装置の動作を説明するためのフ
ローチャートである。FIG. 9 is a flowchart for explaining the operation of the speech decoding apparatus.

[Explanation of symbols]

２０音声符号化装置、２１、１１５入力信号判定
部、３１音声復号化装置、２４０ＬＰＣパラメータ
再生制御部、２４１切り換え制御部、２４２インデク
ス判定部Reference Signs List 20 audio encoding device, 21, 115 input signal determination unit, 31 audio decoding device, 240 LPC parameter reproduction control unit, 241 switching control unit, 242 index determination unit

Claims

[Claims]

1. A speech encoding method for encoding an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on a determination result of each section. An audio encoding step of dividing the input signal into predetermined encoding units on the time axis, encoding the encoded signals in each encoding unit, and outputting a plurality of types of audio encoding parameters, the audio encoding step includes: When the determination result indicates the background noise section, the plurality of types of parameters are output after a predetermined time.

2. The speech encoding method according to claim 1, wherein in the speech encoding step, the determination result of each section is always output as a basic parameter even during the predetermined time.

3. The speech encoding step includes: a short-term prediction residual calculation step for obtaining a short-term prediction residual of the input signal; and a sine wave analysis encoding step for performing sine-wave analysis encoding on the obtained short-term prediction residual. 2. A speech encoding method according to claim 1, further comprising a waveform encoding step of encoding said input signal by waveform encoding.

4. The method according to claim 3, wherein the sine wave analysis encoding step encodes the input signal when the audio signal section is voiced, and the waveform encoding step encodes the input signal when the audio signal section is unvoiced. Voice encoding method.

5. The speech encoding step according to claim 1, wherein the difference quantization is not performed in the short-term prediction residual calculating step when the background noise section is within the background noise section or one frame before is the background noise section. Item 3. The speech encoding method according to Item 3.

6. A speech coding apparatus for coding an input signal consisting of a speech signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on a determination result of each section. An audio encoding unit that divides an input signal into predetermined encoding units on a time axis, performs encoding in each encoding unit, and outputs a plurality of types of audio encoding parameters, wherein the audio encoding unit includes: When the determination result indicates the background noise section, the plurality of parameters are output after a predetermined time.

7. Speech coded data transmitted by coding an input signal consisting of a voice signal section divided into a voiced sound section or an unvoiced sound section and a background noise section at a variable rate based on the determination result of each section. A speech decoding method for decoding, wherein during the background noise section, the background noise is generated using a plurality of types of parameters transmitted at a predetermined time interval, and the past background noise is generated during the predetermined time interval. A speech decoding method characterized by generating the background noise using the parameters sent to the speech decoding device.

8. A short-term prediction residual calculation step for obtaining a short-term prediction residual of an input audio signal, and a sine wave analysis encoding step for performing sine-wave analysis encoding of the obtained short-term prediction residual. 8. The audio decoding method according to claim 7, wherein the audio signal is generated by an audio encoding step including a waveform encoding step of encoding the input audio signal by a waveform encoding step.

9. The audio encoded data is encoded by the sine wave analysis encoding step when the audio signal section is voiced, and is encoded by the waveform encoding step when the audio signal section is unvoiced. 9. The speech decoding method according to claim 8, wherein

10. The method according to claim 7, wherein the past parameter used to generate the background noise during the predetermined time is at least a short-term prediction coding coefficient calculated by the short-term prediction residual calculation step. Audio decoding method.

11. The background noise is generated according to a difference between a previous value of an encoded output from the waveform encoding step and a previous value of the plurality of types of parameters transmitted after the predetermined time. The speech decoding method according to claim 7, wherein

12. The speech decoding method according to claim 11, wherein an encoded output from said waveform encoding step is a gain index based on a short-term predicted encoding coefficient.

13. An audio signal section divided into a voiced sound section or an unvoiced sound section and an input signal composed of a background noise section are encoded at a variable rate based on a result of determination of each section, and transmitted speech encoded data is transmitted. A speech decoding apparatus for decoding, wherein during the background noise section, the background noise is generated using a plurality of types of parameters transmitted at a predetermined time interval, and the background noise is generated during the predetermined time interval. And a speech decoding unit for generating the background noise using the parameters sent to the speech decoding apparatus.