JP3537008B2

JP3537008B2 - Speech coding communication system and its transmission / reception device.

Info

Publication number: JP3537008B2
Application number: JP20180995A
Authority: JP
Inventors: 功手嶋
Original assignee: Hitachi Kokusai Electric Inc
Current assignee: Hitachi Kokusai Electric Inc
Priority date: 1995-07-17
Filing date: 1995-07-17
Publication date: 2004-06-14
Anticipated expiration: 2015-07-17
Also published as: JPH0934499A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声符号化通信方
式に関し、特に、音声信号をディジタル信号の形式に変
換して伝送する場合において、情報量を削減することに
よって伝送速度を低速化し狭帯域化を図り伝送帯域の有
効利用を図る分析合成系の音声符号化通信方式とその送
受信装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coded communication system, and more particularly, to a technique for converting a speech signal into a digital signal format and transmitting the signal, thereby reducing the amount of information to reduce the transmission speed and narrow the bandwidth. Speech Encoding Communication System for Analysis and Synthesis System for Efficient Use of Transmission Bandwidth and Its Transmission
It relates to a receiving device .

【０００２】[0002]

【従来の技術】電波資源の有効利用を目的とした低速度
音声符号化において、数kbps程度の音声符号化方式は、
ＣＥＬＰ（code excited linear prediction) 符号化方
式に代表されるハイブリッド符号化が主流であり、良好
な再生音声品質が実現されている。図１はＣＥＬＰ符号
化方式の要部概念図である。ＣＥＬＰ符号化方式は、コ
ード駆動ＬＰＣ符号化といわれ、コードブック内の雑音
を音源として音声合成フィルタを駆動する方式であり、
入力音声と合成音声との間の誤差信号が最小となる最適
雑音駆動信号を抽出し、そのコードインデックスと利得
情報Ｇをパラメータとして多重化して伝送する。2. Description of the Related Art In low-speed speech coding for effective use of radio wave resources, a speech coding scheme of about several kbps is used.
Hybrid encoding typified by CELP (code excited linear prediction) encoding is the mainstream, and good reproduction audio quality is realized. FIG. 1 is a conceptual diagram of a main part of the CELP coding method. The CELP coding method is called code-driven LPC coding, and is a method of driving a speech synthesis filter using noise in a codebook as a sound source.
An optimum noise drive signal that minimizes an error signal between the input voice and the synthesized voice is extracted, and the code index and the gain information G are multiplexed as parameters and transmitted.

【０００３】一方、２４００bps 程度の領域では、ＬＰ
Ｃ（linear predictive coding：線形予測分析符号化）
方式に代表される分析合成形符号化が主流である。図２
はＬＰＣのブロック図である。ＬＰＣ方式は、無声／有
声の識別をもとに駆動音源として雑音あるいはピッチ周
期を持つ単一パルスを切換えて用いるものであり、ＬＰ
Ｃ係数とピッチ周期及び電力情報を符号化して伝送す
る。On the other hand, in the region of about 2400 bps, LP
C (linear predictive coding)
The analysis-synthesis type coding represented by the system is mainstream. FIG.
Is a block diagram of the LPC. The LPC system uses a single pulse having a noise or a pitch cycle as a driving sound source by switching based on unvoiced / voiced discrimination.
The C coefficient, pitch period and power information are encoded and transmitted.

【０００４】[0004]

【発明が解決しようとする課題】しかし、上記従来の構
成では、ＬＰＣのブロック図から分かるように分析合成
符号化は、音源に割り当てられる情報量が限られるた
め、音源の単純なモデル化を行うことから一般に再生音
声は低品質であるという欠点がある。また、ビットレー
トを２４００bps 以下に落とす場合は、分析フレーム長
を伸ばす必要があることから、さらに品質が低下すると
いう欠点がある。本発明は上記の事情に鑑みてなされた
もので、圧縮率が高く、自然性の高い音声符号化通信方
式を提供することを目的とする。However, in the above-mentioned conventional configuration, as can be seen from the LPC block diagram, in analysis-synthesis coding, since the amount of information allocated to the sound source is limited, simple modeling of the sound source is performed. Therefore, there is a disadvantage that the reproduced sound is generally of low quality. Further, when the bit rate is reduced to 2400 bps or less, there is a disadvantage that the quality is further reduced because the analysis frame length needs to be increased. The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a speech encoding communication system having a high compression rate and a high naturalness.

【０００５】[0005]

【課題を解決するための手段】本発明の音声符号化通信
方式は、離散化されフレーム化された入力音声信号のフ
レーム単位の線スペクトル対パラメータとコードブック
に予めトレーニングによって求められ格納されているコ
ードベクトルとをパターン照合して選択した最適ベクト
ルのコードインデックスと、前記入力音声信号から前記
線スペクトル対パラメータを除いたフレーム単位の音源
信号の有声／無声情報，有声区間のピッチ周期，電力情
報とを多重化して送出する分析合成系の音声符号化通信
方式において、前記フレーム単位の音源信号を時間軸上
で複数のサブフレームに分割する分割手段と、該複数の
サブフレーム毎の有声／無声を判定し有声区間のピッチ
周期を抽出してその中から代表ピッチ周期を検出して符
号化する代表ピッチ周期情報出力手段と、前記各サブフ
レームの有声区間のピッチ周期と前記代表ピッチ周期と
の差分を量子化し有声／無声情報とともに符号化する複
数のピッチ周期差分情報出力手段と、前記複数のサブフ
レーム毎の電力を算出しその中から代表電力を検出して
符号化する代表電力情報出力手段と、前記各サブフレー
ムの電力と前記代表電力との差分を量子化する複数の電
力差分情報出力手段と、前記コードインデックスと、前
記各サブフレームの有声／無声情報，前記代表ピッチ周
期情報，前記複数のピッチ周期差分情報，前記代表電力
情報，前記複数の電力差分情報とを多重化して伝送路に
送出する多重化手段とを備え、フレーム毎の線スペクト
ル対パラメータの最適ベクトルのコードインデックス
と、前記線スペクトル対パラメータを除くフレーム毎の
音源情報が複数分割されたサブフレームの代表ピッチ周
期情報と、有声区間のピッチ周期差分情報及び有声／無
声情報と、サブフレームの代表電力情報と各サブフレー
ムの電力差分情報とが多重化された音声符号化信号とし
て伝送するようにしたことを特徴とするものである。ま
た、本発明の音声符号化送信装置は、離散化されフレー
ム化された入力音声信号から線スペクトル対パラメータ
を抽出する線スペクトル対パラメータ分析器と、予め線
スペクトル対パラメータをトレーニングにより学習させ
てコードベクトルとして保持したコードブックと、前記
線スペクトル対パラメータ分析器によって抽出された線
スペクトル対パラメータと前記コードブックのコードベ
クトルとをパラメータ照合して選択した最適ベクトルの
コードインデックスを出力するベクトル量子化器と、前
記入力音声信号から前記線スペクトル対パラメータを除
いた音源信号をフレーム単位に算出する逆フィルタと、
該フレーム単位の音源信号を時間軸上で複数のサブフレ
ームに分割するサブフレーム分割器と、該分割された各
サブフレームの有声／無声を判定し有声区間のピッチ周
期を抽出する複数のサブフレームピッチ抽出器と、該抽
出された前記各サブフレームの有声区間のピッチ周期の
中から代表ピッチ周期を決定する代表ピッチ検出器と、
該代表ピッチ周期を符号化して代表ピッチ周期情報を出
力する代表ピッチ符号器と、前記代表ピッチ周期と前記
各サブフレームの有声区間のピッチ周期との差分をそれ
ぞれ量子化して有声／無声情報とともに符号化したピッ
チ周期差分情報を出力する複数のサブフレームピッチ差
分量子化器と、前記分割された各サブフレームの電力を
算出する複数のサブフレーム電力抽出器と、該算出され
た各サブフレームの電力の中から代表電力を決定する代
表電力検出器と、該代表電力を符号化して代表電力情報
を出力する代表電力符号器と、前記代表電力と前記各サ
ブフレームの電力との差分をそれぞれ量子化した電力差
分情報を出力する複数のサブフレーム電力差分量子化器
と、前記コードインデックスと、前記代表ピッチ周期情
報と、前記ピッチ周期差分情報及び有声／無声情報と、
前記代表電力情報と、前記電力差分情報とを多重化して
伝送路に送出する多重化器とが備えられたものである。
また、本発明の音声符号化受信装置は、伝送路を介し
て、フレーム毎の線スペクトル対パラメータの最適ベク
トルのコードインデックスと、前記線スペクトル対パラ
メータを除くフレーム毎の音源情報が複数分割されたサ
ブフレームの代表ピッチ周期情報と有声区間のピッチ周
期差分情報及び有声／無声情報と、サブフレームの代表
電力情報と各サブフレームの電力差分情報とが多重化さ
れた音声符号化信号を受信して復号し再生音声を出力す
る音声符号化受信装置であって、前記音声符号化信号
を、前記コードインデックスと、前記代表ピッチ周期情
報と、前記各サブフレームの有声区間のピッチ周期差分
情報及び有声／無声情報と、前記代表電力情報と、前記
各サブフレームの電力差分情報とに分離出力する分離器
と、分離された前記代表ピッチ周期情報を復号して代表
ピッチ周期を出力する代表ピッチ復号器と、該代表ピッ
チ周期を基準として前記各サブフレームのピッチ周期差
分情報を復号して前記各サブフレームの有声区間のピッ
チ周期を出力する複数のサブフレームピッチ復号器と、
分離された前記代表電力情報を復号して代表電力を出力
する代表電力復号器と、該代表電力を基準として前記各
サブフレームの電力差分情報を復号して各サブフレーム
の電力を出力する複数のサブフレーム電力復号器と、前
記サブフレームが有声のとき前記復号されたピッチ周期
と電力とから音源信号を再生する複数のピッチ再生器
と、前記サブフレームが無声のとき前記復号された電力
に対応する音源信号となる雑音を出力するノイズ発生器
と、前記分離器から出力される前記各サブフレームの有
声／無声情報とピッチ周期差分情報によりサブフレーム
単位に有声／無声情報に従って前記複数のピッチ再生器
または前記ノイズ発生器の出力を切替え出力する切替器
と、前記分離器から出力される前記コードインデックス
に対応する線スペクトル対パラメータを、送信側と同じ
内容の線スペクトル対パラメータコードブックから読み
出して出力するコードブック探索器と、前記切替器から
出力される有声または無声音源信号と前記コードブック
探索器から出力される線スペクトル対パラメータとから
音声信号を再生出力する合成フィルタとが備えられたも
のである。According to the speech coded communication system of the present invention, a discretized and framed line spectrum pair parameter of a frame unit of an input speech signal and a code book are previously obtained and stored by training. A code index of an optimal vector selected by pattern matching with a code vector, voiced / unvoiced information of a sound source signal in a frame unit obtained by removing the line spectrum pair parameter from the input voice signal, a pitch period of voiced section, and power information. the in the analysis synthesis system speech coding communication method for transmitting by multiplexing, dividing means for dividing a sound source signal of the frame into a plurality of subframes on the time axis, for each of said plurality of subframes A representative which judges voiced / unvoiced, extracts a pitch period of a voiced section, detects a representative pitch period from among them, and encodes it. A pitch period information output means, each subfolder
A plurality of pitch period difference information output means for quantizing the difference between the pitch period of the voiced section of the frame and the representative pitch period and encoding the difference together with voiced / unvoiced information; and calculating the power for each of the plurality of subframes. representative power information output means for encoding and detecting a representative power, a plurality of power difference information output means for quantizing a difference between the power and the representative power of each sub-frame, and the code index, each sub with voiced / unvoiced information of the frame, the representative pitch period information, the plurality of pitch period difference information, the representative power information, and multiplexing means for transmitting to a transmission path by multiplexing said plurality of power difference information, the frame Line specs for each
Code index of the optimal vector of
And for each frame excluding the line spectrum pair parameter
Representative pitch circumference of a subframe in which sound source information is divided into multiple parts
Period information, pitch period difference information of voiced sections, and voiced / unvoiced
Voice information, representative power information of the subframe, and each subframe.
And the power difference information of the
Transmission . Ma
Further , the speech coded transmission apparatus of the present invention includes a line spectrum pair parameter analyzer for extracting a line spectrum pair parameter from a discretized and framed input speech signal, and a line spectrum pair parameter which is learned by training in advance. A codebook stored as a vector, a vector quantizer that outputs a code index of an optimal vector selected by performing parameter matching between a line spectrum pair parameter extracted by the line spectrum pair parameter analyzer and a code vector of the codebook. And an inverse filter that calculates a sound source signal obtained by removing the line spectrum pair parameter from the input audio signal in frame units;
A sub-frame divider that divides the sound source signal of each frame into a plurality of sub-frames on a time axis, and determines voiced / unvoiced of each of the divided sub-frames and extracts a pitch period of a voiced section. a plurality of sub-frame pitch extractor, a representative pitch detector for determining a representative pitch period from the pitch period of the extracted issued said voiced interval of each sub-frame,
A representative pitch encoder for outputting a representative pitch cycle information surrogate table pitch period is coded, the said representative pitch cycle
A plurality of subframe pitch difference quantizers each of which quantizes a difference from a pitch period of a voiced section of each subframe and outputs encoded pitch period difference information together with voiced / unvoiced information; A plurality of sub-frame power extractors for calculating the power of each of the sub-frames; a representative power detector for determining the representative power from the calculated power of each sub-frame; and encoding the representative power to output representative power information. a representative power encoder, a plurality of sub-frame differential power quantizer for outputting a power difference information a difference between the representative power and the power of each sub-frame and each quantized, and the code index, the representative pitch cycle Information , said pitch period difference information and voiced / unvoiced information;
The representative power information, the power difference information and is the ash provided with a multiplexer to be transmitted to the transmission path by multiplexing.
Further, in the speech coded receiving device of the present invention, the code index of the optimal vector of the line spectrum pair parameter for each frame and the sound source information for each frame excluding the line spectrum pair parameter are divided into a plurality via the transmission path. Receiving a speech coded signal in which the representative pitch period information of the subframe, the pitch period difference information of the voiced section and the voiced / unvoiced information, and the representative power information of the subframe and the power difference information of each subframe are multiplexed; A speech coded receiving apparatus for decoding and outputting a reproduced speech, wherein the speech coded signal includes the code index, the representative pitch cycle information, pitch cycle difference information of a voiced section of each subframe, and voiced / unvoiced information, the representative power information, the a separator for separating output to the power difference information for each sub-frame, before separated The pitch period of the representative pitch and decoder, voiced segments of said each subframe decodes the pitch period difference information of each subframe, based on the surrogate table pitch period to output a representative pitch cycle decodes the representative pitch cycle information A plurality of subframe pitch decoders that output
A representative power decoder outputs a representative power by decoding the separated the representative power information, a plurality of output power of said each sub-frame by decoding the power difference information for each sub-frame based on the surrogate table power Subframe power decoder and previous
A plurality of pitch regenerator serial subframe reproduces the sound source signal from said decoded pitch period and power when voiced, noise the subframe is the sound source signal corresponding to the decoded power when silent An output noise generator, and the plurality of pitch reproducers or the noise generator according to voiced / unvoiced information in subframe units based on voiced / unvoiced information and pitch cycle difference information of each of the subframes output from the separator. A switch for switching the output, and a codebook searcher for reading out and outputting a line spectrum pair parameter corresponding to the code index output from the separator from the line spectrum pair parameter codebook having the same content as that of the transmission side; A voiced or unvoiced sound source signal output from the switcher and output from the codebook searcher. Provided with a synthesis filter for reproducing and outputting an audio signal and a line spectrum pair parameter is ash of.

【０００６】すなわち、本発明では、上記課題を解決す
るために、分析合成形の音声符号化方式の音源のモデル
化を時間軸上で細かくすることで有声音／無声音の判定
を細かく行い、有声音の場合ピッチの時間変化を細かく
表現することにより再生音声品質を向上し自然性の改善
を行ったことを要旨とする。That is, in the present invention, in order to solve the above-mentioned problem, voiced sound / unvoiced sound is finely determined by finely modeling the sound source of the analysis / synthesis type speech coding method on the time axis. In the case of vocal sounds, the gist is that the temporal change in pitch is finely expressed to improve the quality of reproduced voice and improve naturalness.

【０００７】[0007]

【発明の実施の形態】上記本発明の音声符号化通信方式
では、まず、離散化されフレーム化された入力音声信号
を、スペクトル分析を用いてスペクトルパラメータを抽
出する。このスペクトルパラメータを用いて入力音声信
号を逆フィルタを通して予測残差信号すなわち音源信号
を算出し、音源信号を複数のサブフレームに分割する。
分割されたそれぞれのサブフレームごとに有声音／無声
音の判定を行って有声音のときのピッチ周期の検出を行
う。検出された複数のピッチ周期の中から基準となる代
表ピッチを選択して符号化し、その代表ピッチと各サブ
フレームの検出ピッチとの差分を量子化してそれぞれ多
重化器に入力する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the above-mentioned speech coded communication system of the present invention, first, spectrum parameters are extracted from an input speech signal discretized and framed by using spectrum analysis. Using the spectral parameters, the input speech signal is subjected to an inverse filter to calculate a prediction residual signal, that is, a source signal, and the source signal is divided into a plurality of subframes.
A voiced sound / unvoiced sound is determined for each of the divided subframes, and a pitch cycle for voiced sound is detected. A representative representative pitch is selected and encoded from a plurality of detected pitch periods, and the difference between the representative pitch and the detected pitch of each subframe is quantized and input to the multiplexer.

【０００８】一方、分割されたサブフレームごとのパワ
ー（電力）を自乗和によって計算し、検出された複数の
パワーの中から最大なものを選択し基準パワー（代表電
力）として符号化し、その代表電力と各サブフレームの
計算されたパワーとの差分を量子化してそれぞれ多重化
器に入力する。On the other hand, the power (power) of each of the divided subframes is calculated by the sum of squares, the largest one is detected from a plurality of detected powers, encoded as a reference power (representative power), and the representative power is selected. The difference between the power and the calculated power of each subframe is quantized and input to a multiplexer.

【０００９】さらに、スペクトルパラメータはトレーニ
ングによって予め求められコードブックに格納されてい
るコードベクトルとのパターン照合がなされ最適なベク
トルが選択されそのインデックスが多重化器に入力され
る。以上の多重化器入力を多重化して伝送路へ送出す
る。Further, the spectral parameters are obtained in advance by training, are subjected to pattern matching with code vectors stored in a code book, and an optimum vector is selected, and its index is input to a multiplexer. The above multiplexer input is multiplexed and transmitted to the transmission path.

【００１０】伝送路を介して受信した上記多重化信号の
復号処理では、受信された基準となる代表ピッチを復号
し、差分量子化されたそれぞれのサブフレームのピッチ
を復号し、基準となる代表パワーを復号し、差分量子化
されたそれぞれのサブフレームパワーを復号し、音源信
号を再生する。さらに、受信されたインデックスのコー
ドベクトルをコードブック中から読み出しスペクトルパ
ラメータとし、上記再生された音源信号を、合成フィル
タによって合成して再生音声出力を得る。In the decoding process of the multiplexed signal received via the transmission path, the received representative representative pitch is decoded, the pitch of each differentially quantized subframe is decoded, and the representative representative frame is decoded. The power is decoded, the differentially quantized subframe power is decoded, and the sound source signal is reproduced. Further, the code vector of the received index is read out from the code book as a spectral parameter, and the reproduced sound source signal is synthesized by a synthesis filter to obtain a reproduced sound output.

【００１１】[0011]

【実施例】図３，図４は本発明の実施例を示すブロック
図であり、図３は送信側構成を示し、図４は受信側構成
を示す。図３，図４に従って説明する。送信側の図３に
おいて、１はＬＳＰパラメータ分析器、２は逆フィル
タ、３はサブフレーム分割器、４は３分割されたサブフ
レームそれぞれのピッチ抽出器、５は代表ピッチ検出
器、６は代表ピッチ符号器、７は３分割されたサブフレ
ームそれぞれのピッチ差分量子化器、８は３分割された
サブフレームそれぞれの電力抽出器、９は代表電力検出
器、１０は代表電力符号器、１１は３分割されたサブフ
レームそれぞれの電力差分量子化器、１４は多重化器で
ある。3 and 4 are block diagrams showing an embodiment of the present invention. FIG. 3 shows a configuration on the transmission side, and FIG. 4 shows a configuration on the reception side. This will be described with reference to FIGS. In FIG. 3 on the transmitting side, 1 is an LSP parameter analyzer, 2 is an inverse filter, 3 is a subframe divider, 4 is a pitch extractor for each of the divided subframes, 5 is a representative pitch detector, 6 is a representative pitch detector. A pitch encoder, 7 is a pitch difference quantizer for each of the divided subframes, 8 is a power extractor for each of the divided subframes, 9 is a representative power detector, 10 is a representative power encoder, and 11 is a representative power encoder. A power difference quantizer 14 for each of the three divided subframes is a multiplexer.

【００１２】受信側の図４において、１５は分離器、１
６は代表ピッチ復号器、１７は切替器、１８は３分割さ
れたサブフレームそれぞれのピッチ復号器、１９は代表
電力復号器、２０は３分割されたサブフレームそれぞれ
の電力復号器、２１は３分割されたサブフレームそれぞ
れのピッチ再生器、２２はノイズ発生器、２３はコード
ブック探索器、１２は送信側と同じ内容のコードブッ
ク、２４は係数補間器、２５は合成フィルタである。In FIG. 4 on the receiving side, 15 is a separator, 1
6 is a representative pitch decoder, 17 is a switch, 18 is a pitch decoder for each of the divided subframes, 19 is a representative power decoder, 20 is a power decoder for each of the divided subframes, and 21 is 3 Reference numeral 22 denotes a noise generator, reference numeral 23 denotes a codebook searcher, reference numeral 12 denotes a codebook having the same contents as those on the transmission side, reference numeral 24 denotes a coefficient interpolator, and reference numeral 25 denotes a synthesis filter .

【００１３】図３の送信側において、まず音声信号を離
散化しフレーム化する。分析フレーム長は４５ms程度と
し低ビットレート化し易くする。本発明ではスペクトル
パラメータ分析にＬＳＰ（line spectrum pair：線スペ
クトル対）分析を用いた。すなわち、ＬＳＰパラメータ
分析器１によって、フレーム単位に音声信号からＬＳＰ
パラメータを取り出す。逆フィルタ２は、入力音声信号
からＬＳＰパラメータ分析器１により求めたＬＳＰパラ
メータを取り除いた音源信号（予測残差）を算出する。
次に、この音源信号を分析しモデル化するが、自然性を
保持するため予測残差を複数のサブフレームに分割し、
それぞれのサブフレームで有声／無声判定を行って、有
声区間のピッチ周期の検出を行う。そこでサブフレーム
分割器３によって音源信号を複数のサブフレームに分割
する。分割数は２〜５分割がよいが、本実施例では３分
割とする。On the transmitting side shown in FIG. 3, the audio signal is first discretized and framed. The analysis frame length is about 45 ms to make it easy to reduce the bit rate. In the present invention, LSP (line spectrum pair) analysis was used for spectral parameter analysis. That is, the LSP parameter analyzer 1 converts the audio signal into LSPs in units of frames.
Retrieve a parameter. The inverse filter 2 calculates a sound source signal (prediction residual) by removing the LSP parameter obtained by the LSP parameter analyzer 1 from the input speech signal.
Next, the sound source signal is analyzed and modeled. In order to maintain naturalness, the prediction residual is divided into a plurality of subframes.
Voice / unvoice determination is performed in each subframe to detect a pitch period of a voiced section. Therefore, the sub-frame divider 3 divides the sound source signal into a plurality of sub-frames. The number of divisions is preferably 2 to 5 divisions, but in this embodiment, it is 3 divisions.

【００１４】次に、分割されたサブフレーム毎に設けた
３つのサブフレームピッチ抽出器４によってそれぞれ有
声／無声判定とピッチ周期を検出する。従来は、検出さ
れたピッチ周期をすべて符号化しているが、それでは音
源に割り当てる情報量が大幅に増えてしまい低ビットレ
ート化に寄与できない。そこで、各サブフレーム毎に求
めた有声のピッチを比較し代表ピッチを１つ決定して符
号化伝送し、他のピッチは量子化された代表ピッチとの
差分量、あるいは無声情報を符号化して伝送する。すな
わち、代表ピッチ検出器５によって、サブフレームピッ
チ抽出器４で求めた各サブフレームのピッチ周期の中か
ら代表ピッチを決定する。代表ピッチの選択方法は、最
大値，中間値，最小値，平均値といろいろ考えられる
が、中間値，平均値を代表値とした場合には、±両方の
差分を考慮する必要があるため本発明では最大値を代表
値と設定する。このようにして検出した代表ピッチを代
表ピッチ符号器６で符号化する。Next, a voiced / unvoiced determination and a pitch period are detected by three subframe pitch extractors 4 provided for each of the divided subframes. Conventionally, all detected pitch periods are coded, but this does not contribute to lowering the bit rate because the amount of information allocated to the sound source is greatly increased. Therefore, the voiced pitches obtained for each subframe are compared, one representative pitch is determined and encoded and transmitted, and the other pitches are obtained by encoding the difference between the quantized representative pitch and unvoiced information. Transmit. That is, the representative pitch is determined by the representative pitch detector 5 from the pitch cycle of each subframe obtained by the subframe pitch extractor 4. There are various methods for selecting the representative pitch, such as a maximum value, an intermediate value, a minimum value, and an average value. However, when the intermediate value and the average value are used as the representative values, it is necessary to consider the difference between both ±. In the present invention, the maximum value is set as the representative value. The representative pitch detected in this way is encoded by the representative pitch encoder 6.

【００１５】一方、３つのサブフレームピッチ差分量子
化器７は、代表ピッチ符号器６で符号化された代表ピッ
チと各サブフレームの検出ピッチ周期との差分をそれぞ
れ量子化し、かつ、有声／無声情報と共に符号化する。On the other hand, the three sub-frame pitch difference quantizers 7 quantize the difference between the representative pitch coded by the representative pitch encoder 6 and the detected pitch period of each sub-frame, respectively, and perform voiced / unvoiced operations. Encode with information.

【００１６】差分量の量子化に当たっては、低ビットレ
ート化に寄与するために割り当てビット数を考慮しなけ
ればならない。各サブフレーム間の差分量は小さいので
割り当てビット数は２ビット程度で十分である。そこで
本実施例では、差分量と有声音／無声音のフラグに２ビ
ットを割り当て、無声音のときは二進数で“００”、差
分量が０〜−１のとき“０１”、差分量が−２〜−３の
とき“１０”、差分量が−４以上のき“１１”とするこ
とで２ビットで表した。この方法でサブフレームごとの
音源信号を符号化することにより情報量をあまり増加さ
せることなく、時間経過に伴うピッチの揺らぎを表現し
自然性を向上した。In quantizing the difference, the number of allocated bits must be considered in order to contribute to a lower bit rate. Since the difference between the subframes is small, the number of allocated bits is about 2 bits. Therefore, in the present embodiment, two bits are assigned to the difference amount and the voiced / unvoiced sound flag. For an unvoiced sound, the binary number is “00”, when the difference amount is 0 to −1, “01”, and the difference amount is −2. It is represented by 2 bits by setting “10” when the difference is −3 and “11” when the difference is -4 or more. By encoding the excitation signal for each sub-frame by this method, the fluctuation of the pitch with the passage of time is expressed without increasing the amount of information so much that the naturalness is improved.

【００１７】一方、サブフレーム分割器３によって分割
された複数（３つ）のサブフレームの音源信号を３つの
サブフレーム電力算出器８にそれぞれ入力し、サブフレ
ームデータの自乗和で電力を算出する。電力の符号化に
当たってもピッチの場合と同様に、予め代表値を求めて
符号化し、さらに差分の量子化を行う。すなわち、３つ
のサブフレーム電力算出器８でそれぞれ求めたサブフレ
ームごとの電力情報の中から代表電力検出器９によって
代表電力を決定する。決定された代表電力を代表電力符
号器１０によって符号化する。３つのサブフレーム電力
差分量子化器１１は、代表電力符号器１０によって符号
化された代表電力と各サブフレームの検出電力との差分
を量子化する。差分量の量子化に当たっては、ピッチの
場合と同様に、各サブフレーム間の差分量は小さいので
割り当てビット数を２ビットとした。On the other hand, the excitation signals of a plurality of (three) subframes divided by the subframe divider 3 are input to three subframe power calculators 8, respectively, and the power is calculated by the sum of squares of the subframe data. . As in the case of pitch, even when encoding power, a representative value is obtained and encoded in advance, and the difference is quantized. That is, the representative power is determined by the representative power detector 9 from the power information for each subframe obtained by the three subframe power calculators 8. The determined representative power is encoded by the representative power encoder 10. The three subframe power difference quantizers 11 quantize the difference between the representative power encoded by the representative power encoder 10 and the detected power of each subframe. In quantizing the difference amount, the number of allocated bits is set to 2 bits, as in the case of the pitch, since the difference amount between the subframes is small.

【００１８】次に、ＬＳＰパラメータ分析器１で求めた
ＬＳＰパラメータのベクトル量子化を行う。ベクトル量
子化に当たっては、予めＬＳＰパラメータをトレーニン
グ信号により学習させ、コードベクトルとしてＬＳＰパ
ラメータコードブック１２に保持させておく。ベクトル
量子化器１３は、ＬＳＰパラメータ分析器１で求めたＬ
ＳＰパラメータと、ＬＳＰパラメータコードブック１２
のコードベクトルとを照合し、最適なベクトルを選択し
てコードインデックスを求める。Next, vector quantization of the LSP parameters obtained by the LSP parameter analyzer 1 is performed. In vector quantization, LSP parameters are learned in advance by a training signal, and are stored in the LSP parameter codebook 12 as code vectors. The vector quantizer 13 calculates the L obtained by the LSP parameter analyzer 1
SP parameter and LSP parameter code book 12
, And an optimal vector is selected to obtain a code index.

【００１９】多重器１４は、ベクトル量子化器１３から
のコードインデックスと、代表ピッチ符号器６で符号化
された代表ピッチ情報と、サブフレームピッチ差分量子
化器７によって符号化された各サブフレームのピッチ周
期兼有声／無声情報と、代表電力符号器１０で符号化さ
れた代表電力情報と、サブフレーム電力差分量子化器１
１によって符号化された各サブフレームの電力情報とを
多重化して伝送路に出力する。The multiplexer 14 includes a code index from the vector quantizer 13, representative pitch information encoded by the representative pitch encoder 6, and each subframe encoded by the subframe pitch difference quantizer 7. Pitch information and voiced / unvoiced information, the representative power information encoded by the representative power encoder 10, and the subframe power difference quantizer 1
1 is multiplexed with the power information of each subframe coded by 1 and output to the transmission path.

【００２０】次に、図４の受信側について説明する。伝
送路を介して受信した多重化信号は分離器１５に入力さ
れ、符号化された代表ピッチ情報と、各サブフレームの
符号化されたサブフレームピッチ差分ピッチ周期兼有声
／無声情報と、符号化された代表電力情報と、符号化さ
れた各サブフレームのサブフレーム電力差分と、コード
インデックスとに分離出力される。Next, the receiving side of FIG. 4 will be described. The multiplexed signal received via the transmission path is input to the demultiplexer 15 and coded representative pitch information, coded subframe pitch difference pitch cycle and voiced / unvoiced information of each subframe, and The obtained representative power information, the subframe power difference of each coded subframe, and the code index are output separately.

【００２１】代表ピッチ復号器１６は符号化された代表
ピッチ情報を復号する。切替器１７は、差分量子化され
有声／無声情報も兼ねた差分量子化サブフレームピッチ
情報からサブフレームごとに有声／無声情報にしたがっ
て、３つのピッチ再生器２１の出力もしくはノイズ発生
器２２の出力を切り替え出力する。有声のときサブフレ
ームピッチ復号器１８は、代表ピッチを基準として差分
量子化されたサブフレームピッチ情報を復号する。The representative pitch decoder 16 decodes the encoded representative pitch information. The switch 17 outputs the output of the three pitch reproducers 21 or the output of the noise generator 22 according to the voiced / unvoiced information for each subframe from the differentially quantized subframe pitch information which is also differentially quantized and also serves as voiced / unvoiced information. And output. When voiced, the subframe pitch decoder 18 decodes the differentially quantized subframe pitch information based on the representative pitch.

【００２２】代表電力復号器１９は符号化された代表電
力情報を復号する。サブフレーム電力復号器２０は、復
号された代表電力を基準として差分量子化されたサブフ
レーム電力情報を復号する。有声音のときは、3 つの場
合ピッチ再生器２１は、サブフレームピッチ情報とサブ
フレーム電力情報から音源信号を再生し、無声音のとき
はノイズ発生器２２によってサブフレーム電力情報から
音源信号を再生する。The representative power decoder 19 decodes the encoded representative power information. The subframe power decoder 20 decodes the differentially quantized subframe power information based on the decoded representative power. In the case of voiced sound, in three cases, the pitch reproducer 21 reproduces the sound source signal from the subframe pitch information and the subframe power information, and in the case of unvoiced sound, the noise generator 22 reproduces the sound source signal from the subframe power information. .

【００２３】コードブック探索器２３は、コードブック
１２のコードを参照して、コードインデックスをＬＳＰ
パラメータコードブックの対応するＬＳＰパラメータに
置き換える。合成フィルタ２５は、切替器１７から出力
される有声もしくは無声音源信号と、係数補間されたＬ
ＳＰパラメータとから音声信号を再生出力する。４５ms
程度と長い分析フレームによってスペクトル包絡がなま
ると考えられるため、ＬＳＰパラメータ補間器２４によ
って、分析フレームの前半１／３フレームにわたって、
過去のフレームのＬＳＰパラメータと現在のフレームの
ＬＳＰパラメータの線形補間を行うことで劣化を抑え
る。The codebook search unit 23 refers to the code in the codebook 12 and sets the code index to the LSP.
Replace with the corresponding LSP parameter in the parameter codebook. The synthesis filter 25 includes a voiced or unvoiced sound source signal output from the switch 17 and a coefficient-interpolated L
The audio signal is reproduced and output from the SP parameter. 45ms
Because the spectral envelope is considered to be distorted by the analysis frame having a length as long as possible, the LSP parameter interpolator 24 allows the analysis to be performed over the first half of the analysis frame.
Degradation is suppressed by performing linear interpolation between the LSP parameter of the past frame and the LSP parameter of the current frame.

【００２４】[0024]

【発明の効果】以上述べたように、本発明によれば、音
声信号をディジタル信号の形式に変換して伝送する音声
符号化通信において、分析合成形の音声符号化方式の音
源のモデル化を時間軸上で細かくすることによって、音
声の持つピッチの時間変化を表現し、さらに有声音／無
声音の判定も細かくできるため、低ビットレートにおけ
る受信再生音声品質の劣化が軽減されて自然性の改善が
行えるので実用上の効果は大きい。As described above, according to the present invention, in a voice coded communication for converting a voice signal into a digital signal format and transmitting the digital signal, modeling of a sound source of an analysis-synthesis type voice coding system is performed. By making finer on the time axis, the time change of the pitch of the voice can be expressed, and the voiced / unvoiced sound can also be finely determined, so that the deterioration of the received and reproduced voice quality at low bit rates is reduced and the naturalness is improved. Therefore, the practical effect is great.

[Brief description of the drawings]

【図１】従来のＣＥＬＰの概念図である。FIG. 1 is a conceptual diagram of a conventional CELP.

【図２】従来のＬＰＣのブロック図である。FIG. 2 is a block diagram of a conventional LPC.

【図３】本発明の実施例（送信側）を示すブロック図で
ある。FIG. 3 is a block diagram showing an embodiment (transmission side) of the present invention.

【図４】本発明の実施例（受信側）を示すブロック図で
ある。FIG. 4 is a block diagram showing an embodiment (receiving side) of the present invention.

[Explanation of symbols]

１ＬＳＰパラメータ分析器２逆フィルタ３サブフレーム分割器４サブフレームピッチ抽出器５代表ピッチ検出器６代表ピッチ符号器７サブフレームピッチ差分量子化器８サブフレーム電力抽出器９代表電力検出器１０代表電力符号器１１サブフレーム電力差分量子化器１２コードブック１３ベクトル量子化器１４多重化器１５分離器１６代表ピッチ復号器１７切替器１８サブフレームピッチ復号器１９代表電力復号器２０サブフレーム電力復号器２１ピッチ再生器２２ノイズ発生器２３コードブック探索器２４係数補間器２５合成フィルタ 1 LSP parameter analyzer 2 Inverse filter 3 Subframe divider 4 Subframe pitch extractor 5 Representative pitch detector 6 Representative pitch encoder 7 Subframe pitch difference quantizer 8 subframe power extractor 9 Representative power detector 10 Representative power encoder 11 Subframe power difference quantizer 12 Codebook 13 Vector Quantizer 14 Multiplexer 15 Separator 16 representative pitch decoder 17 Switch 18 Subframe pitch decoder 19 Representative power decoder 20 subframe power decoder 21 Pitch regenerator 22 Noise generator 23 Codebook Searcher 24 coefficient interpolator 25 Synthesis filter

Claims

(57) [Claims]

1. A code of an optimal vector selected by performing pattern matching on a line spectrum pair parameter in a frame unit of a discretized and framed input speech signal and a code vector previously obtained and stored in a code book by training. An index, voiced / unvoiced information of a sound source signal in frame units obtained by removing the line spectrum pair parameter from the input voice signal, a pitch period of a voiced section,
In analysis-synthesis system speech coding communication method for transmitting by multiplexing and power information, and dividing means for dividing into a plurality of sub-frame sound source signal of the frame on the time axis, voiced each said plurality of sub-frame A representative pitch cycle information output means for judging / unvoiced, extracting a pitch cycle of a voiced section, detecting and encoding a representative pitch cycle therefrom, a pitch cycle of a voiced section of each subframe and the representative pitch cycle A plurality of pitch period difference information output means for quantizing the difference between the subframe and the voiced / unvoiced information, and representative power information for calculating power for each of the plurality of subframes, detecting representative power from among them, and coding and output means, a plurality of power difference information output means the quantizing a difference between power and the representative power of each sub-frame, and the code index, each sub Voiced / unvoiced information frame, the representative pitch period information, the plurality of pitch Chi period difference information, the representative power information, the plurality of power difference information and multiplexing means for transmitting to a transmission path by multiplexing And the optimal vector of the line spectrum versus parameters for each frame
And the line index versus the parameter
Sound source information for each frame excluding data
Lame representative pitch period information and pitch period of voiced section
Difference information, voiced / unvoiced information, and the representative
Power information and power difference information of each subframe are multiplexed.
A voice coded communication system characterized by transmitting a voice coded signal .

2. A line spectrum pair parameter analyzer for extracting a line spectrum pair parameter from a discretized and framed input speech signal, and a codebook in which a line spectrum pair parameter is previously learned by training and stored as a code vector. A vector quantizer that outputs a code index of an optimal vector selected by performing parameter matching between the line spectrum pair parameter extracted by the line spectrum pair parameter analyzer and the code vector of the codebook, and from the input audio signal. and inverse filter to calculate the excitation signal except for the line spectral pair parameters for each frame, and the sub-frame divider for dividing into a plurality of sub-frame sound source signal of the frame on the time axis, each of the sub, which are the divided Judgment of voiced / unvoiced frame and voiced A plurality of sub-frame pitch extractor for extracting a pitch period of the segment, the representative pitch detector for determining a representative pitch period from the pitch period of the voiced of each subframe the extraction, the surrogate table pitch period A representative pitch encoder that encodes and outputs representative pitch cycle information; and a pitch cycle difference that quantizes a difference between the representative pitch cycle and a pitch cycle of a voiced section of each subframe and encodes the difference together with voiced / unvoiced information. A plurality of subframe pitch difference quantizers that output information; a plurality of subframe power extractors that calculate the power of each of the divided subframes; and a representative power among the calculated powers of the subframes. a representative power detector for determining a representative power encoder that outputs a representative power information by encoding the surrogate table power, the representative power and the respective sub A plurality of sub-frame power difference quantizers each outputting a power difference information obtained by quantizing a difference between the power of the frame and the power of the frame; the code index, the representative pitch cycle information, the pitch cycle difference information, and voiced / unvoiced information. And a multiplexer for multiplexing the representative power information and the power difference information and transmitting the multiplexed power information to a transmission path.

3. A code index of an optimal vector of a line spectrum pair parameter for each frame via a transmission path, and representative pitch period information of a subframe obtained by dividing a plurality of sound source information for each frame excluding the line spectrum pair parameter. And a voice encoded signal in which the pitch cycle difference information and voiced / unvoiced information of the voiced section, the representative power information of the subframe, and the power difference information of each subframe are multiplexed and decoded to output a reproduced voice. A speech coded receiving device, comprising: encoding the speech coded signal with the code index, the representative pitch cycle information , pitch cycle difference information of a voiced section of each subframe, and voiced / unvoiced information ; and br /> table power information, wherein the separator for separating output to the power difference information for each sub-frame, a separated the representative pitch period information A representative pitch decoder for outputting a representative pitch cycle in issue, a plurality of outputs the pitch period of the voiced interval of the each sub-frame by decoding the pitch period difference information of each subframe, based on the surrogate table pitch period and the sub-frame pitch decoder, a representative power decoder decodes the separated the representative power information and outputs the representative power, each sub-frame by decoding the power difference information for each sub-frame based on the surrogate table power a plurality of sub-frame power decoder for outputting a power, a plurality of pitch regenerator said subframe to reproduce the sound source signal from said decoded pitch period and power when voiced, when the sub-frame is unvoiced A noise generator that outputs noise serving as a sound source signal corresponding to the decoded power;
A switch for switching and outputting the outputs of the plurality of pitch reproducers or the noise generator in accordance with voiced / unvoiced information in subframe units based on unvoiced information and pitch cycle difference information, and corresponding to the code index output from the separator A line spectrum pair parameter to be read out from the line spectrum pair parameter codebook having the same contents as the transmitting side and output, and a voiced or unvoiced sound source signal output from the switcher and output from the codebook searcher. And a synthesis filter for reproducing and outputting an audio signal from the line spectrum pair parameter to be output.