JPH0229235B2

JPH0229235B2 -

Info

Publication number: JPH0229235B2
Application number: JP57095926A
Authority: JP
Inventors: Satoru Taguchi; Masanori Kobayashi; Takayuki Ishikawa
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1982-06-04
Filing date: 1982-06-04
Publication date: 1990-06-28
Also published as: JPS58211795A

Description

【発明の詳細な説明】本発明は線形予測型音声分析合成装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a linear predictive speech analysis and synthesis device.

音声分析合成装置は近年に至り、線形予測分析
法（LPC法）の確立により実用化された。 Speech analysis and synthesis equipment has recently been put into practical use with the establishment of the linear predictive analysis method (LPC method).

このLPC分析法は音声のスペクトラム包絡を
全極型モデルで近似するものであるが、これに
は、従来、経験的に知られたフオルマント帯域幅
の過小推定と比較的にエネルギの小さい第３フオ
ルマントの近似性が悪いという二つの欠点がある
とされている。 This LPC analysis method approximates the spectral envelope of speech using an all-pole model, but this method requires an underestimation of the formant bandwidth, which is known empirically, and a third formant with relatively small energy. It is said that there are two drawbacks: poor approximation.

上述の第１の欠点は第１フオルマント等エネル
ギの集中する周波数にスペクトラムの極が過度に
集中するために起るものと考えられる。 The above-mentioned first drawback is thought to occur because the poles of the spectrum are excessively concentrated at frequencies where the energy of the first formant is concentrated.

最近、このような特定の周波数に極が集中する
のを防ぐために、音声帯域を複数のサブバンドに
分割し、各サブバンドに対しそれぞれ適当な次数
のLPC分析を行ない極の適宜な分散を計る帯域
分割型線形予測分析法が検討されている。 Recently, in order to prevent such concentration of poles at specific frequencies, the audio band is divided into multiple subbands, and LPC analysis of an appropriate order is performed on each subband to measure the appropriate dispersion of the poles. A band-splitting linear predictive analysis method is being considered.

この方式は、帯域の分割を適当に選ぶことによ
り、上述の第１の欠点ばかりでなく第２の欠点を
も緩和し、LPC型ボコーダの音質改善に貢献す
る有力な手段になる可能性を秘めている。 By appropriately selecting the band division, this method alleviates not only the first drawback but also the second drawback, and has the potential to become an effective means of contributing to improving the sound quality of LPC vocoders. ing.

しかしながら、一般の音声の中で無声音、とく
に摩擦音はもともと急峻なスペクトラム包絡を持
たないので、これに対しては上述の帯域分割を行
なつても効果がないのみならず、帯域分割のため
の余計な処理によりむしろ音質劣化を起す可能性
がある。 However, in general speech, unvoiced sounds, especially fricatives, do not inherently have a steep spectral envelope, so the above-mentioned band division is not only ineffective, but also requires unnecessary band division. In fact, such processing may cause deterioration in sound quality.

本発明の目的は上述の従来の欠点を除去した線
形予測型音声分析合成装置を提供するにある。 SUMMARY OF THE INVENTION An object of the present invention is to provide a linear predictive speech analysis and synthesis device that eliminates the above-mentioned conventional drawbacks.

本発明の装置は入力音声信号を複数の音声伝送
帯域に分割しそれぞれの帯域について線形予測分
析を行う線形予測型音声分析合成装置において、
前記入力音声信号が有声音であれば帯域分割型線
形予測分析を無声音であれば帯域非分割型線形予
測分析を選択的に実行する手段を有する。 The device of the present invention is a linear predictive speech analysis and synthesis device that divides an input speech signal into a plurality of speech transmission bands and performs linear predictive analysis on each band.
The apparatus includes means for selectively performing band-split linear predictive analysis if the input audio signal is a voiced sound, and selectively performs a non-band-splitting linear predictive analysis if the input audio signal is an unvoiced sound.

次に図面を参照して本発明を詳細に説明する。 Next, the present invention will be explained in detail with reference to the drawings.

第１図、第２図および第３図は本発明の一実施
例を示すブロツク図で、第１図は全体を、第２図
は分析側を第３図は合成側をそれぞれ示すブロツ
ク図である。 Figures 1, 2, and 3 are block diagrams showing one embodiment of the present invention; Figure 1 is a block diagram showing the whole, Figure 2 is a block diagram showing the analysis side, and Figure 3 is a block diagram showing the synthesis side. be.

本実施例は第１図に示すように分析側１、合成
側２および伝送路３から構成される。 This embodiment is comprised of an analysis side 1, a synthesis side 2, and a transmission line 3, as shown in FIG.

分析側１はさらに、第２図に示すように、低域
フイルタおよびＡ／Ｄ変換器１０１、ウインドウ
処理器１０２、フーリエ変換器１０３、パワース
ペクトラムメモリ１０４、低域側自己相関係数計
測器１０５、低域側線形予測係数分析器１０６、
高域側自己相関係数計測器１０７、高域側線形予
測係数分析器１０８、全帯域自己相関係数計測器
１０９、全帯域線形予測係数分析器１１０、有
声／無声判別器１１１、ピツチ抽出器１１２およ
び符号化器１１３を有する。 As shown in FIG. 2, the analysis side 1 further includes a low-pass filter and A/D converter 101, a window processor 102, a Fourier transformer 103, a power spectrum memory 104, and a low-pass autocorrelation coefficient measuring device 105. , low-frequency side linear prediction coefficient analyzer 106,
High-frequency autocorrelation coefficient measuring device 107, high-frequency linear prediction coefficient analyzer 108, full-band autocorrelation coefficient measuring device 109, full-band linear prediction coefficient analyzer 110, voiced/unvoiced discriminator 111, pitch extractor 112 and an encoder 113.

また合成側２はさらに、第３図に示すように復
号化器２０１、低域側LPCフイルタ２０２、低
域側補間器２０３、低域側帯域フイルタ２０４、
高域側LPCフイルタ２０５、高域側補間器２０
６、周波数変換器２０７、高域側帯域フイルタ２
０８、低域高域合成器２０９、全帯域LPCフイ
ルタ２１０、ピツチ発生器２１１、雑音発生器２
１２、音源切替器２１３、低域側可変利得増幅器
２１４、高域側可変利得増幅器２１５、全帯域可
変利得増幅器２１６、モード切替器２１７および
Ｄ／Ａ変換器および低域フイルタ２１８を有して
いる。 Furthermore, as shown in FIG.
High-frequency side LPC filter 205, high-frequency side interpolator 20
6. Frequency converter 207, high band filter 2
08, low and high frequency synthesizer 209, full band LPC filter 210, pitch generator 211, noise generator 2
12, includes a sound source switch 213, a low-frequency variable gain amplifier 214, a high-frequency variable gain amplifier 215, a full-band variable gain amplifier 216, a mode switch 217, a D/A converter, and a low-pass filter 218. .

さて、伝送すべき音声波形は第２図に示す分析
側１の音声入力端子１０００より入力され、低域
フイルタおよびＡ／Ｄ変換器１０１の低域フイル
タにより、例えば3333Hz以上の高域成分が遮断さ
れたものが、例えば8000Hzサンプリング周波数に
よりサンプルされ、Ａ／Ｄ変換器により１サンプ
ル当り例えば12ビツトのデジタル信号に量子化さ
れ、ウインドウ処理器１０２に供給される。 Now, the audio waveform to be transmitted is input from the audio input terminal 1000 of the analysis side 1 shown in FIG. The sampled signal is sampled at a sampling frequency of, for example, 8000 Hz, quantized into a digital signal of, for example, 12 bits per sample by an A/D converter, and supplied to the window processor 102.

ウインドウ処理器１０２は、入力した量子化音
声信号を、一旦、内部のメモリに格納する。この
メモリは前記量子化音声信号の例えば30mSEC分
（240サンプル分）を記憶し、これにハミングウイ
ンドウ、または矩形ウインドウ等のウインドウ関
数を乗ずるウインドウ処理を行なう。 The window processor 102 temporarily stores the input quantized audio signal in an internal memory. This memory stores, for example, 30 mSEC (240 samples) of the quantized audio signal, and performs window processing by multiplying this by a window function such as a Hamming window or a rectangular window.

このようなウインドウ処理は、例えば10mSEC
周期で繰り返され、これが基本分析周期（以後基
本フレーム周期）となる。 Such window processing can be done, for example, by 10mSEC
This is repeated periodically, and this becomes the basic analysis period (hereinafter referred to as the basic frame period).

さて、ウインドウ処理された音声波形データ
は、基本フレーム周期ごとに、フーリエ変換器１
０３、有声／無声判別器１１１、およびピツチ抽
出器１１２に供給される。 Now, the window-processed audio waveform data is passed through the Fourier transformer 1 for each basic frame period.
03, the voiced/unvoiced discriminator 111, and the pitch extractor 112.

フーリエ変換器１０３は、入力した前述のウイ
ンドウ処理された音声波形データを用い、これを
フーリエ変換することにより各周波数のスペクト
ラム成分を求め、さらにこの絶対値の２乗をとる
ことにより各周波数におけるパワースペクトラム
成分に変換して、これをパワースペクトラムメモ
リ１０４に格納する。こうしてメモリ１０４に格
納されたパワースペクトラム成分を表す各データ
は、各種自己相関係数計測器１０５，１０７、お
よび１０９により自由に読み出され、以下に述べ
るように自己相関係数を計測するのに用いられ
る。 The Fourier transformer 103 uses the input audio waveform data that has been subjected to the window processing described above, performs a Fourier transform on this to obtain the spectrum component of each frequency, and further calculates the power at each frequency by taking the square of this absolute value. It is converted into spectrum components and stored in the power spectrum memory 104. The data representing the power spectrum components thus stored in the memory 104 are freely read out by various autocorrelation coefficient measuring devices 105, 107, and 109, and used to measure the autocorrelation coefficients as described below. used.

さて、低域側自己相関係数計測器１０５は、前
記パワースペクトラムの低域側、例えば０から
1333Hzのパワースペクトラムの成分をメモリ１０
４から読み出し、これにフーリエ逆変換を施すこ
とにより必要な範囲内の各遅れ時間における自己
相関係数を計測し、これを低域側線形予測係数分
析器１０６に供給する。 Now, the low-frequency side autocorrelation coefficient measuring device 105 measures the low-frequency side of the power spectrum, for example, from 0 to
Memory 10 of the power spectrum components of 1333Hz
4 and performs inverse Fourier transform to measure the autocorrelation coefficient at each delay time within the necessary range, and supply this to the low-band linear prediction coefficient analyzer 106.

また、これとともに計測器１０５は、計測した
遅れ時間０の自己相関係数を、この基本フレーム
周期における低域側短時間平均電力として出力ラ
イン１０５０を介し符号化器１１３に供給する。 At the same time, the measuring device 105 supplies the measured autocorrelation coefficient with a delay time of 0 to the encoder 113 via the output line 1050 as the low-frequency side short-time average power in this basic frame period.

分析器１０６は、供給された自己相関係数デー
タの組からＫパラメータを所定の次数まで、例え
ばオートコリレーシヨン（AUTO
CORRELATION）法等の線形予測分析法により
抽出し、抽出された低域側にパラメータを符号化
器１１３に供給する。 The analyzer 106 converts the K parameter from the supplied set of autocorrelation coefficient data to a predetermined order, for example, by autocorrelation (AUTO).
CORRELATION) method or the like, and the extracted low-frequency parameters are supplied to the encoder 113.

一方、高域側自己相関係数計測器１０７は、前
記パワースペクトラムの高域側、上記の例では
1333Hzから3333Hzのパワースペクトラムの成分を
メモリ１０４から読み出し、これにフーリエ逆変
換を施すことにより必要な範囲内の各遅れ時間に
おける自己相関係数を計測し、これを高域側線形
予測係数分析器１０８に供給する。但し、上述の
フーリエ逆変換の演算に当つて、1333Hzから3333
Hzのパワースペクトラム成分を、1333Hzだけ周波
数の低い方に周波数シフトし、０から2000Hzのパ
ワースペクトラムと見做してフーリエ逆変換を実
行し、自己相関係数を計測する。 On the other hand, the high-frequency side autocorrelation coefficient measuring device 107 measures the high-frequency side of the power spectrum, in the above example.
The components of the power spectrum from 1333Hz to 3333Hz are read out from the memory 104 and subjected to inverse Fourier transform to measure the autocorrelation coefficient at each delay time within the required range, which is then calculated by the high-frequency side linear prediction coefficient analyzer. 108. However, when calculating the inverse Fourier transform mentioned above, from 1333Hz to 3333Hz
The Hz power spectrum component is frequency-shifted by 1333 Hz to a lower frequency, treated as a power spectrum from 0 to 2000 Hz, performs inverse Fourier transform, and measures the autocorrelation coefficient.

また、これとともに計測器１０７は、計測した
遅れ時間０の自己相関係数を、この基本フレーム
周期における高域側短時間平均電力として、出力
ライン１０７０を介し符号化器１１３に供給す
る。 At the same time, the measuring device 107 supplies the measured autocorrelation coefficient with a delay time of 0 to the encoder 113 via the output line 1070 as the high-frequency side short-time average power in this basic frame period.

分析器１０８は、供給された自己相関係数のデ
ータの組から、Ｋパラメータを所定の次数まで、
例えばオートコリレーシヨン法等の線形予測分析
法により抽出し、抽出された高域側Ｋパラメータ
を符号化器１１３に供給する。 The analyzer 108 calculates the K parameter to a predetermined order from the supplied autocorrelation coefficient data set.
For example, it is extracted by a linear predictive analysis method such as an autocorrelation method, and the extracted high-frequency side K parameter is supplied to the encoder 113.

さらにまた、全帯域自己相関係数計測器１０９
は、前記パワースペクトラムの全帯域、上述の例
では０から3333Hzのパワースペクトラムの成分を
メモリ１０４から読み出し、これにフーリエ逆変
換を施すことにより必要な範囲内の各遅れ時間に
おける自己相関係数を計測し、これを全帯域線形
予測係数分析器１１０に供給する。なおまた、計
測器１０９は、計測した遅れ時間０の自己相関係
数を、この基本フレーム周期における全帯域短時
間平均電力として、出力ライン１０９０を介し符
号化器１１３に供給する。 Furthermore, full-band autocorrelation coefficient measuring device 109
reads out the power spectrum components from the memory 104 over the entire band of the power spectrum, from 0 to 3333 Hz in the above example, and calculates the autocorrelation coefficient at each delay time within the required range by performing inverse Fourier transform on this. and supplies it to the full-band linear prediction coefficient analyzer 110. Furthermore, the measuring device 109 supplies the measured autocorrelation coefficient with a delay time of 0 to the encoder 113 via an output line 1090 as the entire band short-time average power in this basic frame period.

さて、分析器１１０は供給された自己相関係数
データの組からＫパラメータを所定の次数まで、
例えばオートコリレーシヨン法等の線形予測分析
法により抽出し、抽出された全帯域Ｋパラメータ
を符号化器１１３に供給する。 Now, the analyzer 110 calculates the K parameter to a predetermined order from the supplied set of autocorrelation coefficient data.
For example, it is extracted by a linear predictive analysis method such as an autocorrelation method, and the extracted full-band K parameter is supplied to the encoder 113.

なお、上述のオートコリレーシヨン法の詳細に
ついては、例えば、ジヨンマツコール（John
Makhoul）：“リニアプレデイクレヨン（Linear
Prediction）：アチユートリアルレビユー
（Ａ Tutorial Review）”、Proceedings of the
IEEE、Vol.63、No.４、pp.561〜580 April、1975
を参照されたい。 For details on the above-mentioned autocorrelation method, see, for example, John Matsukoll.
Makhoul): “Linear Preday Crayon (Linear
Prediction: A Tutorial Review”, Proceedings of the
IEEE, Vol.63, No.4, pp.561-580 April, 1975
Please refer to

なおまた、分析器１０６および１０８による低
域側および高域側の線形予測係数分析に際して
は、これら入力される音声信号の最高周波数が、
上述のように、低域側は０から1333Hzまで、また
高域側は０から2000Hzまでとなり、その最高周波
数がもとの標本化周期できまる最高周波数4000Hz
に対し、それぞれ1/3および1/2に制限されている
ため、線形予測分析を行なう場合の標本化周期
は、それぞれもとの標本化周期の３倍および２倍
にデシメート（decimate）したものを用いたこ
とと等価となる。 Furthermore, when the analyzers 106 and 108 perform linear prediction coefficient analysis on the low-frequency side and the high-frequency side, the highest frequency of these input audio signals is
As mentioned above, the low frequency side ranges from 0 to 1333Hz, and the high frequency side ranges from 0 to 2000Hz, and the highest frequency is 4000Hz, which is the maximum frequency determined by the original sampling period.
However, when performing linear predictive analysis, the sampling period is decimated to 3 times and 2 times the original sampling period, respectively. This is equivalent to using .

さて、有声／無声判別器１１１は、ウインドウ
処理された前記音声波形データの供給を受け各基
本フレーム周期ごとに、このフレーム中の音声信
号が有声音か無声音かを判別し、その判別結果を
符号化器１１３に供給する。この有声／無声判別
器としては例えば特開昭54−94212または特開昭
54−151303を応用して容易に構成できる。 Now, the voiced/unvoiced discriminator 111 receives the window-processed audio waveform data, determines whether the audio signal in this frame is a voiced sound or an unvoiced sound for each basic frame period, and encodes the determination result. It is supplied to the converter 113. As this voiced/unvoiced discriminator, for example, Japanese Patent Application Laid-Open No. 54-94212 or
It can be easily configured by applying 54-151303.

また、ピツチ抽出器１１２は、供給された前記
ウインドウ処理された音声波形データから各基本
フレームにおけるピツチ周波数データを抽出し、
これを符号化器１１３に供給する。 Further, the pitch extractor 112 extracts pitch frequency data in each basic frame from the supplied window-processed audio waveform data,
This is supplied to the encoder 113.

符号化器１１３は、こうして供給された各種デ
ータを符号化して伝送フレームを作成し、各基本
フレームごとに一伝送フレームづつ伝送路３を介
して合成側２に送出する。但し、この伝送フレー
ムの作成に際しては、とくに下記のような処理を
含ませる。 The encoder 113 encodes the various data thus supplied to create transmission frames, and sends out one transmission frame for each basic frame to the combining side 2 via the transmission line 3. However, when creating this transmission frame, the following processing is particularly included.

すなわち、各伝送フレームごとに有声／無声判
別器１１１からの有声／無声判別情報を含ませる
とともに、有声情報を含む伝送フレームにおいて
は、伝送すべきＫパラメータとして、前記低域側
Ｋパラメータおよび前記高域側Ｋパラメータを組
として符号化し、この場合には前記全帯域Ｋパラ
メータは用いない。 That is, each transmission frame includes the voiced/unvoiced discrimination information from the voiced/unvoiced discriminator 111, and in the transmission frame including voiced information, the low-frequency side K parameter and the high-frequency side K parameter are included as the K parameters to be transmitted. The band-side K parameters are encoded as a set, and in this case, the full-band K parameters are not used.

これに対し、無声情報を含む伝送フレームにお
いては、伝送すべきＫパラメータとして前記全帯
域Ｋパラメータを用いて符号化し、高域側Ｋパラ
メータおよび低域側Ｋパラメータは用いない。 On the other hand, in a transmission frame including unvoiced information, the full-band K parameter is used for encoding as the K parameter to be transmitted, and the high-band K parameter and the low-band K parameter are not used.

短時間平均電力に関しても、同様に、有声情報
を含む伝送フレームにおいては、伝送すべき短時
間平均電力として、前記低域側短時間平均電力お
よび前記高域側短時間平均電力を組として符号化
し、この場合には前記全帯域短時間平均電力は用
いない。 Regarding short-time average power, similarly, in a transmission frame including voiced information, the low-frequency short-term average power and the high-frequency short-term average power are encoded as a set as the short-term average power to be transmitted. , in this case, the full-band short-time average power is not used.

これに対し、無声情報を含む伝送フレームにお
いては、伝送すべき短時間平均電力として、前記
全帯域平均電力を用いて符号化し、高域側短時間
平均電力および低域側短時間平均電力は用いな
い。 On the other hand, in a transmission frame containing unvoiced information, the above-mentioned all-band average power is used for encoding as the short-term average power to be transmitted, and the high-band short-term average power and the low-band short-term average power are used. not present.

かくして、有声音区間に対しては帯域分割され
たＫパラメータおよび短時間平均電力が、また無
声音区間に対しては帯域非分割のＫパラメータお
よび短時間平均電力が合成側２に送出される。 Thus, the band-divided K parameter and short-time average power are sent to the synthesis side 2 for the voiced sound section, and the band-unsplit K parameter and short-time average power are sent to the synthesis side 2 for the unvoiced sound section.

さて、合成側２においては、第３図に示すよう
に、伝送路３を介して伝送された前記伝送フレー
ムが、復号化器２０１に次々に供給される。 Now, on the combining side 2, as shown in FIG. 3, the transmission frames transmitted via the transmission path 3 are successively supplied to the decoder 201.

復号化器２０１は、これらの伝送フレームを復
号化することにより、分析側の各データを再生
し、これらのデータをそれぞれ下記のように供給
する。 The decoder 201 reproduces each data on the analysis side by decoding these transmission frames, and supplies these data as follows.

まず、前記有声／無声判別情報を判別し、これ
が有声情報である伝送フレームの場合には、再生
された低域側Ｋパラメータを低域側LPCフイル
タ２０２に、また再生された高域側Ｋパラメータ
を高域側LPCフイルタ２０５にそれぞれ供給す
る。これに対して、判別情報が無声情報である伝
送フレームの場合には、再生された全帯域Ｋパラ
メータを全帯域LPCフイルタ２１０に供給する。 First, the voiced/unvoiced discrimination information is determined, and if the transmitted frame is voiced information, the reproduced low-frequency side K parameter is sent to the low-frequency side LPC filter 202, and the reproduced high-frequency side K parameter is transmitted to the low-frequency side LPC filter 202. are respectively supplied to the high frequency side LPC filter 205. On the other hand, in the case of a transmission frame whose discrimination information is unvoiced information, the reproduced full-band K parameter is supplied to the full-band LPC filter 210.

復号化器２０１は、さらに、伝送されたピツチ
周波数を指定する情報を再生し、これをピツチ周
波数制御信号としてピツチ発生器２１１に供給す
る。 The decoder 201 further reproduces the transmitted information specifying the pitch frequency and supplies this to the pitch generator 211 as a pitch frequency control signal.

また、復号化器２０１は前記有声／無声判別情
報を再生し、出力ライン２０１０を介し有声／無
声切替信号として音源切替器２１３に供給し、ま
たこれをモード切替信号としてモード切替器２１
７に供給する。 Further, the decoder 201 reproduces the voiced/unvoiced discrimination information and supplies it to the sound source switch 213 as a voiced/unvoiced switching signal via an output line 2010, and also supplies this as a mode switching signal to the mode switch 213.
Supply to 7.

さらに、復号化器２０１は、有声／無声判別情
報が有声情報である伝送フレームの場合には、再
生された低域側短時間平均電力と高域側短時間平
均電力とを、それぞれ低域側利得制御情報および
高域側利得制御情報としてそれぞれ出力ライン２
０１１および２０１２を介し、それぞれ低域側可
変利得増幅器２１４および高域側可変利得増幅器
２１５に供給する。これに対し無声情報である伝
送フレームの場合には、再生された全帯域短時間
平均電力を全帯域利得制御情報として出力ライン
２０１３を介して全帯域可変利得増幅器２１６に
供給する。 Furthermore, in the case of a transmission frame in which the voiced/unvoiced discrimination information is voiced information, the decoder 201 converts the reproduced low frequency side short time average power and high frequency side short time average power into the low frequency side Output line 2 is used as gain control information and high frequency side gain control information, respectively.
011 and 2012, respectively, to a low-band variable gain amplifier 214 and a high-band variable gain amplifier 215. On the other hand, in the case of a transmission frame that is unvoiced information, the reproduced full-band short-time average power is supplied as full-band gain control information to the full-band variable gain amplifier 216 via an output line 2013.

さて、ピツチ発生器２１１は指定された周波数
のピツチパルスデータを発生し音源切替器２１３
に供給する。 Now, the pitch generator 211 generates pitch pulse data of a specified frequency, and the sound source switch 213
supply to.

音源切替器２１３は、ライン２０１０を介して
供給された有声／無声切替信号が有声音を指定す
る場合にはピツチ発生器２１１側の入力データを
選択し、無声音を指定する場合には雑音発生器２
１２側の入力データを選択してこれを各可変利得
増幅器２１４，２１５および２１６に供給する。 The sound source switcher 213 selects the input data on the pitch generator 211 side when the voiced/unvoiced switching signal supplied via the line 2010 specifies a voiced sound, and selects the input data on the pitch generator 211 side when the voiced/unvoiced switching signal supplied via the line 2010 specifies a voiced sound, and selects the input data on the pitch generator 211 side when the voiced/unvoiced switching signal supplied via the line 2010 specifies a voiced sound. 2
12 side input data is selected and supplied to each variable gain amplifier 214, 215 and 216.

各可変利得増幅器２１４，２１５および２１６
は、かくして供給された。ピツチパルスデータま
たは雑音信号データを、それぞれライン２０１
１，２０１２および２０１３を介して供給された
前述の利得制御情報で荷重することにより、可変
増幅し、音源励振データを作成し、これをそれぞ
れ励振信号ライン２１４０，２１５０、および２
１６０を介して低域側LPCフイルタ２０２、高
域側LPCフイルタ２０５および全帯域LPCフイ
ルタ２１０にそれぞれ音源励振データとして供給
する。 Each variable gain amplifier 214, 215 and 216
was thus provided. Pitch pulse data or noise signal data are sent to the line 201, respectively.
1, 2012, and 2013 to create variable amplification and source excitation data, which is applied to excitation signal lines 2140, 2150, and 2, respectively.
160 to the low frequency side LPC filter 202, the high frequency side LPC filter 205, and the full band LPC filter 210, respectively, as sound source excitation data.

さて、再生された低域側Ｋパラメータの供給を
受けた低域側LPCフイルタ２０２は、その内部
において、供給されたＫパラメータをαパラメー
タに変換し、このαパラメータをLPCフイルタ
のフイルタ係数として使用し、ライン２１４０を
介して供給された音源励振データと、このフイル
タ係数とより低域側の音声波形データを合成し、
これを低域側補間器２０３に供給する。 Now, the low-frequency side LPC filter 202 that has received the reproduced low-frequency side K parameter converts the supplied K parameter into an α parameter, and uses this α parameter as a filter coefficient of the LPC filter. and synthesizes the sound source excitation data supplied via the line 2140, this filter coefficient, and the audio waveform data on the lower frequency side,
This is supplied to the low frequency side interpolator 203.

前述の分析側におけるデシメートのため、こう
して低域側のＫパラメータから合成された低域側
音声波形データは、その標本化周期が正常の標本
化周期の３倍になつている。低域側補間器２０３
は供給されたこの音声波形データを1333Hzの低域
フイルタを通すことによつて補間し正常の標本化
周期の音声波形データを作成し、これを低域側帯
域フイルタ２０４に供給する。 Due to the aforementioned decimation on the analysis side, the sampling period of the low-frequency audio waveform data synthesized from the low-frequency K parameters is three times the normal sampling period. Low-frequency interpolator 203
interpolates the supplied audio waveform data by passing it through a 1333 Hz low-pass filter to create audio waveform data with a normal sampling period, and supplies this to the low-side band filter 204.

低域側帯域フイルタ２０４は供給されたデータ
を、例えば300Hzから1333Hzまでの帯域をもつ帯
域フイルタを通すことにより不要帯域の周波数成
分を除去して低域側の音声波形データを生成し、
低域高域合成器２０９の一方の入力側に供給す
る。 The low band filter 204 removes frequency components in unnecessary bands by passing the supplied data through a band filter having a band from 300Hz to 1333Hz, for example, to generate low band audio waveform data.
It is supplied to one input side of the low and high frequency synthesizer 209.

一方、再生された高域側Ｋパラメータの供給を
受けた高域側LPCフイルタ２０５は、その内部
において、供給されたＫパラメータをαパラメー
タに変換し、このαパラメータをLPCフイルタ
のフイルタ係数として使用し、ライン２１５０を
介して供給された音源励振データとこのフイルタ
係数とより高域側の音声波形データを合成し、こ
れを高域側補間器２０６に供給する。 On the other hand, the high-frequency side LPC filter 205, which has received the reproduced high-frequency side K parameter, internally converts the supplied K parameter into an α parameter, and uses this α parameter as a filter coefficient of the LPC filter. Then, the sound source excitation data supplied via the line 2150, this filter coefficient, and higher frequency audio waveform data are synthesized, and this is supplied to the higher frequency interpolator 206.

前述の分析側における処理のため、高域側Ｋパ
ラメータは、もとの音声信号の1333Hzから3333Hz
の成分を周波数シフトすることにより０から2000
Hzまでの帯域にうつし、これを正常の２倍の標本
化周期にデシメートした音声波形に対するＫパラ
メータとなつている。従つて、このＫパラメータ
から合成された音声波形データは、その標本化周
期が正常の標本化周期の２倍であり、またその周
波数が1333Hzだけ低い方に周波数シフトされた波
形となつている。そこで、高域側補間器２０６
は、供給されたこの音声波形データを2000Hzの低
域フイルタを通すことによつて補間し、正常の標
本化周期の音声波形データを作成し、これを周波
数変換器２０７に供給する。 Due to the processing on the analysis side mentioned above, the high-frequency side K parameter changes from 1333Hz of the original audio signal to 3333Hz.
from 0 to 2000 by frequency shifting the components of
This is the K parameter for the voice waveform, which is transmitted in the band up to Hz and decimated to a sampling period twice the normal rate. Therefore, the audio waveform data synthesized from this K parameter has a sampling period twice the normal sampling period, and has a waveform whose frequency has been shifted lower by 1333 Hz. Therefore, the high frequency side interpolator 206
interpolates the supplied audio waveform data by passing it through a 2000 Hz low-pass filter to create audio waveform data with a normal sampling period, and supplies this to the frequency converter 207.

周波数変換器２０７は、供給されたこの音声波
形データに1333Hzの正弦波を乗算して音声波形の
周波数を1333Hzだけシフトし、これを高域側帯域
フイルタ２０８に供給する。 The frequency converter 207 multiplies the supplied audio waveform data by a 1333 Hz sine wave to shift the frequency of the audio waveform by 1333 Hz, and supplies this to the high band filter 208 .

高域側帯域フイルタ２０８は、供給されたデー
タを1333Hzから3333Hzまでの帯域をもつ帯域フイ
ルタを通すことにより不要帯域の周波数成分を除
去して高域側音声波形データを生成し、低域高域
合成器２０９の他方の入力側に供給する。 The high frequency side band filter 208 removes frequency components in unnecessary bands by passing the supplied data through a band filter having a band from 1333Hz to 3333Hz, generates high frequency side audio waveform data, and generates high frequency side audio waveform data. The other input side of combiner 209 is supplied.

低域高域合成器２０９は供給された低域側音声
波形データと高域側音声波形データとを加算して
合成する。かくしてその出力２０９０には帯域分
割型線形予測分析を行なつた場合の合成音声が生
成され、これはモード切替器２１７の一方の入力
として供給される。 The low frequency high frequency synthesizer 209 adds and synthesizes the supplied low frequency audio waveform data and high frequency audio waveform data. Thus, the output 2090 is a synthesized speech obtained by performing the band division type linear predictive analysis, and this is supplied as one input of the mode switch 217.

一方また、再生された全帯域Ｋパラメータの供
給を受けた全帯域LPCフイルタ２１０は、その
内部において、供給されたＫパラメータをαパラ
メータに変換し、このαパラメータをLPCフイ
ルタのフイルタ係数として使用し、ライン２１６
０を介して供給された音源励振データとこのフイ
ルタ係数とより全帯域の音声波形データを合成
し、これを出力ライン２１００を介してモード切
替器２１７の他方の入力として供給する。 On the other hand, the full-band LPC filter 210 that has received the reproduced full-band K parameter converts the supplied K parameter into an α parameter, and uses this α parameter as a filter coefficient of the LPC filter. , line 216
The sound source excitation data supplied via the output line 2100 and the filter coefficients are used to synthesize audio waveform data of all bands, and this is supplied as the other input of the mode switch 217 via the output line 2100.

なお、上述の各LPCフイルタ２０２，２０５
および２１０の内部で行なわれるＫパラメータか
らαパラメータへの変換は、前述のオートコリレ
ーシヨン法等を応用して容易に実行することがで
き、またLPCフイルタそのものは巡回型フイル
タとして容易に構成することができる。 In addition, each of the above-mentioned LPC filters 202, 205
The conversion from the K parameter to the α parameter, which is performed inside the filter 210, can be easily performed by applying the autocorrelation method described above, and the LPC filter itself can be easily configured as a recursive filter. be able to.

さて、モード切替器２１７は、ライン２０１０
を介して供給されるモード切替信号により、伝送
フレームが有声音を合成する場合には出力ライン
２０９０側の入力を選択し、無声音を合成する場
合には出力ライン２１００側の入力を選択してこ
れをＤ／Ａ変換器および低域フイルタ２１８に出
力する。 Now, the mode switch 217 is connected to the line 2010.
When the transmission frame synthesizes voiced sound, the input on the output line 2090 side is selected, and when the transmission frame synthesizes unvoiced sound, the input on the output line 2100 side is selected. is output to the D/A converter and low-pass filter 218.

Ｄ／Ａ変換器および低域フイルタ２１８は、供
給された音声データをＤ／Ａ変換器によりアナロ
グ音声信号に変換し、さらに低域フイルタにより
3333Hz以上の成分を遮断し、合成された音声信号
として出力端子２０００より出力する。 The D/A converter and low-pass filter 218 converts the supplied audio data into an analog audio signal using a D/A converter, and further converts the supplied audio data into an analog audio signal using a low-pass filter.
Components of 3333 Hz or higher are blocked and output as a synthesized audio signal from the output terminal 2000.

以上の説明で明らかなように、本実施例の音声
分析合成装置は、入力音声が有声音であれば分析
側が帯域分割型線形予測分析を行なつた結果のデ
ータを合成側に伝送し、合成側ではこれに対応す
る音声合成を行ない、また入力音声が無声音であ
れば分析側が帯域非分割型線形予測分析を行なつ
た結果のデータを合成側に伝送し、合成側ではこ
れに対応する音声合成を行なう。 As is clear from the above explanation, if the input speech is a voiced sound, the speech analysis and synthesis device of this embodiment transmits the data resulting from the band-splitting linear predictive analysis on the analysis side to the synthesis side, and synthesizes it. The synthesis side performs corresponding speech synthesis, and if the input speech is unvoiced, the analysis side transmits the data resulting from band non-splitting linear predictive analysis to the synthesis side, and the synthesis side synthesizes the corresponding speech. Perform synthesis.

この結果、最初に述べた従来のLPC分析法の
欠点であるフオルマント帯域幅の過少推定と比較
的にエネルギーの小さい第３フオルマントの近似
性が悪いという欠点を緩和するとともに、無声音
の場合にも余計な処理により音質劣化を招く可能
性のない線形予測型音声分析合成を行なうことが
できる。 As a result, the disadvantages of the conventional LPC analysis method mentioned earlier, such as underestimation of the formant bandwidth and poor approximation of the third formant, which has relatively small energy, can be alleviated. Through this process, linear predictive speech analysis and synthesis can be performed without the possibility of deteriorating sound quality.

なお、上述の実施例においては、帯域分割型動
作を行なう場合に、音声帯域を高域側と低域側と
に２分割したが、これは一例であり、この分割数
を更に増すこともできる。 In addition, in the above-mentioned embodiment, when performing band division type operation, the audio band is divided into two into a high frequency side and a low frequency side, but this is just an example, and the number of divisions can be further increased. .

また、２分割の場合に使用した1333Hzの分割周
波数も単なる一例でありこれに限るものではな
い。 Furthermore, the division frequency of 1333Hz used in the case of two divisions is just an example and is not limited to this.

同様に、デシメートの比率も一例を示したにす
ぎない。 Similarly, the decimate ratio is just an example.

また、本実施例においては、分析側で帯域分割
を行なうに当つて、まず全帯域のパワースペクト
ラムを求めこれを各帯域に分割したが、このかわ
りに時間軸上で取り扱かい、入力波形を帯域フイ
ルタを用いて分割してから周波数シフトにより基
底帯域におとし、この波形を帯域幅に応じてデシ
メートしてから、線形予測分析を行なうという構
成をとることもできる。 In addition, in this example, when performing band division on the analysis side, the power spectrum of the entire band was first obtained and divided into each band, but instead of this, the input waveform was handled on the time axis. It is also possible to adopt a configuration in which the waveform is divided using a bandpass filter and then converted into a base band by frequency shifting, and this waveform is decimated according to the bandwidth, and then linear predictive analysis is performed.

以上のように、本発明を用いると線形予測型音
声分析合成装置の音質改善を達成できる。 As described above, by using the present invention, it is possible to improve the sound quality of a linear predictive speech analysis and synthesis device.

[Brief explanation of drawings]

第１図は本発明の一実施例の全体を示すブロツ
ク図、第２図はこの実施例の分析側を示すブロツ
ク図および第３図はこの実施例の合成側を示すブ
ロツク図である。図において、１……分析側、２……合成側、３
……伝送路、１０１……低域フイルタおよびＡ／
Ｄ変換器、１０２……ウインドウ処理器、１０３
……フーリエ変換器、１０４……パワースペクト
ラムメモリ、１０５……低域側自己相関係数計測
器、１０６……低域側線形予測係数分析器、１０
７……高域側自己相関係数計測器、１０８……高
域側線形予測係数分析器、１０９……全帯域自己
相関係数計測器、１１０……全帯域線形予測係数
分析器、１１１……有声／無声判別器、１１２…
…ピツチ抽出器、１１３……符号化器、２０１…
…復号化器、２０２……低域側LPCフイルタ、
２０３……低域側補間器、２０４……低域側帯域
フイルタ、２０５……高域側LPCフイルタ、２
０６……高域側補間器、２０７……周波数変換
器、２０８……高域側帯域フイルタ、２０９……
低域高域合成器、２１０……全帯域LPCフイル
タ、２１１……ピツチ発生器、２１２……雑音発
生器、２１３……音源切替器、２１４……低域側
可変利得増幅器、２１５……高域側可変利得増幅
器、２１６……全帯域可変利得増幅器、２１７…
…モード切替器、２１８……Ｄ／Ａ変換器および
低域フイルタ。 FIG. 1 is a block diagram showing the entire embodiment of the present invention, FIG. 2 is a block diagram showing the analysis side of this embodiment, and FIG. 3 is a block diagram showing the synthesis side of this embodiment. In the figure, 1...analysis side, 2...synthesis side, 3
...Transmission line, 101...Low pass filter and A/
D converter, 102... Window processor, 103
...Fourier transformer, 104...Power spectrum memory, 105...Low band side autocorrelation coefficient measuring device, 106...Low band side linear prediction coefficient analyzer, 10
7...High frequency side autocorrelation coefficient measuring device, 108...High frequency side linear prediction coefficient analyzer, 109...Full band autocorrelation coefficient measuring device, 110...Full band linear prediction coefficient analyzer, 111... ...voiced/unvoiced discriminator, 112...
...Pitch extractor, 113... Encoder, 201...
...Decoder, 202...Low band side LPC filter,
203...Low band interpolator, 204...Low band filter, 205...High band LPC filter, 2
06...High band interpolator, 207...Frequency converter, 208...High band filter, 209...
Low and high frequency synthesizer, 210... Full band LPC filter, 211... Pitch generator, 212... Noise generator, 213... Sound source switcher, 214... Low band side variable gain amplifier, 215... High Band side variable gain amplifier, 216...Full band variable gain amplifier, 217...
...Mode switch, 218...D/A converter and low-pass filter.

Claims

[Claims]

1. In a linear predictive speech analysis and synthesis device that divides an input audio signal into a plurality of audio transmission bands and performs linear predictive analysis on each band, if the input audio signal is a voiced sound, the band division linear predictive analysis is performed on an unvoiced sound. A linear predictive speech analysis and synthesis device comprising means for selectively performing band non-splitting linear predictive analysis, if any.