JP2537113B2

JP2537113B2 - Adaptive compression method of vocal tract parameter information in speech coder / decoder

Info

Publication number: JP2537113B2
Application number: JP4102318A
Authority: JP
Inventors: 忠由牧野
Original assignee: IDO TSUSHIN SHISUTEMU KAIHATSU KK
Current assignee: IDO TSUSHIN SHISUTEMU KAIHATSU KK
Priority date: 1992-03-30
Filing date: 1992-03-30
Publication date: 1996-09-25
Anticipated expiration: 2011-09-25
Also published as: JPH05282000A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、声道信号をデジタル化
して伝送あるいは記憶を行い、また伝送ないし記憶され
ているデジタル信号をアナログ信号へ変換する音声符号
化に関し、電話機、携帯電話、自動車電話などの電話機
器、音声ファイル、音声メモリなどへ応用することがで
きる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to voice coding for digitizing a vocal tract signal for transmission or storage, and for converting a transmitted or stored digital signal into an analog signal, such as a telephone, a mobile phone, an automobile. It can be applied to telephone devices such as telephones, voice files, and voice memories.

【０００２】[0002]

【従来の技術】従来からアナログの音声信号をデジタル
化して情報の圧縮を行い、効率よく伝送ないしは記憶す
るための技術は種々開発されている。2. Description of the Related Art Conventionally, various techniques have been developed for digitizing an analog voice signal to compress information for efficient transmission or storage.

【０００３】符号化器、復号化器の一例を図４ａ、図４
ｂに示す。ここでは音声の音源を雑音と周期成分の二種
類と想定し、入力音声の一定区間毎に、音声信号の声道
パラメータ、有声音／無声音の判定、有声音の場合は周
期性の計測を行い、声道パラメータ、有声／無声の判定
結果、周期成分を伝送し、送られたパラメータを基に、
有声音の場合にはその周期の信号で、無声音であればラ
ンダムノイズを、測定した声道パラメータで構成した声
道フィルタへ入力して音声を再生するボコーダを例とす
る。An example of an encoder and a decoder is shown in FIGS. 4a and 4a.
Shown in b. Assuming that there are two types of sound sources, noise and periodic components, the vocal tract parameters of voice signals, voiced sound / unvoiced sound determination, and periodicity measurement in the case of voiced sound are measured at fixed intervals of the input voice. , Vocal tract parameters, voiced / unvoiced decision results, periodic components are transmitted, and based on the transmitted parameters,
An example is a vocoder that reproduces voice by inputting a signal of that period in the case of voiced sound and random noise in the case of unvoiced sound to a vocal tract filter configured by the measured vocal tract parameters.

【０００４】図４ａに符号化器１の一例を示す。図４ａ
の符号化器１において、入力音声はＡＤコンバータ２に
より、サンプリングクロックジェネレータ３のサンプリ
ングクロックにしたがってデジタル化され、この信号列
がフレームクロックジェネレータ４のフレームクロック
の区間毎に声道パラメータ測定部５、有声／無声検出部
６、ピッチ周期検出部７、振幅測定部８につながれ、各
々のパラメータを測定し、符号化が行われる。また、後
に詳述するように、声道パラメータ測定部５をベクトル
量子化部９に連絡し、声道パラメータのベクトル量子化
を行うこともできる。An example of the encoder 1 is shown in FIG. 4a. Figure 4a
In the encoder 1, the input voice is digitized by the AD converter 2 in accordance with the sampling clock of the sampling clock generator 3, and this signal string is converted into the vocal tract parameter measuring unit 5 for each section of the frame clock of the frame clock generator 4. The voiced / unvoiced detection unit 6, the pitch period detection unit 7, and the amplitude measurement unit 8 are connected to measure each parameter and perform encoding. Further, as described later in detail, the vocal tract parameter measuring unit 5 can be connected to the vector quantizing unit 9 to perform vector quantization of the vocal tract parameters.

【０００５】この符号化器１の作用を説明する。入力音
声はＡＤコンバータ２で、サンプリングクロックジェネ
レータ３のサンプリングクロック、例えば８ｋＨｚのク
ロックにしたがい８ビットにデジタル化される。このデ
ジタル信号をそのまま伝送すれば、その伝送量は、８ｂｉｔ × ８０００＝６４ｋｂｐｓとなる。The operation of the encoder 1 will be described. The input voice is digitized by the AD converter 2 into 8 bits according to the sampling clock of the sampling clock generator 3, for example, a clock of 8 kHz. If this digital signal is transmitted as it is, the amount of transmission becomes 8 bit × 8000 = 64 kbps.

【０００６】ここでは、デジタル化した音声信号に対
し、フレームクロックジェネレータ４のフレームクロッ
クにしたがい、２０ｍｓｅｃ毎に声道パラメータ測定部
５にて１０次の予測を行い声道パラメータを求め、また
有声／無声検出部６にて有声音か無声音かの判定を１ビ
ットにてパラメータ化し、さらにそのピッチ周期をピッ
チ周期検出部７で８ビットにてパラメータ化し、またそ
の入力レベルを振幅測定部８にて８ビットでパラメータ
化する。Here, for the digitized voice signal, the vocal tract parameter measuring unit 5 makes a tenth-order prediction every 20 msec according to the frame clock of the frame clock generator 4 to obtain a vocal tract parameter, and voice / voice The unvoiced detection unit 6 parameterizes the determination of voiced sound or unvoiced sound with 1 bit, the pitch period is parameterized with the pitch period detection unit 7 with 8 bits, and the input level is measured with the amplitude measurement unit 8. Parameterize with 8 bits.

【０００７】声道パラメータ測定部５は種々の具現化の
方法がすでに考案されているが、例えば線形予測にて求
めることができる。有声／無声の判定、ピッチ周期に関
しても同様に種々の方法が考案されているが、例えば前
者に関してはフレーム間の入力音声信号の零交差回数か
ら、後者に関してはフレーム間での入力音声信号の自己
相関から求めることができる。The vocal tract parameter measuring unit 5 has been devised in various ways, but it can be obtained by, for example, linear prediction. Similarly, various methods have been devised for voiced / unvoiced determination and pitch period. For example, in the former case, the number of zero-crossings of the input speech signal between frames is used, and in the latter case, the self-statement of the input speech signal between frames is performed. It can be obtained from the correlation.

【０００８】この例の場合の伝送レートは、声道パラメータ１０次 × ８ビット有声／無声パラメータ１ビットピッチパラメータ８ビット振幅パラメータ８ビットであるから、（１０×８＋１＋８＋８）×１０００／２０＝４８５０ｂｐｓとなり、ＡＤ変換のみで音声を伝送した場合に比べ、約
１／１６にデータが圧縮される。The transmission rate in the case of this example is 10th order vocal tract parameter × 8 bits voiced / unvoiced parameter 1 bit pitch parameter 8 bits amplitude parameter 8 bits, so that (10 × 8 + 1 + 8 + 8) × 1000/20 = 4850 bps. Therefore, the data is compressed to about 1/16 as compared with the case where the voice is transmitted only by the AD conversion.

【０００９】図４ｂに復号化器１０の一例を示す。図４
ｂに示した復号化器１０は、符号化器１から送られてく
るピッチ周期パラメータに合わせた周期を発生するパル
ス列発生部１１、ランダムノイズ発生部１２、伝送され
てくる有声／無声パラメータにより、有声であればパル
ス列発生部１１で発生したパルス列を、無声であればラ
ンダムノイズ発生部１２で発生したノイズを選択する切
り替えスイッチ１３を有し、この切り替えスイッチ１３
で選択した信号を符号化器１からの声道パラメータによ
って制御される声道フィルタ部１４で演算を行った後、
伝送されてきた振幅パラメータでリファレンス信号を制
御したＤＡコンバータ１５により振幅を元に戻しながら
アナログ音声信号に復調される。An example of the decoder 10 is shown in FIG. 4b. FIG.
The decoder 10 shown in FIG. 2b uses a pulse train generation unit 11 that generates a period matching the pitch period parameter sent from the encoder 1, a random noise generation unit 12, and a voiced / unvoiced parameter that is transmitted, If the voice is voiced, the pulse train generated by the pulse train generator 11 is selected. If the voice is unvoiced, the noise generated by the random noise generator 12 is selected.
After the signal selected in 1 is calculated by the vocal tract filter unit 14 controlled by the vocal tract parameter from the encoder 1,
The DA converter 15, which controls the reference signal with the transmitted amplitude parameter, demodulates it into an analog audio signal while restoring the original amplitude.

【００１０】上記した復号化器１０の動作を以下に説明
する。符号化器１からは先に述べたように、声道パラメ
ータ、ピッチ周期パラメータ、有声／無声パラメータ、
振幅パラメータの４種類のパラメータが伝送されてく
る。The operation of the above decoder 10 will be described below. From the encoder 1, as described above, the vocal tract parameter, the pitch period parameter, the voiced / unvoiced parameter,
Four types of parameters of the amplitude parameter are transmitted.

【００１１】図４ｂでは、これらのパラメータは、声道
フィルタ部１４、パルス列発生部１１、ランダムノイズ
発生部１２、切り替えスイッチ１３、ＤＡコンバータ１
５に接続され、各フレーム毎のパラメータにしたがい、
有声音であればそのときのピッチ周期を持ったパルス列
が声道フィルタ部１４への入力となり、一方、無声音で
あればランダムノイズ発生部１２で発生したランダムノ
イズが声道フィルタ部１４への入力となり、演算の後、
振幅パラメータにて基準信号を制御したＤＡコンバータ
１５でアナログの音声信号に変換される。このような音
声符号化器１においては、例えば声道パラメータに関
し、より低ビットレート化を行うため、ベクトル量子化
を行うことが有効であることが知られている。In FIG. 4b, these parameters are the vocal tract filter section 14, the pulse train generating section 11, the random noise generating section 12, the changeover switch 13 and the DA converter 1.
Connected to 5, according to the parameters for each frame,
In the case of voiced sound, a pulse train having the pitch period at that time is input to the vocal tract filter unit 14, while in the case of unvoiced sound, random noise generated by the random noise generation unit 12 is input to the vocal tract filter unit 14. And after the calculation,
The DA converter 15 which controls the reference signal by the amplitude parameter converts it into an analog audio signal. In such a speech coder 1, it is known that it is effective to perform vector quantization in order to lower the bit rate with respect to vocal tract parameters, for example.

【００１２】図４ｃに声道パラメータのベクトル量子化
を行う場合の関連部分の構成例を示す。入力音声は、符
号化器１では声道パラメータ測定部５で線形予測を行っ
た結果、例えば１０次のパラメータになり、この値がベ
クトル量子化部９にて代表ベクトルとの歪演算から歪最
小のベクトルが選定され、そのコードが伝送されること
になる。つまりベクトル量子化では、１０次のパラメー
タの組み合わせを１ベクトルとして予めいくつかのベク
トルを用意しておき、入力された１０次のパラメータの
組み合わせに最も近いベクトルを演算から求めて、その
番号を伝送する。例えば、１０次の声道パラメータに対
して、１，２４８ビットのベクトルを用意することにす
れば、１０次の声道パラメータをそのまま量子化した場
合に比べて、１／１０に伝送量が圧縮できる。FIG. 4c shows an example of the structure of the relevant parts when performing vector quantization of vocal tract parameters. The input speech becomes, for example, a tenth-order parameter as a result of linear prediction in the vocal tract parameter measuring unit 5 in the encoder 1, and this value is distorted by the vector quantizing unit 9 from the distortion calculation with the representative vector. Is selected and the code will be transmitted. In other words, in vector quantization, several vectors are prepared in advance with the combination of tenth-order parameters as one vector, the vector closest to the input combination of tenth-order parameters is calculated, and the number is transmitted. To do. For example, if a 1,248-bit vector is prepared for the tenth-order vocal tract parameter, the transmission amount is compressed to 1/10 as compared with the case where the tenth-order vocal tract parameter is directly quantized. it can.

【００１３】復号化器１０では、情報圧縮を行ったデー
タがベクトルの番地として伝送されてくるので、ベクト
ル復号部１６により１０次のパラメータに戻される。In the decoder 10, the information-compressed data is transmitted as the address of the vector, so that the vector decoding unit 16 restores it to the tenth order parameter.

【００１４】[0014]

【発明が解決しようとする課題】上記した声道パラメー
タのベクトル量子化による符号化の圧縮手段では、予め
ベクトルとして平均的な値を用意し、入力された音声信
号の声道パラメータに対し、最も近い平均値の声道パラ
メータを当てはめるのであるが、この方式によると音声
の個人声を損なうとともに、音質劣化を起こすという問
題点があった。In the compression means for encoding the vocal tract parameters by vector quantization, an average value is prepared in advance as a vector, and the average value for the vocal tract parameters of the input speech signal is set. Although a vocal tract parameter with a close average value is applied, this method has a problem that the personal voice of the voice is impaired and the sound quality is deteriorated.

【００１５】[0015]

【課題を解決するための手段】本発明は上記した課題を
解決するために提案されたもので、声道パラメータの圧
縮に対してニューラルネットワークを２回路備え、一方
のニューラルネットワークの重みは平均値を代入し、他
方のニューラルネットワークに対しては、当初は平均値
を設定するが、その後入力される音声から求める声道パ
ラメータを教師信号とした学習を行わせ、音声の途切れ
る適当な区間で荷重の変更あるいはニューラルネットワ
ークの交換を行い、最適化を行うようにして音声品質の
向上を図ったものである。SUMMARY OF THE INVENTION The present invention has been proposed to solve the above-mentioned problems. Two neural networks are provided for compression of vocal tract parameters, and one neural network has an average weight. For the other neural network, the average value is initially set, but learning is performed using the vocal tract parameter obtained from the input speech as the teacher signal, and the weight is applied in an appropriate interval where the speech is interrupted. Is changed or the neural network is replaced to optimize the speech quality.

【００１６】[0016]

【作用】したがって、本発明によれば２回路のニューラ
ルネットワークを用意することにより、当初は一方のニ
ューラルネットワークで声道パラメータの圧縮を行いな
がら、他方のニューラルネットワークにて入力パラメー
タを基に学習を行い、平均的な荷重の値とトレーニング
を施した荷重の間に一定量以上の差が生じた時は、音声
の途切れ等の適当なタイミングにて荷重の変更あるいは
ニューラルネットワークの切り替えを行うことにより、
声道パラメータの最適化を行い音声品質の向上を図るこ
とができる。Therefore, according to the present invention, by preparing a two-circuit neural network, the vocal tract parameters are initially compressed by one neural network while the other neural network performs learning based on the input parameters. If there is a certain amount of difference between the average load value and the trained load, change the load or switch the neural network at an appropriate timing such as voice interruption. ,
The vocal tract parameters can be optimized to improve the voice quality.

【００１７】[0017]

【実施例】図１は本発明の一実施例の構成を示すもので
ある。図１ａに示す符号化器１７では、入力音声はＡＤ
コンバータ１８により、サンプリングクロックジェネレ
ータ１９のサンプリングクロックにしたがってデジタル
化され、この信号がフレームクロックジェネレータ２０
のフレームクロックの区間毎に、声道パラメータ測定部
２１、有声／無声検出部２２、ピッチ周期検出部２３、
振幅測定部２４につながれ、さらに声道パラメータ測定
部２１の出力は荷重固定型ニューラルネットワーク２５
と学習型ニューラルネットワーク２６につながる。DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the configuration of an embodiment of the present invention. In the encoder 17 shown in FIG. 1a, the input speech is AD
The converter 18 digitizes the signal according to the sampling clock of the sampling clock generator 19, and the signal is converted into the frame clock generator 20.
Vocal tract parameter measuring unit 21, voiced / unvoiced detection unit 22, pitch period detection unit 23,
The output of the vocal tract parameter measuring unit 21 is connected to the amplitude measuring unit 24, and the output of the vocal tract parameter measuring unit 21 is a fixed weight neural network 25.
And the learning type neural network 26.

【００１８】荷重比較部２７は荷重固定型ニューラルネ
ットワーク２５の荷重と学習型ニューラルネットワーク
２６の学習値の荷重を比較し、一定の差が生じると、荷
重制御部２８へその結果を出力する。The load comparing section 27 compares the weight of the fixed weight type neural network 25 with the weight of the learning value of the learning type neural network 26, and outputs a result to the load control section 28 when a certain difference occurs.

【００１９】荷重制御部２８では、荷重比較部２７によ
る荷重比較結果と、有声／無声検出部２２による有声／
無声の判定結果と、振幅測定部２４との測定結果に基づ
き、スイッチ（１）２９及びカウンタ３０へ信号を送
り、スイッチ（２）３１ではカウンタ３０からの結果を
受けてピッチ周期検出部２３の出力または学習型ニュー
ラルネットワーク２６の荷重を選択して出力とする。In the load control unit 28, the load comparison result by the load comparison unit 27 and the voiced / unvoiced detection unit 22
A signal is sent to the switch (1) 29 and the counter 30 based on the unvoiced determination result and the measurement result of the amplitude measuring unit 24, and the switch (2) 31 receives the result from the counter 30 and outputs the signal from the pitch period detecting unit 23. The output or the weight of the learning neural network 26 is selected and used as the output.

【００２０】カウンタ３０は荷重制御部２８の出力回数
をカウントし、カウント数が４０回を超えると制御信号
をスイッチ（２）３１へ送る。The counter 30 counts the number of outputs of the load controller 28, and when the number of counts exceeds 40, sends a control signal to the switch (2) 31.

【００２１】スイッチ（２）３１は、カウンタ３０から
の信号に基づき、荷重固定型ニューラルネットワーク２
５により情報圧縮された声道パラメータと学習型ニュー
ラルネットワーク２６により情報圧縮された声道パラメ
ータのどちらか一方を復号化器へ声道パラメータとして
送出する切り替え器である。The switch (2) 31 is based on the signal from the counter 30 and has a fixed weight type neural network 2.
5 is a switching device that sends either the vocal tract parameter compressed by 5 or the vocal tract parameter compressed by the learning neural network 26 to the decoder as a vocal tract parameter.

【００２２】荷重切替部３２は荷重制御部２８からの荷
重制御信号、カウンタか３０らのカウンタ信号の出力を
受けており、復号化器３３（図１ｂ）へ荷重切り替えの
ための信号を作成する。The load switching unit 32 receives the load control signal from the load control unit 28 and the output of the counter signal from the counter 30, and creates a signal for switching the load to the decoder 33 (FIG. 1b). .

【００２３】上記した符号化器１７の動作を説明する。
入力音声はサンプリングクロックジェネレータ１９によ
り通常８ｋＨｚのクロックにてＡＤコンバータ１８で８
ビットにデジタル化される。デジタル化された音声信号
は、フレームクロックジェネレータ２０により通常２０
ｍｓｅｃのフレーム区間で、声道パラメータ測定部２
１、有声／無声検出部２２、ピッチ周期検出部２３、振
幅測定部２４にて各々のパラメータが測定される。The operation of the encoder 17 will be described.
The input voice is sent from the sampling clock generator 19 to the AD converter 18 at a frequency of 8 kHz.
Digitized into bits. The digitized audio signal is usually output by the frame clock generator 20.
Vocal tract parameter measurement unit 2 in the msec frame section
1, the voiced / unvoiced detection unit 22, the pitch period detection unit 23, and the amplitude measurement unit 24 measure respective parameters.

【００２４】声道パラメータに関しては、さらに荷重固
定型ニューラルネットワーク２５にて音声の平均的な特
性に合わせた荷重を持つ構成のネットワークで情報量の
圧縮を行う。Regarding the vocal tract parameters, the weight-fixed neural network 25 further compresses the amount of information with a network having a weight corresponding to the average characteristics of the voice.

【００２５】ここでニューラルネットワークによる情報
圧縮に関し、簡単に原理の説明を行う。図３ａは３層砂
時計型のニューラルネットワークを示すもので、簡単の
ため、入力層を１−１〜１−４の４ユニット、中間層を
２−１〜２−２の２ユニット、出力層を３−１〜３−４
の４ユニットとして説明する。Here, the principle of information compression by the neural network will be briefly described. FIG. 3a shows a three-layer hourglass-type neural network. For simplicity, the input layer has four units 1-1 to 1-4, the intermediate layer has two units 2-1 to 2-2, and the output layer has four units. 3-1 to 3-4
4 units will be described.

【００２６】入力層へ入力される情報ｘ1 〜ｘ4 は荷重
ｗijを介して中間層につながれ、その出力が荷重ｗikを
介して出力層３−１〜３−４で元に戻される。ｊは入力
層の１−１でｉ＝１とし、１−４ｉ＝４とした入力層の
番号、ｉは同様に中間層２−１でｊ＝１に、２−２がｊ
＝２に対応する。ここで、中間層において、上記ニュー
ラルネットワークを２等分すると、図３ｂに示すように
なる。すなわち、入力された情報は中間層で２ユニット
に情報圧縮され、この信号を伝送し、受けた側で元の情
報に戻すことができる。The information x1 to x4 input to the input layer is connected to the intermediate layer via the load wij, and its output is returned to the original at the output layers 3-1 to 3-4 via the load wik. j is the number of the input layer in which the input layer 1-1 is i = 1 and 1-4i = 4, and i is similarly the intermediate layer 2-1 and j = 1 and 2-2 is j.
= 2. Here, when the neural network is divided into two in the middle layer, the result is as shown in FIG. 3b. That is, the input information is compressed into two units in the intermediate layer, and this signal can be transmitted and the receiving side can restore the original information.

【００２７】図１ａの荷重固定型ニューラルネットワー
ク２５の構成の一例を図２ａに示す。図２ａにおいて、
α1 〜α10は図１ａの声道パラメータ測定部２１で測定
した１０次の声道パラメータを示す。この１０個のパラ
メータが１０個のニューラルネットワーク入力層１−１
〜１−１０に入力される。各々の入力層からは、４個の
中間層２−１〜２−４に荷重ｗijを付して接続され、ニ
ューラルネットワークの出力となる。すなわち、図２ａ
の動作は、１０個のパラメータが４個に圧縮されたこと
になる。An example of the configuration of the fixed weight neural network 25 of FIG. 1a is shown in FIG. 2a. In FIG. 2a,
α1 to α10 represent tenth-order vocal tract parameters measured by the vocal tract parameter measuring unit 21 of FIG. 1a. The ten parameters are the ten neural network input layers 1-1.
Is input to 1-10. From each input layer, the four intermediate layers 2-1 to 2-4 are connected with weights wij, and are connected to the output of the neural network. That is, FIG.
This means that 10 parameters have been compressed to 4.

【００２８】声道パラメータ測定部２１で測定されたパ
ラメータは、更に学習型ニューラルネットワーク２６に
も入力する。学習型ニューラルネットワーク２６では、
入力される声道パラメータを、例えば１０００前後の教
師信号として取り込み、この値を基に荷重を求める演算
を行う。１０００フレームは１フレームを２０ｍｓとす
れば、２０秒程度の音声信号に相当する。これは、１回
の会話で連続性を問わなければ十分期待できる音声区間
である。The parameters measured by the vocal tract parameter measuring unit 21 are also input to the learning type neural network 26. In the learning type neural network 26,
The input vocal tract parameter is fetched as a teacher signal of, for example, about 1000, and a calculation for obtaining a weight is performed based on this value. If one frame is 20 ms, 1000 frames correspond to a voice signal for about 20 seconds. This is a voice segment that can be expected sufficiently if continuity is not questioned in one conversation.

【００２９】この様にして、学習した荷重を求めた後
に、この装置で通話を行う時、荷重固定型ニューラルネ
ットワーク２５に備えた荷重と学習した荷重に差がある
場合は、無音区間なり無声区間でピッチ情報の８ビット
の部分で荷重のデータを伝送し、全て伝送した後に荷重
の切り替えを行えば、荷重の最適化を行うことが可能と
なる。In this way, after the learned weight is obtained, when a call is made using this device, if there is a difference between the weight provided in the fixed-weight neural network 25 and the learned weight, there is a silent section or a silent section. It is possible to optimize the load by transmitting the load data in the 8-bit portion of the pitch information and switching the load after transmitting all of the data.

【００３０】学習した荷重の伝送に関し、図１ａに示し
た符号化器１７の動作を説明する。まず、学習した荷重
の値と平均的な荷重の値の比較を荷重比較部２７で行
い、比較値が一定値を超えた場合は荷重制御部２８へ制
御信号を加える。Regarding the transmission of the learned weights, the operation of the encoder 17 shown in FIG. 1a will be described. First, the weight comparison unit 27 compares the learned load value and the average load value, and when the comparison value exceeds a certain value, a control signal is applied to the load control unit 28.

【００３１】荷重制御部２８は、荷重制御部２７からの
荷重制御信号を受信した場合、有声／無声検出部２２の
結果が無声の場合、ないしは振幅検出部２４における検
出結果が一定値以下の場合に、スイッチ（１）２９の切
り替え動作により、学習した荷重を復号化器３３へ伝送
する。The load control unit 28 receives the load control signal from the load control unit 27, the voiced / unvoiced detection unit 22 outputs a voiceless result, or the amplitude detection unit 24 outputs a detection result of a predetermined value or less. Then, the learned load is transmitted to the decoder 33 by the switching operation of the switch (1) 29.

【００３２】カウンタ３０は、学習した荷重が一定の出
力回数、上記した例の場合には４０個全て送出されたか
否かを判定するもので、全ての送出を終了すると、スイ
ッチ（２）３１を制御し、荷重固定型ニューラルネット
ワーク２５の荷重を学習した荷重に変更するとともに、
復号化器３３側の荷重も変更する。The counter 30 determines whether or not the learned load has been transmitted a fixed number of times, in the case of the above-mentioned example, all 40 have been transmitted. When all the transmissions are completed, the switch (2) 31 is turned on. Control and change the weight of the fixed weight neural network 25 to the learned weight, and
The load on the decoder 33 side is also changed.

【００３３】この結果、図１ａの符号化器１７からは、
学習が行われるまでは、声道パラメータ、有声／無声パ
ラメータ、ピッチ周期パラメータ、振幅パラメータの４
種類のパラメータが送られることとなり、各々の情報量
は先の従来例で示したベクトル量子化に合わせれば、声道パラメータ４×２ビット有声／無声パラメータ１ビットピッチ周期パラメータ８ビット振幅パラメータ８ビット制御パラメータ２ビットとして設定することができる。As a result, the encoder 17 of FIG.
Up to learning, vocal tract parameters, voiced / unvoiced parameters, pitch period parameters, and amplitude parameters
Parameters of various types will be transmitted, and each information amount will be vocal tract parameter 4 × 2 bits Voiced / unvoiced parameter 1 bit Pitch period parameter 8 bits Amplitude parameter 8 bits, according to the vector quantization shown in the conventional example The control parameter can be set as 2 bits.

【００３４】荷重切替部３２は、荷重制御部２８からの
荷重制御信号、カウンタ３０からの出力を受け、当初は
平均荷重を用いながら、適応荷重のピッチ周期スロット
を用いた伝送、荷重切り替えのタイミング信号を送出す
る。The load switching unit 32 receives the load control signal from the load control unit 28 and the output from the counter 30, and initially uses the average load, but transmits using the pitch cycle slot of the adaptive load, and the timing of load switching. Send a signal.

【００３５】学習後は、学習の効果がある場合はピッチ
周期パラメータの部分が無声音ないし無音区間で学習後
の荷重が伝送され、全ての荷重が伝送されると制御信号
により荷重の切り替えが行われる。After learning, if there is an effect of learning, the weight of the pitch period parameter is transmitted in the unvoiced sound or the silent section, and the weight is switched by the control signal when all the weights are transmitted. .

【００３６】図１ｂに復号化器３３の一実施例を示す。
図１ｂに示した復号化器３３は、荷重制御信号のデコー
ダ３４により符号化器１７からの制御信号がデコードさ
れ、この制御信号がピッチ／荷重切替スイッチ３５を制
御して、ピッチ周期と適応荷重パラメータの切り替えを
行う。FIG. 1b shows an embodiment of the decoder 33.
In the decoder 33 shown in FIG. 1B, the control signal from the encoder 17 is decoded by the decoder 34 for the weight control signal, and this control signal controls the pitch / weight changeover switch 35 to set the pitch period and the adaptive weight. Switch parameters.

【００３７】声道パラメータは、ニューラルネットワー
ク復号部３６へ送られ、このニューラルネットワーク復
号部３６の荷重は、平均荷重３７と適応荷重３８の２種
類の荷重が接続されている。この２個の荷重は、スイッ
チ（３）３９により選択される。The vocal tract parameter is sent to the neural network decoding unit 36, and the weight of the neural network decoding unit 36 is connected to two types of weights, an average weight 37 and an adaptive weight 38. The two loads are selected by the switch (3) 39.

【００３８】３８の適応荷重は荷重制御デコーダ３４の
制御を受けるピッチ／荷重切替スイッチ３５の出力を受
けている。The adaptive load 38 receives the output of the pitch / load changeover switch 35 which is controlled by the load control decoder 34.

【００３９】パルス列発生部４０は、ピッチ／荷重切替
スイッチ３５からのピッチ周期信号を受けている。The pulse train generator 40 receives the pitch period signal from the pitch / load changeover switch 35.

【００４０】スイッチ（４）４１は有声／無声パラメー
タで制御されるスイッチであり、有声音の場合にはパル
ス列発生部４０からのパルス列を、無声音の場合にはラ
ンダムノイズ発生部４２からのランダム信号を選択す
る。そして、選択された信号が声道フィルタ４３に入力
され、声道フィルタ４３の出力は、レファレンス信号が
振幅パラメータで制御されるＤＡコンバータ４４により
アナログ信号に変換され、音声出力となる。The switch (4) 41 is a switch controlled by a voiced / unvoiced parameter, and in the case of voiced sound, the pulse train from the pulse train generator 40, and in the case of unvoiced sound, the random signal from the random noise generator 42. Select. Then, the selected signal is input to the vocal tract filter 43, and the output of the vocal tract filter 43 is converted into an analog signal by the DA converter 44 in which the reference signal is controlled by the amplitude parameter, and becomes a voice output.

【００４１】また、この復号化器３３は、クロックを供
給するためのクロックジェネレータ４５を有している。The decoder 33 also has a clock generator 45 for supplying a clock.

【００４２】図１ｂのニューラルネットワーク復号部３
６の基本構成を図２ｂに示す。ここで、β1 〜β4 は符
号化器１７から伝送されてくる圧縮された声道パラメー
タであり、この値が荷重ｗによって１０個の声道パラメ
ータへ復号される。そして、この荷重ｗが音声入力によ
り適応制御される。Neural network decoding unit 3 of FIG. 1b
The basic configuration of 6 is shown in FIG. 2b. Here, β1 to β4 are compressed vocal tract parameters transmitted from the encoder 17, and these values are decoded into 10 vocal tract parameters by the weight w. Then, this load w is adaptively controlled by voice input.

【００４３】ニューラルネットワーク復号部３６には、
平均荷重３７と適応荷重３８の２個の荷重があり、平均
荷重３７は平均的な荷重、適応荷重３８は学習した荷重
である。適応荷重３８は制御パラメータの制御により、
荷重更新信号の場合はピッチパラメータスロットの信号
を荷重として受け、またこの制御信号により、スイッチ
（３）３９を制御して荷重の切り替えを行っている。The neural network decoding unit 36 includes
There are two loads, an average load 37 and an adaptive load 38. The average load 37 is an average load and the adaptive load 38 is a learned load. The adaptive load 38 is controlled by the control parameter.
In the case of the load update signal, the signal of the pitch parameter slot is received as the load, and the switch (3) 39 is controlled by this control signal to switch the load.

【００４４】図１ｂに示した復号化器３３の動作は、符
号化器１７から送られる荷重制御信号により、ピッチ周
期ないし適応荷重のパラメータスロットの信号の性質を
解読し、ピッチ周期であればパルス列発生部４２で発生
するパルス列を制御し、適応荷重のパラメータであれ
ば、適応荷重３８の値を順次更新し、全ての更新が終了
した時点で荷重を平均値から適応値へ更新する。The operation of the decoder 33 shown in FIG. 1b is to decode the characteristics of the signal of the pitch period or the parameter slot of the adaptive weight by the weight control signal sent from the encoder 17, and if it is the pitch period, the pulse train. The pulse train generated by the generation unit 42 is controlled, and if it is a parameter of the adaptive load, the value of the adaptive load 38 is sequentially updated, and when all the updates are completed, the load is updated from the average value to the adaptive value.

【００４５】声道パラメータは、ニューラルネットワー
ク復号部３６で復号され、このパラメータで声道フィル
タ４３を作成する。そして、有声／無声パラメータによ
り有声、無性を判断し、無声音であればランダムノイズ
発生部４２で発生したランダムノイズを、有声音であれ
ばパルス列発生部４０で発生したパルス列を声道フィル
タ４３へ入力する。The vocal tract parameters are decoded by the neural network decoding unit 36, and the vocal tract filter 43 is created using these parameters. Then, the voiced / unvoiced parameter is used to determine voiced / unvoiced, and if unvoiced, the random noise generated by the random noise generator 42 is sent to the vocal tract filter 43 if the voice train is the pulse train generated by the pulse train generator 40. input.

【００４６】声道フィルタ４３は、レファレンス信号を
振幅パラメータで制御されたＤＡコンバータ４４に連絡
し、ＤＡコンバータ４４では符号化器１７に入力された
音声振幅に合わせた大きさの音声信号を再生する。The vocal tract filter 43 communicates the reference signal to the DA converter 44 controlled by the amplitude parameter, and the DA converter 44 reproduces a voice signal having a size corresponding to the voice amplitude input to the encoder 17. .

【００４７】以下、本発明に係るニューラルネットワー
クを用いた適応化声道パラメータと、従来の声道パラメ
ータとの比較を述べる。従来例としてあげたベクトル量
子化においても原理的には複数のベクトル量子化部を備
えることにより適応化は可能である。A comparison between the adapted vocal tract parameters using the neural network according to the present invention and the conventional vocal tract parameters will be described below. In principle, the vector quantization given as a conventional example can also be adapted by providing a plurality of vector quantizers.

【００４８】しかし、適応化を行った場合の伝送量の考
察を行うと、本発明に係るニューラルネットワークを用
いた方法であれば、荷重の値の変更と制御信号という観
点から、本発明における伝送量は、４０×２×８＝６４０ビットであるのに対して、従来例のベクトル量子化における伝
送量は、１０２４×１０×８＝８１９２０ビットとなる。However, considering the amount of transmission in the case of adaptation, the method using the neural network according to the present invention, the transmission in the present invention from the viewpoint of changing the value of the load and the control signal. The amount is 40 × 2 × 8 = 640 bits, while the transmission amount in the conventional vector quantization is 1024 × 10 × 8 = 81920 bits.

【００４９】また、本発明で示すように、学習を終えた
荷重データを復号化器３３に伝送することを考えると、
ニューラルネットワークの荷重の変更の場合は、本発明
では、６４０÷８＝８０フレームであるのに対して、従来のベクトル量子化では、８１９２０÷８＝１０２４０フレームとなる。Further, as shown in the present invention, considering that the weight data which has been learned is transmitted to the decoder 33,
In the case of changing the weight of the neural network, in the present invention, 640/8 = 80 frames, whereas in the conventional vector quantization, 81920/8 = 10240 frames.

【００５０】データ変更のタイミングを無音区間及び無
声区間とすると、無音区間は、平均的に音声の３０％、
無声区間は２０％と考えられていることから、ニューラ
ルネットワークでの荷重変更は、８０×０．０２×２＝３．２秒でベクトル量子化のベクトル変更量は、１０２４０×０．０２×２＝４０９．６秒となり、かなり長時間の会話が継続したときのみデータ
変更が行えることになり実用化はできない。Assuming that the timing of data change is the silent section and the unvoiced section, the silent section averages 30% of the voice,
Since the unvoiced section is considered to be 20%, the weight change in the neural network is 80 × 0.02 × 2 = 3.2 seconds, and the vector change amount of vector quantization is 10240 × 0.02 × 2. = 409.6 seconds, which means that the data can be changed only when the conversation has continued for a considerably long time, and cannot be put to practical use.

【００５１】またここでは、荷重の学習を平均荷重から
始めることとしたが、例えば、男性音声、女性音声の２
種類の荷重のどちらかを選ぶ方法などの予め数種類の荷
重を設定することも考えられる。更に、実施例では学習
を行うニューラルネットワークを一組としたが、例えば
家庭用電話機などのように、２〜５組のニューラルネッ
トワークを備え、使用者によってニューラルネットワー
クを切り替えるといった方法も考えられる。In this case, the weight learning is started from the average weight.
It is also possible to set several types of loads in advance, such as a method of selecting one of the types of loads. Further, in the embodiment, one set of neural networks is used for learning, but a method of providing two to five sets of neural networks such as a home telephone and switching the neural networks by the user is also conceivable.

【００５２】[0052]

【発明の効果】以上説明したように、本発明によれば、
上記した従来例に示したベクトル量子化による情報の圧
縮に比べ、符号化器へ入力される音声の特徴に合わせ
て、最適な符号圧縮を行うことにより、個人特徴を伝送
することが可能になり、音声品質の向上を図ることがで
きる。As described above, according to the present invention,
Compared to the compression of information by vector quantization shown in the above conventional example, it is possible to transmit individual characteristics by performing optimal code compression according to the characteristics of the voice input to the encoder. It is possible to improve the voice quality.

[Brief description of drawings]

【図１】１ａ本発明に係る音声符号化器のブロック
図。１ｂ本発明に係る音声復号化器のブロック図。FIG. 1a is a block diagram of a speech coder according to the present invention. 1b is a block diagram of a speech decoder according to the present invention.

【図２】２ａ本発明に係るニューラルネットワークの
符号化部の基本構成の説明図。２ｂ本発明に係るニューラルネットワークの復号化部
の基本構成の説明図。2a is an explanatory diagram of a basic configuration of a coding unit of the neural network according to the present invention. FIG. 2b An explanatory diagram of the basic configuration of the decoding unit of the neural network according to the present invention.

【図３】３ａ本発明に係るニューラルネットワークに
おける符号圧縮の原理図。３ｂ本発明に係るニューラルネットワークにおける符
号圧縮の原理図。FIG. 3a is a principle diagram of code compression in the neural network according to the present invention. 3b is a principle diagram of code compression in the neural network according to the present invention.

【図４】４ａ従来の符号化器のブロック図。４ｂ従来の復号化器のブロック図。４ｃ従来の声道パラメータのベクトル量子化を行う場
合の関連部分のブロック図。FIG. 4a is a block diagram of a conventional encoder. 4b A block diagram of a conventional decoder. 4c is a block diagram of a related part when vector quantization of a conventional vocal tract parameter is performed.

[Explanation of symbols]

２５荷重固定型ニューラルネット２６学習型ニューラルネット２７荷重比較部２８荷重制御部２９スイッチ１３０カウンタ３１スイッチ２３２荷重切替部 25 fixed weight type neural network 26 learning type neural network 27 load comparison section 28 load control section 29 switch 1 30 counter 31 switch 2 32 load switching section

Claims

(57) [Claims]

1. When compressing vocal tract parameters in speech coding, two sets of neural networks are used, one neural network performs average weighting, and the other neural network inputs parameters to the neural network. Based on the average weighted neural network,
A method for compressing vocal tract parameter information, which has a function of detecting a section that does not affect conversation and switching to a learned neural network with weighting.