JPS6117197A

JPS6117197A - Voice analysis/synthesization method and apparatus

Info

Publication number: JPS6117197A
Application number: JP59138703A
Authority: JP
Inventors: 隆夫中島; 茂原　宏
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-07-04
Filing date: 1984-07-04
Publication date: 1986-01-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声を分析して特徴パラメータを抽出し、この
特徴パラメータに基づいて音声を合成する音声分析方法
および装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a speech analysis method and apparatus for analyzing speech, extracting feature parameters, and synthesizing speech based on the feature parameters.

[Technical background of the invention]

ＬＰＧ、ＰＡＲＣＯＲ，ｌ−、ＳＰ、フォルマント方式
等による分析合成方式の音声分析合成装置では、音声を
分析して特徴パラメータを抽出することににす、音声情
報を圧縮して記憶している。そしてこの特徴パラメータ
から原音声を合成している。さらに情報量を圧縮するた
め抽出された特徴パラメータを量子化し、この量子化パ
ラメータを記憶するようにしている。Speech analysis and synthesis apparatuses using analysis and synthesis methods such as LPG, PARCOR, l-, SP, and formant systems compress and store speech information in order to analyze speech and extract feature parameters. The original speech is then synthesized from these feature parameters. Furthermore, in order to compress the amount of information, the extracted feature parameters are quantized and the quantized parameters are stored.

この量子化の際には、特徴パラメータと量子化パラメー
タとが線形の関係にある線形変換をおこなうことが多い
。量子化パラメータにＮビットを割当てれば、量子化パ
ラメータの値は正の数で表現して０〜２Ｎ−１の範囲を
とる。ところが特徴パラメータの分布の統計的性質から
、０付近および２Ｎ−１付近の値をとる頻瓜が非常に少
ない場合がある。このとき量子化パラメータのビット数
Ｎを小さくすると、新しいビット数Ｎに対する２Ｎ−１
を超える範囲にも特徴パラメータが分布してしまうため
、単にビット数Ｎを小さくすることができず、どうして
も無駄な範囲を生じ、特徴パラメータの記憶および伝送
に余分な情報量を必要としていた。During this quantization, linear transformation is often performed in which the feature parameter and the quantization parameter have a linear relationship. If N bits are allocated to the quantization parameter, the value of the quantization parameter is expressed as a positive number and ranges from 0 to 2N-1. However, due to the statistical nature of the distribution of feature parameters, there are cases where there are very few melons that take values around 0 and around 2N-1. At this time, if the number of bits N of the quantization parameter is decreased, 2N-1 for the new number of bits N
Since the feature parameters are distributed in a range exceeding , it is impossible to simply reduce the number of bits N, resulting in a useless range and requiring an extra amount of information to store and transmit the feature parameters.

このことを考慮して特徴パラメータの一部又は全部を非
線形に変換した後世子化して、必要最少限のビット数の
量子化パラメータで特徴パラメータをあられすことがお
こなわれている。しかしながらこのように非線形に量子
化した場合には、量子化パラメータから特徴パラメータ
に復元するときに、複雑な非線形変換の逆変換をする必
要があった。通常、ＬＳＩ化された音声分析合成装置で
は、この複雑な逆変換をＲＯＭにより実用しており、回
路規模が大きくなるという問題があった。Taking this into consideration, some or all of the feature parameters are non-linearly converted and converted into descendants, and the feature parameters are created using quantization parameters with the minimum number of bits necessary. However, in the case of nonlinear quantization in this manner, it was necessary to perform complex inverse nonlinear transformation when restoring the quantization parameters to feature parameters. Normally, in an LSI-based speech analysis and synthesis device, this complicated inverse conversion is implemented using a ROM, which poses a problem of increasing the circuit scale.

〔発明の目的〕本発明は上記事情を考慮してなされたもので、少ない情
報量で歪の少ない量子化をすることができ、回路規模の
増加を招くことなく、高品位の合成音を得るとかできる
音声分析合成方法ａ５よび装置を提供することを目的と
する。[Object of the Invention] The present invention was made in consideration of the above circumstances, and it is possible to perform quantization with less distortion using a small amount of information, and to obtain high-quality synthesized sound without increasing the circuit scale. It is an object of the present invention to provide a speech analysis and synthesis method A5 and a device that can perform the following.

（発明の概要〕上記目的を達成するために本発明による音声分析方法は
、音声の特徴パラメータの量子化パラメータ中の一部の
パラメータが特定のパターンか否か検出し、特定のパタ
ーンが検出されたか否かに基づいて、量子化パラメータ
中の必要最少限のパラメータ部分を変換した変換パラメ
ータか母子化パラメータかを切換え、ぞの切換えられた
パラメータを合成パラメータどして音声を合成すること
を特徴としている。(Summary of the Invention) In order to achieve the above object, a speech analysis method according to the present invention detects whether some parameters in quantization parameters of speech feature parameters are a specific pattern, and detects whether or not a specific pattern is detected. Based on whether or not the minimum necessary parameter part of the quantization parameter is converted, the conversion parameter or the mother childization parameter is switched, and the switched parameter is used as the synthesis parameter to synthesize speech. It is said that

また本発明ににる音声合成装置は、量子化パラメータ中
の一部のパラメータが特定のパターンであるか否かを検
出するパターン検出手段と、量子化パラメータ中の他の
パラメータを変換して変換パラメータを出力する変換手
段と、パターン検出手段の検出結果により母子化パラメ
ータと変換パラメータとを切換る切換手段と、この切換
手段により切換られたパラメータを合成パラメータとし
て音声を合成する合成手段とを備えたことを特徴とする
。Further, the speech synthesis device according to the present invention includes a pattern detection means for detecting whether some parameters in the quantization parameters are a specific pattern, and converting other parameters in the quantization parameters. A conversion means for outputting a parameter, a switching means for switching between a mothering parameter and a conversion parameter based on the detection result of the pattern detection means, and a synthesis means for synthesizing speech using the parameter switched by the switching means as a synthesis parameter. It is characterized by:

（５Ｒ，明の実施例〕以下、本発明を図示の実施例に基づいて説明する。第１
図に本発明の一実施例による音声分析合成装置を示す。(Embodiment of 5R, Bright) The present invention will be explained below based on the illustrated embodiment.
The figure shows a speech analysis and synthesis device according to an embodiment of the present invention.

分析回路１ｏに入力された音声信号は、ＬＰＧ、ＰＡＲ
ＣＯＲ，ＬＳＰ、フｔ）Ｌｔｚント方式等の分析がおこ
なわれ、特徴パラメータ１１が出力される。この特徴パ
ラメータ１１は、Ｍビットであり量子化回路１２と比較
回路１３に入力される。量子化回路１２は入力したＭピ
ットの特徴パラメータ１１を量子化してＮビットの量子
化パラメータ１５を出力する。比較回路１３は、特徴パ
ラメータ１１の値による制御信号１４を量子化回路１２
へ出力し、量子化回路１２を切換えるようにしている。The audio signal input to the analysis circuit 1o is LPG, PAR
Analysis such as COR, LSP, and Ltz method is performed, and feature parameters 11 are output. This characteristic parameter 11 has M bits and is input to a quantization circuit 12 and a comparison circuit 13. The quantization circuit 12 quantizes the input feature parameters 11 of M pits and outputs N-bit quantization parameters 15. The comparison circuit 13 converts the control signal 14 based on the value of the feature parameter 11 into the quantization circuit 12.
The quantization circuit 12 is switched.

このように特徴パラメータ１１をさらに情報圧縮して得
られた量子化パラメータ１５はメモリ２ｏに記憶される
。The quantization parameters 15 obtained by further compressing the information of the feature parameters 11 in this way are stored in the memory 2o.

メモリ２０に記憶された母子化パラメータからの音声の
合成は次のようにする。メモリ２ｏがらの量子化パラメ
ータ２１は上位Ｎ′ビットがパターン検出回路３１に入
力され、下位Ｎ−Ｎ’　ビットが変換ＲＯＭ３２に入力
される。パターン検出回路　３１は母子化パラメータ２
１の上位Ｎ′ビットが特定のパターンであるか否か検出
し、その検出信号２７を切換回路３４に出力する。一方
、変換ＲＯ’Ｍ３２は母子化パラメータの下位Ｎ−Ｎ′
ビットをＭビットの変換パラメータ２４を出力する。ビ
ット拡張回路３３は、Ｍ−ＮピッＩ−の所定の付加パラ
メータ２５を出力するもので、この付加パラメータ２５
は量子化パラメータ２１に付加され、Ｍビットのパラメ
ータ２６が切換回路３４に入力される。切換回路は、パ
ラメータ２６と変換パラメータ２４とを切換えてＭビッ
トの合成パラメーータ２７として出力するものである。Speech is synthesized from the maternalization parameters stored in the memory 20 as follows. The upper N' bits of the quantization parameter 21 from the memory 2o are input to the pattern detection circuit 31, and the lower N-N' bits are input to the conversion ROM 32. Pattern detection circuit 31 is maternalization parameter 2
It is detected whether the upper N' bits of 1 are in a specific pattern or not, and the detection signal 27 is outputted to the switching circuit 34. On the other hand, the conversion RO'M32 is the lower N-N' of the maternalization parameter.
A conversion parameter 24 of M bits is output. The bit expansion circuit 33 outputs a predetermined additional parameter 25 of the M-N pin I-.
is added to the quantization parameter 21, and the M-bit parameter 26 is input to the switching circuit 34. The switching circuit switches between the parameter 26 and the conversion parameter 24 and outputs it as an M-bit composite parameter 27.

パターン検出回路３１により特定のパターンが検出され
た場合には、検出信号２７によりパラメータ２６が選択
され合成パラメータ２７として出力される。逆に特定の
パターンが検出されない場合は変換パラメータ２４が選
択され合成パラメータ２７として出力される。合成回路
３５はこの合成パラメータ２７に基づいて音声を合成す
る。When a specific pattern is detected by the pattern detection circuit 31, the parameter 26 is selected by the detection signal 27 and output as a synthesis parameter 27. Conversely, if a specific pattern is not detected, the conversion parameter 24 is selected and output as the synthesis parameter 27. The synthesis circuit 35 synthesizes speech based on the synthesis parameters 27.

次に第２図（ａ）（ｂ）（ｃ）に示す具体例により更に
詳細に説明する。ここでは４ビツトの特徴パラメータを
３ビツトの量子化パラメータに情報圧縮している。この
具体例では各パラメータを符号なし整数として取扱う。Next, a more detailed explanation will be given using specific examples shown in FIGS. 2(a), 2(b), and 2(c). Here, information is compressed from a 4-bit feature parameter to a 3-bit quantization parameter. In this specific example, each parameter is treated as an unsigned integer.

第２図＜ａ＞に示す特徴パラメータは２〜９の値をとる
。特徴パラメータから量子化パラメータの量子化は量子
化回路１２によりなされる。この具体例では量子化は４
ビツトの特徴パラメータの最上位ビットを削除すること
によりおこなわれる（第２図（ｂ））。音声合成にあた
っては、メモリ２ｏに記憶された量子化パラメータ２１
の上位２ビツトをパターン検出回路３１に入力する。パ
ターン検出回路３１はこの上位２ヒツトがｒｏｏＪなる
パターンであるか否かを検出し、その検出結果に基づき
異なる検出信号２７を切換回路３５へ出力する。−力変
換ＲＯＭ３２は量子化パラメータ２１の下位１ビツトを
人力し、第２図（Ｃ）に示すような４ピットの変換パラ
メータ２４を出力する。またビット拡張回路３３は「０
」なる１ビツトの付加パラメータ２５を出力し、この付
加パラメータ２５は量子化パラメータ２１にイ」加され
、４ビツトのパラメータ２６になる。The characteristic parameters shown in FIG. 2 <a> take values from 2 to 9. The quantization circuit 12 quantizes the quantization parameters from the feature parameters. In this specific example, the quantization is 4
This is done by deleting the most significant bit of the bit characteristic parameter (FIG. 2(b)). For speech synthesis, the quantization parameter 21 stored in the memory 2o
The upper two bits of the data are input to the pattern detection circuit 31. The pattern detection circuit 31 detects whether the top two hits are the pattern rooJ or not, and outputs a different detection signal 27 to the switching circuit 35 based on the detection result. - The force conversion ROM 32 inputs the lower 1 bit of the quantization parameter 21 and outputs a 4-pit conversion parameter 24 as shown in FIG. 2(C). Further, the bit expansion circuit 33 is “0”.
This additional parameter 25 is added to the quantization parameter 21 to become a 4-bit parameter 26.

切換回路３４は検出信号２７によりパラメータ２６と変
換パラメータ２６とを切換える。すなわち、ｒｏＯＪな
るパターンが検出された場合は、変換パラメータ２４に
切換え、「００」なるパターンが検出されない場合は、
パラメータ２６に切換える。これにより量子化パラメー
タ２１から、第２図（ａ）に示す合成パラメータ２７が
生成される。このように３ビツトのΦ子化パラメータか
ら４ビツトの合成パラメータを得ることができる。The switching circuit 34 switches between the parameter 26 and the conversion parameter 26 based on the detection signal 27. That is, when the pattern roOJ is detected, the conversion parameter is switched to 24, and when the pattern "00" is not detected,
Switch to parameter 26. As a result, a synthesis parameter 27 shown in FIG. 2(a) is generated from the quantization parameter 21. In this way, a 4-bit synthesis parameter can be obtained from a 3-bit Φ-conversion parameter.

次に第３図（ａ）（ｂ）（ｃ）に示す他の具体例に基づ
いて本実施例を更に詳細に説明する。この具体例は２〜
１６の値をとる５ビツトの特徴パラメータを３ビツトの
量子化パラメータに情報圧縮している。この量子化は琶
子化回路１２と比較回路１３によりなされる。比較回路
１３では特徴パラメータ１１の値を１７」と比較し、「
７」以下の場合は５ビツトの特徴パラメータ１１の上位
２ビツトを削除し、「７」より大ぎい場合は特徴パラメ
ータ１１の値に応じて第３図（ｂ）に示すようにｒｏｏ
ＯＪ、ｒｏｏｌＪを出力する。音声合成にあたっては、
メモリ２ｏに記憶された量子化パラメータ２１の上位２
ビツトをパターン検出回路３１に入力する。パターン検
出回路３１はこの上位２ビツトが「ｏＯ」なるパターン
であるが否かを検出し、その検出結果に基づき異なる検
出信号２７を切換回路３５へ出力する。一方ｉ換ＲＯＭ
３２は量子化パラメータ２１の下位１ビツトを入力し、
第３図（Ｃ）に示すような５ビツトの変換パラメータ２
４を出力する。またビット拡張回路３３は「００」なる
２ビツトの付加パラメータ２５を出力し、この付加パラ
メータ２５は量子化パラメータ２１に付加され、５ビツ
トのパラメータ２６になる。Next, this embodiment will be described in more detail based on other specific examples shown in FIGS. 3(a), 3(b), and 3(c). This specific example is from 2 to
Information is compressed from a 5-bit feature parameter that takes a value of 16 to a 3-bit quantization parameter. This quantization is performed by a quantization circuit 12 and a comparison circuit 13. The comparison circuit 13 compares the value of the feature parameter 11 with "17", and
If the value is less than 7, the upper two bits of the 5-bit feature parameter 11 are deleted, and if it is greater than 7, roo is deleted as shown in FIG. 3(b).
Output OJ and roolJ. For speech synthesis,
Top 2 of quantization parameters 21 stored in memory 2o
The bits are input to the pattern detection circuit 31. The pattern detection circuit 31 detects whether the upper two bits are the pattern "oO" or not, and outputs a different detection signal 27 to the switching circuit 35 based on the detection result. On the other hand, i-conversion ROM
32 inputs the lower 1 bit of the quantization parameter 21,
5-bit conversion parameter 2 as shown in Figure 3(C)
Outputs 4. Further, the bit expansion circuit 33 outputs a 2-bit additional parameter 25 of "00", and this additional parameter 25 is added to the quantization parameter 21 to become a 5-bit parameter 26.

切換回路３４は検出信号２７によりパラメータ２６と変
換パラメータ２４とを切換る。すなわち、「００Ｊなる
パターンが検出された場合は、変換パラメータ２４に切
換え、「ＯＯ」なるパターンが検出されない場合は、パ
ラメータ２６に切換える。これにより量子化パラメータ
２１から、第２図（ａ）に示す合成パラメータ２７が生
成される。The switching circuit 34 switches between the parameter 26 and the conversion parameter 24 based on the detection signal 27. That is, when the pattern "00J" is detected, the conversion parameter is switched to 24, and when the pattern "OO" is not detected, the conversion parameter is switched to the parameter 26. As a result, a synthesis parameter 27 shown in FIG. 2(a) is generated from the quantization parameter 21.

このように３ビツトの量子化パラメータから非線形変換
された５ビツトの合成バラメークを得ることができる。In this way, it is possible to obtain a 5-bit composite parameter makeup that has been nonlinearly transformed from 3-bit quantization parameters.

このように本実施例によれば極めて簡単な回路で、量子
化パラメータから合成パラメータを生成することができ
有効である。特に非線形の部分にだＧノ変換ＲＯＭを使
用すればよく、小容量の変換ＲＯＭで非線形変換が可能
である。また線形、非線形を任意に組合せＩど量子化も
可能である。As described above, according to this embodiment, it is possible to generate synthesis parameters from quantization parameters using an extremely simple circuit, which is effective. In particular, a G conversion ROM may be used for the nonlinear portion, and nonlinear conversion can be performed with a small capacity conversion ROM. Furthermore, quantization can be performed by arbitrarily combining linear and nonlinear methods.

例えばＲＡ　ＲＣＯＲ方式で音声の分析合成をおこなう
場合、従来はピッチデータを７ビツトで線形量子化して
いる。この場合０〜１２７のｔ！囲のピッチデータが量
子化できる。しかし実際に人の声を分析すると１５以下
のピッチデータが得られることはなく、０〜１５の範囲
が無駄となっていた。一方、男性の低目の声はピッチデ
ータが１３０以上ある場合もあり、従来はこれを正しく
量子化することができなかった。しかしながら上記実施
例によれば同じ７ビツトで量子化する場合でも１６〜１
４３の範囲のピッチデータを量子化することができ、量
子化パラメータの情報量を増やさず、かつ回路規模もほ
とんど増加することなく高品位の合成音を得ることがで
きる。For example, when analyzing and synthesizing speech using the RA RCOR method, conventionally pitch data is linearly quantized using 7 bits. In this case, t! from 0 to 127! The surrounding pitch data can be quantized. However, when actually analyzing human voices, pitch data below 15 is never obtained, and the range from 0 to 15 is wasted. On the other hand, a low-pitched male voice may have pitch data of 130 or more, and it has not been possible to correctly quantize this in the past. However, according to the above embodiment, even when quantizing with the same 7 bits, 16 to 1
Pitch data in a range of 43 can be quantized, and a high-quality synthesized sound can be obtained without increasing the amount of information of quantization parameters and with almost no increase in circuit scale.

また例えば１６〜１６０の値をとる特徴パラメータを７
ビツトで量子化する場合は次のようにする。特徴パラメ
ータの１６〜１２７に対してはそのままの値で量子化し
、１２８〜１６０の値に対しては対数圧縮してＯ〜１５
の値に変換して量子化パラメータとする。合成パラメー
タを生成する場合は、量子化パラメータのＯ〜１５の部
分をパターン検出装置３１で検出し、変換ＲＯＭ３２に
より指数伸長した変換をおこない、１６〜１６０の合成
パラメータを得るようにする。For example, a feature parameter that takes a value of 16 to 160 is set to 7.
To quantize in bits, do as follows. The feature parameters 16 to 127 are quantized as they are, and the values 128 to 160 are logarithmically compressed to 0 to 15.
Convert it to the value of and use it as the quantization parameter. When generating synthesis parameters, the pattern detection device 31 detects the quantization parameters from 0 to 15, and the conversion ROM 32 performs exponential expansion conversion to obtain synthesis parameters from 16 to 160.

〔Effect of the invention〕

以上の通り本発明によれば回路規模の増加を招くことな
く、少ない情報量で歪の少ない量子化をづることかぐき
る。As described above, according to the present invention, quantization with little distortion can be achieved with a small amount of information without increasing the circuit scale.

[Brief explanation of the drawing]

第１図は本発明の一実施例による音声分析合成装置のブ
ロック図、第２図（ａ）、（ｂ）、（Ｃ）は同音声分析合成装置の
特徴パラメータ、合成パラメータ、量子化パラメータ、
変換ＲＯＭの具体例を示す図、第３図（ａ＞、（ｂ）、
（Ｃ）は同音声分析合成装置の特徴パラメータ、合成パ
ラメータ、量子化パラメータ、変換ＲＯＭの他の具体例
を示す図である。１０・・・分析回路、１１・・・特徴パラメータ、１２
・・・量子化回路、１３・・・比較回路、１５・・・量
子化パラメータ、２０・・・メモリ、２１・・・量子化
パラメータ、２７・・・合成パラメータ、３１・・・パ
ターン検出回路、３２・・・変換ＲＯＭ、３３・・・ビ
ット拡張回路、３４・・・切換回路、３５・・・合成回
路。第１図FIG. 1 is a block diagram of a speech analysis and synthesis device according to an embodiment of the present invention, and FIGS. 2(a), (b), and (C) show characteristic parameters, synthesis parameters, quantization parameters, and
A diagram showing a specific example of a conversion ROM, FIG. 3 (a>, (b),
(C) is a diagram showing another specific example of characteristic parameters, synthesis parameters, quantization parameters, and conversion ROM of the speech analysis and synthesis device. 10...Analysis circuit, 11...Characteristic parameter, 12
Quantization circuit, 13 Comparison circuit, 15 Quantization parameter, 20 Memory, 21 Quantization parameter, 27 Synthesis parameter, 31 Pattern detection circuit , 32... Conversion ROM, 33... Bit expansion circuit, 34... Switching circuit, 35... Synthesizing circuit. Figure 1

Claims

[Claims] 1. The feature parameters are quantized so that some of the parameters have a specific pattern according to the values of the feature parameters obtained by analyzing the voice, and some of the quantized parameters are detecting whether or not it is a specific pattern, and switching between a conversion parameter obtained by converting at least a necessary minimum parameter part of the quantization parameter and the quantization parameter based on whether or not the specific pattern is detected; , A speech analysis and synthesis method characterized in that speech is synthesized using the switched parameters as synthesis parameters. 2. analysis means for analyzing voice and extracting feature parameters; quantizing the feature parameters so that some of the parameters have a specific pattern according to the value of the feature parameters extracted from the analysis means; quantization means for generating quantization parameters; pattern detection means for detecting whether some of the parameters in the quantization parameters are a specific pattern; and inputting other parameters in the quantization parameters. a conversion means for outputting a conversion parameter according to the other parameters; a switching means for switching between the quantization parameter and the conversion parameter according to the detection result of the pattern detection means; A speech synthesis analysis device comprising: synthesis means for synthesizing speech using parameters obtained as synthesis parameters. 3. The speech analysis and synthesis device according to claim 2, wherein the upper bits in the quantization parameter are used as the part of the parameter. 4. The speech analysis and synthesis device according to claim 2 or 3, wherein the number of bits of the synthesis parameter is larger than the number of bits of the quantization parameter. 5. analysis means for analyzing voice and extracting feature parameters; quantizing the feature parameters so that some of the parameters have a specific pattern according to the value of the feature parameters extracted from the analysis means; quantization means for generating a quantization parameter; pattern detection means for detecting whether or not some of the parameters in the quantization parameter are a specific pattern; and a pattern detection means for adding a predetermined parameter to the quantization parameter. , a bit extension means for outputting an extended parameter obtained by extending the quantization parameter; a conversion means for inputting another parameter in the quantization parameter and outputting a transformation parameter according to the other parameter; and the pattern detection. A speech synthesis and analysis device comprising: switching means for switching between the extended parameter and the conversion parameter according to a detection result of the switching means; and a synthesis means for synthesizing speech using the parameters switched by the switching means as synthesis parameters. 6. The speech analysis and synthesis device according to claim 5, wherein the upper bits in the quantization parameter are used as the part of the parameter. 7. The speech analysis and synthesis device according to claim 5 or 6, wherein the number of bits of the synthesis parameter is greater than the number of bits of the quantization parameter.