JPS5853359B2

JPS5853359B2 - speech synthesizer

Info

Publication number: JPS5853359B2
Application number: JP55064974A
Authority: JP
Inventors: 憲正山田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1980-05-15
Filing date: 1980-05-15
Publication date: 1983-11-29
Also published as: JPS56161598A

Description

【発明の詳細な説明】この発明はディジタル音声合成装置、特にその周波数パ
ラメータ復号方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a digital speech synthesizer, and particularly to a frequency parameter decoding method thereof.

ディジタル音声合成装置は原音声を特徴づける音源情報
、スペクトル包絡情報を分析抽出し低ビットでディジタ
ル符号化されたパラメータを復号し原音を復元するもの
である。A digital speech synthesizer analyzes and extracts sound source information and spectral envelope information that characterize the original speech, decodes parameters digitally encoded with low bits, and restores the original speech.

従来この種の装置として第１図に示すものがあった。A conventional device of this type is shown in FIG.

図において１はピッチデータ（周波数の逆数）変換ＲＯ
Ｍ、２は有声音パルス発生器、３は無声音パルス発生器
、４は切換スイッチ、５は乗算器、６は振巾変換ＲＯＭ
、７は格子型ディジタルフィルタ、８はディジタルフィ
ルタ係数変換ＲＯＭ、９はＤＡ変換器、１０はスピーカ
ーである。In the figure, 1 is pitch data (reciprocal of frequency) conversion RO
M, 2 is a voiced pulse generator, 3 is an unvoiced pulse generator, 4 is a changeover switch, 5 is a multiplier, 6 is an amplitude conversion ROM
, 7 is a lattice type digital filter, 8 is a digital filter coefficient conversion ROM, 9 is a DA converter, and 10 is a speaker.

１１は有声音ピッチ制御入力が印加される端子、１２は
有声音、無声音を切換え制御する信号の入力端子、１３
は音源振巾制御入力端子、１４はディジタルフィルタ係
数入力端子である。11 is a terminal to which a voiced sound pitch control input is applied; 12 is an input terminal for a signal that controls switching between voiced and unvoiced sounds; 13
14 is a sound source amplitude control input terminal, and 14 is a digital filter coefficient input terminal.

次に動作について説明する。Next, the operation will be explained.

音声パラメータのうち、有声音の符号化されたピッチ制
御パラメータがピッチ制御入力端子１１に与えられる。Among the voice parameters, encoded pitch control parameters of voiced sounds are applied to the pitch control input terminal 11 .

ピッチ変換ＲＯＭ１によって復号されたピッチデータに
対応した周期をもつ有声音パルスガ、パルス発生器２か
ら切換スイッチ４を通して送出され、また音源振巾制御
入力端子１３より入力された符号化データが振巾変換Ｒ
ＯＭ６によって復号され、その振巾データーによって乗
算器５で振巾が制御される。A voiced sound pulse having a period corresponding to the pitch data decoded by the pitch conversion ROM 1 is sent from the pulse generator 2 through the changeover switch 4, and encoded data input from the sound source amplitude control input terminal 13 is amplitude converted. R
The signal is decoded by the OM 6, and the amplitude is controlled by the multiplier 5 based on the amplitude data.

一方無声音パルス発生器３からの無声音源は切換スイッ
チ４を通して乗算器５に送られる。On the other hand, the unvoiced sound source from the unvoiced sound pulse generator 3 is sent to the multiplier 5 through the changeover switch 4.

乗算器５で振巾制御された音源信号は格子型ディジタル
フィルタ７に供給される。The sound source signal whose amplitude has been controlled by the multiplier 5 is supplied to a lattice type digital filter 7.

ディジタルフィルタの係数は、ディジタルフィルタ係数
入力端子１４より符号化されて入力され、変換ＲＯＭ８
によって復号されたデータによって制御される。The coefficients of the digital filter are encoded and inputted from the digital filter coefficient input terminal 14, and are input to the conversion ROM 8.
controlled by the data decoded by

ディジタルフィルタ７の出力をＤＡ変換器９を通し、ス
ピーカ１０によって音声を発生するようになっている。The output of the digital filter 7 is passed through a DA converter 9, and a speaker 10 generates audio.

ところで音源情報のうちピッチパラメータは一般に音声
の標本化周期の整数倍で与えられ、標本化周波数を１０
ＫＨｚとすると通常の女性は、２０〜７０ピツチ（１４
０〜５００Ｈ２）、男性では４０〜１４０ピツチ（７０
〜２５０Ｈ２）であり、男女を含むピッチデータを量子
化するには７ビツト程度必要である。By the way, the pitch parameter of the sound source information is generally given as an integral multiple of the audio sampling period, and the sampling frequency is set to 10
If KHz is used, a normal woman would have 20 to 70 pitches (14
0-500H2), 40-140 pitches (70
~250H2), and approximately 7 bits are required to quantize pitch data including men and women.

周波数が低いところの量子化ステップを大きくとるよう
な非線形量子化を行っても、音質劣化を防ぐには６ビツ
ト程度必要であり、その時の変換ＲＯＭ容量は２’＝６
４ワードを要していた。Even if non-linear quantization is performed that takes a large quantization step at low frequencies, about 6 bits are required to prevent sound quality deterioration, and the conversion ROM capacity at that time is 2' = 6.
It required 4 words.

■ワードを１０ビツトとするシステムであれば６４０ビ
ツトと多くのＲＯＭ容量を要する欠点があり、またピッ
チパラメータの情報量は、フレーム周期１０ＭＳでは６
００ビット／秒、２０Ｍ５の場合でも３００ビット／秒
を要する欠点があった。■A system in which words are 10 bits has the drawback of requiring a large ROM capacity of 640 bits, and the amount of information for pitch parameters is 640 bits with a frame period of 10 MS.
Even in the case of 00 bits/sec and 20M5, there was a drawback that 300 bits/sec was required.

この発明は上記のような従来の欠点を除去するためにな
されたもので、周波数帯域を指定する情報により、ピッ
チパラメータ変換ＲＯＭのデータを定数倍する手段をも
うけることによりピッチ周期変換ＲＯＭ容量を小さくし
かつ情報量を少くすることができる音声合成装置を提供
するものである。This invention was made in order to eliminate the above-mentioned drawbacks of the conventional technology, and it is possible to reduce the pitch period conversion ROM capacity by providing a means for multiplying the data in the pitch parameter conversion ROM by a constant based on information specifying the frequency band. Moreover, it is an object of the present invention to provide a speech synthesis device that can reduce the amount of information.

以下この発明の一実施例を図について説明する。An embodiment of the present invention will be described below with reference to the drawings.

第２図において、１５ピツチデータを２倍する回路、１
６は周波数帯域を指定する信号の入力端子である。In Figure 2, a circuit that doubles 15 pitch data, 1
6 is an input terminal for a signal specifying a frequency band.

符号１〜１４をつけた部分は、第１図と同じであり説明
を省略する。The parts numbered 1 to 14 are the same as in FIG. 1, and their explanation will be omitted.

音声パラメータのうち、有声音のピッチ制御パラメータ
がピッチ制御入力端子１１に入力される。Among the voice parameters, pitch control parameters for voiced sounds are input to the pitch control input terminal 11 .

ピッチ変換ＲＯＭＩによって復号されたピッチデータが
それを２倍することができる回路１５におくられ、周波
数帯域情報を指定する入力端子１６の情報により２倍又
は１倍されたデータが有声音パルス発生器２におくられ
ピッチデータに対応した有声音パルスを発生する。The pitch data decoded by the pitch conversion ROMI is sent to a circuit 15 that can double it, and the data multiplied by 2 or 1 depending on the information at the input terminal 16 specifying frequency band information is sent to the voiced pulse generator. 2 to generate voiced sound pulses corresponding to the pitch data.

例えばピッチ変換ＲＯＭに女性に対応するピッチデータ
を内蔵するとする。For example, assume that the pitch conversion ROM contains pitch data corresponding to women.

女性のピッチ範囲は２０〜７０あれば充分であるから６
ビツトで量子化すればよく、周波数の低いところの量子
化ステップを大きくとっても音質劣化は少いから５ビッ
ト程度で充分である。A pitch range of 20 to 70 is sufficient for women, so 6
It is sufficient to quantize in bits, and even if the quantization step is increased at low frequencies, there is little deterioration in sound quality, so about 5 bits is sufficient.

男性のピッチ範囲は女性のそれの２倍程度あれば充分で
あり、これは周波数帯域を指定する入力端子１６によっ
てピッチデータを２倍する回路１５によって得られる。It is sufficient for the male pitch range to be about twice that of the female, and this is obtained by the circuit 15 which doubles the pitch data using the input terminal 16 that specifies the frequency band.

２倍する回路はシフト演算等によって簡単に実現できる
。A doubling circuit can be easily realized by a shift operation or the like.

この例ではピッチ変換ＲＯＭ容量２５＝３２ワードでよ
く、ピッチパラメータの情報量はフレーム周期１０Ｍ５
では５００ビット／秒、２０Ｍ５では２５０ビット／秒
と従来に較べて減少させることができる。In this example, the pitch conversion ROM capacity is 25 = 32 words, and the amount of information of the pitch parameter is a frame period of 10M5.
This can be reduced to 500 bits/sec for 20M5 and 250 bits/sec for 20M5, compared to the conventional technology.

なお上記実施例ではピッチデータを定数倍する手段とし
てシフト演算による２倍回路を使用したが、１／２回路
であってもよく、又任意の係数を乗する乗算器であって
も良い。In the above embodiment, a doubling circuit based on a shift operation is used as means for multiplying pitch data by a constant, but a 1/2 circuit or a multiplier for multiplying by an arbitrary coefficient may also be used.

以上のようにこの発明によれば音声合成装置に周波数帯
域を指定する情報によりピッチパラメータ変換ＲＯＭの
データを定数倍する手段をもうけたのでピッチ変換ＲＯ
Ｍ容量を減らすことができ、また情報量を減らすことが
できる効果がある。As described above, according to the present invention, there is provided a means for multiplying the data in the pitch parameter conversion ROM by a constant based on the information specifying the frequency band in the speech synthesizer.
This has the effect of reducing the M capacity and the amount of information.

[Brief explanation of drawings]

第１図は従来の音声合成装置を示すブロック図、第２図
はこの発明の一実施例による音声合成装置を示すブロッ
ク図である。１はピッチパラメータ変換ＲＯＭ、１５は定数倍する回
路、１６は周波数帯域を指定する入力端子である。FIG. 1 is a block diagram showing a conventional speech synthesis device, and FIG. 2 is a block diagram showing a speech synthesis device according to an embodiment of the present invention. 1 is a pitch parameter conversion ROM, 15 is a constant multiplication circuit, and 16 is an input terminal for specifying a frequency band.

Claims

[Claims]

1. Encode and transmit the voice feature parameters and convert R
1. A digital speech synthesis device using parameters decoded by an OM, characterized in that the device includes means for multiplying data in a pitch parameter conversion ROM by a constant based on information specifying a frequency band.