JPS593497A

JPS593497A - Fundamental frequency control system for rule synthesization system

Info

Publication number: JPS593497A
Application number: JP57112881A
Authority: JP
Inventors: 金盛　亨
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-06-30
Filing date: 1982-06-30
Publication date: 1984-01-10
Also published as: JPH0358519B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は規則合成方式の蒔声合成装＋ｔｉｃ関し、簡単
な制御により十分な性能が得られるようにしたものであ
る。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a voice synthesis system +tic using a regular synthesis method, and is capable of obtaining sufficient performance through simple control.

〔発明の従来技術Ｊ音声出力装置に杜大別して２通シの出力方式がある。１
つは出力すべき単語ないし文章をすべて側窓しておき、
それらを所望の組合わせで順次出力するものであるが、
出力すべき文章の種類が多数の場合には膨大な記憶容量
を必要とする欠点がある７辷れに対して本願の対象とする出力方式は、自然音声の
音韻単位を必要な種類だけバラメータ化して用意してお
き、任意の単語ないし文章をそれら音韻単位から合成す
る規則合成方式である　この場合、各音韻単位を並べる
だけでなく、それらのピッチや振幅を制御して、より自
然な音声とすることができる。１〔従来技術の問題点〕従来方式に訃いて、人間の発声に近い滑らかな基本周波
数の時間変化パターンを得るためには、１０ｍ５ｅｃ程
度の短い時間間隔にて、関数近位等の１を雑な手段を用
いて基本周波数を設定する必要があった。[Prior Art of the Invention J There are two types of output methods that can be broadly classified into audio output devices. 1
One is to keep all the words or sentences to be output in a side window.
It outputs them sequentially in a desired combination,
When there are a large number of types of sentences to be output, there is a drawback that a huge amount of memory capacity is required.7 The output method that is the subject of this application for addressing sluggishness parameterizes the phonological units of natural speech by only the necessary types. This is a rule synthesis method that synthesizes arbitrary words or sentences from those phonological units.In this case, in addition to arranging each phonological unit, it also controls the pitch and amplitude of each phonological unit to create more natural speech. can do. 1 [Problems with the prior art] In order to obtain a smooth time-varying pattern of the fundamental frequency close to that of human speech by using the conventional method, it is necessary to It was necessary to set the fundamental frequency using suitable means.

乙また、これら基本周波数の計算処理は、一般周波数の対
数値を用いて行なわれる。とれは人間の聴感上は周波数
の対数に応じて操作する方が優れた近似が得られること
による。しかし、この基本周波数情報に応じて音声信号
を作成する、いわゆる音声合成ｒ、５ｘＦｉすべて基本
周期（周波数の逆数）を入力するようになっておシ従っ
て周波数の対数を周期に変換する必要があシ、その六め
の複雑な回路及び変換処理時間を要するという問題があ
る。Furthermore, the calculation process of these fundamental frequencies is performed using the logarithm value of the general frequency. This is because, in terms of human hearing, a better approximation can be obtained by operating according to the logarithm of the frequency. However, so-called speech synthesis r, 5xFi, which creates audio signals according to this fundamental frequency information, all require input of the fundamental period (reciprocal of the frequency), and therefore it is necessary to convert the logarithm of the frequency into a period. The sixth problem is that it requires a complicated circuit and conversion processing time.

さらに、２つの音韻単位を接続する場合、第１の音韻単
位の最終振幅と第２の音韻単位の初期捩輻とが異なる場
合、その間を清らかになるよう補間する必要がある。こ
の振幅情報社一般に対数表示をさらに情報圧縮した形式
のパラメータで与えられるが、これを補間するには一但
整数表現にしてから補間し、再度元のパラメータ形式に
戻して音声合成ＬＳＩに入力しておシ、そのための回路
及び処理時間も大きくなるという問題がある、（発明の目的〕御所の羽的号一本発明はこれらの問題を解決し、簡単な回路で高速処理
ができ、かつ十分な性能を得られる音声合成制御方式を
提供することにある。Furthermore, when connecting two phonetic units, if the final amplitude of the first phonetic unit and the initial torsion of the second phonetic unit are different, it is necessary to interpolate between them to make it clear. This amplitude information company is generally given as a parameter in a format in which logarithmic representation is further compressed, but in order to interpolate this, it must be expressed as a single integer, then interpolated, and then returned to the original parameter format and input to the speech synthesis LSI. (Objective of the Invention) The present invention solves these problems, enables high-speed processing with a simple circuit, and has a sufficient processing time. The purpose of this invention is to provide a speech synthesis control method that can obtain excellent performance.

[Structure of the invention]

本発明は上記欠点を解決するため、先ず、基本周波数を
制御する情報は大まかな近似で与え、その情報（デジタ
ル情報）をデジタルフィルタによシ平滑化することで、
細かい近似で与えた場合に近い滑らかさを得るようにし
ている。In order to solve the above drawbacks, the present invention first provides information for controlling the fundamental frequency as a rough approximation, and then smoothes that information (digital information) using a digital filter.
The attempt is made to obtain a smoothness close to that obtained by giving a fine approximation.

また、この基本周波数情報をピッチ周期に基づくパラメ
ータに変換するのにテープｌｖ索引方式を採用し、ハー
ドウェアの単純化、特性変更の容易化を達成する。Furthermore, a tape lv index method is employed to convert this fundamental frequency information into parameters based on pitch periods, thereby achieving hardware simplification and ease of changing characteristics.

また、振幅情報の補間処理においては、−但整数値に変
換することをせず、捩幅パヲメータをそのまま純２進数
と見なして補間することによシ、高速かつ高性能の補間
処理を行なうようにしている。In addition, in the interpolation processing of amplitude information, the torsion width parameter is treated as a pure binary number and interpolated without converting it to an integer value, thereby achieving high-speed and high-performance interpolation processing. I have to.

[Embodiments of the invention]

第１図は、規則合成方式の説明図であり、「ヤマガタ」
という単語を合成する場合を例にしている。音韻単位の
分は形にはいくつかの方式があるが、ここではいわゆる
ＶＣＶ　（母音・子音・母音）の組合わせを用いている
。Figure 1 is an explanatory diagram of the rule synthesis method.
This example uses the case of composing the words . There are several ways to form phonological units, but here we use a combination of so-called VCV (vowel, consonant, vowel).

［ＹＡＭＡＧＡＴＡＪは５つの音韻単位「ＹＡＪ。[YAMAGATAJ is composed of five phonological units “YAJ.

１’−ＡＭＡＪ　、　［ＡＧＡＪ　、　［ＡＴＡＪ　、
　［ＡＪを接続して得られる。（同図（ａ））各単位間は同一母音で接続すればよいので、自然な接続
が比較的容易に得られる。1'-AMAJ, [AGAJ, [ATAJ,
[Obtained by connecting AJ. ((a) in the same figure) Since each unit can be connected using the same vowel, a natural connection can be obtained relatively easily.

同図（ｂ）は、基本周波数情報（対数表現）を示してお
シ、「マ」にアクセントがあることを示している。FIG. 6B shows fundamental frequency information (logarithmic expression) and shows that there is an accent on "C" and "M".

同図（ｃ）は、基本周波数のピッチ（周期）表現を示し
ておシ、対数表現では直線のものが周期では曲線になっ
ている。FIG. 6(c) shows a pitch (periodic) representation of the fundamental frequency, where the logarithmic representation is a straight line, but the period is a curved line.

第２図は、本発明の一実施例ブロック図であり、１は合
成すべき単語／文章を文字コードで入力する手段・２は
文字列を音韻単位（ＶＣＶ）の系列情報へ変換する手段
、３は音韻単位の系列をＶ、Ｃの系列情報へ分解整列す
る手段、４はアクセント、イントネーションのパターン
を指定する手段、５は韻律テーブルでイントネーション
等に応じたパラメータを与えるもの、６は基本周波数や
各音韻間の接続時間の時系列的なパターンを作成する手
段、７はデジタルフィルタ、８は変換テーブル、９は各
音韻単位（ＶＣＶ　）が、例えば１０ｍ５ｅｃのサンプ
リング周期でバフメータ化されて格納されたファイル、
１０はｖＣ■パラメータを結合する手段、１１がパラメ
ータに応じて音声信号を合成する手段、１２はスピーカ
である。FIG. 2 is a block diagram of an embodiment of the present invention, in which 1 is a means for inputting words/sentences to be synthesized in character codes; 2 is a means for converting character strings into sequence information of phoneme units (VCV); 3 is a means for disassembling and arranging the sequence of phoneme units into V and C sequence information, 4 is a means for specifying accent and intonation patterns, 5 is a prosody table that provides parameters according to intonation, etc., and 6 is a fundamental frequency 7 is a digital filter, 8 is a conversion table, and 9 is a means for creating a time-series pattern of the connection time between phonemes; 9, each phoneme unit (VCV) is converted into a buff meter and stored at a sampling period of 10 m5ec, for example; file,
10 is a means for combining vC■ parameters, 11 is a means for synthesizing an audio signal according to the parameters, and 12 is a speaker.

イントネーション情報等によって音韻テープ／Ｌ１５を
索引して基本周波数情報（ｌｏｇ　ｆ）を求める場合、
従来では充分滑らかなｌｏｇｆ曲線を得るには、この音
韻テーブル５の情報を上記サンプリング周期と同程度の
周期で詳細に用意するか、又はテープ／％１５の情報は
粗っぽくしてその代シ比較的複雑な所定の関数を用いて
その間を補間するかしていた。When indexing the phonetic tape/L15 using intonation information etc. to obtain fundamental frequency information (log f),
Conventionally, in order to obtain a sufficiently smooth logf curve, the information in the phoneme table 5 must be prepared in detail at a period comparable to the above-mentioned sampling period, or the information on tape/%15 must be made coarse and replaced by a sample. A relatively complex predetermined function was used to interpolate between the two.

これに対して本発明ではテープ／Ｌ１５の情報は例えば
１００１１１１６１ｊ　　毎程度に粗つホ＜シ、かつ補
間は直線補間等の単純な関数で行なう。その代り、その
情報はデジタルフィルり７にて平滑化を施される。On the other hand, in the present invention, the information on the tape/L15 is coarsely divided, for example, every 100111161j, and the interpolation is performed using a simple function such as linear interpolation. Instead, the information is smoothed in digital fill 7.

第３図は、デジタルフィルタ７の一実施例であり、入力
Ｘは減算器３１にて出力Ｙの遅延（Ｚ”’）したものＹ
Ｘ’□Ｉのと差をとられ、それをアンプ３２でａ倍（１
＞ａ＞Ｏ）Ｌ、それにＹＸ］を加算器３３で加算して出
力される。出力Ｙは１サンプリング周期（例えば１０ｍ
ｍｅｅ）遅延素子３４で遅延されて減算＠？；３Ｘ＋加
算器３３にフィードバックされる。この回路の伝達特性
は、Ｙ　　　　　　　　　　ａであり、Ｚ　＝　１−　ａのときに極であり、Ｓ平面に
写像すると、Ｚ＝ｅ”πｆＴ　　となる。ここでｆは極
周波数、Ｔはサンプリング周期である。FIG. 3 shows an embodiment of the digital filter 7, where the input X is the output Y delayed (Z"') by the subtracter 31.
The difference between X'□I is taken and multiplied by a (1
>a>O)L, and YX] are added by an adder 33 and output. The output Y is one sampling period (for example, 10m
mee) Delayed by delay element 34 and subtraction @? ;Feedback to 3X+adder 33. The transfer characteristic of this circuit is Y a , which is a pole when Z = 1-a, and when mapped to the S plane, it becomes Z = e”πfT, where f is the polar frequency and T is the sampling period. be.

今、ａ−１／４　、　Ｔ　＝＝　０．０１秒とすると、
ハｆ−Ｊ　ｎ　（１ａ　）中４．６Ｈｚ　。Now, assuming a-1/4 and T == 0.01 seconds,
4.6 Hz in Ha f-J n (1a).

２πＴ即ち、カットオフ周波数４．６Ｈｚ、−６ｄｂ１０ｃｔ
のローパス・フィルタトナル。2πT i.e. cutoff frequency 4.6Hz, -6db10ct
low-pass filter tonal.

このようなフィルタを用いれば第１図（ｂ）の実線のよ
うな直線近似でも同図点線のような滑らかなｌｏｇ　ｆ
　　情報となる。If such a filter is used, even a straight line approximation like the solid line in Figure 1(b) can be achieved with a smooth log f like the dotted line in the same figure.
It becomes information.

さらに、このようにして得た基本周波数の対数表現情報
を周期情報に変換するには変換テーブル８を用いる。第
４図はテーブルの一実施例ヲ示シ、ＲＯＭ　（リードオ
ンリーメモリ）を用い、入力には例えば８ビツト２進数
で（０〜２５５）＋ｏが与えられ、これに対して出力に
は７ビツト２進数で（１１１〜２１）１０が出力される
。Furthermore, a conversion table 8 is used to convert the logarithmic expression information of the fundamental frequency obtained in this way into period information. FIG. 4 shows an example of a table in which a ROM (read only memory) is used, and the input is given, for example, an 8-bit binary number (0 to 255) + o, whereas the output is given as a 7-bit binary number. (111-21) 10 is output in binary.

このようなテーブル変換方式によればＲＯＭを変換又は
切換えるのみで、任意の特性の装置に変換することがで
きる。例えば男声から女声への変更等が容易に行なえる
。。According to such a table conversion method, it is possible to convert the device into a device with arbitrary characteristics simply by converting or switching the ROM. For example, changing from a male voice to a female voice can be easily performed. .

また、各音韻単位間の振幅補間は以下のとおシに行なう
。Further, amplitude interpolation between each phoneme unit is performed as follows.

尚、ＶＣＶファイル９には各音韻単位が１０ｍａｌｅｅ
毎の各種パラメータの時系列集合として記憶されている
。振幅情報もそのパラメータの１つである。第１図（ａ
）の［ＹＡＪの最後と［ＡＭＡＪの最初のように同一母
音ｒＡＪでもその振幅は一般に異なっており、それらを
接続するときには滑らかにつながるように補間を施す必
要がある。パラメータ中の振幅Ｗ報は一般に次表のよう
に表わされる。In addition, each phoneme unit is 10 male in VCV file 9.
It is stored as a time-series set of various parameters for each time. Amplitude information is also one of the parameters. Figure 1 (a
) The amplitudes of the same vowel rAJ are generally different, such as the end of [YAJ and the beginning of [AMAJ], and when connecting them, it is necessary to perform interpolation to connect them smoothly. The amplitude W information among the parameters is generally expressed as shown in the following table.

即ち、ｌＯ進整数は浮動小数点の２進表示にされ、その
小数点以下第１桁は常に”１”であるので（正規化しで
ある故）それを省いてビット数を圧縮している。That is, an IO integer is expressed as a floating point binary number, and since the first digit below the decimal point is always "1" (because it is normalized), it is omitted to compress the number of bits.

例えば、ある音韻の最終振幅パフメータが”１０００１
”で、次の音韻の最初の振幅パラメータが１０１０１”
の場合、従来は各パラメータを整数値”１０”と”２０
″とに一但変換し、その間の直線補間をとって再びパラ
メータ形式に戻し７ている。For example, the final amplitude puff meter of a certain phoneme is "10001".
"Then, the first amplitude parameter of the next phoneme is 10101"
In the case of
'', linear interpolation is performed between them, and the parameter format is returned again.

これに対して本発明ではパラメータ形式をそのまま純２
進数と見なして直接に直線補間を行なう。即ち、」１記
の例では１０００１　”と”１０１旧”の差は００１０
０”であシ、仮りに４点間に直線補間するとすれば、各
点に”００００１”の差をつければよい。On the other hand, in the present invention, the parameter format is changed directly to pure 2
It is treated as a base number and linear interpolation is performed directly. In other words, in the example in item 1, the difference between ``10001'' and ``101 old'' is 0010.
0". If linear interpolation is performed between four points, it is sufficient to add a difference of "00001" to each point.

従って前夫のとおりパラメータ形式は、”１０００１″
から１０１０１″　捷で”００００１″きざみで補間さ
れる。これを１０進表示に直してみると、６１０″、”
１２’、’１４″、″１６”、′２０”となり、指数関
数的な補間になっているのが判この方法は無駄な変換が
不要なだけでなく、ｂむしち聴感特性上も優れているという効果がある。とい
うのは人間の聴感上は周波数、振幅等が対数的に変化し
ても、それが直線的な変化にしか感じないという特性が
あり、補間を施すにも対数関数的な補間の方が優れてい
るのである。Therefore, as my ex-husband said, the parameter format is "10001"
From 10101", it is interpolated in steps of "00001". When converted to decimal, it becomes 610".
12', '14'', '16', '20'', which is an exponential interpolation.This method not only eliminates unnecessary conversion, but also has excellent auditory characteristics. It has the effect of being there. This is because human hearing has the characteristic that even if frequency, amplitude, etc. change logarithmically, it only feels like a linear change, and logarithmic interpolation is better when interpolating. -ing

第５図に補間用回路の例を示すが、この回路はＶＣＶ結
合部１０の中にあると考えてよい。An example of an interpolation circuit is shown in FIG. 5, and this circuit may be considered to be within the VCV coupling unit 10.

前回の音韻単位の最終振幅パラメータはレジスタ４１に
、また次音韻単位の先頭振幅パフメータはＶジメタ４２
にセットされ、夫々を純２進数として減算器４３にて差
をとる。The final amplitude parameter of the previous phoneme unit is stored in the register 41, and the first amplitude puff meter of the next phoneme unit is stored in the V dimeter 42.
, and the subtracter 43 calculates the difference by treating each as a pure binary number.

一方、音韻単位間の接続時間を決定する手段６からの接
続時間（サンプル点数）でその差を除算器４４で割り、
その商を加算器４５にて前回最終振幅値に加算して行け
ばよい。On the other hand, the difference is divided by the connection time (number of samples) from the means 6 for determining the connection time between phonetic units by the divider 44,
The quotient may be added to the previous final amplitude value by the adder 45.

〔Effect of the invention〕

以上の如く本発明によれば、単純な回路で単純な処理を
行なうことで従来と同等、或いはより優れた特性を得る
ことができ、音声出力装置のコストダウン化に有効であ
る。As described above, according to the present invention, by performing simple processing with a simple circuit, it is possible to obtain characteristics that are equivalent to or better than conventional ones, and are effective in reducing the cost of the audio output device.

[Brief explanation of the drawing]

第１図は規則合成方式の説明図、第２図は本発明の一寮
施例概略ブロック図、第３図はテ゛ジタルフィルりの一
実施例ブロック図、第４図は変換テーブルの一夾施例グ
ロック図、第５図は第１　図躬　２　回Fig. 1 is an explanatory diagram of the rule synthesis method, Fig. 2 is a schematic block diagram of an embodiment of the present invention, Fig. 3 is a block diagram of an embodiment of the digital filter, and Fig. 4 is an illustration of the implementation of a conversion table. Example Glock diagram, Figure 5 is the 1st diagram 2 times

Claims

[Claims]

A plurality of phoneme units are sampled at a predetermined period, at least frequency information or amplitude information is converted into a normalized binary floating point representation, and stored in a format with the first digit below the decimal point deleted, to obtain the desired phoneme unit. In the rule synthesis method that connects and synthesizes speech, when interpolating information in the above format, linear interpolation is performed by regarding the exponent part and decimal point part of the above format with the first digit deleted as the upper digit of a pure binary number. Features an interpolation control method in the rule synthesis method.