JP2910035B2

JP2910035B2 - Speech synthesizer

Info

Publication number: JP2910035B2
Application number: JP63066584A
Authority: JP
Inventors: 英行高木; 紀代原
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1988-03-18
Filing date: 1988-03-18
Publication date: 1999-06-23
Anticipated expiration: 2014-06-23
Also published as: JPH01238697A

Abstract

PURPOSE:To synthesize the voices of various ways of talkings from phonetic symbols and accent type symbols by constituting a meter pattern estimating section of plural linear signal processing sections of multiple inputs-outputs and providing a weight coefft. memory, data input part and weight multiplying and data adding means thereto. CONSTITUTION:Data xi is inputted to an input section 1001 in the multiinputs- output signal processing section and the weight coefft. wi stored in a memory 1002 is multipled by a multiplier 1003 and is totaled by an adder 1004. Further, a threshold processing section 2000 is provided to limit the output of the linear signal processing section to a specified range. The phonetic symbols and accent type symbols of the inputs are converted to the meter pattern of the output by adequately determining the weight coefft. Many sets of the meter patterns corresponding to the phonetic symbols and accent type symbols are prepd. and the learning is repeated by gradually changing the weight coefft. until the meter patterns attains optimum values. The voices having dialectal characteristics and high power of personal expression are synthesized by this constitution.

Description

【発明の詳細な説明】産業上の利用分野本発明は入力した文字列を音声に変換する音声合成装
置に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for converting an input character string into speech.

従来の技術文字列を入力し音声に変換するタイプの音声合成装置
の開発は近年盛んになってきており、試作品が学会・新
聞報道等に発表されるようになってきている。このタイ
プの音声合成装置は、基本的には第１図に示すように、
文字列を発音記号とアクセント型記号に変換する言語処
理部１と、発音記号とアクセント型記号から音声の韻律
パタンを推定する韻律パタン推定部２と、これらの情報
から音声を合成する音声合成部３とからなっている。簡
易な装置では言語処理部１を省略し直接発音記号とアク
セント型記号を入力する場合もある。2. Description of the Related Art In recent years, the development of a speech synthesizer of the type of inputting a character string and converting it into speech has been active, and prototypes have been announced in academic conferences and newspaper reports. This type of speech synthesizer basically has a structure as shown in FIG.
A language processing unit 1 for converting a character string into a phonetic symbol and an accent type symbol, a prosody pattern estimating unit 2 for estimating a prosodic pattern of the voice from the phonetic symbol and the accent type symbol, and a voice synthesizing unit for synthesizing the voice from the information. It consists of three. In some simple devices, the language processing unit 1 may be omitted and phonetic symbols and accent symbols may be directly input.

このような構成をした音声合成装置において、従来の
実現方法を以下に説明する。A conventional method for realizing the speech synthesizer having such a configuration will be described below.

言語処理部１の実現方法は、日本語ワードプロセッサ
の「かな漢字変換技術」と基本的に同じである。即ち、
かな漢字変換技術とは入力されたかな文字列を形態素解
析し、自立語については自立語辞書から読みコードに対
応する漢字コードを検索して漢字かな混じり文にするこ
とである。これに対し言語処理部１での処理は、かな文
字列を形態素解析して発音記号に変換し、自立語につい
ては自立語辞書から漢字コードの代わりに読みに対応す
るアクセント型記号検索して、発音記号＋アクセント型
記号にするのである。言語処理部１への入力が漢字かな
混じり文の場合は、かな漢字変換の場合の自立語辞書の
利用方法と反対に、漢字コードで自立語辞書を検索して
読みコードを取り出してから、３述の発音記号・アクセ
ント型記号に変換することになる。例えば、「私は」→
「わたしは」→「WATASIWA＋平板型」と変換される。ア
クセント型記号とは、例えば日本放送出版協会発行のNH
K編「日本語アクセント辞書」に記述されており、例え
ば、「箸」と「橋」のアクセントは「頭高型」と「尾高
型」とで区別する。The method of realizing the language processing unit 1 is basically the same as the “Kana-Kanji conversion technology” of the Japanese word processor. That is,
The kana-kanji conversion technique is to perform a morphological analysis on an input kana character string and to search for a kanji code corresponding to a reading code from an independent word dictionary from an independent word dictionary to obtain a kanji-kana mixed sentence. On the other hand, in the processing in the language processing unit 1, the kana character string is morphologically analyzed and converted into phonetic symbols, and for an independent word, an accent type symbol corresponding to the reading is searched from the independent word dictionary instead of the kanji code, The pronunciation symbol + accent type symbol is used. If the input to the language processor 1 is a kanji-kana-mixed sentence, contrary to the method of using the independent word dictionary in the case of kana-kanji conversion, the independent word dictionary is searched using the kanji code and the reading code is extracted, and then the three words are read. Will be converted to phonetic symbols and accent symbols. For example, "I am" →
"I am" → "WATASIWA + flat type" is converted. The accent type symbol is, for example, NH published by the Japan Broadcasting Publishing Association.
For example, the accents of "chopsticks" and "bridges" are distinguished by "head-high type" and "Odaka type".

韻律パタン推定部２は言語処理部１で得られた発音記
号＋アクセント型記号から、ピッチパタン、ホルマント
パタン、音韻の持続時間長、音の強弱パタンといった音
声の自然さを表わす韻律パタンを出力する。従来の韻律
パタンの推定方法は、例えば藤崎・須藤「日本語単語ア
クセントの基本周波数パターンとその生成機構モデル」
日本音響学会誌27巻９号昭和46年に示されている数字モ
デルに当てはめるとか、樋口・山本「規則合成実験シス
テムにおける韻律的特徴の制御」日本音響学会昭和61年
春季研究発表会講演論文集２−２−14に示されている経
験的に得られた規則をから推定することで得られる。音
声合成部３は、例えば山本他「音素を合成単位とする音
声規則合成装置の試作」日本音響学会昭和62年春季研究
発表会講演論文集３−６−２に示されている。即ち、言
語処理部１で得られた発音記号から音記号毎に毎に用意
されたホルマント周波数と、韻律パタン推定部２から得
られたピッチ周波数、音韻持続長、強弱とを基に音響パ
ラメータにを作成しホルマント合成器を駆動するのであ
る。The prosody pattern estimation unit 2 outputs a prosody pattern representing the naturalness of speech, such as a pitch pattern, a formant pattern, a duration of a phoneme, and a dynamic pattern of a sound, from the phonetic symbols + accented symbols obtained by the language processing unit 1. . Conventional methods for estimating prosodic patterns are described in, for example, Fujisaki and Sudo, "Basic frequency patterns of Japanese word accents and their generation mechanism models."
Applied to the numerical model shown in Journal of the Acoustical Society of Japan, Vol. 27, No. 9, 1979, Higuchi and Yamamoto, "Control of prosodic features in a rule synthesis experiment system" Proceedings of the Spring Meeting of the Acoustical Society of Japan in 1986. It is obtained by inferring the empirically obtained rules shown in 2-2-14. The speech synthesizer 3 is described in, for example, Yamamoto et al., "Prototype Production of Speech Rule Synthesizer Using Phoneme as Synthesis Unit", Proceedings of the Acoustical Society of Japan, Spring Meeting, 1987, 3-6-2. That is, the acoustic parameters are determined based on the formant frequency prepared for each phonetic symbol from the phonetic symbol obtained by the language processing unit 1 and the pitch frequency, phoneme duration, and strength obtained from the prosodic pattern estimation unit 2. And drive the formant synthesizer.

発明が解決しようとする課題しかしながら上記のような方法に基づく韻律パタン推
定部を持つ構成では、方言の表現や個人性を表現するこ
とが大変困難になる。ピッチパタンの数字モデルにして
も、大阪方言に対応したモデルとか博多方言に対応した
モデルを作成し直すためには専門家が非常に長い時間を
かけて実際の音声データや解析しなければ得られない。
ましてや、経験に基づく規則を得ようとするならば、十
分な規則を得るには多大な時間が必要になる。Ａさんの
話し方、Ｂさんの話し方の韻律パタンを推定しようとし
ても同様である。従って現実的には、代表的な一種類の
韻律パタンに従った一種類の話し方しか合成できないと
いう課題を有していた。However, in a configuration having a prosody pattern estimating unit based on the above method, it is very difficult to express dialects and individuality. Even if the pitch pattern is a numerical model, it can be obtained unless specialists take a very long time to analyze the actual voice data and analysis in order to recreate a model corresponding to the Osaka dialect or a model corresponding to the Hakata dialect. Absent.
Even more, if you want to get rules based on experience, it takes a lot of time to get enough rules. The same applies to estimating the prosody pattern of Mr. A's speech style and Mr. B's speech style. Therefore, in reality, there is a problem that only one type of speech can be synthesized according to one representative type of prosodic pattern.

本発明はかかる点に鑑み、韻律パタンの推定を容易に
することで色々な方言や個人性の音声を合成する音声合
成装置を提供することを目的とする。In view of the above, an object of the present invention is to provide a speech synthesizer that synthesizes various dialects and personality voices by facilitating estimation of a prosody pattern.

課題を解決するための手段本発明は複数の多入力一出力信号処理部を回路網的に
構成した韻律パタン推定部を備えた音声合成装置であ
る。Means for Solving the Problems The present invention is a speech synthesizing apparatus including a prosody pattern estimating unit in which a plurality of multi-input one-output signal processing units are configured in a circuit network.

作用本発明は前記した構成により、発音記号とアクセント
型記号から、韻律パタン推定部が色々な韻律パタンを出
力することで色々な話し方の音声を合成する。Operation According to the present invention, the prosody pattern estimating unit outputs various prosody patterns from phonetic symbols and accent-type symbols, thereby synthesizing various spoken voices.

実施例第２図は本発明における音韻パタン推定部の一般的構
成図を示すものである。第３図はその具体的な構成例を
示す。以下、韻律パタンとしてピッチパタンを推定する
場合の動作を第３図に従って説明する。Embodiment FIG. 2 shows a general configuration diagram of a phoneme pattern estimating unit according to the present invention. FIG. 3 shows a specific configuration example. The operation of estimating a pitch pattern as a prosody pattern will be described below with reference to FIG.

第３図は発音記号とアクセント型記号を入力とし７モ
ーラ分のピッチパタンを推定する３段構成の韻律パタン
推定部２の例であり、各段内相互に結合がなくかつ上段
の層にのみ信号が伝播される。入力となる発音記号とア
クセント型記号を各モーラ毎に13個の入力に分ける。１
つの入力ユニット101について複数の発音記号が割り振
られている場合は、数値の大小で区別する。例えば、該
当しない場合は０、入力モーラ数以外の場合は−１、該
当する場合は（０〜１）を該当ユニットの受持発音記号
数で配分する。例えば、あるモーラが/nya/であれば、
拗音に対して第２ユニットが1/3、/n/に対し第３ユニッ
トが6/8、母音に対して第８ユニットが１で残り10個に
ユニットは０として１モーラ分の入力データを作る。上
位層のユニット100は下位層の全てのユニットから入力
を得る多入力一出力信号処理部であり、第３図に示すよ
うに入力層10は入力層101は91ユニットで構成され、中
間層102は87ユニット、出力層103は84ユニットで構成さ
れている。そして、入力層101の91ユニット101−ｕが中
間層102の87ユニット102−ｕと結合し、さらにその中間
層102の87ユニット102−ｕが出力層103の84ユニット103
−ｕと結合している。FIG. 3 shows an example of a prosody pattern estimating unit 2 having a three-stage configuration for estimating a pitch pattern for 7 moras using a phonetic symbol and an accent-type symbol as inputs. The signal is propagated. The phonetic symbols and accent symbols used as input are divided into 13 inputs for each mora. 1
When a plurality of phonetic symbols are assigned to one input unit 101, they are distinguished by the magnitude of the numerical value. For example, 0 is assigned if not applicable, -1 is assigned other than the number of input mora, and (0-1) is assigned based on the number of phonetic symbols accepted by the unit. For example, if a mora is / nya /,
The second unit is 1/3 for repetitive sounds, the third unit is 6/8 for / n /, the eighth unit is 1 for vowels, and the remaining 10 units are 0 for 1 mora of input data. create. The upper layer unit 100 is a multi-input / one-output signal processing unit for obtaining inputs from all the lower layer units. As shown in FIG. 3, the input layer 10 is composed of 91 units of the input layer 101 and the intermediate layer 102. Is composed of 87 units, and the output layer 103 is composed of 84 units. Then, 91 units 101-u of the input layer 101 are combined with 87 units 102-u of the intermediate layer 102, and 87 units 102-u of the intermediate layer 102 are further combined with 84 units 103 of the output layer 103.
-U.

各結合には予め学習で得られた重みかかけられてい
て、出力層103では１モーラあたり、子音用６点、母音
用６点のピッチの値が得られる。したがって、出力ユニ
ット103−ｕは１モーラあたり12ユニットとなるので12
ユニット／モーラ×７モーラ＝84ユニットの出力が得ら
れる。Each connection is multiplied by a weight obtained by learning in advance, and the output layer 103 obtains pitch values of 6 points for consonants and 6 points for vowels per 1 mora. Therefore, the output unit 103-u has 12 units per mora,
An output of 84 units is obtained per unit / mora × 7 moras.

このような韻律パタン推定部２を構成する多入力一出
力信号処理部100のうち、線形演算のみを基本とする線
形信号処理部の構成を具体的に示したものが第４図であ
る。第４図において、101は多入力一出力信号処理部100
の入力部、1002は入力部1001からの複数入力を重み付け
る重み係数を格納するメモリ、1003はメモリ1002の重み
係数と入力部1001からの入力を各々掛け合わせる乗算
器、1004は乗算器1003各々の出力を足し合わせる加算器
である。つまり、第４図に示す多入力一出力信号処理部
100は入力部1001への入力値をxi、メモリ1002に格納さ
れている重み係数をwiとすれば、ｙ＝Σwixi （１）を計算しているわけである。また、第５図は、韻律パタ
ン推定部２を構成する多入力一出力信号処理部100のう
ち、非線形演算も行う非線形信号処理部の構成を具体的
に示したものである。第５図において、1000は第４図で
説明した線形信号処理部、2000は線形信号処理部の出力
を一定範囲の値に制限する閾値処理部である。閾値処理
部2000の入出力特性例を第６図に示す。例えば、出力を
（0,1）の範囲に制限する閾値処理部2000の入出力特性
は０＝1/（１＋exp（−Ｉ））と数式的に表現できる。ここで、ＩとＯは閾値処理部20
00の入力と出力である。FIG. 4 specifically shows the configuration of the linear signal processing unit based on only the linear operation in the multi-input one-output signal processing unit 100 constituting the prosody pattern estimating unit 2. In FIG. 4, reference numeral 101 denotes a multi-input one-output signal processing unit 100.
1002 is a memory for storing a weighting factor for weighting a plurality of inputs from the input unit 1001, 1003 is a multiplier for multiplying the weighting factor of the memory 1002 by the input from the input unit 1001, and 1004 is a multiplier 1003 This is an adder for adding the outputs of. That is, the multi-input one-output signal processing unit shown in FIG.
100 is calculated as y = xiwixi (1), where xi is the input value to the input unit 1001 and wi is the weight coefficient stored in the memory 1002. FIG. 5 specifically shows a configuration of a non-linear signal processing unit that also performs a non-linear operation in the multi-input / one-output signal processing unit 100 constituting the prosody pattern estimating unit 2. In FIG. 5, reference numeral 1000 denotes a linear signal processing unit described in FIG. 4, and reference numeral 2000 denotes a threshold processing unit for limiting the output of the linear signal processing unit to a value within a certain range. FIG. 6 shows an example of input / output characteristics of the threshold processing unit 2000. For example, the input / output characteristics of the threshold processing unit 2000 for limiting the output to the range of (0, 1) can be mathematically expressed as 0 = 1 / (1 + exp (-I)). Here, I and O are threshold processing units 20
00 input and output.

以上の構成をした韻律パタン推定部２は最終的に求め
るべきピッチパタンを出力する。他の韻律パタンについ
ても全く同様にして推定できる。The prosody pattern estimation unit 2 having the above configuration outputs a pitch pattern to be finally obtained. Other prosodic patterns can be estimated in exactly the same way.

では、次になぜ上述のような構成をすれば韻律パタン
が推定できるか、さらに上述のような構成をした場合ど
のようにすれば韻律パタンが精度よく推定できるかを説
明する。韻律パタン推定部２は多段の回路網構成をして
いる。韻律パタン推定部２の入力と出力の関係はメモリ
1002に格納されている重み係数にのみ依存している。当
然のことながら、入力の発音記号・アクセント型記号と
出力の韻律パタンとは強い相関があるので、この重み係
数を適当に決定できれば、入力の発音記号・アクセント
型記号を出力の韻律パタンに変換することは可能にな
る。これが、韻律パタンを推定できる理由である。Next, a description will be given of why a prosody pattern can be estimated with the above configuration, and how a prosody pattern can be accurately estimated with the above configuration. The prosody pattern estimation unit 2 has a multistage circuit network configuration. The relationship between the input and output of the prosody pattern estimator 2 is stored in memory
It only depends on the weighting factor stored in 1002. Naturally, there is a strong correlation between the phonetic symbols / accented symbols of the input and the prosodic pattern of the output, so if this weighting factor can be determined appropriately, the phonetic symbols / accented symbols of the input will be converted to the prosody pattern of the output. It is possible to do. This is the reason that the prosody pattern can be estimated.

第２の問題である「どのようにすれば韻律パタンが精
度よく推定できるか」は「どのようにすれば適当な重み
係数が決定できるか」という問題に具体化することがで
きる。この問題は、任意の重み係数から徐々に変化させ
て行き、発音記号・アクセント型記号から韻律パタンを
最適な値になるまで学習を繰り返すことで解決できる。
このような学習アルゴリズムには、例えば、backpropag
ation（D.E.Rumelhart,G.E.Hint on and R.J.Williams"
Learning Representations by Back−Propagating Erro
rs,"Nature,vol.323,pp.533−536,Oct.9,1986）があ
る。数学的な証明は参考文献に譲るが、学習データとし
て発音記号・アクセント型記号と、他の手段で推定した
この発音記号・アクセント型記号に対応した韻律パタン
とを組にして多数用意しておき、両者を入力と出力とし
てこの入出力関係をbackpropagationアルゴリズムで繰
り返し学習させるのである。The second problem "how to accurately estimate the prosodic pattern" can be embodied into the problem "how to determine an appropriate weighting factor". This problem can be solved by gradually changing an arbitrary weighting coefficient and repeating learning from phonetic symbols / accented symbols until the rhythm pattern reaches an optimal value.
Such learning algorithms include, for example, backpropag
ation (DERumelhart, GEHint on and RJWilliams "
Learning Representations by Back-Propagating Erro
rs, "Nature, vol. 323, pp. 533-536, Oct. 9, 1986). Mathematical proofs are given in the references, but phonetic symbols / accented symbols are used as training data, and other means are used. A large number of prosodic patterns corresponding to the estimated phonetic symbols and accent type symbols are prepared as a set, and the input and output are used as input and output, and the input / output relationship is repeatedly learned by a backpropagation algorithm.

実験的に得られた結果を示す。実験では韻律パタン推
定部２は第３図のように、入力層・中間層・出力層の３
層からなる回路網構成にして、ピッチパタンを推定する
ように学習した。ただし、各層に於ける多入力一出力信
号処理部100は各々84個と87個の非線形信号処理部とし
た。従って、メモリ２に格納されている重み係数は（84
個×87個）＋（87個×91個）である。韻律パタン推定部
２の入力と出力を第７図に示す。第７図においてドット
点が実測ピッチ、線が韻律パタン推定部２の出力であ
る。発音記号・アクセント型記号を入力するだけで実測
に非常に近いピッチパタンを推定できている。つまり、
この話者の方言性や個人性が非常によく表現できること
を示している。The results obtained experimentally are shown. In the experiment, the prosody pattern estimating unit 2 has three input layers, intermediate layers, and output layers as shown in FIG.
We learned to construct a circuit network consisting of layers and estimate the pitch pattern. However, the multi-input one-output signal processing units 100 in each layer were 84 and 87 non-linear signal processing units, respectively. Therefore, the weight coefficient stored in the memory 2 is (84
X 87) + (87 x 91). FIG. 7 shows the inputs and outputs of the prosodic pattern estimation unit 2. In FIG. 7, the dot points are the actually measured pitches, and the lines are the outputs of the prosodic pattern estimation unit 2. Just by inputting phonetic symbols and accent symbols, pitch patterns that are very close to actual measurements can be estimated. That is,
This shows that the speaker's dialect and personality can be expressed very well.

このように本実施例によれば、少なくともネットワー
ク接続された複数の多入力一出力信号処理部から構成さ
れる韻律パタン推定部を設けることにより、発音記号・
アクセント記号から対応する韻律パタンを推定すること
ができる。As described above, according to the present embodiment, by providing the prosodic pattern estimating unit including at least a plurality of multi-input / one-output signal processing units connected to the network, the phonetic symbols /
The corresponding prosodic pattern can be estimated from the accent marks.

なお、実施例における韻律パタン推定部２では上位層
の多入力一出力信号処理部は下位層の全てのユニットに
結合していたが、１つ残らず結合することが本質ではな
いので、部分的な結合であっても構わない。In the prosody pattern estimating unit 2 in the embodiment, the multi-input / one-output signal processing unit in the upper layer is connected to all the units in the lower layer. However, it is not essential that all the units are connected. Any combination may be used.

また、実施例におけるメモリ２に格納されている重み
係数の数はユニット数の結み合あわせ数であったが、多
入力一出力信号処理部100にはその他に常に１が重み付
きで入力されていてもよい。この場合はメモリ２に格納
される重み係数の数がユニット分だけ増える。この常に
１なる入力は（１）式をｙ＝w0＋Σwixi （２）と変形する。つまり、（１）式の常に原点を通るという
制約を無くす分けで、より表現能力が大きくなる。即
ち、韻律パタン推定部２の推定能力がより向上し得る。Further, although the number of weighting factors stored in the memory 2 in the embodiment is the number of combinations of the number of units, 1 is always input to the multi-input / one-output signal processing unit 100 with a weight. May be. In this case, the number of weighting factors stored in the memory 2 increases by the unit. This input of 1 always transforms equation (1) into y = w0 + Σwixi (2). In other words, the expression ability is further increased by eliminating the restriction that the formula (1) always passes through the origin. That is, the estimation ability of the prosody pattern estimation unit 2 can be further improved.

発明の効果以上説明したように、本発明によれば、多入力一出力
信号処理部からなる信号処理網で韻律パタン推定部を構
成することによって容易に韻律パタンを推定することが
でき、それがために、方言性や個人性の表現力に富んだ
音声を合成することができるので、その実用的価値には
大なるものがある。Effect of the Invention As described above, according to the present invention, a prosody pattern can be easily estimated by configuring a prosody pattern estimation unit in a signal processing network including a multi-input one-output signal processing unit. For this reason, it is possible to synthesize speech that is rich in dialect and personality, so that its practical value is great.

[Brief description of the drawings]

第１図は本発明の一実施例の音声合成装置の構成図、第
２図および第３図は同装置における韻律パタン推定部の
構成図、第４図は同装置における線形信号処理部の構成
図、第５図は同装置における非線形信号処理部の構成
図、第６図は同装置における閾値処理部の入出力特性
図、第７図は同装置における韻律パタン推定部が推定し
たピッチパタンの例図である。１……言語処理部、２……韻律パタン推定部、３……音
声合成部、100……多入力一出力信号処理部、101……韻
律パタン推定部の入力部、1000……線形信号処理部、10
01……多入力一出力信号処理部の入力部、1002……メモ
リ、1003……乗算器、1004……加算器、2000……閾値処
理部。FIG. 1 is a block diagram of a speech synthesizer according to one embodiment of the present invention, FIGS. 2 and 3 are block diagrams of a prosody pattern estimating unit in the device, and FIG. 4 is a configuration of a linear signal processing unit in the device. FIG. 5, FIG. 5 is a configuration diagram of a non-linear signal processing unit in the device, FIG. 6 is an input / output characteristic diagram of a threshold processing unit in the device, and FIG. 7 is a pitch pattern estimated by a prosody pattern estimating unit in the device. It is an example figure. DESCRIPTION OF SYMBOLS 1 ... Language processing part, 2 ... Prosody pattern estimation part, 3 ... Speech synthesis part, 100 ... Multi-input one-output signal processing part, 101 ... Input part of prosody pattern estimation part, 1000 ... Linear signal processing Part, 10
01: Input section of the multi-input / one-output signal processing section, 1002: Memory, 1003: Multiplier, 1004: Adder, 2000: Threshold processing section.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭63−46498（ＪＰ，Ａ) 特開昭61−259295（ＪＰ，Ａ) Ｔ．Ｊ．Ｓｅｊｎｏｗｓｋｉ，Ｃ. Ｒ．Ｒｅｓｅｎｂｅｒｇ ”ＰａｒａｌｌｅｌＮｅｔｗｏｒｋｓｔｈａｔＬｅａｒｎｔｏＰｒｏｎｏｎｃｅＥｎｇｌｉｓｈＴｅｘｔ”，Ｃｏｍｐｌｅｘ，Ｓｙｓｔｅｍ，１（1987）ｐ. 145−168 高木英行、原紀代ニューラルネットを用いたプロソディーの制御（1988）日本音響学会昭和63年度春季研究発表会講演論文集−▲Ｉ▼− (58)調査した分野(Int.Cl.⁶，ＤＢ名) G10L 3/00 G10L 5/02 G10L 5/04 G06F 15/18 560 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-63-46498 (JP, A) JP-A-61-259295 (JP, A) J. Sejnowski, C.R. Resenberg, "Parallel Networks That Learn to Prompt English Text", Comp lex, System, 1 (1987) p. 145-168 Hideyuki Takagi, Prosody Control Using Proto-Era Neural Networks (1988) 2008 Spring Meeting Presentation Papers-▲ I ▼-(58) Fields surveyed (Int.Cl. ⁶ , DB name) G10L 3/00 G10L 5/02 G10L 5/04 G06F 15/18 560

Claims

(57) [Claims]

1. A language processing unit (1) for outputting phonetic symbols and accent types for each mora from an input character string, and an output from the language processing unit (1) as an input to estimate and output a pitch pattern. A pitch pattern estimating unit (2), and an output from the language processing unit (1) and an output from the pitch pattern estimating unit (2) as inputs, and a voice synthesizing unit (3) that synthesizes a voice signal.
And a pitch pattern estimating unit (2) comprising at least an input layer (10
1) a signal processing network including an intermediate layer (102) and an output layer (103). The input layer (101) includes a plurality of input layer units (101-u), and receives a signal from the language processing unit (1). Is input to the input layer unit (101-u), and the intermediate layer (102) includes a plurality of intermediate layer units (102-u). , Weighting processing, addition processing, and threshold processing for limiting the numerical value output from the input layer unit (101-u) to a certain range value are performed, and the output unit (103
-U), the output layer (103) includes a plurality of output layer units (103-u), and receives a value output from the intermediate layer unit (102-u) as an input and forms a pitch pattern forming a pitch pattern. A speech synthesizer for outputting a value.