JPH06214585A

JPH06214585A - Voice synthesizer

Info

Publication number: JPH06214585A
Application number: JP5006010A
Authority: JP
Inventors: Takaaki Arai; 孝章新居; Hiroyuki Tsuboi; 宏之坪井; Yasuki Yamashita; 泰樹山下
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-01-18
Filing date: 1993-01-18
Publication date: 1994-08-05

Abstract

PURPOSE:To reduce a user's work load while he uses voice media of a computer by providing a voice rule synthesis means which controls rhythm associated with a freely varying uttered voice speed of natural voice while interacting with the computer or in other computer usages. CONSTITUTION:The synthesizer consists of a language processing section 12 which extracts language information from the content information of voices to be synthesized, a uttered voice speed deciding section 22 which decides the speed of uttered voice, a phenome processing section 14 which generates phenome information from the language information, a rhythm processing section 16 which generates rhythm information, a rhythm processing controlling section 26 which controls a rhythm processing mark generating section 24 based on the uttered voice speed outputted from the section 22 in the section 16, a synthesis parameter generating section 18 which generates synthesis parameters from the section 24, which generates rhythm marks, the phenome information and the rhythm information and a voice waveform generating section 20 which generates voice waveforms from the synthesis parameters.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、一定の規則に基づいて
音声を合成する音声合成装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice synthesizing apparatus for synthesizing voice based on a certain rule.

【０００２】[0002]

【従来の技術】近年、文字、音声、図形、映像などのマ
ルチメディアを入力、出力および加工することで、人間
とコンピュータとの対話（Ｈｕｍａｎ−Ｃｏｍｐｕｔｅ
ｒＩｎｔｅｒａｃｔｉｏｎ）を様々な形態で行う研究
が行なわれている。2. Description of the Related Art In recent years, human-computer interaction (Human-Compute) has been performed by inputting, outputting and processing multimedia such as characters, voices, figures and images.
r Interaction) has been studied in various forms.

【０００３】人間同士の対話に用いられるメディアの中
で、音声は内容を伝える言語的情報だけでなく、感情な
どの情緒的情報や男性／女性などの個人性情報を含み、
様々な情報を手軽に自由に伝えることができる特徴があ
る。人間とコンピュータとの対話においてもこのような
情報を伝達できることは自然な対話を行なうために重要
である。In the media used for human-to-human dialogue, voice includes not only linguistic information that conveys content, but also emotional information such as emotions and personality information such as men / women.
It has the feature that various information can be easily and freely transmitted. Being able to convey such information is also important for a natural dialogue in a human-computer dialogue.

【０００４】一方、最近になってメモリ容量や計算機の
パワーが飛躍的に向上したことでマルチメディアを扱え
るワークステーションやパーソナルコンピュータが開発
され、高品質の音声や音響信号が特別なハードウエアな
しに標準システムで入出力できるようになってきた。こ
のような状況から計算機の入出力メディアとして音声入
出力技術は重要なものとなっている。On the other hand, recently, workstations and personal computers capable of handling multimedia have been developed due to dramatic improvements in memory capacity and computer power, and high quality voice and sound signals can be obtained without special hardware. I / O has become possible with the standard system. Under these circumstances, voice input / output technology has become important as an input / output medium for computers.

【０００５】従来から音声入出力技術の中でも、音声合
成技術と呼ばれる任意の文章あるいは単語の文字を音声
に変換する技術の開発が行なわれてきている。現在で
は、音声規則合成専用のハードウエアが開発され、テキ
ストの読み合わせなどの応用が広がりつつある。このよ
うな技術により、メディア変換として言語的な情報であ
る文字を音声信号に変換することが可能となってきた。
しかしながら、実際に発声された自然音声を編集合成す
る合成器の音声出力に比較して、現在の規則合成音声は
不自然であり、内容の理解が困難である。Among the voice input / output techniques, a technique for converting characters of an arbitrary sentence or word into voice has been developed, which is called voice synthesis technique. Currently, hardware dedicated to speech rule synthesis has been developed, and its applications such as text reading are expanding. By such a technique, it has become possible to convert characters, which are linguistic information, into audio signals as media conversion.
However, compared with the voice output of a synthesizer that edits and synthesizes the actually uttered natural voice, the current rule-synthesized voice is unnatural and the content is difficult to understand.

【０００６】これは規則合成の音声では音素の接続や持
続時間、あるいはピッチの変化などを規則として表現
し、入力された文字列の解析結果から規則に基づいて音
声を合成するためである。音質の低下をまねく原因とし
ては合成パラメータの品質によるものやアクセント・イ
ントネーションの制御によるものがあげられる。This is because, in the case of rule-synthesized speech, the connection and duration of phonemes, the change in pitch, etc. are expressed as rules, and the speech is synthesized based on the rules from the analysis result of the input character string. The cause of the deterioration of sound quality is due to the quality of synthesis parameters and the control of accent and intonation.

【０００７】韻律を制御する方式としては、日本語の平
叙文や疑問文などのイントネーションの制御方式として
音節ごとの点ピッチ情報を線形に補間して韻律制御する
方式（箱田「文章音声合成におけるピッチパラメータ制
御法の検討」日本音響学会音声研究会資料、ＳＰ８８−
７（１９８８））の検討が行なわれている。また、音節
間のピッチの変化を線形補間ではなく時間変化のモデル
を用いた制御方式として、臨界制動二次線形系による
「基本周波数パタン生成過程モデル」（広瀬、藤崎、河
井、山口「基本周波数パタ−ン生成過程モデルに基づく
文書音声の合成」電子情報通信学会論文誌、Ｖｏｌ．Ｊ
７２−Ａ、Ｎｏ．１、ｐｐ．３２−４０（１９８９−
１））の検討が行なわれている。As a method of controlling prosody, a method of linearly interpolating point pitch information for each syllable as a control method of intonation such as Japanese hiragana or interrogative sentence (Hakoda, "Pitch in sentence speech synthesis" Examination of parameter control method "Acoustical Society of Japan Material, SP88-
7 (1988)). In addition, as a control method that uses a temporal change model instead of linear interpolation for the pitch change between syllables, a "fundamental frequency pattern generation process model" by a critical damping quadratic linear system (Hirose, Fujisaki, Kawai, Yamaguchi "fundamental frequency" Synthesis of Document Speech Based on Pattern Generation Process Model ", The Institute of Electronics, Information and Communication Engineers, Vol. J.
72-A, No. 1, pp. 32-40 (1989-
1)) is under study.

【０００８】さらに文字音声変換のような合成音声の単
調さや機械的な感じを取り除くために、合成音声中の局
所的な強調や弱めのある音声を合成することを目的とし
て、韻律情報と呼ばれるピッチ（基本周波数）、振幅、
強調や弱めに伴う継続時間などを制御する方式（武田、
市川「日本語文音声のプロミネンス生成規則の作成と評
価」日本音響学会誌、ｖｏｌ４７、Ｎｏ．６、ｐｐ−３
９７−４０４（１９９１））が検討されている。Further, in order to remove monotonousness and mechanical feeling of synthesized speech such as character-to-speech conversion, pitch called prosodic information is used for the purpose of synthesizing locally emphasized or weakened speech in the synthesized speech. (Fundamental frequency), amplitude,
Method to control duration such as emphasis and weakness (Takeda,
Ichikawa "Creation and Evaluation of Prominence Generation Rule for Japanese Sentences", Journal of Acoustical Society of Japan, vol 47, No. 6, pp-3
97-404 (1991)) is under consideration.

【０００９】[0009]

【発明が解決しようとする課題】上記の各種の制御方式
により文字音声変換における韻律制御の検討がなされ、
合成音声の品質が改善されてきた。しかしながら、人間
の自然音声による対話や読み上げでは発声速度が様々に
変化し、丁寧にゆっくりと発声したり、早く発声するな
ど発声速度に緩急をつけた発声が行なわれている。すな
わち、発声する文や文章全体にわたって一定の速度で発
声されず、発声文中の韻律の制御が部分的に異なってき
ている。また、同一の文章を発声する場合でも、速度に
よってイントネーションなどの韻律の制御は異なり、発
声速度に応じて韻律が変化する。DISCLOSURE OF THE INVENTION Prosody control in character-to-speech conversion has been studied by the above various control methods.
The quality of synthetic speech has improved. However, the utterance speed changes variously in human conversation and reading with natural voice, and the utterance speed is gradually changed by slowly or quickly uttering politely. That is, the utterance and the whole sentence are not uttered at a constant speed, and the control of the prosody in the uttered sentence is partially different. Further, even when the same sentence is uttered, control of prosody such as intonation differs depending on the speed, and the prosody changes according to the utterance speed.

【００１０】例えば、「彼があの大きな車を運転する」
という文を発声した場合、句音声区間は以下のようにな
る。句音声区間とは、一息で発話される区間のことであ
る。図２における実線は音声の基本周波数パターンのフ
レーズ成分を示したものである。これは男性被験者の発
声した音声の句音声区間を抽出したものである。図２の
（ｂ）は７モーラ／ｓｅｃで発声したものである。ここ
で、「／」は句音声区間の境界を表す。For example, "he drives that big car"
When the sentence is uttered, the phrase voice section is as follows. The phrase voice section is a section in which one breath is spoken. The solid line in FIG. 2 shows the phrase component of the fundamental frequency pattern of the voice. This is an extraction of a phrase voice section of a voice uttered by a male subject. FIG. 2 (b) shows the case of uttering at 7 mora / sec. Here, "/" represents the boundary of the phrase voice section.

【００１１】[0011]

【数１】次に、発声速度を速くした場合、例えば１０モーラ／ｓ
ｅｃの発声速度では、図２の（ａ）のような句音声区間
を構成して発声している。[Equation 1] Next, when the vocalization speed is increased, for example, 10 mora / s
At the speech production speed of ec, the phrase speech section as shown in FIG.

【００１２】[0012]

【数２】また、発声速度を遅くした場合、例えば、発声速度を４
モーラ／ｓｅｃでは、図２の（ｃ）のように、[Equation 2] Also, when the speaking rate is slowed down, for example, the speaking rate is set to 4
In mora / sec, as shown in (c) of FIG.

【数３】となる。[Equation 3] Becomes

【００１３】このように、発声速度によって句音声区間
の区分が変化し、自然で滑らかな発声を行なっている。As described above, the segment of the phrasal voice section changes depending on the utterance speed, and natural and smooth utterance is performed.

【００１４】ところが、従来の音声合成器では文字音声
変換を目的として文書の読み上げなどの分野を対象とし
たものであり、設定された発声する速度に関わらず読み
上げる文章内容によって韻律制御の方法が定まってお
り、設定された発声速度によって読み上げる韻律制御が
変化することはなく、合成音声の発声速度の変化に対し
て、音韻については音韻継続時間とポーズ長の制御を変
化させ、韻律については線形に時間変化させるのみであ
った。このために、設定する発声速度を早く、あるいは
遅く設定した場合に不自然な合成音声になることは避け
られなかった。また、対話における音声規則合成などで
発声速度の変化を伴う自然な音声を規則合成することは
できなかった。However, the conventional speech synthesizer is intended for a field such as reading out a document for the purpose of character-to-speech conversion, and the prosody control method is determined by the sentence content to be read out regardless of the set utterance speed. Therefore, the prosody control that is read aloud does not change depending on the set utterance speed.In response to changes in the utterance speed of synthetic speech, the control of phonological duration and pause length is changed for phonology and linear for prosody. It only changed over time. For this reason, it is unavoidable that unnatural synthetic speech occurs when the speaking rate to be set is set high or low. In addition, it was not possible to synthesize a natural voice accompanied by a change in utterance speed by voice rule synthesis in dialogue.

【００１５】実際の自然音声では上述した発声速度に応
じた韻律の制御が行なわれており、ポーズを挿入しフレ
ーズ区分が変化する場合がある。計算機との対話や他の
様々な利用における発声速度の変化に伴う韻律情報の制
御が可能な音声合成器の実現により、自然で分かり易い
音声を合成することが可能となり、利用者の負担が軽減
される。In actual natural speech, prosody is controlled according to the above-mentioned vocalization speed, and a phrase may be changed by inserting a pause. Realization of a speech synthesizer capable of controlling prosodic information associated with changes in vocalization rate during dialogue with a computer and various other uses makes it possible to synthesize natural and easy-to-understand speech, reducing the burden on the user. To be done.

【００１６】本発明は、上記事情に鑑みてなされたもの
で、自然音声がもつ緩急自在な音声の発声速度に伴う韻
律情報の制御を可能とする音声規則合成の実現手段を提
供することを目的とする。The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a means for realizing voice rule synthesis that enables control of prosody information associated with the vocalization speed of a natural voice that is slow and fast. And

【００１７】[0017]

【課題を解決するための手段】本発明の請求項１記載の
音声合成装置は、合成する音声の内容情報から言語情報
を抽出する言語処理部と、言語情報より音韻情報を生成
する音韻処理部と、韻律情報を生成する韻律処理部と、
生成された音韻情報、韻律情報より合成パラメータを生
成する合成パラメータ生成部と、合成パラメータより音
声波形を生成する音声波形生成部を具備したものであっ
て、発声速度を決定する発声速度決定部と、発声速度決
定部から出力された発声速度情報に基づき韻律処理部を
制御する韻律処理制御部を具備する。According to a first aspect of the present invention, there is provided a speech synthesizing apparatus for extracting linguistic information from content information of speech to be synthesized, and a phonological unit for generating phonological information from the linguistic information. And a prosody processing unit that generates prosody information,
A synthesizing parameter generating unit that generates a synthesizing parameter from the generated phonological information and prosody information, and a voice waveform generating unit that generates a voice waveform from the synthesizing parameter, and a utterance speed determining unit that determines a utterance speed. A prosody processing control unit that controls the prosody processing unit based on the vocalization speed information output from the vocalization speed determination unit.

【００１８】請求項２記載の音声合成装置は、請求項１
のものにおいて、発声速度決定部から出力された発声速
度情報に基づいて話調成分の区分化を制御する韻律処理
制御部を具備する。A speech synthesizing apparatus according to a second aspect of the present invention is a speech synthesizing apparatus according to the first aspect.
The prosody processing control unit for controlling the segmentation of the speech tone component based on the speaking rate information output from the speaking rate determining unit.

【００１９】[0019]

【作用】上記装置であると、発声速度の異なる音声を
規則に基づいて合成する場合に、従来の音声規則合成器
では実現できなかったイントネーションなどの韻律の制
御において、合成音声の発声速度に依存した制御が可能
となり、人間の自然音声による対話や読み上げでは発声
速度が様々に変化し、丁寧にゆっくりと発声したり、早
く発声するなど発声速度に緩急をつけた音声の合成が可
能となる。[Operation] With the above device, when synthesizing voices with different utterance speeds based on rules, in the control of prosody such as intonation which cannot be realized by the conventional voice rule synthesizer, the utterance speed of the synthesized voice is changed. Dependent control is possible, and speaking speed changes variously in human conversation and reading by natural voice, and it becomes possible to synthesize speech with slow and fast speaking speed, such as slowly and carefully speaking. .

【００２０】すなわち、合成する文が同じでも発声速度
に応じてイントネーションなどの韻律の制御において、
１つの話調成分をもつ話調単位の区分化を変化させるこ
とが可能となる。That is, even if the sentences to be synthesized are the same, in controlling the prosody such as intonation according to the speaking speed,
It is possible to change the segmentation of the tone unit having one tone component.

【００２１】そのため、従来の音声合成器が目的として
いた文字音声変換や文書の読み上げなどの分野を対象と
したものばかりでなく、発声速度が早く、あるいは遅く
なるような人間とコンピュータとの対話における音声規
則合成などで発声速度の変化を伴う自然な音声を規則合
成することが可能となる。[0021] Therefore, not only is it intended for the fields such as character-to-speech conversion and reading of a document which the conventional speech synthesizer has aimed at, but also in a dialogue between a human and a computer in which the speaking speed becomes fast or slow. It becomes possible to regularly synthesize a natural voice accompanied by a change in utterance speed by voice synthesis.

【００２２】このように、計算機との対話や他の様々な
利用における発声速度の変化に伴う韻律情報の制御が可
能な音声合成器の実現により、自然で分かり易い音声を
計算機により合成することが可能となり、計算機の音声
メディアを利用する場合の利用者の負担が軽減される。As described above, by realizing a speech synthesizer capable of controlling prosody information associated with a change in utterance speed in dialogue with a computer and various other uses, it is possible to synthesize a natural and easily understandable speech by a computer. This makes it possible to reduce the burden on the user when using the computer voice media.

【００２３】[0023]

【実施例】以下、本発明の一実施例を図面に基づいて説
明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００２４】本実施例の文音声規則合成装置１０は、図
１に示すように、言語処理部１２、音韻処理部１４、韻
律処理部１６、合成パラメータ生成部１８、音声波形生
成部２０、発声速度決定部２２よりなり、さらに韻律処
理部１６は、韻律処理記号生成部２４と韻律処理制御部
２６とから構成される。As shown in FIG. 1, the sentence-speech rule synthesizing device 10 of this embodiment has a language processing unit 12, a phoneme processing unit 14, a prosody processing unit 16, a synthesis parameter generating unit 18, a speech waveform generating unit 20, and a utterance. The prosody processing unit 16 includes a speed determination unit 22, and the prosody processing unit 16 includes a prosody processing symbol generation unit 24 and a prosody processing control unit 26.

【００２５】すなわち、音声規則合成装置１０は、合成
出力しようとする音声の発声内容情報を言語処理部１２
で処理し、合成処理に必要な情報を決定し、音韻処理部
１４、韻律処理部１６で韻律成分を付加した音韻記号列
を生成する。この記号列から合成パラメータ生成部１８
で合成パラメータ系列を生成し、音声波形生成部２０で
出力波形を生成する。本合成装置１０は、入力情報とし
て発声速度や発声内容情報、および言語処理部１２から
の情報などから発声速度を決定する発声速度決定部２２
と、音韻処理部１４において、発声速度決定部２２から
出力された発声速度情報から韻律処理を制御する韻律処
理制御部２６を具備したものであり、韻律処理制御部２
６からの情報を韻律処理記号生成部２４に渡すことによ
り、発声速度に応じた自然な音声を規則合成することが
可能となる。That is, the speech rule synthesizing device 10 extracts the utterance content information of the speech to be synthesized and output from the language processing section 12.
Processing is performed to determine information necessary for the synthesis processing, and the phoneme processing unit 14 and the prosody processing unit 16 generate a phoneme symbol string to which a prosody component is added. From this symbol string, the synthesis parameter generation unit 18
Generates a synthesis parameter sequence, and the voice waveform generation unit 20 generates an output waveform. The synthesizing device 10 has a utterance speed determining unit 22 that determines the utterance speed from the utterance speed and utterance content information as input information, the information from the language processing unit 12, and the like.
The phonological processing unit 14 is provided with a prosody processing control unit 26 that controls prosody processing based on the vocalization speed information output from the vocalization speed determination unit 22.
By passing the information from No. 6 to the prosody processing symbol generation unit 24, it becomes possible to regularly synthesize natural speech according to the utterance speed.

【００２６】次に、各処理部について詳細に説明する。Next, each processing unit will be described in detail.

【００２７】言語処理部１２は、文字音声変換におい
て、入力された漢字仮名交じり文を解析し、音声処理に
必要な情報である、語、文節の境界、漢字の読み、単語
のアクセント、係り受け、および品詞、活用形などを決
定する。The language processing unit 12 analyzes the input Kanji / Kana mixed sentence in the character-to-speech conversion, and is information necessary for voice processing, such as words, boundary of clauses, reading of Kanji, accent of words, and dependency. ,, part of speech, inflection, etc.

【００２８】例えば、言語処理部１２において形態素解
析、構文解析、意味解析などを行うことにより、音声処
理で必要な情報、すなわち語、文節の境界、漢字の読
み、単語のアクセント、係り受け、品詞、活用形などの
情報を出力する。For example, by performing morphological analysis, syntactic analysis, semantic analysis, etc. in the language processing unit 12, information necessary for voice processing, that is, word / segment boundaries, kanji reading, word accent, dependency, part-of-speech , Output information such as inflections.

【００２９】形態素解析では、入力文を単語単位に分割
し、それぞれの単語の品詞や活用情報などの文法情報
と、語、文節の境界などを得ることができる。また、形
態素解析における辞書検索結果から漢字の読みや、単語
のアクセントなどを得ることができる。ここでは、辞書
には各単語の読み、アクセントが付加されているものを
用いる。In the morphological analysis, an input sentence can be divided into word units, and grammatical information such as the part of speech and utilization information of each word and boundaries between words and phrases can be obtained. In addition, it is possible to obtain kanji readings, word accents, etc. from the dictionary search results in the morphological analysis. Here, a dictionary to which each word is read and accented is used.

【００３０】構文解析では、形態素解析で得られた語、
文節に対して、構文木などを作成して解析し、各語、文
節間の係り受けや文節間の結合度の情報などを求めるこ
とができる。例えば、先に用いた例文に対して構文解析
を行なうと、図３に示すような構文情報などが得られ
る。In the syntactic analysis, the words obtained by the morphological analysis,
It is possible to create a syntax tree or the like for a bunsetsu and analyze it to obtain information on each word, the dependency between bunsetsus, and the degree of connection between bunsetsus. For example, if the previously used example sentence is syntactically analyzed, the syntactic information as shown in FIG. 3 is obtained.

【００３１】意味解析では、構文解析で得られる単語の
意味属性を含んだ解析結果から、１つ１つの文の意味構
造を出力することができる。In the semantic analysis, the semantic structure of each sentence can be output from the analysis result containing the semantic attributes of words obtained by the syntactic analysis.

【００３２】一方、概念から合成する音声規則合成にお
いては、言語処理部１２は合成する内容の概念的な表現
で表した解析を行ない、統語境界、品詞、読み、アクセ
ントなどの情報を出力する。具体的には語の係り受け処
理、述語を中心とした各関係処理、術語の文法処理、文
間の接続処理を行ない、上述した情報を得る。On the other hand, in the speech rule synthesis which synthesizes from the concept, the language processing unit 12 analyzes the content to be synthesized by a conceptual expression and outputs information such as a syntactic boundary, a part of speech, a reading, and an accent. Specifically, word dependency processing, relational processing centering on predicates, grammar processing of terms, and connection processing between sentences are performed to obtain the above-mentioned information.

【００３３】発声速度決定部２２は、韻律処理部１６で
発声速度に応じた韻律制御を行なうための発声速度情報
を決定する。The utterance rate determining section 22 determines utterance rate information for the prosody processing section 16 to perform prosody control according to the utterance rate.

【００３４】発声速度決定部２２への入力は、外部から
発声速度を入力したり、また発声内容情報中に発声速度
の指定を含める等、言語処理部１２から出力された情報
を入力できる。As the input to the utterance rate determining unit 22, information output from the language processing unit 12 can be input, such as inputting the utterance rate from the outside or including specification of the utterance rate in the utterance content information.

【００３５】そして、発声速度決定部２２における発声
速度の決定には、外部から発声速度を入力することによ
り決定したり、または、発声内容情報中において発声速
度を指定するものが含まれていることにより発声速度を
決定する。また言語処理部１２から得られる情報から発
声速度を決定するために、例えば構文的な情報から埋め
込み文の部分は他の部分より早めに発声したり、意味的
に重要でない部分を早めに発声するように速度を設定す
る。Then, the utterance speed determination unit 22 determines the utterance speed by inputting the utterance speed from the outside or by specifying the utterance speed in the utterance content information. Determines the speaking rate. Further, in order to determine the utterance speed from the information obtained from the language processing unit 12, for example, from the syntactic information, the embedded sentence portion is uttered earlier than other portions, or the semantically insignificant portion is uttered earlier. To set the speed.

【００３６】このように、発声速度を決定することによ
り、韻律処理部１６で発声速度に応じた制御を行なうた
めの情報を出力する。By thus determining the utterance speed, the prosody processing section 16 outputs information for performing control according to the utterance speed.

【００３７】音韻処理部１４は、言語処理部１２より出
力された漢字の読みや辞書項目などの情報から単語の音
韻属性などを生成し、これに韻律処理部１６からの出力
であるフレーズ記号やポーズ記号及び意味的な切れ目な
どを示す韻律情報などを用いて鼻音化や無声化、連濁と
いった一般に良く知られた音韻規則に従って、合成文の
各単語の読みを決定し、単音記号列を出力する。また、
韻律処理部１６でのアクセント型の決定のために必要な
単音音韻属性情報などを韻律処理部１６に出力する。The phonological processing unit 14 generates phonological attributes of words from the information such as the reading of Kanji and dictionary items output from the language processing unit 12, and the phrase symbols output from the prosody processing unit 16 and the like. According to the well-known phonological rules such as nasalization, devoicing, and rendaku using the prosodic information indicating pause marks and semantic breaks, etc., the reading of each word in the synthesized sentence is determined and a monophonic symbol string is output. . Also,
The prosodic processing unit 16 outputs monophonic phoneme attribute information necessary for the accent type determination in the prosodic processing unit 16.

【００３８】韻律処理部１６は、図４に示すように、韻
律処理制御部２６と、韻律処理記号生成部２４とからな
る。As shown in FIG. 4, the prosody processing section 16 comprises a prosody processing control section 26 and a prosody processing symbol generation section 24.

【００３９】韻律処理制御部２６では、発声速度決定部
２２から出力された発声速度情報から、韻律処理記号生
成部２４で発声速度に応じた処理を行なうための制御を
行なう。これにより、発声速度に応じて韻律処理部１６
で得られる話調成分の区分を変更することができる。The prosody processing control unit 26 controls the vocalization rate information output from the vocalization rate determination unit 22 to cause the prosody processing symbol generation unit 24 to perform processing according to the vocalization rate. As a result, the prosody processing unit 16 is selected according to the speaking rate.
It is possible to change the classification of speech tone components obtained in.

【００４０】韻律処理記号生成部２４では、音韻処理部
１４で生成された単音記号列や単語音韻属性などの音韻
情報と、言語処理部１２で出力された文節の境界、単語
のアクセントおよび文法情報などの言語情報、さらに韻
律処理制御部２６からの発声速度に関する制御情報など
を入力して、基本周波数パターンやパワー、継続時間、
ポーズの位置などの韻律成分を決定する。In the prosody processing symbol generation unit 24, phonological information such as the monophonic symbol string and word phonological attribute generated in the phonological processing unit 14 and the boundary of phrases, word accents and grammatical information output in the language processing unit 12. Language information such as, and further, control information relating to the speaking rate from the prosody processing control unit 26, and the like, and input the fundamental frequency pattern, power, duration,
Determine prosodic components such as pose position.

【００４１】例えば、韻律記号の生成に関しては、韻律
的特徴を記述するパラメータ、すなわち、ポーズの長さ
および大きさ、フレーズ指令・アクセント指令の大きさ
を韻律記号として定義し、ポーズ記号・フレーズ記号の
生成規則、およびアクセント記号の生成のための規則を
用いて韻律成分を決定することができる。また、平叙文
と疑問文と命令文というように文の種類を分類し、文の
種類毎にアクセントやフレーズの規則を作成してもよ
い。For example, regarding the generation of prosodic symbols, parameters that describe prosodic features, that is, the length and size of a pose, the size of a phrase command / accent command, are defined as a prosody symbol, and a pose symbol / phrase symbol is defined. Can be used to determine the prosody component. Further, it is also possible to classify sentence types such as a plain text, an interrogative sentence, and an imperative sentence, and create accent and phrase rules for each sentence type.

【００４２】また、話調成分の区分、すなわち、句音声
区間は意味的に１つのまとまりになったものを単位とす
る場合が多い。ここでは、自立語と付属語を合わせたも
ののように文脈にかかわらず常に１つのアクセント成分
をもつ形態素の連鎖を「韻律語」と呼び、韻律語を
「葉」とした構文木は図３のようになる。また、韻律語
の境界において、左の語が右の語を直接修飾している境
界部分を「右依存境界」と呼び、それ以外の境界を「非
右依存境界」と呼ぶ。さらに、前後を非右依存境界で区
切られ、かつ右依存境界のみを含む単語連鎖を「極大右
依存連鎖」と呼ぶ。極大右依存連鎖は意味的に１つのま
とまりとなっており、多くの場合一息で発声される場合
が多い。この場合、句音声区間の境界は、極大右依存連
鎖の境界に対応しているとみなすことができる。このよ
うに言語情報における文節の境界や係り受けの関係など
の情報を用いて句音声区間を生成することができる。In many cases, the classification of speech tone components, that is, the phrase voice section is semantically united as a unit. Here, a chain of morphemes that always have one accent component regardless of context, such as a combination of independent words and adjuncts, is called a "prosodic word", and a parse tree in which a prosodic word is a "leaf" is shown in FIG. Like In the boundary of prosodic words, the boundary part in which the left word directly modifies the right word is called "right-dependent boundary", and the other boundary is called "non-right-dependent boundary". Further, a word chain that is separated by non-right dependency boundaries and includes only right dependency boundaries is called a "maximum right dependency chain". The maximal right dependence chain is semantically one unit, and is often vocalized in a single breath. In this case, the boundary of the phrase speech section can be regarded as corresponding to the boundary of the maximum right dependence chain. In this way, it is possible to generate a phrase speech section by using information such as the boundary of clauses in the language information and the relationship of dependency.

【００４３】韻律処理記号生成部２４においては、上記
のような韻律記号生成規則を用いて韻律記号を生成し、
句音声区間の生成規則を用いて句音声区間を生成する。The prosody symbol generation unit 24 generates a prosody symbol using the above prosody symbol generation rule,
A phrase speech section is generated using the phrase speech section generation rule.

【００４４】次に、韻律処理制御部２６では、発声速度
決定部２２により決定された発声速度情報により、発声
速度に応じた韻律処理を行うための制御を韻律処理記号
生成部２４に対して行う。Next, the prosody processing control unit 26 controls the prosody processing symbol generation unit 24 to perform prosody processing according to the utterance speed based on the utterance speed information determined by the utterance speed determination unit 22. .

【００４５】例えば、発声速度による句音声区間の区分
を変更する場合に、句音声区間の生成規則において発声
速度に関したしきい値や重みづけを韻律処理記号生成部
２４に与えることにより制御する。以下に発声速度に応
じたしきい値と重みづけを用いた句音声区間の生成規則
を示す。For example, when changing the division of the phrase speech section according to the utterance speed, the prosodic processing symbol generator 24 is controlled by giving a threshold value or weighting relating to the utterance speed in the phrase speech section generation rule. The following shows the generation rules of phrase speech sections using thresholds and weights according to the speaking speed.

【００４６】ここでは、意味的なまとまりを示す極大右
依存連鎖と係り受けの関係による音調連結の強さと、発
声速度による句音声区間のモーラ数のしきい値と、それ
らの発声速度による重みづけにより句音声区間を生成す
る。Here, the strength of the tone connection by the relation of maximal right dependence chain and the dependency showing the semantic unity, the threshold of the number of mora in the phrasal speech section by the utterance speed, and the weighting by those utterance speeds. To generate a phrase voice section.

【００４７】重みづけは、発声速度が速いほど一息で発
話する単位が大きくなり、意味的な情報よりも重みが高
くなることが多い。また逆に、発声速度が遅い場合は、
発話の単位は短くなり、意味内容が伝わりやすいように
丁寧に発話することが多い。したがって、句音声区間の
生成は以下のような規則となる。As for the weighting, the faster the utterance speed, the larger the unit of utterance in one breath, and the weight is often higher than the semantic information. On the contrary, if the speaking speed is slow,
The unit of utterance is shortened, and the utterance is often carefully to make it easy to convey the meaning. Therefore, the generation of the phrase speech section has the following rules.

【００４８】（１）文・連節・節の境界を、極大右依
存連鎖の境界を句音声区間の境界とする。(1) The boundary between sentences, joints, and clauses is the boundary of the maximal right dependence chain as the boundary of the phrase speech section.

【００４９】（２）極大右依存連鎖の境界Ｉを求め
る。境界Ｉでの句音声区間の境界となると割合Ｉ（ｎ）
を１とする。Ｉ以外の文節の境界をではＩ（ｎ）＝０と
する。ｎは文頭からの文節間の境界位置を示す。(2) The boundary I of the maximum right dependence chain is obtained. Proportion I (n) at the boundary of the phrase speech section at boundary I
Is set to 1. The boundary of the clauses other than I is I (n) = 0. n indicates the boundary position between the phrases from the beginning of the sentence.

【００５０】（３）各文節間の係り受けの関係による
音調結合の強さＫ（ｎ）を図９の表より求める。(3) Tone coupling strength K (n) due to the dependency relation between the phrases is determined from the table of FIG.

【００５１】（４）発声速度情報よりしきい値Ａ，Ｂ
を決定する。各しきい値は発声速度により図８の表から
参照する。(4) Thresholds A and B based on vocalization speed information
To decide. Each threshold is referred to from the table of FIG. 8 according to the speaking rate.

【００５２】（５）境界Ｉにおいて、直前の極大右依
存連鎖のモーラ数がしきい値Ａより短い場合はしきい値
による境界Ｓとしない。すなわちＳ（ｎ）＝０である。
文節境界も同様にＳ（ｎ）＝０である。(5) At the boundary I, if the number of mora of the immediately preceding maximum right dependence chain is shorter than the threshold value A, the boundary S is not determined by the threshold value. That is, S (n) = 0.
Similarly, the bunsetsu boundary is S (n) = 0.

【００５３】（６）各境界Ｉにおいて、極大右依存連
鎖のモーラ数がしきい値Ｂより長い場合は、その境界内
において、しきい値Ｂ以下の長さで、かつその境界を分
割してできた境界内のモーラ数が均等になるような位置
をしきい値の境界Ｓ（ｎ）＝１とする。(6) In each boundary I, when the number of mora of the maximal right dependence chain is longer than the threshold value B, the boundary is divided into a length equal to or smaller than the threshold value B and the boundary is divided. The threshold value boundary S (n) = 1 is set at a position where the number of moras in the created boundary becomes even.

【００５４】（７）発声速度によるそれぞれの境界の
重みづけＩｇ，Ｋｇ，Ｓｇを図７の表より求め、以下の
式により各文節間の境界における句音声区間となる確率
Ｐ（ｎ）を求める。(7) The weighting Ig, Kg, Sg of each boundary according to the utterance speed is calculated from the table of FIG. 7, and the probability P (n) of the phrase speech section at the boundary between the phrases is calculated by the following formula. .

【００５５】[0055]

【数４】（８）Ｐ（ｎ）が規定値Ｔより大きい場合に句音声区
間の境界とする。ここではＴ＝０．６５とした。[Equation 4] (8) When P (n) is larger than the specified value T, it is set as the boundary of the phrase voice section. Here, T = 0.65.

【００５６】ここでは、発声速度情報により上記のしき
い値Ａ、Ｂと重みづけの値を変更することにより、発声
速度に対応した句音声区間の区分変更を可能とした。こ
れにより、発声速度に応じた自然な韻律情報を生成する
ことができる。Here, the threshold values A and B and the weighting values are changed according to the vocalization speed information, so that it is possible to change the segment of the phrase voice section corresponding to the vocalization speed. As a result, it is possible to generate natural prosody information according to the speaking rate.

【００５７】本実施例では、上記の韻律記号生成規則に
発声速度決定部２２からの出力である発声速度情報から
韻律処理を制御する韻律処理制御部２６を設けることに
より、発声速度に応じた韻律制御を可能とする。In this embodiment, the prosody corresponding to the utterance speed is provided by providing the above-mentioned prosody symbol generation rule with the prosody processing control unit 26 for controlling the prosody process from the utterance speed information output from the utterance speed determining unit 22. Allows control.

【００５８】このようにして決定された韻律成分によ
り、音韻処理部１４で出力された単音記号列を句音声区
間に分け、各句音声区間においてポーズ成分、フレーズ
成分の立ち上がり、立ち下がりなどの韻律情報を示す韻
律記号列を付加した音韻記号列が生成され合成パラメー
タ生成部１８に出力される。The prosodic component determined in this way divides the monophonic symbol string output by the phonological processing unit 14 into phrase speech sections, and prosody such as the rise and fall of the pause component and phrase component in each phrase speech section. A phonological symbol string to which a prosody symbol string indicating information is added is generated and output to the synthesis parameter generation unit 18.

【００５９】合成パラメータ生成部１８は、図５に示す
ように、音韻処理部１４の処理結果である単音記号およ
び韻律処理部１６の処理結果であるポーズ記号に基づい
て、音声合成装置１０の合成フィルタの特性を制御する
伝達特性パラメータ時系列を生成すると同時に、韻律処
理部１６の処理結果である話調記号、アクセント記号に
基づいて音声合成装置１０の音源周波数を制御する基本
周波数時系列を生成する。As shown in FIG. 5, the synthesis parameter generator 18 synthesizes the speech synthesizer 10 on the basis of the phonetic symbol which is the processing result of the phoneme processing unit 14 and the pause symbol which is the processing result of the prosody processing unit 16. At the same time as generating the transfer characteristic parameter time series for controlling the characteristics of the filter, at the same time as generating the fundamental frequency time series for controlling the sound source frequency of the speech synthesizer 10 based on the speech tone and accent symbols which are the processing results of the prosody processing unit 16. To do.

【００６０】すなわち、単音記号、ポーズ記号に対応す
る音節パラメータ系列を記号の連なりに従って接続する
ことによって伝達特性パラメータの時系列を生成し、一
方、アクセント記号、話調記号に従って基本周波数パタ
―ン生成モデルへの指令を生成し、基本周波数パタ―ン
生成モデルに指令を入力することにより基本周波数時系
列を生成する。That is, a time series of transfer characteristic parameters is generated by connecting syllable parameter sequences corresponding to a single phonetic symbol and a pause symbol in accordance with the sequence of symbols, while a fundamental frequency pattern is generated according to an accent symbol and a tone key symbol. By generating commands to the model and inputting commands to the fundamental frequency pattern generation model, the fundamental frequency time series is generated.

【００６１】音節パラメータ系列、伝達特性パラメータ
系列は、音声合成装置１０の合成フィルタパラメータの
系列から構成されるものであり、音節パラメータでは、
一定時間間隔ごとの合成フィルタパラメータの系列によ
り各音節の周波数的な特徴の変化を表している。ポーズ
区間では、音声出力がないので合成パラメータの音源ゲ
インがゼロに設定されている。また、伝達特性パラメー
タ系列は設定されている発声速度と音韻継続時間長に従
って音節パラメータ系列を接続する。接続においては各
パラメータが自然音声と同じように滑らかに変化するよ
うにパラメータ間の補間処理が行なわれる。The syllable parameter series and the transfer characteristic parameter series are composed of a series of synthesis filter parameters of the speech synthesizer 10.
Changes in frequency characteristics of each syllable are represented by a series of synthesis filter parameters at fixed time intervals. In the pause section, since there is no voice output, the sound source gain of the synthesis parameter is set to zero. The transfer characteristic parameter sequence connects the syllable parameter sequence according to the set vocalization rate and phoneme duration. In connection, interpolation processing between parameters is performed so that each parameter changes smoothly like natural speech.

【００６２】韻律処理部１６の処理結果である話調成分
とアクセント成分の制御を行なうための話調記号、アク
セント記号に基づいて音声合成装置１０の音源周波数を
制御するための基本周波数時系列を生成する。ここで
は、話調記号、アクセント記号を基本周波数パターン生
成過程モデルに入力して基本周波数時系列を生成する場
合について説明する。A fundamental frequency time series for controlling the sound source frequency of the speech synthesizer 10 based on the speech tone symbol and the accent symbol for controlling the speech tone component and the accent component which are the processing results of the prosody processing unit 16 is generated. To generate. Here, a case will be described in which a tone symbol and an accent symbol are input to a fundamental frequency pattern generation process model to generate a fundamental frequency time series.

【００６３】基本周波数パターン生成過程モデルは、生
成される基本周波数の（１）文頭から文末に向かう緩や
かな下降曲線（話調成分）と、（２）局所的な起伏の激
しい曲線（アクセント成分）の二つの成分を臨界制動２
次線形系の応答としてモデル化したものである。対数基
本周波数の時間変化は、これら２つの成分の和として表
現される。基本周波数Ｆ０（ｔ）は（ｔは時刻）は次式
のように定式化されている。The fundamental frequency pattern generation process model includes (1) a gradual downward curve (speech component) from the beginning of a sentence to the end of the generated fundamental frequency, and (2) a curve with a strong local undulation (accent component). Two components of critical braking 2
It is modeled as a response of a second-order linear system. The time change of the logarithmic fundamental frequency is expressed as the sum of these two components. The fundamental frequency F0 (t) (t is time) is formulated as the following equation.

【００６４】[0064]

【数５】なお、ここで、Ｌｎ（）は対数を示し、ＳＵＭ［ｉ＝
ｎ，ｍ］（）はパラメータｉについてのｎからｍまでの
和を示し、Ｆ０（ｔ）は基本周波数を示し、Ｆｍｉｎは
最小基本周波数を示し、Ｇｐｉ（ｔ）は話調成分制御機
構のインパルス応答関数を示し、Ｇａｊ（ｔ）はアクセ
ント成分制御機構のステップ応答関数を示し、Ａｐｉ、
Ａａｊはそれぞれ話調成分指令を示し、アクセント成分
指令の大きさを示し、Ｔ０ｉは話調成分指令の時点を示
し、Ｔ１ｊ，Ｔ２ｊはアクセント指令の始点と終点を示
している。[Equation 5] Here, Ln () represents a logarithm, and SUM [i =
[n, m] () represents the sum of n to m for the parameter i, F0 (t) represents the fundamental frequency, Fmin represents the minimum fundamental frequency, and Gpi (t) represents the impulse of the speech component control mechanism. Represents a response function, Gaj (t) represents a step response function of the accent component control mechanism, and Api,
Aaj indicates the tone component instruction, the magnitude of the accent component instruction, T0i indicates the time point of the tone component instruction, and T1j and T2j indicate the start point and the end point of the accent instruction.

【００６５】ここで、ｔ＞０で、Ｇｐｉ（ｔ），Ｇａｊ
（ｔ）は、Here, when t> 0, Gpi (t), Gaj
(T) is

【数６】であり、また、ｔ＜０ではＧｐｉ（ｔ）＝Ｇａｉ（ｔ）
＝０である。[Equation 6] And when t <0, Gpi (t) = Gai (t)
= 0.

【００６６】なお、ｍｉｎ［ｎ，ｍ］は、ｎとｍの小さ
い方を取ることを示し、ＴＨ（＜＝１．０）よりも大き
な値をとらないことを示している。ｅｘｐ（）は指数関
数であり、Ｘｉ，Ｙｊはそれぞれ話調成分制御機構およ
びアクセント成分制御機構の固有角周波数である。It should be noted that min [n, m] indicates that n or m is smaller, and it does not take a value larger than TH (<= 1.0). exp () is an exponential function, and Xi and Yj are natural angular frequencies of the speech tone component control mechanism and the accent component control mechanism, respectively.

【００６７】基本周波数パターン生成過程モデルに基づ
いて基本周波数時系列を生成するためには、各パラメー
タを設定する必要があるが、自然音声と本モデルとの誤
差を最小にするようにパラメータを変化させながらパラ
メータ値の最良近似推定により、パラメータ値が初期設
定される。In order to generate the fundamental frequency time series based on the fundamental frequency pattern generation process model, it is necessary to set each parameter, but the parameters are changed so as to minimize the error between the natural speech and this model. The parameter value is initialized by the best approximation estimation of the parameter value.

【００６８】音声波形生成部２０は、合成パラメータ部
の処理結果である音声合成装置１０の合成フィルタの特
性を制御する伝達特性パラメータ時系列と、音声合成装
置１０の音源周波数を制御する基本周波数時系列に基づ
いて音声合成装置１０の合成フィルタを駆動し音声波形
を生成する。The speech waveform generating section 20 includes a transfer characteristic parameter time series for controlling the characteristics of the synthesis filter of the speech synthesizer 10 which is the processing result of the synthesis parameter section and a fundamental frequency for controlling the sound source frequency of the speech synthesizer 10. The synthesis filter of the speech synthesizer 10 is driven based on the sequence to generate a speech waveform.

【００６９】例えば、図６に示すようなホルマント型合
成装置１００による音声波形生成の例を説明する。For example, an example of voice waveform generation by the formant type synthesizer 100 as shown in FIG. 6 will be described.

【００７０】有声音源の発生源は、パルス発生器であ
り、雑音の発生源は、雑音発生器である。図６中のＲＧ
Ｐは声門共振回路を示し、ＲＧＺは声門反共振回路を示
し、Ａ１〜Ａ５は振幅制御回路を示し、ＡＶは有声音源
用振幅制御回路を示し、ＡＨ，ＡＦは無声音源用振幅制
御回路を示し、ＡＢはバイパス用振幅制御回路を示して
いる。また、Ｆ０は基本周波数を示し、Ｆ１〜Ｆ５は、
共振周波数とバンド幅で制御する共振回路を示してい
る。The source of the voiced sound source is a pulse generator, and the source of noise is a noise generator. RG in Figure 6
P indicates a glottal resonance circuit, RGZ indicates a glottal anti-resonance circuit, A1 to A5 indicate amplitude control circuits, AV indicates an amplitude control circuit for voiced sound sources, and AH and AF indicate amplitude control circuits for unvoiced sound sources. , AB are amplitude control circuits for bypass. In addition, F0 indicates the fundamental frequency, and F1 to F5 are
A resonance circuit controlled by the resonance frequency and the bandwidth is shown.

【００７１】このホルマント合成装置１００は、基本周
波数時系列と伝達特性パラメータ時系列の音源のゲイン
パラメータに基づいて、有声音源の周波数および有声音
源、無声音源のゲインを制御しながら音源波形を生成す
る。さらに、伝達特性パラメータ時系列の合成フィルタ
の中心周波数パラメータとバンド幅パラメータに基づい
て合成フィルタを制御しながら音源波形を入力して各フ
ィルタの出力波形を求め、それぞれの合成フィルタの出
力波形の和を合成装置１００の出力とする。例えば、標
本化周波数を１２ＫＨｚ、８ｍｓｅｃごとに合成パラメ
ータを更新し、有声音源にはインパルスにローパスフィ
ルターをかけたものを、無声音源には乱数発生機を利用
することができる。This formant synthesizer 100 generates a sound source waveform while controlling the frequency of the voiced sound source and the gains of the voiced sound source and the unvoiced sound source based on the fundamental frequency time series and the gain parameter of the sound source of the transfer characteristic parameter time series. . Further, the sound source waveform is input while controlling the synthesis filter based on the transfer frequency parameter time series synthesis filter center frequency parameter and bandwidth parameter, and the output waveform of each filter is obtained, and the sum of the output waveforms of the respective synthesis filters is calculated. Are output from the synthesizer 100. For example, a synthesis parameter can be updated every 8 msec at a sampling frequency of 12 KHz, a low-pass filter for impulses can be used for voiced sound sources, and a random number generator can be used for unvoiced sound sources.

【００７２】ここではホルマント型合成装置１００によ
る波形生成について説明したが、合成装置の構成、音源
の種類、標本化周波数等も一般的に知られたものを利用
することができる。Although the waveform generation by the formant type synthesizer 100 has been described here, a generally known one can be used as the configuration of the synthesizer, the type of the sound source, the sampling frequency and the like.

【００７３】本合成装置１０では、発声速度により句音
声区間の区分を変更することによって、どのような発声
速度においても聞きやすい合成音声を生成できることが
特徴である。The present synthesis device 10 is characterized in that it is possible to generate a synthesized voice that is easy to hear at any utterance speed by changing the segment of the phrase voice section according to the utterance speed.

【００７４】自然音声では、発声速度に応じた韻律の制
御が行なわれており、例えば、図２のように句音声区間
の区分化が変化する。発声内容を発声速度に応じた自然
で滑らかな合成音声に変換するためには、発声速度に応
じた句音声区間の区分変更を行なう必要がある。In natural speech, prosody is controlled according to the utterance speed, and, for example, the segmentation of the phrase speech section changes as shown in FIG. In order to convert the utterance content into a natural and smooth synthetic speech according to the utterance speed, it is necessary to change the segment of the phrase speech section according to the utterance speed.

【００７５】発声速度による句音声区間の区分変更に
は、発声速度が速い場合には、それらの複数の句音声区
間を接続して、１つの句音声区間としたり、また、発声
速度が遅い場合、１つの句音声区間内において、文の構
造的、意味的に切れる部分でその句音声区間を複数の句
音声区間に分割する。When the utterance speed is high, a plurality of phrase sound sections are connected to form one phrase sound section, or when the utterance speed is low. Within one phrase voice section, the phrase voice section is divided into a plurality of phrase voice sections at a structurally and semantically cut portion of a sentence.

【００７６】発声速度に応じた句音声区間の区分の変更
について具体的な例をあげて説明する。The change of the segment of the phrase speech section according to the utterance speed will be described with a specific example.

【００７７】ここでは、発声速度により句音声区間の区
分の変更を先に示した規則を適用し、上記の処理を先に
用いた文例を用いて具体的に説明する。Here, a description will be specifically given by using a sentence example in which the above-described processing is applied by applying the above-described rule for changing the division of the phrase speech section according to the utterance speed.

【００７８】構文解析などの結果から文例の構文木は図
３のようになり、これより極大右依存連鎖の境界は以下
のようになる。ここで、「｜」は極大右依存連鎖の境界
を表す。また各文節間の境界となりうる割合を示す。From the result of the syntactic analysis, the syntax tree of the sentence example is as shown in FIG. 3. From this, the boundary of the maximal right dependency chain is as follows. Here, "|" represents the boundary of the maximum right dependence chain. It also shows the ratio that can be the boundary between each clause.

【００７９】[0079]

【数７】また、各文節間の係り受けの関係による音調結合の強さ
は以下のようになる。ただし値が大きいほど結合度が弱
いものとする。[Equation 7] In addition, the strength of the tone combination due to the dependency relation between each bunsetsu is as follows. However, the larger the value, the weaker the degree of binding.

【００８０】[0080]

【数８】これに発声速度によるしきい値から得られた境界Ｓは以
下のようになる。ここでの発話速度は７モーラ／ｓｅｃ
であり、発声速度によるしきい値は図８の表よりＡは３
モーラ、Ｂは１３モーラである。しきい値による境界を
「‖」で表す。[Equation 8] The boundary S obtained from the threshold according to the speaking rate is as follows. The speech rate here is 7 mora / sec
From the table of FIG. 8, A is 3
Mora, B is 13 Mora. The threshold boundary is represented by "|".

【数９】この場合、句音声区間の生成規則より、「彼が」の後の
極大右依存連鎖の境界は境界Ｓとなるが、「あの」は２
モーラであるので、生成規則により、その後は境界とし
ない。最後の極大右依存連鎖の境界は生成規則を満足す
るので句音声区間境界となり、上記の文のような区分と
なる。[Equation 9] In this case, the boundary of the maximal right dependency chain after "he is" is the boundary S according to the generation rule of the phrase speech section, but "that" is 2
Since it is a mora, it is not a boundary after that due to the production rules. Since the boundary of the last maximum right dependency chain satisfies the production rule, it becomes a phrase-speech section boundary, which is a section like the above sentence.

【００８１】これより各文節間の境界の句音声区間とな
る確率をもとめると、以下のようになり確率の規定値よ
り句音声区間は「／」で示された部分を表す。From this, the probability of becoming a phrase speech section at the boundary between each clause is obtained as follows, and the phrase speech section represents the portion indicated by "/" from the specified value of the probability.

【００８２】[0082]

【数１０】発声速度を速くする場合の句音声区間の区分を示す。発
声速度が速い場合、しきい値はそれぞれ大きくなる。発
声速度を１０モーラ／ｓｅｃで発声すると、図８の表よ
りしきい値Ａは６モーラ、Ｂは１５モーラとなり、上記
の文の句音声区間は以下のようになる。[Equation 10] The following shows the division of phrase speech sections when the utterance speed is increased. When the speech rate is high, the threshold value becomes large. Speaking at a speech rate of 10 mora / sec, the threshold A is 6 mora and B is 15 mora from the table of FIG. 8, and the phrase speech section of the above sentence is as follows.

【００８３】[0083]

【数１１】発声速度が遅い場合、しきい値はそれぞれ小さくする。
４モーラ／ｓｅｃで発声した場合、Ａは２モーラ、Ｂは
１２モーラとなり、上記の文の句音声区間は以下のよう
になる。[Equation 11] When the speaking rate is slow, the threshold value is set small.
When uttered at 4 mora / sec, A is 2 mora, B is 12 mora, and the phrase speech section of the above sentence is as follows.

【００８４】[0084]

【数１２】このようにして、上記例文の句音声区間を生成すること
により図２のような自然音声と同様に句音声区間の合成
文を生成することができる。すなわち、発声速度に応じ
た韻律の制御が可能となる。[Equation 12] In this way, by generating the phrase voice section of the above example sentence, a synthesized sentence of the phrase voice section can be generated as in the case of the natural voice as shown in FIG. That is, it is possible to control the prosody according to the speaking rate.

【００８５】[0085]

【発明の効果】以上説明したように、本発明により発声
速度の異なる音声を規則に基づいて合成する場合に、従
来の音声規則合成装置では実現できなかったイントネー
ションなどの韻律制御において、合成音声の発声速度に
依存した制御が可能となり、人間の自然音声における発
声速度の様々な変化に応じて音声の合成が可能となる。
すなわち、合成する文が同じでも発声速度に応じてイン
トネーションなどの韻律制御において、１つの話調成分
をもつ話調単位の区分化を変化させることが可能とな
る。As described above, in the case of synthesizing voices having different utterance speeds according to the present invention according to the present invention, in the prosody control such as intonation which cannot be realized by the conventional voice rule synthesizing device, the synthetic voice The control depending on the utterance speed becomes possible, and the voice can be synthesized according to various changes in the utterance speed in the natural voice of human.
That is, even if the sentences to be synthesized are the same, it is possible to change the segmentation of the tone unit having one tone component in prosody control such as intonation according to the utterance speed.

【００８６】すなわち、音声規則合成装置において、発
声速度決定部を持つことにより、発声速度を決定するこ
とができ、また、決定された発声速度に対応した韻律の
制御を行なう韻律処理制御部を持つことにより、発声速
度に応じた句音声区間の区分の変更が可能となる。That is, in the voice rule synthesizing device, the utterance speed determining unit is provided, so that the utterance speed can be determined, and the prosodic processing control unit for controlling the prosody corresponding to the determined utterance speed is provided. As a result, it becomes possible to change the division of the phrase speech section according to the utterance speed.

【００８７】そのため、従来の音声合成装置が目的とし
ていた文字音声変換や文書の読み上げなどの分野を対象
としたものばかりでなく、発声速度が早く、あるいは遅
くなるような人間とコンピュータとの対話における音声
規則合成などで、丁寧にゆっくりと発声したり、早く発
声するなど発声速度の変化を伴う自然な音声を規則合成
することが可能となる。Therefore, not only is it intended for the fields such as character-to-speech conversion and reading of a document that a conventional speech synthesizer has aimed at, but also in a dialogue between a human and a computer in which the speaking speed becomes fast or slow. By using voice rule synthesis, it becomes possible to perform rule synthesis of a natural voice accompanied by a change in utterance speed, such as slowly and slowly speaking.

【００８８】このように、計算機との対話や他の様々な
利用における発声速度の変化に伴う韻律情報の制御が可
能な音声合成装置の実現により、自然で分かり易い音声
を計算機により合成することが可能となり、計算機の音
声メディアを利用する場合の利用者の負担を軽減するこ
とが可能となる。As described above, the realization of the speech synthesizer capable of controlling the prosody information associated with the change in the utterance speed in the dialog with the computer and various other uses makes it possible to synthesize a natural and easily understandable voice by the computer. This makes it possible to reduce the burden on the user when using the voice media of the computer.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例を示す文音声規則合成装置の
ブロック図である。FIG. 1 is a block diagram of a sentence voice rule synthesizing device according to an embodiment of the present invention.

【図２】音声の基本周波数パタ―ンを示すフレ―ズ成分
であって、（ａ）は、１０モ―ラ／ｓｅｃの発声速度で
あり、（ｂ）は、７モ―ラ／ｓｅｃの発声速度であり、
（ｃ）は、４モ―ラ／ｓｅｃの発声速度である。FIG. 2 is a phase component showing a fundamental frequency pattern of speech, in which (a) is a speech rate of 10 m / sec and (b) is 7 m / sec. Is the speaking rate,
(C) is a speech rate of 4 mora / sec.

【図３】「彼があの大きな車を運転する」という文の構
文木である。Figure 3 is the syntax tree for the sentence "He drives that big car".

【図４】韻律処理部のブロック図である。FIG. 4 is a block diagram of a prosody processing unit.

【図５】合成パラメ―タ生成部のブロック図である。FIG. 5 is a block diagram of a synthesis parameter generation unit.

【図６】ホルマント合成装置のブロック図である。FIG. 6 is a block diagram of a formant synthesizer.

【図７】境界の重みを示す表である。FIG. 7 is a table showing boundary weights.

【図８】発声速度からＡ，Ｂのしきい値を示す表であ
る。FIG. 8 is a table showing threshold values of A and B based on a speaking rate.

【図９】音調結合の強さを示す表である。FIG. 9 is a table showing strengths of tone combinations.

[Explanation of symbols]

１０……文音声規則合成装置１２……言語処理部１４……音韻処理部１６……韻律処理部１８……合成パラメータ生成部２０……音声波形生成部２２……発声速度決定部２４……韻律処理記号生成部２６……韻律処理制御部 10 ... Sentence rule synthesis device 12 ... Language processing unit 14 ... Phonological processing unit 16 ... Prosodic processing unit 18 ... Synthesis parameter generation unit 20 ... Speech waveform generation unit 22 ... Speaking speed determination unit 24 ... Prosody processing symbol generation unit 26 ... Prosody processing control unit

Claims

[Claims]

1. A linguistic processing section for extracting linguistic information from content information of a synthesized voice, a phonological processing section for generating phonological information from the linguistic information, a prosody processing section for generating prosody information, and the generated phonological information. A speech synthesis apparatus including a synthesis parameter generation unit that generates a synthesis parameter from prosody information and a voice waveform generation unit that generates a voice waveform from the synthesis parameter. A speech synthesis apparatus comprising a prosody processing control unit for controlling the prosody processing unit based on the speech rate information output from the speech synthesis device.

2. The speech synthesis apparatus according to claim 1, further comprising a prosody processing control unit for controlling segmentation of speech tone components based on the speech rate information output from the speech rate determination unit.