JPH01186996A

JPH01186996A - Sentence intonation processing method for voice synthesizing device

Info

Publication number: JPH01186996A
Application number: JP63011248A
Authority: JP
Inventors: Yoshimichi Okuno; 義道奥野
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1988-01-21
Filing date: 1988-01-21
Publication date: 1989-07-26

Abstract

PURPOSE:To obtain a synthesized voice having an intonation approximating a human voice by determining a breath position in accordance with boundaries of clauses and phrases of a text and adding the expiratory intonation determined based on the breath position to a fundamental intonation and word accents to obtain a sentence intonation. CONSTITUTION:A sentence analyzer 5b refers to a grammar dictionary 6 and a word dictionary 7 to perform the sentence analysis of a Japanese-language input text such as clause division and phrase division and obtains boundaries of clauses and phrases and the number of syllables of each boundary with respect to the sentence divided to clauses and phrases and determines a breath position in the sentence in accordance with these boundaries and numbers of syllables. An expiratory intonation which has a logarithmic characteristic having the breath position and the number of syllables as the start point and the end point respectively is generated and is superposed on the fundamental intonation and word accents and a pause or the like is added to obtain metrical information of one sentence. Thus, a natural voice is synthesized.

Description

【発明の詳細な説明】Ａ、産業上の利用分野本発明は、規則合成方式による音声合成装置に係り、特
に文イントネーションの処理方法に関する。DETAILED DESCRIPTION OF THE INVENTION A. Field of Industrial Application The present invention relates to a speech synthesis device using a rule synthesis method, and more particularly to a method for processing sentence intonation.

Ｂ８発明の概要本発明は、テキストの韻律情報を文章解析から得るにお
いて、テキストを文節区切、句区切した境界から息つき位置を
決定し、この息つき位置から決める呼気イントネーショ
ンを基本イントネーションと単語アクセントに加えるこ
とで文イントネーションを得ることにより、人の声に近づけたイントネーションの合成音声が得られ
るようにしたものである。B8 Summary of the Invention The present invention, in obtaining prosodic information of a text from sentence analysis, determines the breathing position from the boundaries of the text into clauses and phrases, and uses the exhalation intonation determined from this breathing position as the basic intonation and word accent. By adding this to the sentence intonation, it is possible to obtain a synthesized voice with an intonation close to that of a human voice.

Ｃ１従来の技術規則合成方式による音声合成装置は、例えば第３図に示
す構成にされる。文章解析部１は日本語入力テキストの
文字列に対して辞書１ａと文章解析装置１ｂによる文章
解析を行う。辞書１ａには単語の読みがな変換のための
辞書のほかに単語の文節区切１句区切等のための日本語
文法辞書を有し、さらには単語のアクセントや基本イン
トネーションの規則辞書を有する。文章解析装置１ｂは
辞書１ａを参照して入力テキストを音素あるいは音節の
音韻記号列に変換すると共に、単語アクセントや基本イ
ントネーション等の韻律情報を発生する。C1 A speech synthesis device based on the conventional technical rule synthesis method has, for example, the configuration shown in FIG. The text analysis unit 1 performs text analysis on a character string of a Japanese input text using a dictionary 1a and a text analysis device 1b. In addition to a dictionary for converting the pronunciation of words, the dictionary 1a has a Japanese grammar dictionary for dividing words into clauses and phrases, and also has a dictionary of rules for word accents and basic intonation. The text analysis device 1b converts the input text into a phoneme symbol string of phonemes or syllables by referring to the dictionary 1a, and also generates prosodic information such as word accents and basic intonation.

音声合成規則部２は、ファイル２ａとパラメータ生成装
置２ｂによって構成される。ファイル２ａは音韻単位の
特徴パラメータとそれらの接続規則及び韻律情報の制御
規則を蓄積しておく。パラメータ生成装置２ｂは音韻情
報に対する特徴パラメータをその接続時間等の情報と共
に連結した制御パラメータ列を生成すると共に、韻律情
報による音源のピッチ、エネルギー、イントネーション
処理を施した音源パターン列を生成する。The speech synthesis rule section 2 is composed of a file 2a and a parameter generation device 2b. The file 2a stores feature parameters of phoneme units, their connection rules, and prosodic information control rules. The parameter generation device 2b generates a control parameter sequence in which feature parameters for phoneme information are linked together with information such as connection time, and also generates a sound source pattern sequence in which pitch, energy, and intonation processing of the sound source is processed based on prosody information.

音声生成部３は、音源生成装置３ａと音声合成ディジタ
ルフィルタ３ｂと音声変換器３ｃとによって構成される
。音源生成装置３ａは、音源パターン列に従ったピッチ
、エネルギー等の音源信号を発生する。ディジタルフィ
ルタ３ｂは制御パラメータ列に従ってパーコール係数や
伝達関数又はフォルマント周波数のパラメータが変えら
れ、このパラメータでの音源信号に対する応答出力に合
成音声データ列を得る。音声変換器３ｃはフィルタ３ｂ
の出力をアナログ信号に変換して音声波形を得、スピー
カ等の電気−音変換手段による合成音声を出力する。The speech generation section 3 is composed of a sound source generation device 3a, a speech synthesis digital filter 3b, and a speech converter 3c. The sound source generation device 3a generates sound source signals such as pitch and energy according to the sound source pattern sequence. In the digital filter 3b, the parameters of the Percoll coefficient, transfer function, or formant frequency are changed according to the control parameter string, and a synthesized speech data string is obtained as a response output to the sound source signal with these parameters. The audio converter 3c is a filter 3b
The output of the converter is converted into an analog signal to obtain an audio waveform, and synthesized audio is output by an electro-sound conversion means such as a speaker.

上述のような音声合成装置において、韻律情報は、文章
データに対して基本イントネーション。In the speech synthesis device described above, the prosody information is the basic intonation for the text data.

単語アクセント、ストレス、ポーズ、継続時間等の組合
わせで１つの文イントネーションとして作成される。こ
の文イントネーションの作成処理には、日本語入力テキ
ストを辞書１ａを参照した構文解析による文節区切１句
区切１文区切、形態素（言語的に意味を持つ最小の単位
）分類等によって単語の系列区分化と単語境界を付した
表音列に変換する。この表音列情報に対して、区読点単
位の文節（モーラ）数から求められる基本イントネーシ
ョン、単語単位のアクセント、句単位のポーズ、接頭語
や接尾辞等から求められる単語中のストレス、音素単位
の継続時間等が決定される。A single sentence intonation is created by combining word accent, stress, pause, duration, etc. In this process of creating sentence intonation, the Japanese input text is divided into phrases, phrases, and sentences by syntactic analysis with reference to Dictionary 1a, and words are divided into series by morpheme (the smallest linguistically meaningful unit) classification. Convert to a phonetic string with word boundaries and word boundaries. For this phonetic sequence information, basic intonation determined from the number of clauses (mora) for each punctuation mark, accent for each word, pause for each phrase, stress in words determined from prefixes and suffixes, etc., and phoneme units. The duration etc. of the period are determined.

Ｄ０発明が解決しようとする課題従来の文イントネーション処理方法では、基本イントネ
ーションと単語アクセントを重畳したピッチ周波数が支
配的となり、単調な文イントネーションになり易く、人
の音に較べて品質の劣るものになり易かった。即ち、人
の声は発生器管の性質から、振幅、ピッチ、フォルマン
ト周波数が時間的に大きく変化するもので、従来の基本
イントネーションと単語アクセントのみでは肉声に近づ
けた合成音声を得るのが難しかった。D0 Problems to be Solved by the Invention In conventional sentence intonation processing methods, the pitch frequency in which basic intonation and word accent are superimposed becomes dominant, which tends to result in monotonous sentence intonation, which is inferior in quality to human speech. It was easy. In other words, due to the nature of the human voice's generator, its amplitude, pitch, and formant frequency change significantly over time, making it difficult to obtain synthesized speech that closely resembles the natural voice using only conventional basic intonation and word accents. .

本発明の目的は、文イントネーションの単調性を無くし
、肉声に近づけた（ントネーンヨンになる文イントネー
シジン処理方法を提供するにある。An object of the present invention is to provide a sentence intonation processing method that eliminates the monotony of sentence intonation and brings it closer to the natural voice.

Ｅ２課題を解決するための手段と作用本発明は上記目的を達成するために、日本語入力テキス
トの文節区切及び句区切の境界から息つき位置を決定し
、この６息つき位置から夫々対数特性で低下する呼気イ
ントネーションを求め、この呼気イントネーションと基
本イントネーションと単語アクセントとを加算して文イ
ントネーションを求めることにより、人の声に含まれる
呼気イントネーションを加味して肉声に近い文イントネ
ーションを得る。E2 Means and Effects for Solving Problems In order to achieve the above object, the present invention determines breathing positions from the boundaries of bunsetsu divisions and phrase divisions of Japanese input text, and calculates logarithmic characteristics from these six breathing positions. By determining the exhalation intonation that decreases in the human voice and adding this exhalation intonation, basic intonation, and word accent to obtain the sentence intonation, the sentence intonation close to the real voice is obtained by adding the exhalation intonation included in the human voice.

Ｆ、実施例第１図は本発明の一実施例を示す文章解析部のブロック
図である。辞書５ａは、従来と同様に、日本語文法辞書
６や単語辞書７（アクセント、接続時間）を備える。文
章解析装置５ｂは、日本語入力テキストに対して、従来
と同様に文法辞書６及び単語辞書７を参照して単語の読
みがな変換。F. Embodiment FIG. 1 is a block diagram of a text analysis section showing an embodiment of the present invention. The dictionary 5a includes a Japanese grammar dictionary 6 and a word dictionary 7 (accent, connection time), as in the past. The text analysis device 5b refers to the grammar dictionary 6 and the word dictionary 7 to convert the Japanese input text into pronunciations of the words, as in the past.

文節区切１句区切、単語アクセント位置等の文章解析を
行う。この文節区切及び句区切の文章に対して、文章解
析装置５ｂは、文節境界や句境界さらには各境界毎の音
節数を求め、これら境界と音節数から文章中の息つき位
置を決定する。そして、文章解析装置５ｂは息つき位置
を起点として音節数を終点とする対数特性を持つ呼気イ
ントネーションを生成し、この呼気イントネーションを
基本イントネーション及び単語アクセントに重畳し、さ
らにはポーズ等を付加して１つの文章の韻律情報を得る
。なお、音韻情報の作成は従来と同様に行われる。Analyzes sentences such as bunsetsu divisions, phrase divisions, word accent positions, etc. The sentence analysis device 5b calculates the phrase boundaries and phrase boundaries as well as the number of syllables for each boundary, and determines the breathing position in the sentence from these boundaries and the number of syllables. Then, the sentence analysis device 5b generates an exhalation intonation with a logarithmic characteristic starting from the breath position and ending at the number of syllables, superimposing this exhalation intonation on the basic intonation and word accent, and further adding pauses, etc. Obtain prosodic information of one sentence. Note that the creation of phoneme information is performed in the same manner as before.

第２図は、文章解析装置５ｂにおける文イントネーショ
ンの作成態様を示す。日本語入力テキストが「学校の桜
がきれいにさいた。」にあるとき、基本イントネーショ
ンはその音節数によって立上り点から対数特性でピッチ
低下して行き、呼気イントネーションは文節区切で立上
って音節数による時定数を持つ対数特性で低下して行き
、これらに単語アクセントが加算され、さらにフィルタ
処理による丸めが行われて１つの文イントネーションが
作成される。FIG. 2 shows how sentence intonation is created in the sentence analysis device 5b. When the Japanese input text is ``The cherry blossoms at school bloomed beautifully.'', the basic intonation decreases in pitch from the rising point in a logarithmic manner according to the number of syllables, and the exhalation intonation rises at the break of phrases and decreases in pitch according to the number of syllables. The word accent is added to these logarithmic characteristics with a time constant, and further rounding is performed by filter processing to create one sentence intonation.

従って、文イントネーション決定には、文節区切や句区
切による境界と音節数から求める息つき位置で呼気イン
トネーシぢンが重畳されたものになり、人が話すときの
呼吸運動に類似するピッチ周波数の変化を与えることが
でき、肉声に近づけた合成音声を得ることができる。Therefore, to determine sentence intonation, the exhalation intonation is superimposed at the breath position determined from the number of syllables and the boundaries caused by bunsetsu and phrase breaks, and changes in pitch frequency similar to the breathing movement when a person speaks. It is possible to obtain a synthesized voice that is close to the real voice.

なお、人力テキストが長い文章になるときも同様の文節
区切と句区切の境界における呼気イントネーション付加
がなされるが、基本イントネーションは主部と述部に振
分けて夫々に与える等の処理で対応される。Note that when a human-written text becomes a long sentence, a similar exhalation intonation is added at the boundary between bunsetsu and phrase breaks, but the basic intonation is handled by dividing it into the subject and predicate and giving them to each. .

Ｇ１発明の効果以上のとおり、本発明によれば、呼気イントネーション
を含めた文イントネーションを得るようにしたため、文
イントネーションが人の声に含まれる呼気によるイント
ネーション変化を持たせた合成音声を発生でき、自然な
音声の合成ができるようになる。G1 Effects of the Invention As described above, according to the present invention, since sentence intonation including exhalation intonation is obtained, it is possible to generate synthetic speech in which the sentence intonation has an intonation change due to exhalation contained in a human voice, It will be possible to synthesize natural speech.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
実施例のイントネーション波形を例示する図、第３図は
規則合成方式による音声合成装置のブロック図である。５ａ・・・辞書、５ｂ・・・文章解析装置、６・・・文
法辞書、７・・・単語辞書。５ａ・・・辞書５ｂ・・・文章解析装置６・・・文法辞書７・・・単語辞書FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram illustrating intonation waveforms of the embodiment, and FIG. 3 is a block diagram of a speech synthesis apparatus using a rule synthesis method. 5a... Dictionary, 5b... Sentence analysis device, 6... Grammar dictionary, 7... Word dictionary. 5a...Dictionary 5b...Sentence analysis device 6...Grammar dictionary 7...Word dictionary

Claims

[Claims]

(1) Obtain phonological information and prosody information from the Japanese input text, obtain the control parameters of the digital filter corresponding to the phonological information and the sound source pattern of the sound source generator corresponding to the prosody information, and generate the audio signal corresponding to the text. In the speech synthesis device using the rule synthesis method, the breathing intonation is determined from the boundaries of bunsetsu divisions and phrase divisions of the Japanese input text, and exhalation intonation that decreases with a logarithmic characteristic is determined from each breathation position. A sentence intonation processing method for a speech synthesizer, characterized in that sentence intonation is obtained by adding intonation, basic intonation, and word accent.