JPH1097290A

JPH1097290A - Speech synthesizer

Info

Publication number: JPH1097290A
Application number: JP8251646A
Authority: JP
Inventors: Hideji Nishida; 秀治西田; Hiroyuki Hirai; 啓之平井; Masanori Miyatake; 正典宮武; Hiroki Onishi; 宏樹大西
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1996-09-24
Filing date: 1996-09-24
Publication date: 1998-04-14

Abstract

PROBLEM TO BE SOLVED: To output a synthesized speech waveform of superior speech quality by reading an optimum unit speech waveform corresponding to a 1st vocal sound symbol part string divided in specific preferential order out of a waveform memory and connecting it. SOLUTION: A text speech synthesizer 10 includes a microcomputer 12. The microcomputer 12 receives an input character string consisting of a 1st vocal sound symbol string consisting of text document data, and uses a dictionary 14 for text analysis to convert it into a vocal sound symbol string consisting of the 1st vocal sound symbol part string and also generate the pitch pattern and power pattern of this input character string. Then the microcomputer 12 shapes, connects, and edits unit speech waveforms registered in a speech waveform data base 16 according to the pitch pattern and power pattern, and outputs the resulting synthesized speech. Language information corresponding to vocal sound symbols of a 2nd vocal sound symbol string which is divided in specific preferential order is added to the 2nd vocal sound symbol string.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声合成装置に関
し、特に音声案内、音声応答および音声読み上げ等に用
いられ、入力文字列に対応する音韻記号列に従って音声
波形を合成して出力する、音声合成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer, and more particularly to a speech synthesizer used for voice guidance, voice response, voice reading, etc., which synthesizes and outputs a voice waveform according to a phoneme symbol string corresponding to an input character string. Related to the device.

【０００２】[0002]

【従来の技術】従来の一般的な音声合成装置が、特開平
７−９２９９７号公報に開示されている。これは、音声
合成を行う音韻系列に対応し、データベースに含まれる
音声単位の中から、その音声単位に付加された韻律情報
により、音響的な特徴を選択基準として、音声単位を選
択してそれぞれを接続するものである。2. Description of the Related Art A conventional general speech synthesizer is disclosed in Japanese Patent Application Laid-Open No. 7-92997. This corresponds to the phoneme sequence for speech synthesis, and from the speech units included in the database, based on the prosodic information added to the speech unit, a speech unit is selected based on acoustic features as a selection criterion, and Is to connect.

【０００３】具体的に上述の音声合成装置の構成例を図
６に示す。FIG. 6 shows a specific example of the configuration of the above-described speech synthesizer.

【０００４】図６において、１００は入力端子、１０１
は前処理部、１０２は選択基準パラメータ設定部、１０
３は素片選択部、１０４は条件設定部、１０５は素片パ
ラメータテーブル、１０６は素片ファイル、１０７は素
片接続部、１０８は出力端子を表す。In FIG. 6, reference numeral 100 denotes an input terminal, 101
Is a preprocessing unit, 102 is a selection criterion parameter setting unit, 10
Reference numeral 3 denotes a unit selection unit, 104 denotes a condition setting unit, 105 denotes a unit parameter table, 106 denotes a unit file, 107 denotes a unit connection unit, and 108 denotes an output terminal.

【０００５】条件設定部１０４は、素片選択部１０３に
おける素片選択処理時に用いる素片環境適正条件を種々
設定するものであり、これら設定条件の追加、変更、削
除を行うことができる。The condition setting unit 104 sets various unit environment appropriate conditions to be used in the unit selection process in the unit selection unit 103, and can add, change, and delete these setting conditions.

【０００６】次に、図６を用いて、従来の音声合成装置
の処理について説明する。Next, the processing of the conventional speech synthesizer will be described with reference to FIG.

【０００７】前処理部１０１は、入力文字列を音韻単位
に分割する。選択基準パラメータ設定部１０２は、合成
パラメータである波形素片の選択に用いる選択基準パラ
メータを上述の音韻単位に基づいて設定する。素片選択
部１０３は、設定された選択基準パラメータと、素片パ
ラメータテーブル１０５から取り込んだ素片パラメータ
との間でその２乗誤差を算出し、この２乗誤差が小さい
順に素片パラメータを選択して１次候補を生成し、更に
条件設定部１０４に設定された素片環境適正条件を最も
良くみたす素片パラメータに対応する素片を上述の音韻
単位に対する最適素片として決定する。The preprocessing unit 101 divides an input character string into phoneme units. The selection criterion parameter setting unit 102 sets a selection criterion parameter used for selecting a waveform segment that is a synthesis parameter based on the above-described phoneme unit. The unit selection unit 103 calculates the square error between the set selection criterion parameter and the unit parameter fetched from the unit parameter table 105, and selects the unit parameters in ascending order of the square error. Then, a primary candidate is generated, and a unit corresponding to a unit parameter that best meets the unit environment appropriate condition set in the condition setting unit 104 is determined as an optimum unit for the above-described phoneme unit.

【０００８】このとき、「素片環境適正条件」とは、
（１）素片を抽出した音韻環境と合成時の音韻環境が一
致もしくは調音方式が類似すること、（２）平均ピッチ
の大小関係が選択基準パラメータの大小関係と一致する
こと、（３）ピッチ傾斜の符号（正／負／０）が選択基
準パラメータの符号と一致すること、であり、韻律情報
によって、最適素片の決定を行う。[0008] At this time, the "element environment appropriate condition" is:
(1) The phoneme environment from which the unit was extracted matches the phoneme environment at the time of synthesis or the articulation method is similar, (2) the magnitude relationship of the average pitch matches the magnitude relationship of the selection reference parameter, (3) the pitch The sign of the slope (positive / negative / 0) matches the sign of the selection reference parameter, and the optimal segment is determined based on the prosody information.

【０００９】次に、素片接続部１０７は決定された音韻
単位の最適素片を素片ファイル１０６から抽出し、音韻
単位毎に素片接続部１０７で接続して合成音声を出力す
る。Next, the unit connection unit 107 extracts the optimal unit for the determined phoneme unit from the unit file 106, connects the unit for each phoneme unit, and outputs a synthesized speech.

【００１０】[0010]

【発明が解決しようとする課題】然し乍ら、日本語の語
尾の母音は無声化しやすく、全体的にパワーが小さくな
るなどの特異な音響特性を有しているため、従来の音声
合成装置では、音声単位の選択の際、音声単位の平均ピ
ッチ周期や平均パワーなどの音響的な特徴を評価しうる
韻律情報のみで判定を行っており、日本語の語尾のよう
に、韻律情報だけでは最適な音声単位が選択できないと
いった問題点があった。However, the vowels at the end of Japanese tend to be unvoiced and have unique acoustic characteristics such as low power as a whole. At the time of unit selection, judgment is made only with prosodic information that can evaluate acoustic features such as the average pitch period and average power of the speech unit. There was a problem that the unit could not be selected.

【００１１】また、それぞれの音声単位は音節単位で分
解されており、音節単位毎に選択基準にしたがって音声
単位が選択されているため、単語単位など、第２の音韻
記号列に対する音声波形の連続性を利用できず、これも
合成音の音質の向上を妨げる原因となっていた。Each speech unit is decomposed in syllable units, and speech units are selected in accordance with the selection criteria for each syllable unit. The sound quality was not available, which also hindered the improvement of the sound quality of the synthesized sound.

【００１２】それゆえに、この発明の主たる目的は、優
れた音質の合成音を出力することができる、音声合成装
置を提供することである。[0012] Therefore, a main object of the present invention is to provide a speech synthesizer capable of outputting a synthesized sound having excellent sound quality.

【００１３】[0013]

【課題を解決するための手段】本発明は、入力文字列に
対応する第１の音韻記号列に含まれる複数の第１の音韻
記号部分列に対応する単位音声波形を合成して合成音声
波形を出力する音声合成装置において、前記第１の音韻
記号列を所定の優先順位で複数の第１の音韻記号部分列
に分割する分割手段と、前記所定の優先順位で分割され
た第２の音韻記号部分列を含む第２の音韻記号列、及び
前記第２の音韻記号部分列に対応する単位音声波形を含
む音声波形を蓄積した波形メモリと、前記第１の音韻記
号部分列に対応する単位音声波形を前記波形メモリから
読み出す波形読出手段と、前記波形メモリから読み出さ
れた単位音声波形を接続して合成音声波形を生成する波
形接続手段と、を具備し、前記第２の音韻記号列には、
音韻記号毎にその音韻記号に対応する言語情報が付加さ
れていることを特徴とする。According to the present invention, a synthesized speech waveform is synthesized by synthesizing unit speech waveforms corresponding to a plurality of first phoneme symbol substrings included in a first phoneme symbol string corresponding to an input character string. And a dividing unit that divides the first phoneme symbol string into a plurality of first phoneme symbol subsequences at a predetermined priority, and a second phoneme divided at the predetermined priority. A waveform memory storing a second phoneme symbol string including a symbol subsequence and a speech waveform including a unit speech waveform corresponding to the second phoneme symbol subsequence; and a unit corresponding to the first phoneme symbol subsequence. A waveform reading means for reading a voice waveform from the waveform memory; and a waveform connecting means for connecting a unit voice waveform read from the waveform memory to generate a synthesized voice waveform, wherein the second phoneme symbol string is provided. In
A feature is that linguistic information corresponding to each phoneme symbol is added to each phoneme symbol.

【００１４】また、前記第２の音韻記号部分列には、そ
の音韻記号部分列が語尾であるかないかの言語情報が付
加されており、前記第１の音韻記号部分列に一致する第
２の音韻記号部分列に対応する単位音声波形を前記波形
メモリから読み出す時、第１の音韻記号部分列が語尾で
あれば、それに対応する第２の音韻記号部分列を、前記
言語情報に基づいて語尾である単位音声波形を選択する
ことを特徴とする。The second phoneme symbol subsequence is added with linguistic information indicating whether or not the phoneme symbol subsequence is the ending, and a second phoneme symbol subsequence that matches the first phoneme symbol subsequence is added. When the unit speech waveform corresponding to the phoneme symbol subsequence is read from the waveform memory, if the first phoneme symbol subsequence is the ending, the corresponding second phoneme symbol subsequence is suffixed based on the language information. Is selected.

【００１５】また、前記第２の音韻記号部分列には、そ
の音韻記号部分列が語尾であるかないかの言語情報が付
加されており、前記第１の音韻記号部分列に一致する第
２の音韻記号部分列に対応する単位音声波形を前記波形
メモリから読み出す時、第１の音韻記号部分列が語尾で
なければ、それに対応する第２の音韻記号部分列を、前
記言語情報に基づいて、語尾でない単位音声波形を選択
することを特徴とする。The second phoneme symbol subsequence is added with linguistic information indicating whether or not the phoneme symbol subsequence is the ending, and the second phoneme symbol subsequence coincides with the first phoneme symbol subsequence. When reading a unit speech waveform corresponding to a phoneme symbol subsequence from the waveform memory, if the first phoneme symbol subsequence is not an ending, a second phoneme symbol subsequence corresponding to the first phoneme symbol subsequence is determined based on the language information. It is characterized in that a unit voice waveform that is not the ending is selected.

【００１６】更に、前記所定の優先順位とは、無音部、
無声音部および有声音部の順であることを特徴とする。Further, the predetermined priority is a silent part,
It is characterized by an unvoiced part and a voiced part in this order.

【００１７】[0017]

【発明の実施の形態】本発明の実施の形態を図１〜図５
を用いて説明する。1 to 5 show an embodiment of the present invention.
This will be described with reference to FIG.

【００１８】図１を参照して、テキスト音声合成装置１
０は、マイコン１２を含む。マイコン１２は、テキスト
文章データからなる第１の音韻記号列からなる入力文字
例を受け、まずテキスト解析用辞書１４を用いて、入力
文字列を分割点が設定された第１の音韻記号部分列から
なる音韻記号列に変換すると共に、この入力文字列のピ
ッチパターンおよびパワーパターンを生成する。Referring to FIG. 1, text-to-speech synthesizer 1
0 includes the microcomputer 12. The microcomputer 12 receives an input character example composed of a first phoneme symbol string composed of text sentence data, and uses the text analysis dictionary 14 to convert the input character string into a first phoneme symbol substring in which a dividing point is set. And a pitch pattern and a power pattern of the input character string are generated.

【００１９】このとき、第１の音韻記号列を第１の音韻
記号部分列に分割するには、所定の優先順位、例えば無
音部、無声音部および有声音部の順であることが好まし
い。At this time, in order to divide the first phoneme symbol sequence into the first phoneme symbol subsequences, it is preferable that a predetermined priority order is set, for example, a silent part, an unvoiced part, and a voiced part.

【００２０】次に、マイコン１２は、音声波形データベ
ース１６に登録された単位音声波形をピッチパターンお
よびパワーパターンに基づいて、整形および接続編集
し、これによって生成された合成音を出力する。Next, the microcomputer 12 shapes and connects and edits the unit sound waveform registered in the sound waveform database 16 based on the pitch pattern and the power pattern, and outputs a synthesized sound generated thereby.

【００２１】音声波形データベース１６には、「音声波
形」と各々の音声波形に対する「音韻ラベル情報」、波
形接続点近傍の音響特性を表す「韻律情報」に加え、語
尾であるかないかを表す「言語情報」が登録されてい
る。音韻ラベル情報は、音韻記号列（第２の音韻記号
列）および記号列番号を含む。具体例として、図２に音
声波形データベース１６に登録されている各情報を列挙
する。なお、音韻記号列に含まれる“−”は５msec以上
の無音区間を表している。The speech waveform database 16 includes "speech waveforms", "phonological label information" for each speech waveform, "prosodic information" representing acoustic characteristics near the waveform connection point, and "suffix or not" indicating whether or not the ending is present. "Language information" is registered. The phoneme label information includes a phoneme symbol string (second phoneme symbol string) and a symbol string number. As a specific example, FIG. 2 lists information registered in the audio waveform database 16. Note that "-" included in the phoneme symbol string indicates a silent section of 5 msec or more.

【００２２】また、音声波形データベース１６に登録さ
れている音韻記号列（第２の音韻記号列）は、第１の音
韻記号部分列と同様に、所定の優先順位、例えば無音
部、無声音部および有声音部の順で分割された第２の音
韻記号部分列から構成されている。The phoneme symbol sequence (second phoneme symbol sequence) registered in the speech waveform database 16 has a predetermined priority, for example, a silence portion, an unvoiced sound portion, and the like, like the first phoneme symbol subsequence. It is composed of a second phoneme symbol substring divided in the order of the voiced sound part.

【００２３】入力文字列に対応する音韻文字列，パワー
パターンおよびピッチパターンを生成するためのアルゴ
リズムを図３に示す。FIG. 3 shows an algorithm for generating a phoneme character string, a power pattern, and a pitch pattern corresponding to an input character string.

【００２４】マイコン１２はまず、ステップＳ１で、入
力文字列を１文単位でメモリ１２ａに書き込む。次に、
ステップＳ３で文字列の形態素解析をする。すなわち、
テキスト解析用辞書１４には、単語の表記とそれに対す
る音韻記号列（読み）やアクセント、品詞などの情報が
蓄えられており、これらの情報を用いて入力文字列がど
のような単語から成り立っているかを解析する。First, in step S1, the microcomputer 12 writes an input character string into the memory 12a in units of one sentence. next,
In step S3, a morphological analysis of the character string is performed. That is,
The text analysis dictionary 14 stores information such as word notation and phonological symbol strings (reading), accents, parts of speech, and the like. The input character string is composed of what words using these information. Is analyzed.

【００２５】続いて、ステップＳ５で解析結果に基づい
て入力文字列の音韻記号列を生成する。Subsequently, in step S5, a phoneme symbol string of the input character string is generated based on the analysis result.

【００２６】その後、ステップＳ７でテキスト解析用辞
書１４を用いて入力文字列のポーズ（ＰＡＵＳＥ）情報
を解析し、この解析結果からステップＳ９で入力文字列
のパワーパターンを生成する。Then, in step S7, the pause (PAUSE) information of the input character string is analyzed using the text analysis dictionary 14, and a power pattern of the input character string is generated from the analysis result in step S9.

【００２７】更に、ステップＳ１１で、テキスト解析用
辞書１４を用いて入力文字列のアクセント情報を解析
し、この解析結果からステップＳ１３で入力文字列のピ
ッチパターンを生成する。Further, in step S11, accent information of the input character string is analyzed using the text analysis dictionary 14, and a pitch pattern of the input character string is generated in step S13 from the analysis result.

【００２８】ここで、パワーパターンは、周知の数量化
Ι類モデル、ピッチパターンも周知の藤崎モデル（比企
静雄編、「音声情報処理」東京大学出版会、１９７３）
により算出される。Here, the power pattern is a well-known quantified type II model and the pitch pattern is also a well-known Fujisaki model (edited by Shizuo Hiki, "Speech Information Processing" University of Tokyo Press, 1973)
Is calculated by

【００２９】次に、入力文字列に対応する音韻記号列、
パワーパターンおよびピッチパターンに基づいて出力音
声を生成するアルゴリズムを図４に示す。Next, a phoneme symbol string corresponding to the input character string,
FIG. 4 shows an algorithm for generating an output voice based on the power pattern and the pitch pattern.

【００３０】マイコン１２は、まずステップＳ１５で入
力文字列に対応する音韻記号列の分割点を決定し、この
音韻記号列を複数の音韻記号部分列に分割する。At step S15, the microcomputer 12 first determines a division point of the phoneme symbol string corresponding to the input character string, and divides the phoneme symbol string into a plurality of phoneme symbol subsequences.

【００３１】次に、ステップＳ１７で部分列ナンバーｎ
を“１”に設定し、さらにステップＳ１９で、ｎ番目の
音韻記号部分列に対応する単位音声波形およびラベル情
報を音声波形データベース１６より抽出する。Next, in step S17, the subsequence number n
Is set to “1”, and the unit speech waveform and the label information corresponding to the n-th phoneme symbol subsequence are extracted from the speech waveform database 16 in step S19.

【００３２】続いて、ステップＳ２１で、入力文字列に
対応するパワーパターンに一致するように単位音声波形
の音韻継続時間長およびゲインを波形整形によって修正
する。Subsequently, in step S21, the phoneme duration and gain of the unit speech waveform are corrected by waveform shaping so as to match the power pattern corresponding to the input character string.

【００３３】その後、ステップＳ２３で、入力文字列に
対応するピッチパターンに対応するように単位音声波形
のピッチを波形整形によって修正する。Then, in step S23, the pitch of the unit voice waveform is corrected by waveform shaping so as to correspond to the pitch pattern corresponding to the input character string.

【００３４】続いて、ステップＳ２５で波形を接続し、
接続した合成音声波形をステップＳ２７でメモリ１２ｂ
に記憶する。Subsequently, the waveforms are connected in step S25,
The connected synthesized speech waveform is stored in the memory 12b in step S27.
To memorize.

【００３５】その後、ステップＳ２９で部分列番号ｎを
インクリメントし、ステップＳ３１でｎ番目の単位音声
波形が存在するかどうか判断する。ここで“ＹＥＳ”で
あればステップＳ１９に戻るが、“ＮＯ”であればステ
ップＳ３３で合成音声波形をアナログ音声波形に変換し
て出力する。ステップＳ３３のデータ変換は、周知のＰ
ＳＯＬＡ法（F.Charpentier他、”Pitch-Synchronous W
aveform Processing Techniques for Text-to-speech S
ynthesis Using Diphones”,Proc.Eurospeech’89)を用
いた。Thereafter, in step S29, the sub-sequence number n is incremented, and in step S31, it is determined whether or not the n-th unit voice waveform exists. If "YES" here, the process returns to the step S19. However, if "NO", the synthesized voice waveform is converted into an analog voice waveform and output in a step S33. The data conversion in step S33 is performed by using a well-known P
SOLA method (F. Charpentier et al., “Pitch-Synchronous W
aveform Processing Techniques for Text-to-speech S
synthesis Using Diphones ", Proc. Eurospeech '89).

【００３６】ここで、本発明の特徴であるステップＳ１
５を、具体的に以下に説明する。Here, step S1 which is a feature of the present invention is described.
5 will be specifically described below.

【００３７】本発明の実施の形態では、入力音韻記号列
の分割点の組み合わせからできるすべての音韻記号部分
列について、以下に示す「評価関数score」により決定
される評価点を算出し、各々の音韻記号部分列に対応す
る評価点の累積が最小になる組み合わせより分割点を決
定する。In the embodiment of the present invention, evaluation points determined by the following "evaluation function score" are calculated for all the phoneme symbol subsequences formed from the combination of the division points of the input phoneme symbol string, and each evaluation point is calculated. The division point is determined from the combination that minimizes the accumulation of the evaluation points corresponding to the phoneme symbol subsequence.

【００３８】ここで、「評価関数score」は、分割点の
優先順位により決定される値type、分割点前後の音韻の
種類により決定される値link、分割された音韻長により
決定される値len、及び分割点に対応する波形接続点に
おける理論値とのピッチ周期の差により決定される値f
0、および選択された音韻記号部分列が語尾であるかな
いかを数量化した値termのそれぞれの数値にｗ１〜ｗ５
の重みをかけて足し合わせた値とする。ｗ１〜ｗ５の重
みは、夫々０〜１０までの実数定数である。Here, the "evaluation function score" is a value type determined by the priority of the division point, a value link determined by the type of phoneme before and after the division point, and a value len determined by the length of the divided phoneme. , And a value f determined by the difference in pitch period from the theoretical value at the waveform connection point corresponding to the division point
0, and w1-w5 for each numerical value of the value term quantifying whether the selected phonological symbol subsequence is the ending or not.
Weighted and added together. The weights of w1 to w5 are real constants of 0 to 10, respectively.

【００３９】評価関数：score＝ｗ１＊type＋ｗ２＊lin
k＋ｗ３＊len＋ｗ４＊f0＋ｗ５＊term type ＝０（分割点が前記優先順位第１位である場合） type ＝１（分割点が前記優先順位第２位である場合） type ＝３（分割点が前記優先順位第３位である場合） type ＝９（それ以外の場合） link ＝０（分割点前後の音韻の種類が一致する場合） link ＝９（それ以外の場合） len ＝−（分割点で区切られた音韻記号部分列の音韻
記号数） f0 ＝｜log（実波形のピッチ周期）−log（理論ピッ
チ周期）｜／ log（理論ピッチ周期） term ＝０（入力部分列が語尾でなく、選択部分列も語
尾でない場合） term ＝１（入力部分列が語尾であり、選択部分列も語
尾である場合） term ＝９（上記以外の場合）以下、入力文字列／−ａｍｅｎｏ−ｔａｍｅｄｅｓｕ−
／（雨のためです）について、分割点の決定方法につい
て述べる。Evaluation function: score = w1 * type + w2 * lin
k + w3 * len + w4 * f0 + w5 * term type = 0 (when the division point is the first priority) type = 1 (when the division point is the second priority) type = 3 (the division point is the priority) Type = 9 (otherwise) link = 0 (if the phonemes before and after the division point match) link = 9 (otherwise) len =-(separated by the division point) F0 = | log (pitch period of actual waveform)-log (theoretical pitch period) | / log (theoretical pitch period) term = 0 (input subsequence is not the ending, but is selected) Term = 1 (when the input subsequence is the ending and the selected subsequence is also the ending) term = 9 (other than the above) Hereinafter, the input character string / -ameno-tamedesu-
Regarding / (because of rain), the method of determining the division point is described.

【００４０】尚、説明の簡略のため本実施の形態では、ｗ１＝１，ｗ２＝１，ｗ３＝１，ｗ４＝１，ｗ５＝１とした。音韻記号部分列の組み合わせは、図５に示す木
検索により行う。In this embodiment, for simplicity of description, w1 = 1, w2 = 1, w3 = 1, w4 = 1, w5 = 1. The combination of phoneme symbol substrings is performed by a tree search shown in FIG.

【００４１】図５中、選択された音韻記号部分列（この
音韻記号部分列は音声波形デ−タベ−ス１６のラベル情
報に存在し、かつ、すべての音韻分割点前後の音韻が一
致するものが選択されたとした）の下側にscore値が示
されている。説明のため、各音韻記号部分列の選択され
た状態を便宜上「ノ−ド０」から「ノ−ド８」と呼ぶ。In FIG. 5, the selected phoneme symbol subsequence (this phoneme symbol subsequence exists in the label information of the speech waveform database 16 and all phonemes before and after the phoneme division point coincide with each other) The score value is shown below. For the sake of explanation, the selected state of each phoneme symbol substring is referred to as "node 0" to "node 8" for convenience.

【００４２】まず、はじめにノ−ド０において／−／
（無音）で始まり、／−ａｍｅｎｏ．．．／と続く音韻
記号部分列を音声波形デ−タベ−ス１６のラベル情報よ
り検索し、その中で最もscore値が小さい音韻記号部分
列から所定の数ｍ個（実施例では、２個とする）選択
し、下位にｍ個のノ−ドを作成する。First, at node 0,
(Silence) and / -ameno. . . The phoneme symbol subsequence following "/" is retrieved from the label information of the speech waveform database 16, and a predetermined number m (two in the embodiment) of the phoneme symbol subsequence having the smallest score value is searched. ) Select and create m nodes below.

【００４３】図５では、ノ−ド１／−ａｍｅｎｏ−／と
ノ−ド４／−ａｍｅｎｏ−ｔａｍ／が選択された。／−
ａｍｅｎｏ−／のｓｃｏｒｅ値は、 type ＝９：優先順位外の分割点で終わっている。In FIG. 5, node 1 / -ameno- / and node 4 / -ameno-tam / were selected. /-
The score value of ameno- / ends at the division point of type = 9: out of priority.

【００４４】link ＝０：後続の音韻記号がｔで一致し
ている。Link = 0: Subsequent phonemic symbols match at t.

【００４５】len ＝−１０ f0 ＝1.2：ピッチ差1.2倍 term ＝０：入力部分列が語尾でなく、選択部分列も語
尾でない。Len = −10 f0 = 1.2: pitch difference 1.2 times term = 0: the input subsequence is not the ending, and the selected subsequence is not the ending.

【００４６】score＝９＋０−１０＋1.2＋０＝0.2 ／−ａｍｅｎｏ−ｔａｍ／のｓｃｏｒｅ値は、 type ＝０：優先順位第１位の分割点で終わっている。The score value of score = 9 + 0-10 + 1.2 + 0 = 0.2 / -ameno-tam / is such that type = 0: ends at the division point of the first priority.

【００４７】link ＝０：後続の音韻記号がｍで一致し
ている。Link = 0: Subsequent phonemic symbols match at m.

【００４８】len ＝−７ f0 ＝1.3：ピッチ差1.3倍 term ＝０：入力部分列が語尾でなく、選択部分列も語
尾でない。Len = -7f0 = 1.3: pitch difference 1.3 times term = 0: the input subsequence is not the ending, and the selected subsequence is not the ending.

【００４９】score＝０＋０−７＋1.3＋０＝−5.7 である。ここで、ノ−ド１およびノ−ド４を音韻分割部
分列候補とする。Score = 0 + 0−7 + 1.3 + 0 = −5.7 Here, nodes 1 and 4 are set as phoneme division subsequence candidates.

【００５０】従って、それぞれのノ−ドでの累計score
値は、それぞれ、ノ−ド１での累計score＝0.2 ノ−ド４での累計score＝−5.7 となる。分割毎に累計scoreの小さいものからｍ個の音
韻部分列の探索系列を残すため本実施の形態ではノ−ド
１およびノ−ド４の音韻部分列は候補として残る。従っ
て、次の探索として、ノ−ド２、３、５、６が候補とな
り、ノ−ド２での累計score＝−1.6 ノ−ド３での累計score＝−5.2 ノ−ド５での累計score＝−4.5 ノ−ド６での累計score＝−6.6 となる。Therefore, the total score at each node
The values are respectively: cumulative score at node 1 = 0.2 cumulative score at node 4 = -5.7. In this embodiment, the phoneme subsequences of node 1 and node 4 remain as candidates in order to leave search sequences of m phoneme subsequences starting from the one with the smallest total score for each division. Therefore, as the next search, nodes 2, 3, 5, and 6 are candidates, and the total score at node 2 = -1.6 The total score at node 3 = -5.2 The total at node 5 score = −4.5 Total score at node 6 is −6.6.

【００５１】この場合、もし、同点があればそのノ−ド
でのscore値の小さいほうを優先するとするが、結果、
ノ−ド３、６が候補として残る。ここでノ−ド３は分割
が終了したので、ノ−ド３での累計scoreは、常に候補
として残る。ノ−ド２および５からの探索はこれ以上行
わない。同様に分割を繰り返し、候補として残ったノ−
ドは、図５より、ノ−ド３およびノ−ド７、ノ−ド８と
なり、それぞれの累積score値は、ノ−ド３での累計score＝−5.2 ノ−ド７での累計score＝−2.5 ノ−ド８での累計score＝−9.1 となる。In this case, if there is a tie, priority is given to the smaller score value at that node.
Nodes 3 and 6 remain as candidates. Here, since the division of the node 3 has been completed, the total score at the node 3 always remains as a candidate. No further search from nodes 2 and 5 is performed. The division is repeated in the same manner, and the remaining
From FIG. 5, the nodes are Node 3, Node 7, and Node 8. The cumulative score of each node is: Total score at Node 3 = -5.2 Total score at Node 7 = -2.5 The total score at node 8 is -9.1.

【００５２】ここで、ノ−ド７およびノード８でのscor
eを比較すると、ノ−ド７では、以下のように計算され
る。Here, scor at node 7 and node 8
Comparing e, the calculation at node 7 is as follows.

【００５３】type ＝０：優先順位第１位の分割点で終
わっている。Type = 0: ends at the division point of the first priority.

【００５４】link ＝０：文末であるから後続は接続し
ない。Link = 0: The end of the sentence is not connected.

【００５５】len ＝−６ f0 ＝1.1：ピッチ差1.1倍 term ＝９：入力部分列が語尾であり、選択部分列が語
尾でない。Len = -6f0 = 1.1: pitch difference 1.1 times term = 9: the input subsequence is the ending, and the selected subsequence is not the ending.

【００５６】score＝０＋０−６＋1.1＋９＝4.1 ノ−ド８では、以下のように計算される。Score = 0 + 0-6 + 1.1 + 9 = 4.1 At node 8, the calculation is as follows.

【００５７】type ＝０：優先順位第１位の分割点で終
わっている。Type = 0: Ends at the division point of the first priority.

【００５８】link ＝０：文末であるから後続は接続し
ない。Link = 0: No connection is made after the end of the sentence.

【００５９】len ＝−６ f0 ＝1.5：ピッチ差1.5倍 term ＝１：入力部分列が語尾であり、選択部分列も語
尾である。Len = -6f0 = 1.5: pitch difference 1.5 times term = 1: the input subsequence is the ending, and the selected subsequence is the ending.

【００６０】score＝０＋０−６＋1.1＋９＝−2.5 となり、もし、term項がなければ、接続個所前後のピッ
チ周期がよく近似しているノ−ド７が最終的に選択さ
れ、合成音は語尾／ｄｅｓｕ−／に文中の音が合成され
不自然な音となる。Score = 0 + 0-6 + 1.1 + 9 = -2.5, and if there is no term term, a node 7 whose pitch period before and after the connection is well approximated is finally selected, and the synthesized sound ends with / Desu- / is synthesized with the sound in the sentence, resulting in an unnatural sound.

【００６１】従って、最終的に語尾であるかないかの言
語情報を考慮したscoreが最も小さいノ−ド８までの検
索による音韻分割が最適となり、実際の分割は、／−ａｍｅｎｏ−／−ｔａｍｅ／ｅｄｅｓｕ−／に決定され、／ｅｄｅｓｕ−／の音声波形は、デ−タベ
−ス１６の中の語尾の音声波形が使われる。Therefore, the phoneme division by searching up to the node 8 having the smallest score in consideration of the linguistic information as to whether or not it is the end is optimal, and the actual division is as follows: / −ameno − / − name / edsu- / is determined, and the voice waveform at the end of the database 16 is used as the voice waveform of / edesu- /.

【００６２】[0062]

【発明の効果】以上の説明から明らかなように、本発明
によれば、所定の優先順位で分割された第１の音韻記号
部分列に対応する最適な単位音声波形が読出手段によっ
て波形メモリから読み出され、波形接続手段によって接
続されるため、音質が優れた合成音声波形を出力するこ
とができる効果を奏する。As is apparent from the above description, according to the present invention, the optimum unit speech waveform corresponding to the first phoneme symbol subsequence divided by the predetermined priority is read from the waveform memory by the reading means. Since it is read out and connected by the waveform connection means, it is possible to output a synthesized voice waveform having excellent sound quality.

[Brief description of the drawings]

【図１】本発明の音声合成装置を示すブロック図であ
る。FIG. 1 is a block diagram showing a speech synthesizer of the present invention.

【図２】音声波形データベースの内容を示す図である。FIG. 2 is a diagram showing contents of a speech waveform database.

【図３】実施の形態の動作の一部を示すフロー図であ
る。FIG. 3 is a flowchart showing a part of the operation of the embodiment.

【図４】実施の形態の動作の一部を示すフロー図であ
る。FIG. 4 is a flowchart showing a part of the operation of the embodiment.

【図５】実施の形態の動作の一部を示すフロー図であ
る。FIG. 5 is a flowchart showing a part of the operation of the embodiment.

【図６】従来の音声合成装置を示すブロック図である。FIG. 6 is a block diagram showing a conventional speech synthesizer.

[Explanation of symbols]

１０…テキスト音声合成装置１２…マイコン１４…テキスト解析用辞書１６…音声波形データベース DESCRIPTION OF SYMBOLS 10 ... Text-speech synthesizer 12 ... Microcomputer 14 ... Dictionary for text analysis 16 ... Speech waveform database

───────────────────────────────────────────────────── フロントページの続き (72)発明者大西宏樹大阪府守口市京阪本通２丁目５番５号三洋電機株式会社内 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Hiroki Onishi 2-5-5 Keihanhondori, Moriguchi-shi, Osaka Sanyo Electric Co., Ltd.

Claims

[Claims]

1. A speech synthesizer that synthesizes unit speech waveforms corresponding to a plurality of first phoneme symbol substrings included in a first phoneme symbol string corresponding to an input character string and outputs a synthesized speech waveform, Dividing means for dividing the first phoneme symbol sequence into a plurality of first phoneme symbol subsequences at a predetermined priority; and a second means including a second phoneme symbol subsequence divided at the predetermined priority. A waveform memory storing a speech waveform including a phoneme symbol sequence and a unit speech waveform corresponding to the second phoneme symbol subsequence; and reading a unit speech waveform corresponding to the first phoneme symbol subsequence from the waveform memory. Waveform reading means, and a waveform connection means for connecting the unit speech waveforms read from the waveform memory to generate a synthesized speech waveform, wherein the second phoneme symbol string includes, for each phoneme symbol, Languages corresponding to phonological symbols Speech synthesis apparatus characterized by multi-address is added.

2. The second phoneme symbol subsequence is added with linguistic information as to whether the phoneme symbol subsequence is an ending or not, and a second phoneme symbol subsequence that matches the first phoneme symbol subsequence is added. When the unit speech waveform corresponding to the phoneme symbol subsequence is read from the waveform memory, if the first phoneme symbol subsequence is the ending, the corresponding second phoneme symbol subsequence is suffixed based on the language information. 2. The speech synthesizer according to claim 1, wherein a unit speech waveform is selected.

3. The second phoneme symbol subsequence includes linguistic information indicating whether the phoneme symbol subsequence is an ending or not, and a second phoneme symbol subsequence that matches the first phoneme symbol subsequence. When reading a unit speech waveform corresponding to a phoneme symbol subsequence from the waveform memory, if the first phoneme symbol subsequence is not an ending, a second phoneme symbol subsequence corresponding to the first phoneme symbol subsequence is determined based on the language information. 2. The speech synthesizer according to claim 1, wherein a unit speech waveform other than the ending is selected.

4. The speech synthesizer according to claim 1, wherein the predetermined priority order is a silent part, an unvoiced part, and a voiced part.