JPS6315297A

JPS6315297A - Voice synthesizer

Info

Publication number: JPS6315297A
Application number: JP61159946A
Authority: JP
Inventors: 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1986-07-08
Filing date: 1986-07-08
Publication date: 1988-01-22
Anticipated expiration: 2014-02-10
Also published as: JP2856731B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、音声規則合成技術を用いて記号列から合成音
声に変換する音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a speech synthesis device that converts a symbol string into synthesized speech using speech rule synthesis technology.

（従来の技術）一般に音声規則合成技術を用いて記号列から合成音声に
変換する音声合成装置において、変換の対象とされる単
語や文章が確実に音声として認識されるためには、合成
音声に韻律的特徴が含まれていることが重要な要素とさ
れている。(Prior art) In general, in a speech synthesis device that converts a symbol string into synthesized speech using speech rule synthesis technology, in order to ensure that words and sentences to be converted are recognized as speech, it is necessary to convert the synthesized speech into The inclusion of prosodic features is considered to be an important element.

従来からこのような合成音声に韻律的特徴を付加する方
式として、記号列に音韻情報と韻律情報とを含めて入力
しこれらを合成させるもの、音韻情報のみを入力しこの
音韻情報に基づいてアクセント辞書等から韻律情報を推
定させるもの等がある。Conventionally, methods for adding prosodic features to synthesized speech include methods that input phonological information and metrical information in a symbol string and synthesize them, and methods that input only phonological information and add accents based on this phonological information. There are methods that estimate prosodic information from a dictionary or the like.

（発明が解決しようとする問題点）ところで上記した前者の方式においては、単語や文章の
韻律的特徴を所望とするものにざぜることができ、自然
で滑かな合成音声を得ることが可能であるが、音韻情報
の他に韻律情報を入力しなければならないため、非常に
労力を要する。(Problems to be Solved by the Invention) By the way, in the former method described above, the prosodic features of words and sentences can be adjusted to the desired characteristics, and it is possible to obtain natural and smooth synthesized speech. However, it requires a lot of effort because it requires inputting prosodic information in addition to phonetic information.

一方、後者の方式においては、音韻情報から韻律情報を
推定させるだめの膨大なメモリｍのアクセント辞書等が
必要とされる。したがって全ての単語や文章についての
韻律情報を推定することは不可能であるため、従来から
例えば人名、住所、企業名等の固有名詞等の特定の単語
や文章のみ韻律情報を推定するものとしている。ところ
がこの場合において、他の単語や文章が入力されたとき
、ｆＨｉ律的時的特徴まれていない合成音声となり、誤
った意味に認識されることがある。On the other hand, the latter method requires an accent dictionary or the like with a huge memory m for estimating prosodic information from phonetic information. Therefore, it is impossible to estimate prosodic information for all words and sentences, so traditionally, prosodic information has been estimated only for specific words and sentences, such as proper nouns such as people's names, addresses, and company names. . However, in this case, when other words or sentences are input, the resulting synthesized speech does not include fHi rhythmic temporal features, and may be recognized as having an incorrect meaning.

本発明は上記した事情に鑑みて創案されたもので、全て
の単語や文章について容易に韻律的特徴が付加され、自
然音声に近く了ｗ１度の高い合成音声を得ることのでき
る音声合成装置を提供することを目的としている。The present invention was devised in view of the above-mentioned circumstances, and provides a speech synthesis device that can easily add prosodic features to all words and sentences and that can obtain synthesized speech that is close to natural speech and has a high level of speech. is intended to provide.

［発明の栴成］（問題点を解決するための手段）すなわち本発明の音声合成装置は、音韻情報または韻律
情報が付与された音韻情報の記号列が入力される入力手
段と、この人力重設に入力された記号列が音韻情報のみ
からなるものであるか音韻情報に韻律情報が付与された
ものであるかを判定する判定手段と、多数の韻律情報を
記憶する記憶手段と、前記判定手段で記号列が音韻情報
のみからなるものであると判定したときこの音韻情報に
応じた前記記憶手段に記憶された韻律情報を選択する選
択手段とを少なくとも備えている。[Details of the Invention] (Means for Solving the Problems) In other words, the speech synthesis device of the present invention comprises an input means into which a symbol string of phonetic information to which phonetic information or prosody information is added, and an input means that inputs a symbol string of phonetic information to which phonetic information or prosody information is added; a determining means for determining whether a symbol string input to the system is composed of only phonetic information or one in which prosodic information is added to the phonetic information; a storage means for storing a large number of prosodic information; and a selection means for selecting prosodic information stored in the storage means according to the phoneme information when the means determines that the symbol string consists of only phoneme information.

（作用）本発明の音声合成装置において、入力手段に入力された
記号列が音韻情報のみからなるものであるとき、判定手
段でこのことが判定され、選択手段がこの音韻情報に応
じた記憶手段に記憶された韻律情報を選択している。こ
のため、全ての単語や文章について容易に韻律的特徴が
付加され、自然音声に近く了解度の高い合成音声を１ｑ
ることかできる。(Function) In the speech synthesis device of the present invention, when the symbol string input to the input means consists of only phonetic information, the determining means determines this, and the selection means selects the storage means according to this phonetic information. The prosody information stored in is selected. For this reason, prosodic features can be easily added to all words and sentences, and 1q of synthesized speech that is close to natural speech and has high intelligibility
I can do that.

（実施例）以下、本発明の実施例の詳細を図面に基づいて説明する
。(Example) Hereinafter, details of an example of the present invention will be described based on the drawings.

第１図は本発明の一実施例の音声合成装置を示す構成図
である。FIG. 1 is a block diagram showing a speech synthesis device according to an embodiment of the present invention.

同図において、符号１は単語や文章の文字列である音韻
情報の記号列またはこれにアクセント型、ポーズ長ある
いは接続情報（以下「アクセント型等」と呼ぶ。）を表
わす記号が加えられたすなわち音韻情報と韻律情報とか
らなる記号列が入力される記号列入力部、２は記号列入
力部１に入力された記号列に基づいてこの記号列が音韻
情報のみからなるものであるか、記号列に韻律情報が含
まれているものであるかを判定する入力モード判定部で
ある。この入力モード判定部２は、記号列が音韻情報の
みからなるものである場合にはこの記号列を自動アクセ
ント型発生モードと判定し自動アクセント型発生部Ａ側
に出力する。また記号列にｍ　１ｍ情報が含まれている
場合にはこの記号列をアクセント型設定モードと判定し
アクセント型設定部Ｂ側に出力する。In the figure, reference numeral 1 indicates a symbol string of phonological information, which is a character string of a word or sentence, or a symbol string representing accent type, pause length, or connection information (hereinafter referred to as "accent type, etc.") is added to this symbol string. A symbol string input section into which a symbol string consisting of phonological information and prosodic information is input; 2 is a symbol string input section that determines whether or not this symbol string consists of only phonological information based on the symbol string input to the symbol string input section 1; This is an input mode determining unit that determines whether a string contains prosodic information. If the symbol string consists only of phonetic information, the input mode determining section 2 determines that the symbol string is in automatic accent type generation mode and outputs it to the automatic accent type generating section A side. If the symbol string contains m1m information, this symbol string is determined to be in accent type setting mode and is output to the accent type setting section B side.

上記自動アクセント型発生部Ａ側には、照合部３とアク
セント型占４とが配置され、記号列に応じた音韻情報お
よび韻律情報が照合部３によりアクセント辞書４から読
み出される。上記照合部３の出力側には、アクセント検
定部６と音韻系列検定部７とが配置されている。アクセ
ント検定部６は、照合部３から韻律情報を入力し検定を
行ない、韻律系列を生成し出力する。音韻系列検定部７
は、音韻情報を入力しこの音韻情報に基づいて音韻の鼻
音化および無声化の有無の検定を行ない、音韻系列を生
成し出力する。On the side of the automatic accent type generating section A, a collating section 3 and an accent type reading section 4 are arranged, and the collating section 3 reads out phonetic information and prosody information corresponding to a symbol string from the accent dictionary 4. On the output side of the matching section 3, an accent testing section 6 and a phoneme sequence testing section 7 are arranged. The accent testing section 6 inputs the prosody information from the matching section 3, performs verification, and generates and outputs a prosody sequence. Phonological series test section 7
inputs phoneme information, tests whether the phoneme is nasalized or devoiced based on this phoneme information, and generates and outputs a phoneme sequence.

一方、上記アクセント型設定部Ｂ側は、韻律情報検定部
５および上記した音韻系列検定部７に接続されている。On the other hand, the accent type setting section B side is connected to the prosodic information testing section 5 and the above-described phoneme sequence testing section 7.

すなわちアクセント型設定部Ｂ側においても音韻情報は
音韻系列検定部７に入力され上記と同様に音韻の鼻音化
および無声化の有無の検定が行なわれ、音韻系列が生成
され出力される。韻律情報検定部５は、入力モード判定
部２から出力される韻律情報から韻律系列を生成し出力
する。That is, on the accent type setting section B side, the phoneme information is also input to the phoneme sequence testing section 7, where the presence or absence of nasalization and devoicing of the phoneme is tested in the same manner as above, and a phoneme sequence is generated and output. The prosody information verification unit 5 generates a prosody sequence from the prosody information output from the input mode determination unit 2 and outputs it.

またこれら自動アクセント型発生部Ａ側とアクセント型
設定部Ｂ側との後段には合成パラメータ生成部８が配置
されている。この合成パラメータ生成部８は、アクセン
ト検定部６および韻律情報検定部５から韻律系列を入力
しこの韻律系列に基づいて韻律パラメータ列を生成する
とともに、音韻系列検定部７から音韻系列を入力しこの
音韻系列に基づいた音素をこの合成パラメータ生成部８
に接続された音声素片ファイル９から取出し、音韻パラ
メータ列を生成する。Further, a synthesis parameter generation section 8 is arranged after the automatic accent type generation section A side and the accent type setting section B side. This synthesis parameter generation section 8 inputs a prosodic sequence from the accent testing section 6 and the prosody information testing section 5 and generates a prosodic parameter string based on this prosodic sequence, and also inputs a phonetic sequence from the phonetic sequence testing section 7 and generates a prosodic parameter string based on this prosodic sequence. This synthesis parameter generation unit 8 generates phonemes based on the phoneme sequence.
, and generate a phoneme parameter string.

さらに合成パラメータ生成部８の出力側には音声合成部
１０が配置され、この音声合成部１０により音韻パラメ
ータ列および韻律パラメータ列とが合成され合成音声が
出力される。Furthermore, a speech synthesis section 10 is arranged on the output side of the synthesis parameter generation section 8, and the speech synthesis section 10 synthesizes the phonetic parameter string and the prosodic parameter string and outputs synthesized speech.

以下、この装置に第２図（Ａ＞に示す「アカイハナヘカ
／、ヘサイテイル」の記号列が入力された場合の動作を
説明する。Hereinafter, the operation when the symbol string "Akaihanaheka/, Hesaitail" shown in FIG. 2 (A>) is input to this device will be explained.

なあ、ド」はアクセント核が前の文字に存在することを
示す。但しド」の前に文字列がない場合には、先頭にこ
の「−を有する単語が平板型のアクセント型でおるとす
る（平板型のアクセント型をＯとする。〉。また「、」
はポーズ長を示し、「／」は単語と単語との間の区切り
すなわち１アクセント句を示す。但し「／／」の場合に
は（■福ｎと単語とを接続Ｉ！ザ切離すことを示す。Hey, C' indicates that the accent nucleus is present in the previous letter. However, if there is no character string before "Do", the word with "-" at the beginning is assumed to have a flat accent type (the flat accent type is O).
indicates the pause length, and "/" indicates a break between words, that is, one accent phrase. However, in the case of "//" (■) indicates that the word is connected or disconnected from the word.

そして第２図（Ａ）に示した「アカイハナヘガ／、〜サ
イテイル」の記号列が記号列入力部１に入力され、この
記号列が入力モード判定部２により文字列のみすなわち
音韻情報のみからなるものであるか文字列の他にアクセ
ント型等の記号すなわち韻律情報が含まれているもので
おるかが判定される。この場合、記号列に韻律情報が含
まれているため、アクセント型設定モードと判定されア
クセント型設定部Ｂ側に出力される。そして音韻系列検
定部７においては音韻系列「アカイハナガ」と「サイテ
イル」とをそれぞれ「ａｋａ　ｉ　ｈａｎａｆａ　Ｊと
１−ｓａｉｔｅｉｒｕＪに変換する一方、韻律情報検定
部５では「アカイハナガ」の単語はアクセント型「５」
、ポーズ長「１０」、接続情報「１」、「サイテイル」
の単語はアクセント型「○」、ポーズ長「Ｏ」、接続情
報「Ｏ」と検定する。上記ポーズ長「１０」は単語と単
語との間のポーズが１００ｍ５、接続情報「１」は単語
と単語とを接続し、接続情報ｒＯＪは単語と単語とを接
続しないか、または後続する単語がない場合を示してい
る。Then, the symbol string "Akaihanahega/,~Saiteiru" shown in FIG. It is determined whether the string contains symbols such as accent type, that is, prosodic information in addition to the character string. In this case, since the symbol string includes prosody information, it is determined that it is the accent type setting mode and is output to the accent type setting section B side. Then, the phonetic sequence testing section 7 converts the phonetic sequences "Akaihanaga" and "Saiteiru" into "aka i hanafa J" and "1-saiteiruJ", respectively, while the prosodic information testing section 5 converts the word "Akaihanaga" into the accent type "5". ”
, pose length “10”, connection information “1”, “Scytail”
The word is evaluated as having an accent type of "○", a pause length of "O", and a connection information of "O". The above pause length "10" indicates that the pause between words is 100m5, connection information "1" connects words, and connection information rOJ indicates that words are not connected or that the following word is It shows the case where there is no.

一方、第２図（Ｂ）に示すように「カブシキガイシャト
ウシ×」の記号列が記号列入力部１に入力された場合、
この記号列には「−」、「、」、「／」の記号すなわら
韻律情報が含まれていないため、入力モード判定部２に
より文字列のみすなわち音韻情報のみからなるものであ
ると判定される。すなわちこの記号列を自動アクセント
型発生モードと判定し自動アクセント型発生部Ａ側に出
力する。そして自動アクセント型発生部Ａにおける照合
部３において、上記「カブシキガイシャトウシ×」の記
号列とアクセント辞書４の内容とが照合される。このア
クセント辞書４は、例えば第３図に示すような内容とさ
れている。すなわちこの場合、「カブシキガイシャ」の
単語および「トウシ×」の単語にそれぞれ対応する音韻
情報および韻律情報が読み出される。そして音韻情報は
音韻系列検定部７により、第２図（Ｂ）に示すように、
「カブシキガイシャ」および「トウシ×」からそれぞれ
［ｋａｂｕｓｖｉｋｉｆａｉｓｊａ　Ｊおよび［ｔｏｓ
ｉＸｌの音韻系列が得られる。一方、韻律情報はアクセ
ント検定部６に入力され、第２図（Ｂ）に示すように、
「カブシキガイシャ」の単語は、アクセント型「５」、
ポーズ長ｒＯＪおよび接続情報「１」、「トウシ×」の
単語は、アクセント型「０」、ポーズ長「０」および接
続情報「○」と検定される。On the other hand, when the symbol string "Kabushikigaishatoshi×" is input to the symbol string input section 1 as shown in FIG. 2(B),
Since this symbol string does not include the symbols "-", ",", and "/", that is, prosodic information, the input mode determining unit 2 determines that it consists only of a character string, that is, only phonological information. be done. That is, this symbol string is determined to be in automatic accent type generation mode and is output to the automatic accent type generation section A side. Then, in the collation unit 3 in the automatic accent type generation unit A, the symbol string of “Kabushikigaishatoshi ×” is collated with the contents of the accent dictionary 4. This accent dictionary 4 has contents as shown in FIG. 3, for example. That is, in this case, phonological information and prosody information corresponding to the word "Kabushikigaisha" and the word "Toushi ×" are read out. The phoneme information is then processed by the phoneme sequence testing section 7, as shown in FIG. 2(B).
[kabusvikifaisja J and [tos
A phoneme sequence of iXl is obtained. On the other hand, the prosody information is input to the accent test section 6, and as shown in FIG. 2(B),
The word "Kabushiki Gaisha" has an accent "5",
A word with pause length rOJ and connection information "1" and "Toshi ×" is tested as accent type "0", pause length "0", and connection information "○".

以上のようにして１７られた音韻系列および韻律系列は
、合成パラメータ生成部８に入力され、音韻パラメータ
列および韻律パラメータ列に変換され、音声合成部１０
から合成音声が出力される。The phoneme sequence and prosodic sequence generated as described above are input to the synthesis parameter generation section 8, where they are converted into a phoneme parameter sequence and a prosodic parameter sequence, and the speech synthesis section 10
Synthesized speech is output from.

しかしてこの実施例によれば、アクセント辞書４に登録
されていない単語や文章であっても、入力モード判定部
２の切換によりアクセント型設定モードと判定しアクセ
ント型設定部Ｂ側に出力させ、このアクセント型設定部
Ｂにより韻律情報を生成している。このため、韻律的特
徴が付加されていない合成音声が出力されることがなく
なり、誤った意味に認識されることはなくなる。However, according to this embodiment, even if the word or sentence is not registered in the accent dictionary 4, it is determined to be in the accent type setting mode by switching the input mode determining section 2 and output to the accent type setting section B side, This accent type setting unit B generates prosody information. Therefore, synthesized speech to which no prosodic features are added will not be output, and the speech will not be recognized as having an incorrect meaning.

［発明の効果１以上説明したように本発明の音声合成装置によれば、全
ての単語や文章について容易に韻律的特徴が付加され、
自然音声にほぼ類似し了解度の高い合成音声を得ること
ができる。[Effect of the invention 1 As explained above, according to the speech synthesis device of the present invention, prosodic features can be easily added to all words and sentences,
It is possible to obtain synthesized speech that is almost similar to natural speech and has high intelligibility.

[Brief explanation of drawings]

第１図は本発明の一実施例の音声合成装置を示す構成図
、第２図はこの実施例における入力記号列に対する韻律
情報と音韻系列を示す図、第３図はアクセント辞書の内
容を示す図である。２・・・・・・・・・入力モード判定部３・・・・・・
・・・照合部Fig. 1 is a block diagram showing a speech synthesis device according to an embodiment of the present invention, Fig. 2 is a diagram showing prosodic information and phoneme sequences for input symbol strings in this embodiment, and Fig. 3 shows the contents of an accent dictionary. It is a diagram. 2... Input mode determination section 3...
...Verification section

Claims

[Claims]

(1) An input means into which a symbol string of phonological information to which phonological information or prosodic information is attached is input, and whether the symbol string input to this input means consists only of phonological information or whether the phonological information has prosodic information. a storage means for storing a large number of prosodic information; and a storage means for storing a large number of prosodic information; A speech synthesis device comprising at least selection means for selecting prosodic information stored in the storage means.