JP2839488B2

JP2839488B2 - Speech synthesizer

Info

Publication number: JP2839488B2
Application number: JP62050476A
Authority: JP
Inventors: 義幸原
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1987-03-05
Filing date: 1987-03-05
Publication date: 1998-12-16
Anticipated expiration: 2013-12-16
Also published as: JPS63216100A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、記号列化されたコードとして与えられる入
力単語情報を簡易な処理によって自然性良く音声合成す
ることのできる音声合成装置に関するものである。（従来の技術）近時、記号列化された入力単語情報の韻律パラメータ
および音韻パラメータを求め、これらの韻律パラメータ
と音韻パラメータとに従って音声を規則合成することが
行われている。このような音声合成は、音声認識処理技
術と相俟って、自然性の高いマンマシンインタフェース
を実現する上での重要な技術となっている。ところでこ
のような音声の規則合成において、自然性の高い合成音
声を得るためには、その単語固有のアクセントを正しく
再現することが重要な課題である。そこで従来の音声合成装置においては、単純語毎にそ
の単純語固有のアクセント型および複数の単純語の組合
せで構成される複合語のアクセント情報をアクセント辞
書として登録しておき、入力をこのアクセント辞書と照
合してその入力に特有なアクセントを求めるようにして
いる。しかしながらこのような従来の音声合成装置では英数
字等の処理を行うことができないという不都合があっ
た。（発明が解決しようとする問題点）このように従来の音声合成装置では英数字等を表す特
殊コードについては処理できないという問題点があっ
た。本発明はこのような問題点に鑑みてなされたものでそ
の目的とするところは、通常の文字コードだけでなく英
数字等を表す特殊コードについても処理を行え、アクセ
ント辞書との照合を効率的に行うことにより従来より了
解度および自然性の高い合成音声を出力できる音声合成
装置を提供することにある。［発明の構成］（問題点を解決するための手段）前記目的を達成するために本発明の音声合成装置は、
それぞれの単語に対応する特殊コードおよび文字コード
からなる記号化された単語情報を入力する入力手段と、
この入力手段により入力された特殊コードを対応する単
語の読みの第１の文字コードに変換するとともに、前記
特殊コードに区切られた前記単語情報の中の前記文字コ
ードを対応する単語の読みの第２の文字コードに変換す
る変換手段と、前記第１の文字コードと前記第２の文字
コードとの間に無音コードデータを挿入する無音コード
データ挿入手段とを具備することを特徴としている。（作用）本発明の音声合成装置によれば、特殊コードと特殊コ
ードに区切られた単語情報の中の文字コードとをそれぞ
れ対応する単語の読みの第１、第２の文字コードにそれ
ぞれ変換し、しかも第１の文字コードと第２の文字コー
ドとの間に、無音コードデータ挿入手段により、無音コ
ードデータが自動的に挿入されるので、第１、第２の各
文字コードを音声に変換した場合には、これらの間に無
音声部分が挿入されることとなり、これにより了解度及
び自然性の高い合成音声を出力することができる。（実施例）以下、図面に基づいて本発明の実施例を詳細に説明す
る。第１図は本発明の一実施例に係る音声構成装置の構成
ブロック図である。同図に示されるように、この音声合成装置は記号列入
力部１、コード変換部２、変換テーブル３、単語照合部
４、アクセント辞書５、複合語アクセント検定部６、韻
律パラメータ生成部７、音韻系列検定部８、音声素片フ
ァイル９、音韻パラメータ生成部10、音声合成部11から
なる。記号列入力部１は特殊コードと文字コードからなる記
号列化された単語情報の入力を行う。コード変換部２は
記号列入力部１から入力される特殊コードと文字コード
を所定の文字コード列に変換するものである。この場
合、一つの特殊コードは一つの文字コード列に変換さ
れ、特殊コード間に挟まれた文字コードは一群として一
つの文字コード列に変換される。さらに、特殊コードか
ら変換された文字コード列と特殊コード間に挟まれた文
字コードから一群として変換された文字コード列との間
に（スペース）を挿入する。変換テーブル３には特殊コードおよび文字コードに対
応する文字コード列が登録されている。第２図はこの変
換テーブル３内の特殊コードとその特殊コードに対応す
る文字コード列とを示したものである。たとえば特殊コ
ードが「28」である場合対応する文字コード列は「カッ
コ」となる。同様に特殊コードが「80」である場合対応
する文字コード列は「カブシキガイシャ」となる。単語照合部４はコード変換部２から得られる文字コー
ド列をアクセント辞書５と照合し各文字コード列に対応
するアクセント情報を得るものである。このとき文字コ
ード列中にで区切られている部分があればそれぞれ別々にアクセン
ト辞書５との照合を行う。単語照合部４で得られたアク
セント情報は複合語アクセント検定部６、韻律パラメー
タ生成部７、音韻系列検定部８に入力される。複合語アクセント検定部６は、単語照合された入力単
語が複合語を構成する場合、その単語の種類および照合
検出されたアクセント情報に従ってその複合語における
アクセントを検定するものである。この検定によって複
数の単純語が複合語を構成する場合に変化するアクセン
トの情報等が得られる。韻律パラメータ生成部７は、このようにしてアクセン
ト検定された文字コード系列の韻律パラメータを求める
もので、その韻律パラメータ列は音声合成部11に与えら
れる。また特殊コード変換部２より得られるコードがである場合には、ある一定時間、無音パラメータを音声
合成部11に与える。音韻系列検定部８は単語照合部４により求められる文
字コード系列の各音韻情報から、その音韻が鼻音化およ
び無声化するか否かの検定を行っている。そしてその検
定結果に従って文字コード系列に対応した音韻系列を生
成し、これを音韻パラメータ生成部10に与えている。こ
の音韻パラメータ生成部10では音声素片ファイル９を参
照して音韻系列に対応した音韻パラメータ列を生成する
が、特殊コード変換部２から得られるコードがの時は、音声素片ファイル９を参照せず無音を表す音韻
パラメータ列を作成する。音声合成部11は、作成された音韻パラメータ列と前述
した韻律パラメータ列とに従って入力記号列が示す単語
を音声合成する。次に本実施例の動作について説明する。いま記号列入
力部１に第３図（ａ）に示すように「80、C4、B3、BC、
CA、DE、28、CA、D7、CO、DE、29」という記号列が入力
される場合を想定する。この記号列のうち「80」、「2
8」、「29」は特殊コードでありその他のものは通常の
文字コードである。コード変換部２ではこのような記号
列の文字コード列への変換が行われる。第３図はその変
換の様子を示したものである。すなわちコード変換部２
は変換テーブル３を参照し文字コード列への変換を行
う。すなわち「80」は「カブシキガイシャ」に変換され
「C4」、「B3」、「BC」、「CA」、「DE」はそれぞれ
「ト」、「ウ」、「シ」、「ハ」、「゛」に変換され
る。「28」は「カッコ」に変換され「CA」、「D7」、
「CO」、「DE」はそれぞれ「ハ」、「ラ」、「タ」、
「゛」に変換され「29」は「カッコ」に変換される。こ
の場合、「カブシキガイシャ」と「トウシバ」との間に
は（スペース）が挿入される。同様に「トウシバ」と「カ
ッコ」の間にもが挿入される。同図からわかるように特殊コードはそれ
自体が一つの文字コード列に変換される。これに対して
特殊コード間で挟まれた通常の文字コードは変換後に一
群としての文字コード列として取扱われる。すなわち
「トウシバ」は一群としての文字コード列として取扱わ
れる。次に、コード変換部２によって得られた文字コード列
は単語照合部４に入力されここでアクセント辞書５との
照合が行われてアクセント情報が求められる。このアク
セント辞書との照合はで区切られた単語毎に行われる。すなわち「カブシキガ
イシャ」という文字列とアクセント辞書５との照合が行
われ所定のアクセント情報が得られる。次に「トウシ
バ」という文字列とアクセント辞書５との照合が行われ
所定のアクセント情報が得られる。このようにで区切られた単語毎にアクセントの照合処理が行われる
のでたとえば「トウシバ」なる文字コード列がアクセン
ト辞書５に登録されていない場合でも他の単語のアクセ
ント検定処理には影響が与えられることはない。複合語アクセント検定部６は文字コード列およびアク
セント情報に従って複合語のアクセントを検定し複数の
単純語が複合語を構成する場合のアクセントの情報を与
える。以下このようにして得られた文字コード列および
アクセント情報をもとに韻律パラメータ生成部７、音韻
系列検定部８、音声素片ファイル９、音韻パラメータ生
成部10により韻律パラメータおよび音韻パラメータが生
成される。この場合文字コード列にが含まれているためそのに隣接している単語間にポーズ（たとえば250msの無音
区間）を表すパラメータ列も生成される。これらのパラ
メータは音声合成部11に入力され合成音声が出力され
る。このように本実施例では入力記号列中に特殊コードが
含まれるとき、これを文字コード列に変換し、特殊コー
ドに挟まれた文字コードを一群として文字コード列に変
換し、変換された文字コード列はそれぞれ別々に単語照
合処理を行うので、それらの文字コード列が示す単語固
有のアクセント情報を簡易に、かつ高精度に得ることが
できる。さらに特殊コードと隣接するコード列との間に
ポーズを与えることにより、了解度を高める等の実用上
多大なる効果が奏せられる。なお本発明は上述した実施例にのみ限定されるもので
はない。たとえばポーズを表す記号は以外の記号でもよい。また特殊コードと文字コード列と
の対応についても上述した実施例に限定されるものでな
いことは無論のことである。［発明の効果］以上詳細に説明したように本発明によれば、特殊コー
ドと特殊コードに区切られた文字コードとをそれぞれ対
応する単語の読みの第１、第２の文字コードにそれぞれ
変換し、しかも第１の文字コードと第２の文字コードと
の間に自動的に無音コードデータを挿入して無音声部分
を生成することができ、出力合成音声の了解度及び自然
性を高めることができる。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention provides a speech that can naturally synthesize speech input word information given as a symbolized code by simple processing. The present invention relates to a synthesizer. (Prior Art) Recently, a prosody parameter and a phoneme parameter of input word information converted into a symbol string are obtained, and a speech is rule-synthesized in accordance with the prosody parameter and the phoneme parameter. Such speech synthesis, together with speech recognition processing technology, is an important technology for realizing a highly natural man-machine interface. By the way, in such speech rule synthesis, it is important to correctly reproduce the accent unique to the word in order to obtain a synthesized speech having a high naturalness. Therefore, in a conventional speech synthesizer, for each simple word, the accent information unique to the simple word and the accent information of a compound word composed of a combination of a plurality of simple words are registered as an accent dictionary, and the input is registered in the accent dictionary. And ask for a unique accent to the input. However, such a conventional speech synthesizer has a disadvantage that it cannot perform processing of alphanumeric characters and the like. (Problems to be Solved by the Invention) As described above, there is a problem that a conventional speech synthesizer cannot process a special code representing an alphanumeric character or the like. The present invention has been made in view of such a problem, and an object thereof is to process not only ordinary character codes but also special codes representing alphanumeric characters and the like, so that matching with an accent dictionary can be efficiently performed. The purpose of the present invention is to provide a speech synthesizer capable of outputting a synthesized speech with higher intelligibility and naturalness than before. [Structure of the Invention] (Means for Solving the Problems) In order to achieve the above object, a speech synthesizer according to the present invention comprises:
Input means for inputting symbolized word information comprising a special code and a character code corresponding to each word;
The special code input by the input means is converted into a first character code for reading a corresponding word, and the character code in the word information divided into the special code is converted to a first character code for reading the corresponding word. A second character code, and a silent code data inserting unit for inserting silent code data between the first character code and the second character code. (Operation) According to the speech synthesizer of the present invention, the special code and the character code in the word information divided into the special code are respectively converted into the first and second character codes of the corresponding word reading. In addition, since the silent code data is automatically inserted between the first character code and the second character code by the silent code data inserting means, each of the first and second character codes is converted into voice. In this case, a non-speech part is inserted between them, whereby a synthesized speech with high intelligibility and naturalness can be output. (Example) Hereinafter, an example of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a voice configuration device according to an embodiment of the present invention. As shown in FIG. 1, the speech synthesizer includes a symbol string input unit 1, a code conversion unit 2, a conversion table 3, a word matching unit 4, an accent dictionary 5, a compound word accent test unit 6, a prosodic parameter generation unit 7, It comprises a phoneme sequence test unit 8, a speech unit file 9, a phoneme parameter generation unit 10, and a speech synthesis unit 11. The symbol string input unit 1 inputs word information converted into a symbol string including a special code and a character code. The code converter 2 converts a special code and a character code input from the symbol string input unit 1 into a predetermined character code string. In this case, one special code is converted into one character code string, and the character codes sandwiched between the special codes are converted into one character code string as a group. Furthermore, between the character code string converted from the special code and the character code string converted as a group from the character code sandwiched between the special codes (Space). In the conversion table 3, a character code string corresponding to the special code and the character code is registered. FIG. 2 shows a special code in the conversion table 3 and a character code string corresponding to the special code. For example, when the special code is “28”, the corresponding character code string is “parentheses”. Similarly, when the special code is “80”, the corresponding character code string is “Kabushiki Geisha”. The word collating unit 4 collates the character code string obtained from the code converting unit 2 with the accent dictionary 5 to obtain accent information corresponding to each character code string. At this time, If there is a part delimited by, the comparison with the accent dictionary 5 is performed separately. The accent information obtained by the word matching unit 4 is input to a compound word accent test unit 6, a prosody parameter generation unit 7, and a phoneme sequence test unit 8. The compound word accent tester 6 tests the accent in the compound word according to the type of the word and the accent information detected and collated when the input word whose word has been collated constitutes a compound word. This test provides information on accents that change when a plurality of simple words constitute a compound word. The prosody parameter generation unit 7 obtains a prosody parameter of the character code sequence subjected to the accent test in this manner, and the prosody parameter sequence is provided to the speech synthesis unit 11. Also, the code obtained from the special code conversion unit 2 is In the case of, the silent parameter is given to the speech synthesis unit 11 for a certain period of time. The phoneme sequence test unit 8 tests whether or not the phoneme is nasalized and unvoiced from each phoneme information of the character code sequence obtained by the word matching unit 4. Then, a phoneme sequence corresponding to the character code sequence is generated according to the test result, and this is given to the phoneme parameter generation unit 10. The phoneme parameter generation unit 10 generates a phoneme parameter sequence corresponding to the phoneme sequence with reference to the speech unit file 9, and the code obtained from the special code conversion unit 2 is In the case of, a phoneme parameter string representing silence is created without referring to the speech unit file 9. The speech synthesizer 11 performs speech synthesis of the word indicated by the input symbol string according to the created phoneme parameter string and the above-described prosody parameter string. Next, the operation of this embodiment will be described. Now, as shown in FIG. 3A, "80, C4, B3, BC,
Assume that a symbol string "CA, DE, 28, CA, D7, CO, DE, 29" is input. "80", "2"
"8" and "29" are special codes, and the others are ordinary character codes. The code converter 2 converts such a symbol string into a character code string. FIG. 3 shows the state of the conversion. That is, the code conversion unit 2
Refers to the conversion table 3 and performs conversion into a character code string. That is, "80" is converted to "Kabushiki Geisha", and "C4", "B3", "BC", "CA", and "DE" are "G", "U", "S", "C", and "゛", respectively. Is converted to "28" is converted to "parentheses" and converted to "CA", "D7",
"CO" and "DE" are "ha", "la", "ta",
It is converted to “゛” and “29” is converted to “braces”. In this case, between "Kabushiki Geisha" and "Toshiba" (Space) is inserted. Similarly, between "Toshiba" and "parentheses" Is inserted. As can be seen from the figure, the special code itself is converted into one character code string. In contrast, ordinary character codes sandwiched between special codes are treated as a group of character code strings after conversion. That is, "Toshiba" is treated as a character code string as a group. Next, the character code string obtained by the code conversion unit 2 is input to the word collation unit 4, where it is collated with the accent dictionary 5, and accent information is obtained. Matching with this accent dictionary Is performed for each word separated by. That is, the character string “Kabushiki Geisha” is collated with the accent dictionary 5 to obtain predetermined accent information. Next, the character string "Toshiba" is collated with the accent dictionary 5, and predetermined accent information is obtained. in this way The accent matching process is performed for each word delimited by, so that, for example, even if the character code string "Toshiba" is not registered in the accent dictionary 5, the accent test process for other words is not affected. . The compound word accent tester 6 tests the accent of the compound word according to the character code string and the accent information, and gives information on the accent when a plurality of simple words constitute the compound word. The prosodic parameters and phoneme parameters are generated by the prosodic parameter generation unit 7, the phoneme sequence test unit 8, the speech unit file 9, and the phoneme parameter generation unit 10 based on the character code string and the accent information thus obtained. You. In this case, Because it contains A parameter sequence representing a pause (for example, a silence period of 250 ms) between words adjacent to is also generated. These parameters are input to the voice synthesis unit 11 and a synthesized voice is output. As described above, in the present embodiment, when a special code is included in the input symbol string, the special code is converted into a character code string, and the character codes sandwiched between the special codes are converted into a character code string as a group. Since the code strings are individually subjected to word collation processing, accent information unique to the words indicated by those character code strings can be obtained easily and with high accuracy. Further, by giving a pause between the special code and the adjacent code sequence, a great effect in practical use such as an increase in intelligibility can be obtained. Note that the present invention is not limited to the above-described embodiment. For example, the pose symbol Other symbols may be used. Needless to say, the correspondence between the special code and the character code string is not limited to the above-described embodiment. [Effects of the Invention] As described above in detail, according to the present invention, a special code and a character code divided into special codes are respectively converted into first and second character codes of reading of corresponding words. Moreover, silence code data can be automatically inserted between the first character code and the second character code to generate a silent portion, thereby improving the intelligibility and naturalness of the output synthesized speech. it can.

【図面の簡単な説明】第１図は本発明の一実施例に係る音声合成装置の構成ブ
ロック図、第２図は変換テーブルの構成例を示す図、第
３図は入力記号列とこれを変換処理して求められる文字
コード列の例を示す図である。１……記号列入力部２……コード変換部３……変換テーブル４……単語照合部５……アクセント辞書６……複合語アクセント検定部７……韻律パラメータ生成部８……音韻系列検定部９……音声素片ファイル 10……音韻パラメータ生成部 11……音声合成部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a configuration of a speech synthesizer according to an embodiment of the present invention, FIG. 2 is a diagram showing a configuration example of a conversion table, and FIG. It is a figure showing an example of a character code string obtained by conversion processing. 1 ... symbol string input unit 2 ... code conversion unit 3 ... conversion table 4 ... word collation unit 5 ... accent dictionary 6 ... compound word accent test unit 7 ... prosodic parameter generation unit 8 ... phonological sequence test Unit 9 Voice unit file 10 Phoneme parameter generation unit 11 Voice synthesis unit

Claims

(57) [Claims] Input means for inputting symbolized word information comprising a special code and a character code corresponding to each word, and converting the special code input by the input means into a first character code for reading the corresponding word Conversion means for converting the character code in the word information divided into the special code into a second character code for reading a corresponding word; and the first character code and the second character code And a silence code data insertion unit for inserting silence code data between the speech synthesis device and the speech synthesis device.