JPH02205896A

JPH02205896A - Voice synthesizing device

Info

Publication number: JPH02205896A
Application number: JP1025146A
Authority: JP
Inventors: Shigetoshi Saito; 成利斉藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-02-03
Filing date: 1989-02-03
Publication date: 1990-08-15

Abstract

PURPOSE:To generate a synthesized voice with a tone quality fit for the purpose of use by providing a voice element piece file for proper noun and a voice element piece file for sentence and selectively switching and using them in accordance with contents of voice synthesis. CONSTITUTION:Phonetic information obtained in a character information analyzing part 1 is inputted to a voice parameter generating part 2, and a voice element piece file 3 for proper noun is referred to generate a voice parameter corresponding to inputted phonetic information at the time of synthesizing a voice of only a proper noun, and a voice element piece file 4 for sentence is referred to generate a voice parameter corresponding to inputted phonetic information at the time of synthesizing a voice of a sentence. Metrical information obtained in the character information analyzing part 1 is inputted to a metrical parameter generating part 5, and a metrical parameter corresponding to inputted metrical information is generated. A voice synthesizing part 6 generates and outputs a synthesized voice based on prescribed synthesizing rules in accordance with inputted voice parameters and metrical parameters. Thus, the synthesized voice is generated with a tone quality fit for the purpose of use.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、規則合成法により入力される任意の文字情報
から音声を生成する音声合成装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Field of Industrial Application) The present invention relates to a speech synthesis device that generates speech from arbitrary character information input by a rule synthesis method.

（従来の技術）従来から様々な音声合成の手法が提唱されているが、そ
の技術の１つに音声規則合成法がある。(Prior Art) Various speech synthesis techniques have been proposed in the past, and one of the techniques is a speech rule synthesis method.

規則合成法は、人力される任意の文字情報を解析するこ
とにより、その音韻情報および韻律情報をそれぞれ求め
、あらかじめ定められた規則に基づいて、前記入力文字
情報に対応する合成音声を出力するものである。このよ
うな規則合成法によれば、任意の単語やフレーズの合成
音声を容易に生成することができる。The rule synthesis method analyzes arbitrary text input manually to obtain its phonological and prosodic information, and outputs synthesized speech corresponding to the input text information based on predetermined rules. It is. According to such a rule synthesis method, synthesized speech of any word or phrase can be easily generated.

ところで、規則合成音は、たとえば姓名あるいは企業名
など、固有名詞だけを音声出力する場合では、その内容
がはっきりとわかる明瞭度の高いものが望ましい。しか
し、たとえば天気予報文やメツセージを音声出力する場
合には、明瞭度の高いことより、むしろ滑らかで、たど
たどしくない合成音が望まれている。By the way, when outputting only proper nouns such as first and last names or company names, it is desirable that the rule-synthesized speech has high clarity so that the content can be clearly understood. However, when outputting a weather forecast or a message, for example, it is desired that the synthesized sound be smooth and not stuttered, rather than high intelligibility.

このように、たとえば固有名詞だけを音声出力する場合
と、文章を音声出力する場合とでは、要求される音質が
違うにもかかわらず、出力可能な合成音はそのどちらか
であり、その両方を選択的に出力することはできなかっ
た。In this way, for example, even though the required sound quality is different when outputting only proper nouns and when outputting sentences, the synthesized sound that can be output is either one of them, and it is not possible to output both. It was not possible to output selectively.

（発明が解決しようとする課題）本発明は、上記したように例えば固有名詞だけを音声出
力する場合と、文章を音声出力する場合とでは要求され
る音質が違うにもかかわらず、出力可能な合成音はその
どちらかであり、その両方を選択的に出力することはで
きないという問題点を解決すべくなされたもので、たと
えば固有名詞だけを音声出力するのか文章を音声出力す
るのかにより音質を変えることができる音声合成装置を
提供することを目的とする。(Problems to be Solved by the Invention) As described above, the present invention is capable of outputting even though the required sound quality is different depending on, for example, when outputting only proper nouns and when outputting sentences as voice. This was done to solve the problem that synthesized sounds can be either one of these, and it is not possible to selectively output both.For example, the sound quality can be adjusted depending on whether only proper nouns or sentences are output. The purpose of the present invention is to provide a speech synthesizer that can be changed.

［発明の構成］（課題を解決するための手段）本発明は、入力される文字情報を解析することにより音
韻情報および韻律情報をそれぞれ求め、音声素片パラメ
ータが格納されている音声素片ファイルを参照すること
により、前記求めた音韻情報に対応した音声パラメータ
および前記求めた韻律情報に対応した韻律パラメータを
それぞれ生成し、これら生成した音声パラメータおよび
韻律パラメータにしたがって音声を規則合成するように
構成された音声合成装置であって、前記音声素片ファイ
ルを複数備え、合成出力する音声の内容に応じて音声パ
ラメータの生成のために参照する音声素片ファイルを選
択的に切換えて使用するようにしたことを特徴としてい
る。[Structure of the Invention] (Means for Solving the Problems) The present invention obtains phonological information and prosody information by analyzing input character information, and generates a speech segment file in which speech segment parameters are stored. is configured to generate speech parameters corresponding to the obtained phonetic information and prosodic parameters corresponding to the obtained prosodic information, respectively, and to synthesize speech according to the generated speech parameters and prosodic parameters. The speech synthesis device is equipped with a plurality of the speech segment files, and is configured to selectively switch and use speech segment files to be referred to for generating speech parameters according to the content of speech to be synthesized and output. It is characterized by what it did.

（作　用）たとえば固有名詞用の音声素片ファイルと文章用の音声
素片ファイルの両方を備え、音声合成する内容によって
選択的に切換えて使用することにより、合成出力する音
声が例えば固有名詞だけであるか文章であるかによって
、音声パラメータの生成のために参照する音声素片ファ
イルを切換えることができる。したがって、たとえば固
有名詞だけの場合には明瞭性の高い合成音が得られ、文
章の場合には滑らかで自然性の高い合成音が得られ、そ
れぞれの使用目的にあった音質の合成音が生成できる。(Function) For example, by having both speech segment files for proper nouns and speech segment files for sentences, and selectively switching and using them depending on the content to be synthesized, the synthesized speech can only be produced by, for example, proper nouns. The speech segment file to be referred to for generating speech parameters can be switched depending on whether the speech segment file is a text or a text. Therefore, for example, in the case of only proper nouns, a synthesized voice with high clarity can be obtained, and in the case of sentences, a synthesized voice with a smooth and highly natural sound can be obtained, and a synthesized voice with a sound quality suitable for each purpose of use can be generated. can.

（実施例）以下、本発明の一実施例について図面を参照して説明す
る。(Example) Hereinafter, an example of the present invention will be described with reference to the drawings.

第１図において、１は文字情報解析部で、入力される文
字情報を解析してその音韻情報および韻律情報をそれぞ
れ求める。文字情報解析部１において求められた音韻情
報は、音声パラメータ生成部２に入力される。音声パラ
メータ生成部２は、固有名詞だけを音声合成する場合に
は、固有名詞用の音声素片ファイル３を参照することに
より、入力される音韻情報に対応した音声パラメータを
生成し、文章を音声合成する場合には、文章用の音声素
片ファイル４を参照することにより、入力される音韻情
報に対応した音声パラメータを生成する。In FIG. 1, reference numeral 1 denotes a character information analysis unit that analyzes input character information to obtain its phonological information and prosodic information. The phonetic information obtained by the character information analysis section 1 is input to the speech parameter generation section 2. When performing speech synthesis of only proper nouns, the speech parameter generation unit 2 generates speech parameters corresponding to the input phonetic information by referring to the speech segment file 3 for proper nouns, and converts the sentence into speech. In the case of synthesis, speech parameters corresponding to the input phonetic information are generated by referring to the speech segment file 4 for sentences.

一方、文字情報解析部１において求められた韻律情級は
、韻律パラメータ生成部５に入力される。On the other hand, the prosodic level determined by the character information analysis section 1 is input to the prosodic parameter generation section 5.

韻律パラメータ生成部５は、入力される韻律情報に対応
した韻律パラメータを生成する。The prosodic parameter generation unit 5 generates prosodic parameters corresponding to the input prosodic information.

音声パラメータ生成部２で生成された音声パラメータ、
および韻律パラメータ生成部うで生成された韻律パラメ
ータは、それぞれ音声合成部６に入力される。音声合成
部６は、入力される音声パラメータおよび韻律パラメー
タにしたがって、所定の合成規則に基づいて合成音声を
生成して出力する。Audio parameters generated by the audio parameter generation unit 2,
The prosodic parameters generated by the prosodic parameter generator and the prosodic parameter generator are respectively input to the speech synthesizer 6. The speech synthesis unit 6 generates and outputs synthesized speech based on predetermined synthesis rules in accordance with input speech parameters and prosody parameters.

なお、固有名詞用の音声素片ファイル３を参照するか文
章用の音声素片ファイル４を参照するかは、たとえば最
初に行なわれる本装置の初期化時にソフトウェア的なス
イッチ手段により選択しておく方法と、記号入力（たと
えば記号＄が入力されると固有名詞用の音声素片ファイ
ル３が選択され、記号＃が入力されると文章用の音声素
片ファイル４が選択されるなど）により切換える方法な
どが考えられる。Note that whether to refer to the speech segment file 3 for proper nouns or the speech segment file 4 for sentences is selected by a software switch means, for example, at the time of initialization of this device. Switching is performed depending on the method and symbol input (for example, when the symbol $ is input, the phonetic segment file 3 for proper nouns is selected, and when the symbol # is input, the phonetic segment file 4 for sentences is selected, etc.) There are many possible methods.

ここで、固有名詞だけを音声合成する場合を例にとって
説明すると、たとえばデータベースに入っている住所、
氏名、電話番号を次々に読上げる装置が考えられる。こ
の場合は、明瞭な合成音を出力することが重要となる。Here, to explain the case where only proper nouns are synthesized into speech, for example, an address in a database,
A device that reads names and phone numbers one after another can be considered. In this case, it is important to output a clear synthesized sound.

一方、天気予報文やメツセージなど、文章を合成する場
合には、明瞭であるばかりでなく、滑らかな合成音を生
成することが重要となる。On the other hand, when synthesizing sentences such as weather forecasts and messages, it is important to generate synthesized sounds that are not only clear but also smooth.

そのため、固有名詞だけを音声合成する場合と、文章を
音声合成する場合とで必要とする音質が異なるが、１つ
の装置でその両方の音声素片ファイルを備えることによ
り、そのどちらの用途にも使用することができる。Therefore, the required sound quality is different when synthesizing only proper nouns and when synthesizing sentences, but by providing speech segment files for both in one device, it is possible to perform speech synthesis for both purposes. can be used.

さて、固有名詞レベルと文章レベルの音声素片ファイル
の作成方法の一例について述べる。固有名詞レベルでは
、「じゃ」、「ちや」など拗音の明瞭性も重要であるの
で、音節を孤立発声したものから、その音節の音声素片
ファイルを作成する。Now, an example of a method for creating speech segment files at the proper noun level and sentence level will be described. At the proper noun level, the clarity of syllables such as ``ja'' and ``chiya'' is also important, so a phoneme file of the syllable is created from isolated utterances of the syllable.

また、音声の高域を強調してパラメータ分析する。It also emphasizes the high frequencies of the audio and performs parameter analysis.

一方、文章レベルでは滑らかな合成音を生成することが
重要であることから、２音節、３音節、４音節などの単
語の自然音声から切出した音節を使って音節の音声素片
ファイルを作成する。例えば、／　ｋ　ａ　／なる音節
の素片パラメータを抽出する場合には、この音節を含む
単語の自然音声、例えば「なかま」を発声し、その音韻
／ｎａｋａｍａ／の中から該当部分／　ｋ　ａ　／を切
出してパラメータを抽出し、音声素片ファイルを作成す
る。音声の高域も固有名詞レベルのものほどは強調しな
いで分析する。このように、文章では、連続発声した自
然音声を分析して求められた音声素片ファイルを用いて
音声合成するので滑らかな合成音となる。On the other hand, since it is important to generate smooth synthesized sounds at the sentence level, syllable speech segment files are created using syllables extracted from the natural speech of two-, three-, and four-syllable words. . For example, when extracting the elemental segment parameter of a syllable / ka /, utter the natural sound of a word that includes this syllable, such as "nakama", and extract the corresponding part / ka / from the phoneme /nakama/. , extract the parameters, and create a speech segment file. The high frequencies of speech are also analyzed without emphasizing them as much as those at the proper noun level. In this way, for sentences, speech synthesis is performed using speech segment files obtained by analyzing continuously uttered natural speech, resulting in smooth synthesized speech.

なお、音声素片パラメータとしては、ケプストラムパラ
メータ、ＬＰＧパラメータ、ＰＡＲＣＯＲパラメータ、
ホルマントパラメータなどが考えられる。Note that the speech unit parameters include cepstrum parameters, LPG parameters, PARCOR parameters,
Possible options include formant parameters.

また、前記実施例では、固有名詞では明瞭性の高い合成
音を生成し得る音声素片ファイルを参照し、文章では滑
らかで自然性の高い合成音を生成し得る音声素片ファイ
ルを参照することにより、音声合成するようにしている
が、ユーザの必要により、文章中でも、音声素片ファイ
ルの切換え記号を入力することにより、明瞭な合成音を
出したい部分を選んで、固有名詞用の音声素片ファイル
を参照することができ、その部分だけより明瞭な合成音
にすることができる。Further, in the above embodiment, for proper nouns, a speech segment file that can generate a synthesized sound with high clarity is referred to, and for sentences, a speech segment file that can generate a smooth and highly natural synthesized speech is referred to. However, depending on the user's needs, by inputting the switching symbol of the phoneme file, the user can select the part for which he/she wants to produce a clear synthesized sound, and then create the phoneme for the proper noun. You can refer to a single file and make only that part a clearer synthesized sound.

［発明の効果］以上詳述したように本発明の音声合成装置によれば、た
とえば固有名詞用の音声素片ファイルと文章用の音声素
片ファイルの両方を備え、音声合成する内容によって選
択的に切換えて使用することにより、合成出力する音声
が例えば固有名詞だけであるか文章であるかによって、
音声パラメータの生成のために参照する音声素片ファイ
ルを切換えることができる。したがって、たとえば固有
名詞だけの場合には明瞭性の高い合成音が得られ、文章
の場合には滑らかで自然性の高い合成音が得られ、それ
ぞれの使用目的にあった音質の合成音が生成できる。[Effects of the Invention] As detailed above, according to the speech synthesis device of the present invention, it is provided with both speech segment files for proper nouns and speech segment files for sentences, and can be selectively synthesized depending on the content to be speech synthesized. By switching to , depending on whether the synthesized speech is only proper nouns or sentences,
It is possible to switch the speech segment file referred to for generating speech parameters. Therefore, for example, in the case of only proper nouns, a synthesized voice with high clarity can be obtained, and in the case of sentences, a synthesized voice with a smooth and highly natural sound can be obtained, and a synthesized voice with a sound quality suitable for each purpose of use can be generated. can.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図である。１・・・文字情報解析部、２・・・音声パラメータ生成
部、３・・・固有名詞用音声素片ファイル、４・・・文
章用音声素片ファイル、５・・・韻律パラメータ生成部
、６・・・音声合成部。出願人代理人　弁理士　鈴江武彦FIG. 1 is a block diagram showing one embodiment of the present invention. 1... Character information analysis section, 2... Speech parameter generation section, 3... Speech segment file for proper noun, 4... Speech segment file for sentence, 5... Prosodic parameter generation section, 6...Speech synthesis section. Applicant's agent Patent attorney Takehiko Suzue

Claims

[Claims] Phonological information and prosody information are obtained by analyzing input character information, and the obtained phonological information is obtained by referring to a speech segment file in which speech segment parameters are stored. A speech synthesis device configured to generate corresponding speech parameters and prosodic parameters corresponding to the obtained prosodic information, and to synthesize speech according to the generated speech parameters and prosodic parameters, the speech synthesis device comprising: 1. A speech synthesis device comprising a plurality of segment files and selectively switching and using speech segment files to be referred to for generating speech parameters according to the content of speech to be synthesized and output.