JP2001343987A

JP2001343987A - Method and device for voice synthesis

Info

Publication number: JP2001343987A
Application number: JP2000162242A
Authority: JP
Inventors: Makoto Hashimoto; 誠橋本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2000-05-31
Filing date: 2000-05-31
Publication date: 2001-12-14

Abstract

PROBLEM TO BE SOLVED: To resolve the problem that it is impossible to perform switching between sound prolongation/non-prolongation in accordance with the case that the same word should be spoken by sound prolongation as normal speaking or should be clearly articulated when using conventionally known sound prolongation processing in text voice conversion which generates a synthesized voice from a text. SOLUTION: A sound prolongation setting means is provided, and information of sound prolongation candidate positions where sounds may be prolonged and language information are used to switch sound prolongation so that sounds may not be prolonged at the time of clear articulation and may be prolonged at the time of normal speaking.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声素片を接続す
ることによって、入力されたテキスト情報に対する合成
音声を生成する音声合成方法、およびその装置に関す
る。具体的には、長音化する可能性のある特定の音声の
位置に対して、長音の音声素片を採用するか否かの設定
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizing method for generating a synthesized speech for input text information by connecting speech units, and an apparatus therefor. More specifically, the present invention relates to a setting as to whether or not a long speech unit is to be used for a specific voice position that may be prolonged.

【０００２】[0002]

【従来の技術】近年、漢字かな混じり文章のテキストを
入力すると、その文章に対する合成音声を生成する音声
合成方法、あるいはその装置が開発されており、その合
成音声の品質を向上させる試みが続けられている。2. Description of the Related Art In recent years, a speech synthesizing method or apparatus has been developed in which, when text of a sentence mixed with Kanji or Kana is input, a synthesized speech for the sentence or an apparatus therefor has been developed, and attempts to improve the quality of the synthesized speech have been continued. ing.

【０００３】このような合成音声の品質を左右する重要
な要因の１つに、長音の処理を如何に行うかという点が
ある。One of the important factors influencing the quality of such synthesized speech is how to process long sounds.

【０００４】たとえば、従来、文章の読み情報として与
えられる平仮名あるいは片仮名のテキスト情報に基づ
き、各テキスト毎に対応する音声素片を接続する音声合
成方法においては、長音の省略が可能か否かの情報に基
づいて、合成音声の品質の向上を図る提案がされてい
る。具体的には、単語「コンピューター」の発声に関
し、単語の末尾の長音「ー」については、省略が可能で
あるとする情報が付与される。一方、この末尾の前方に
位置する長音「ー」には省略不可能であるとする情報が
付与されるのである。結果、合成音声による「コンピュ
ーター」、あるいは「コンピュータ」なる発声は許容さ
れるが、「コンピュター」、あるいは「コンピュタ」な
る発声は禁止される（特開平３−１５６６６３号）。For example, conventionally, in a speech synthesis method for connecting speech units corresponding to each text on the basis of text information of hiragana or katakana given as reading information of a sentence, it is determined whether or not long sounds can be omitted. Proposals have been made to improve the quality of synthesized speech based on information. Specifically, with respect to the utterance of the word "computer", information is added that the long sound "-" at the end of the word can be omitted. On the other hand, the long sound "-" located ahead of the end is provided with information that it cannot be omitted. As a result, the utterance of "computer" or "computer" by the synthesized speech is allowed, but the utterance of "computer" or "computer" is prohibited (Japanese Patent Laid-Open No. 3-156666).

【０００５】日本語の発音における長音の特性上、従来
装置のような長音の付加や省略の処置とは異なり、特定
の音声の位置において、その読み情報に対する音声素片
に代えて、長音の音声素片を採用することが好ましい場
合がある。Due to the characteristics of long sounds in Japanese pronunciation, unlike a conventional device in which a long sound is added or omitted, at the position of a specific sound, a long sound voice is used instead of a speech unit for the reading information. It may be preferable to employ a segment.

【０００６】例えば、単語「経験」の読みは、カタカナ
表記で「ケイケン」となるが、実際には、カタカナ表記
で「ケーケン」と発音する方が音声の品質がよく、聞き
手にとって自然に聞こえるのである。このことは、エ段
音（「エ」、「ケ」、「セ」、・・・）の直後の「イ」
は、長音に変更して発音することができるとする発音規
則による。For example, the reading of the word "experience" is "Kakeken" in katakana notation. In practice, however, the pronunciation of "keken" in katakana notation has better voice quality and sounds natural to the listener. is there. This means that the "i" immediately after the "d"("e","ke","se", ...)
Depends on the pronunciation rule that can be changed to a long sound and pronounced.

【０００７】また、オ段音（「オ」、「コ」、「ソ」、
・・・）の直後の「ウ」を長音に変更することができる
とする発音規則により、たとえば、読みとして「ホウコ
ク」が与えられる単語「報告」が「ホーコク」と発音す
る方が聞き手にとって自然に聞こえるのである。これら
の発音規則としては、ＮＨＫ発行の「発音アクセント辞
典」などに詳しい。[0007] In addition, the step sounds ("o", "ko", "so",
…)), The pronunciation rule that “U” can be changed to a long sound, for example, it is more natural for the listener to pronounce the word “Report” given “Hokukoku” as a reading as “Hokukoku”. It sounds like. These pronunciation rules are detailed in NHK's “pronunciation accent dictionary”.

【０００８】従って、従来の音声合成方法においては、
通常は、発音規則に基づき、特定の読みに対して長音の
音声を割り当てて音声合成を行っていた。Therefore, in the conventional speech synthesis method,
Normally, based on pronunciation rules, a long voice is assigned to a specific reading and speech synthesis is performed.

【０００９】一方、音声合成装置の用途によっては、た
とえば、漢字の読みの学習教材として使用するため、単
語「経験」は読み「ケイケン」のとおり、一音毎に
「ケ」、「イ」、「ケ」、「ン」と聞えるように音声合
成を行うことが必要となる場合がある。On the other hand, depending on the use of the speech synthesizer, for example, the word "experience" is used as a learning material for reading kanji, so that the word "experience" is "K", "I", In some cases, it may be necessary to perform speech synthesis so that the user can hear "" and "".

【００１０】しかしながら、従来の音声合成装置におい
ては、長音への変更が許容された特定の読み情報に対す
る長音化を行うか否かについては、入力された読みの情
報（たとえば、テキスト情報）を解析する過程におい
て、一義的に決定されていた。たとえば、長音への変更
が許容された特定の読み情報が全て長音情報に変更され
る変換テーブル機能を持つ言語辞書を用いることによっ
て、長音への変更が許容された特定の読みの音声は全て
を長音化されることになる。[0010] However, in the conventional speech synthesizer, whether or not to lengthen specific reading information permitted to be changed to a long sound is determined by analyzing input reading information (for example, text information). In the process of doing so, it was decided uniquely. For example, by using a language dictionary having a conversion table function in which all the specific reading information permitted to be changed to the long sound is changed to the long sound information, all of the specific reading sound permitted to be changed to the long sound can be used. It will be lengthened.

【００１１】このように、従来の音声合成装置において
は、特定の読み情報を長音情報に変更する（以後、長音
化と称する）か否かについて、ユーザーが自由に設定す
ることはできなかった。As described above, in the conventional speech synthesizer, the user cannot freely set whether or not to change specific reading information into long sound information (hereinafter, referred to as long sound).

【００１２】一方、同じ長音化が可能な音声（音素、単
語、句などの単位）であっても、その音声が文章中のど
こに存在するかによって、あるいはその言語情報（品
詞、体言であるかどうか、自律語であるかどうか）など
の環境の違いによって、長音化の処理が好ましい場合と
好ましいくない場合がある。即ち、長音可の処理によっ
て、聞き取り易くなる場合も、かえって聞きづらくなる
場合もある。On the other hand, even in the case of speech (units of phonemes, words, phrases, etc.) that can be lengthened in the same way, depending on where the speech exists in the text, or its linguistic information (whether it is a part of speech or a speech) Depending on the environment, such as whether or not the word is an autonomous word), the process of lengthening the sound may or may not be preferable. In other words, the process of allowing long sounds may make it easier to hear or may make it harder to hear.

【００１３】このように、従来の長音化のための処理を
行う場合では、同じ単語であっても、通常の発声のよう
に長音化させたい場合と、読みの一音一音を明確に発音
させたい場合があったとしても、任意に長音化と非長音
化との切り替えを行うことはできなかった。As described above, in the conventional processing for increasing the length of a tone, even if the same word is desired to be lengthened as in the case of ordinary utterance, there is a difference between the case where the same word is desired to be made longer and the pronunciation of each sound of the reading. Even if there is a case where it is desired to do so, it has not been possible to arbitrarily switch between prolonged and non-prolonged.

【００１４】[0014]

【発明が解決しようとする課題】従来の音声合成方法、
あるいはその装置においては、前述の不都合を解消する
ために、言語辞書に長音化に対応した読みと非長音化に
対応した読みの両方を登録することにより、長音化／非
長音化を指定することができるようにする方法も考えら
れる。然し乍ら、この方法では、指定をその都度変更し
ない限り、長音化と非長音化のどちらかに固定されてし
まう不都合があり、自由な設定は行われていなかった。
さらに、この場合は、登録語数が増えるに従って、言語
辞書が膨大になるという問題があった。A conventional speech synthesis method,
Alternatively, in the device, in order to solve the above-mentioned inconvenience, it is necessary to register both the reading corresponding to the prolonged sound and the reading corresponding to the non-prolonged sound in the language dictionary, thereby designating the prolonged / non-prolonged sound. It is also conceivable to make it possible to do so. However, in this method, unless the designation is changed each time, there is an inconvenience that the sound is fixed to either the long sound or the non-long sound, and the setting is not freely performed.
Further, in this case, there is a problem that the language dictionary becomes enormous as the number of registered words increases.

【００１５】また、小型機器などに搭載される音声合成
システムにおいては、入力テキストが漢字かな混じりで
はなく、片仮名テキスト（片仮名コード列）である場合
もある。この場合、見出しや読み、品詞などを登録する
ための言語辞書は省略されることが多いため、長音化と
非長音化との切り替えは困難であった。Further, in a speech synthesis system mounted on a small device or the like, the input text may not be a mixture of kanji and katakana text (a katakana code string). In this case, since a language dictionary for registering headings, readings, parts of speech, and the like is often omitted, it has been difficult to switch between long sound and non-long sound.

【００１６】従って、本発明は、長音化の候補がある語
句などについて、通常の発音のように長音化させたい場
合と、読みのとおり一音一音明確に発音させたい場合と
の両方に適応できる音声合成方法、及びその装置を提供
することを目的とする。Therefore, the present invention is applicable to both cases where it is desired to lengthen a word such as a normal pronunciation for a word or the like having a candidate for lengthening, and for a case where it is desired to pronounce each sound clearly as read. It is an object of the present invention to provide a speech synthesis method and a device therefor.

【００１７】[0017]

【課題を解決するための手段】本発明の音声合成方法
は、音声の読み情報に基づいて、長音とは異なる音声に
ついての読み情報を長音に対応する長音情報に変更する
長音化処理を行うことが可能な長音化候補位置を予め設
定しておき、該長音化候補位置に対して前記長音化処理
を実行するか否かの設定を可能としたものである。これ
によって、長音化の候補に対して、選択的に長音化を行
うことができるので、合成音声の用途や目的などに応じ
た聞き取りの容易な音声合成方法を実現できる。A voice synthesizing method according to the present invention performs a lengthening process for changing reading information of a voice different from a long sound into long sound information corresponding to a long sound based on the reading information of the voice. A possible lengthening candidate position is set in advance, and it is possible to set whether or not to perform the lengthening process on the lengthening candidate position. As a result, it is possible to selectively lengthen the sound of the long-sounding candidate, so that it is possible to realize an easy-to-listen speech synthesis method according to the use and purpose of the synthesized speech.

【００１８】また、本発明の音声合成方法は、入力され
たテキスト情報を解析することによって少なくとも読み
情報を生成する解析処理と、生成された読み情報に基づ
いて、長音とは異なる音声についての読み情報を長音に
対応する長音情報に変更する長音化処理を行うことが可
能な長音化候補位置を検出する長音化候補検出処理と、
検出された各長音化候補位置の読み情報を前記長音情報
に変更するか否かを設定した規則に基づき、所定の音声
の読み情報を前記長音情報に変更する長音化処理と、該
長音化処理された読み情報に基づいて音声素片を編集し
て合成音声を生成する素片編集処理とからなるものであ
る。これによって、長音化を行う規則を設定することに
より、聞き取り易い合成音声を生成することができる。The speech synthesis method according to the present invention further comprises an analysis process for generating at least reading information by analyzing input text information; and a reading process for a voice different from a long sound based on the generated reading information. A long sound candidate detection process for detecting a long sound candidate position capable of performing a long sound process for changing information to long sound information corresponding to a long sound;
A lengthening process for changing predetermined voice reading information to the long sound information based on a rule that sets whether or not the detected reading information of each detected long sound candidate position is changed to the long sound information; And a segment editing process of editing a speech segment based on the read information thus generated to generate a synthesized speech. In this way, by setting rules for lengthening the length of sound, it is possible to generate synthesized speech that is easy to hear.

【００１９】この場合の規則は、文中の長音化候補の位
置情報に基づいており、この位置情報として、長音化候
補が含まれる文の位置、長音化候補が含まれる文節の位
置、長音化候補が含まれる単語の位置、長音化候補が含
まれるモーラの位置、長音化候補が含まれる音節の位
置、長音化候補が含まれるポーズで挟まれた区間の位
置、長音化候補が含まれるアクセント句の位置、あるい
は長音化候補が含まれる呼気段落の位置がある。長音化
候補の存在位置に依存する日本語の言語上の長音化に関
する規則を設定でき、自然な合成音声を生成することが
できる。The rule in this case is based on the position information of the prolongation candidate in the sentence. As the position information, the position of the sentence including the prolongation candidate, the position of the phrase including the prolongation candidate, the prolongation candidate , The position of the mora containing the prolongation candidate, the position of the syllable containing the prolongation candidate, the position of the section between poses containing the prolongation candidate, and the accent phrase containing the prolongation candidate , Or the position of the exhalation paragraph containing the prolongation candidate. It is possible to set rules for prolongation in the Japanese language depending on the position of the prolongation candidate, and to generate natural synthesized speech.

【００２０】また、この場合の規則は、文中の長音化候
補の言語情報、即ち、品詞情報、体言、体言止め、自立
語、あるいは付属語などに基づいている。長音化候補の
言語情報に依存する日本語の言語上の長音化に関する規
則を設定でき、自然な合成音声を生成することができ
る。Further, the rule in this case is based on linguistic information of the prolonged candidate in the sentence, ie, part-of-speech information, body language, body language stop, independent word, attached word, and the like. It is possible to set rules for prolongation in the Japanese language that depend on the linguistic information of the prolongation candidate, and generate a natural synthesized speech.

【００２１】さらに、本発明の音声合成装置は、入力さ
れたテキスト情報を解析することによって少なくとも読
み情報を生成する解析処理部と、生成された読み情報に
基づいて、長音とは異なる音声についての読み情報を長
音に対応する長音情報に変更できる長音化候補位置を検
出する長音化候補検出処理部と、検出された各長音化候
補位置の読み情報を長音に変更するか否かを設定した規
則を備えた長音化設定部と、該長音化設定部に備えられ
た規則により定められた位置の読み情報を前記長音情報
に変更する長音化処理部と、該長音化処理された読み情
報に基づいて、音声素片を編集して合成音声を生成する
音声合成部とからなるものである。従って、長音化を行
う規則が設定でき、聞き取り易い合成音声を生成する音
声合成装置を実現することができる。Further, the speech synthesizing apparatus of the present invention analyzes the input text information to generate at least reading information, and, on the basis of the generated reading information, performs processing on a speech different from a long sound. A long sound candidate detection processing unit that detects a long sound candidate position that can change the reading information to long sound information corresponding to a long sound, and a rule that sets whether or not to change the read information of each detected long sound candidate position to a long sound A long tone setting unit comprising: a long tone processing unit that changes reading information at a position defined by a rule provided in the long tone setting unit to the long tone information; and And a speech synthesizer for editing a speech unit to generate a synthesized speech. Therefore, rules for lengthening the length of sound can be set, and a speech synthesizer that generates synthesized speech that is easy to hear can be realized.

【００２２】[0022]

【発明の実施の形態】本発明の実施の形態を図１乃至図
6に基づいて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG.
Explanation will be made based on 6.

【００２３】図１は本発明の音声合成装置の構成を示す
ブロック図である。同装置においては、漢字かな混じり
文章がテキスト情報1aとして入力されるのに対して、素
片合成方式により合成音声７ａを生成するものであり、
この合成音声７ａの特定箇所における長音化／非長音化
の切り替え処理が可能となっている。FIG. 1 is a block diagram showing the configuration of the speech synthesizer of the present invention. In this device, while a sentence mixed with kanji or kana is input as text information 1a, a synthesized speech 7a is generated by a unit synthesis method.
It is possible to perform a process of switching between prolonged sound and non-prolonged sound at a specific portion of the synthesized speech 7a.

【００２４】まず、入力されたテキスト情報1a（漢字か
な混じり文章）に対して、その単語ごとの「読み」や
「品詞」などを判定し、「アクセント」や「ポーズ」な
どを決定する。このため、解析部１においては、図示し
ていないが、たとえば、一般的な形態素解析のためのシ
ステムに用いられる単語単位の「読み」、「品詞」、
「アクセント」、「ポーズ」の情報を記憶した言語辞書
を備えており、テキスト情報１ａに対しての照合処理に
より、これらの情報が割り当てられることになる。な
お、ここでは、読みの情報のために、音節レベルまでの
検出が行われている。First, with respect to the input text information 1a (a sentence mixed with kanji and kana), "reading" and "part of speech" are determined for each word, and "accent" and "pause" are determined. For this reason, in the analysis unit 1, although not shown, for example, “reading”, “part of speech”,
A language dictionary storing information on "accents" and "pauses" is provided, and these information are assigned by the collation processing on the text information 1a. Here, the detection up to the syllable level is performed for the reading information.

【００２５】長音化候補検出部２は、テキスト解析部１
によって解析された結果に対して長音化する可能性のあ
る箇所を検出するものである。ここでの検出方法として
は、たとえば、図２に示すように長音化候補リストが使
用される。このリストには、日本語の発音規則において
長音化が認められる読みとその条件となる直前の読みの
組み合わせ（２音節、あるいは３音節）が蓄積されてお
り、たとえば、エ段音の直後の「イ」、あるいはオ段音
の直後の「ウ」が長音化の候補として含まれている。The prolonged sound candidate detection unit 2 includes a text analysis unit 1
This is to detect a portion that may be prolonged in the result of the analysis. As a detection method here, for example, a long sound candidate list is used as shown in FIG. This list accumulates combinations (two syllables or three syllables) of readings that are recognized as prolonged in the Japanese pronunciation rules and readings immediately preceding the reading conditions. "I" or "u" immediately after the step sound is included as a candidate for longer sound.

【００２６】このような長音化候補リストを用いる場
合、テキスト解析部１から得られる読み情報、即ち、読
みの文字列（例えば、カタカナテキストの文字列）に対
し、第２音節以降の文字列とリスト内の文字列とが一致
するかどうかを検索し、一致した部分を長音化の可能性
のある箇所、即ち、長音化の候補の位置として検出する
ことができる。When such a long tone candidate list is used, the reading information obtained from the text analysis unit 1, that is, the reading character string (for example, the character string of katakana text) is added to the character string of the second syllable and thereafter. It is possible to search for a match with a character string in the list, and detect the matched portion as a location where there is a possibility of prolongation, that is, a position of a prolongation candidate.

【００２７】また、この長音化候補検出部２において
は、テキスト解析部１に用いられる言語辞書の各単語に
対して、その単語の特定の部分が長音化する可能性があ
るか否かの情報を付与させておき、入力されたテキスト
情報の解析時に、この言語辞書により合致した単語に付
与された長音化の可能性を示す情報を検出することも可
能である。In the prolonged candidate detection section 2, for each word of the language dictionary used in the text analysis section 1, information as to whether or not a specific portion of the word may be prolonged. Can be added, and when the input text information is analyzed, it is possible to detect information indicating the possibility of prolonging the tone added to the word matched by the language dictionary.

【００２８】長音化設定部３には、あらかじめ特定の音
声箇所に対する長音化／非長音化の規則が設定されてい
る。この長音化／非長音化の規則は、図３に示すよう
に、複数の条件により構成されている。この場合、規則
１は、「長音化候補の音節が文節中の第１音節以外で、
且つ、その単語が固有名詞であれば、すべて長音化させ
ない」とするものである。規則２は、「長音化候補の音
節が文節中の最終音節で、且つ、その文節位置が文中の
最終文節ならば、普通名詞について、ずべて長音化させ
ない」とするものである。規則３は「文節中の第１音節
以外の体言止の場合は、すべて長音化させない」とする
ものである。この場合、以上の規則１〜３により、これ
らの規則に該当しない長音化候補の音節には、長音化が
行われる規則が設定されていることと見なすことができ
る。In the long sound setting section 3, rules for long sound / non-long sound for a specific voice portion are set in advance. The rule of lengthening / de-prolonging is made up of a plurality of conditions as shown in FIG. In this case, rule 1 states that “the syllable of the prolongation candidate is other than the first syllable in the phrase,
If the word is a proper noun, all words are not prolonged. " Rule 2 is that "if the syllable of the prolongation candidate is the last syllable in the phrase and the syllable position is the last syllable in the sentence, all the pronouns will not be prolonged." Rule 3 states that, in the case of a non-first syllable other than the first syllable in a phrase, all syllables are not lengthened. In this case, according to the above rules 1 to 3, it can be considered that the rule for performing the prolongation is set for the syllable of the prolongation candidate which does not correspond to these rules.

【００２９】長音化処理部４は、長音化設定部３に設定
された規則に基づいて、長音化する可能性のある箇所に
対する長音化／非長音化を決定する。即ち、規則に従っ
て特定の候補の読み情報について、長音の読み情報
「ー」への変更が行われる。The prolongation processing unit 4 determines the prolongation / non-prolongation of a portion that may be prolonged, based on the rules set in the prolongation setting unit 3. That is, the reading information of the specific candidate is changed to the reading information “−” of the long sound in accordance with the rules.

【００３０】音素・韻律決定部５では、前記長音化処理
部４にて決定された長音化が施された読み情報を用い
て、音素記号列への変換と、テキスト解析部１で決定さ
れたアクセントやポーズなどの情報に基づいた韻律情報
（例えば、基本周波数、パワー、音素継続時間長）の決
定を行う。この音素・韻律決定部５で得られた結果は、
たとえば、図４に示すように、各音素に対して韻律情報
が与えられる。図４において、各音素記号に対応する基
本周波数は音の高さを表す情報、音素継続時間長は音の
長さを表す情報、パワー係数は音声素片にかけるパワー
の重み係数を意味する。パワー係数が１．０の場合は、
音声データベースから選択された音声素片のパワーをそ
のまま利用することを意味する。The phoneme / prosodic determination unit 5 converts the phonetic symbol string into a phoneme symbol string using the read information subjected to the lengthening determined by the lengthening processing unit 4 and the text analysis unit 1 determines the phoneme symbol string. Prosody information (eg, fundamental frequency, power, phoneme duration) based on information such as accents and poses is determined. The result obtained by this phoneme / prosodic determination unit 5 is:
For example, as shown in FIG. 4, prosodic information is given to each phoneme. In FIG. 4, a fundamental frequency corresponding to each phoneme symbol indicates information indicating a pitch of a sound, a phoneme duration indicates information indicating a length of a sound, and a power coefficient indicates a weight coefficient of power applied to a speech unit. If the power coefficient is 1.0,
This means that the power of the speech unit selected from the speech database is used as it is.

【００３１】従って、長音化された音節に関しては、直
前の母音の音素が延長されることになる。たとえば、
「ｅ」の場合、この音素継続時間長１０２ｍｓｅｃが延
長されるのであるが、この延長時間と共に基本周波数の
変動やパワーの変動も制御することができる。なお、こ
れらの情報は音素・韻律決定部５にて一義的に設定する
こともできるが、テキスト解析部１に備えられる言語辞
書において単語毎、あるいは文章中の単語の位置を反映
した設定を行うことも可能である。Therefore, for a syllable that has been lengthened, the phoneme of the immediately preceding vowel is extended. For example,
In the case of “e”, the phoneme duration time 102 msec is extended, and the variation of the fundamental frequency and the power can be controlled together with the extension time. Note that these pieces of information can be uniquely set by the phoneme / prosodic determination unit 5, but a setting reflecting each word or the position of a word in a sentence is made in a language dictionary provided in the text analysis unit 1. It is also possible.

【００３２】音声合成部７では、音素・韻律決定部５か
ら得られる音素記号列に合致するように、音声データベ
ース６から適切な音声素片を選択し、それらを接続する
ことによって、テキスト１ａに対応し、所定の長音化処
理がなされた合成音声７aが生成される。The speech synthesis unit 7 selects appropriate speech units from the speech database 6 so as to match the phoneme symbol string obtained from the phoneme / prosodic determination unit 5 and connects them to form the text 1a. Correspondingly, a synthesized speech 7a that has undergone a predetermined lengthening process is generated.

【００３３】以上においては、文節中における長音化候
補位置に関する規則（規則１、２、３）に基づいて、長
音化／非長音化の設定を行う場合について、例示した
が、本発明の実施例はこれに限られない。即ち、本発明
においては、長音化候補となる音素やそれを備える単語
が含まれる文章の位置、同様に長音化候補が含まれる文
の位置、長音化候補が含まれる文節の位置、長音化候補
が含まれる単語の位置、長音化候補が含まれるモーラの
位置、長音化候補が含まれる音節の位置、長音化候補が
含まれるポーズで挟まれた区間の位置、長音化候補が含
まれるアクセント句の位置、あるいは長音化候補が含ま
れる呼気段落の位置に基づいて、長音化／非長音化の設
定を行うことも可能である。このように音声の単位を木
目細かく細分化した上での長音化の候補の位置を考慮す
るためには、音素、モーラ、ポーズの区間、アクセント
句、吸気段落などに関する情報を備えた言語辞書を備え
ることによって、夫々の位置関係が解析できることにな
る。In the above, the case where the setting of the prolonged / non-prolonged sound is set based on the rule regarding the prolonged sound candidate position in the phrase (rules 1, 2, and 3) has been described as an example. Is not limited to this. That is, in the present invention, the position of a sentence that includes a phoneme that is a candidate for prolongation and a word that includes the same, the position of a sentence that includes the candidate for prolongation, the position of a phrase that includes the candidate for prolongation, , The position of the mora containing the prolongation candidate, the position of the syllable containing the prolongation candidate, the position of the section between poses containing the prolongation candidate, and the accent phrase containing the prolongation candidate , Or the setting of the prolonged / non-prolonged sound based on the position of the exhalation paragraph including the prolonged sound candidate. In order to consider the positions of prolonged candidates after subdividing the unit of speech in this way, a language dictionary with information on phonemes, mora, pause sections, accent phrases, intake paragraphs, etc. By providing, the respective positional relationships can be analyzed.

【００３４】次に、本発明において、特定箇所のみ非長
音化の設定を行った場合の動作例を図５のフローチャー
トを用いて説明する。Next, in the present invention, an example of the operation in the case where the setting of the non-prolonged sound is performed for a specific portion will be described with reference to the flowchart of FIG.

【００３５】まず、ステップ１０で、テキストが入力さ
れたことを検知すると、テキスト解析部１においてステ
ップ１１のテキスト解析に移る。例えば、入力テキスト
が「国道1号線、京橋付近、渋滞です。」であった場
合、ステップ１１では、漢字の読み、アクセント、ポー
ズ情報などが決定される。漢字かな混じり文を入力とす
る場合、テキスト解析のステップにおいて、見出し、見
出し、品詞、読み、音素記号列などの言語的情報を記述
した言語辞書が参照される。First, when it is detected in step 10 that a text has been input, the text analyzer 1 proceeds to text analysis in step 11. For example, if the input text is “Route 1 near Kyobashi, traffic congestion.” In step 11, kanji reading, accent, pose information, and the like are determined. When a kanji-kana sentence is input, a language dictionary describing linguistic information such as headings, headlines, parts of speech, readings, and phoneme symbol strings is referred to in the text analysis step.

【００３６】ステップ１２にて、ステップ１１による解
析の出力に対して、長音化候補検出部２にて長音化候補
リスト（図２）を参照することによって、長音化の可能
性のある部分があるか否かを検索する。ここで、長音化
の候補が存在する場合は、ステップ１３に進み、長音化
設定部３の長音化／非長音化規則（図３）を参照して、
長音化／非長音化の設定が行われているか否かを判定
し、この設定がされていればステップ１５に進む。この
ステップ１３にて、長音化の設定が行われていなけれ
ば、ステップ１４に進み、長音化処理部４にて通常の長
音化処理を行いステップ１７に進む。図３の場合、規則
１、２、３のいずれにも該当しないものは、通常の長音
化が行われることになる。In step 12, the output of the analysis in step 11 is referred to by the prolonged sound candidate list (FIG. 2) by the prolonged sound candidate detecting section 2, so that there is a portion that may be prolonged. Search for or not. Here, if there is a prolongation candidate, the process proceeds to step 13, and referring to the prolongation / non-prolongation rule (FIG. 3) of the prolongation setting section 3.
It is determined whether or not the setting of the lengthening / deprolonging is performed. If the setting is performed, the process proceeds to step S15. If it is determined in step S13 that the lengthening process has not been performed, the process proceeds to step S14. In the case of FIG. 3, a rule that does not correspond to any of the rules 1, 2, and 3 is subjected to normal lengthening.

【００３７】さて、ステップ１５では、抽出された長音
化候補位置の音素記号に対して、長音化／非長音化の設
定条件との比較を行い、設定条件に合致する長音化候補
位置の音素記号が存在するか否かを抽出する。これが存
在した場合は、ステップ１６において、該当する箇所を
設定内容に従って、長音化処理部４にて長音化または非
長音化の処理を行いステップ１７に進む。存在しなかっ
た場合は、そのままステップ１７に進む。In step 15, the extracted phoneme symbol at the prolonged candidate position is compared with the setting conditions of prolonged / non-prolonged, and the phoneme symbol at the prolonged candidate position that matches the set condition is compared. Extract whether or not exists. If this exists, in step 16, the corresponding portion is subjected to the process of lengthening or non-prolonging the length in the lengthening processing unit 4 in accordance with the set contents, and the process proceeds to step 17. If not, the process proceeds directly to step S17.

【００３８】ステップ１７からステップ２０までは、通
常の素片編集方式の音声合成処理と同じである。まず、
ステップ１７では、以上のステップにより処理された長
音化の部分あるいは非長音化の部分を含む入力テキスト
に対して、音素・韻律決定部５により、音素時系列への
変換による音素情報と、各音素あるいは短時間毎の韻律
情報（主に、基本周波数、パワー、音素継続時間長）を
決定する。ステップ１８では、音声データベース６か
ら、音素情報と韻律情報に基づいて、最適な音声素片を
選択し、ステップ１９にて各音声素片の接続と、韻律情
報に基づいた韻律制御を行って、ステップ２０で合成音
声を生成する。Steps 17 to 20 are the same as the speech synthesis processing of the ordinary unit editing method. First,
In step 17, the phoneme / prosodic determination unit 5 converts the input text including the prolonged portion or the non-prolonged portion processed in the above steps into phoneme information by conversion into a phoneme time series, Alternatively, prosody information for each short time (mainly, fundamental frequency, power, phoneme duration) is determined. In step 18, an optimal speech unit is selected from the speech database 6 based on the phoneme information and the prosody information. In step 19, connection of each speech unit and prosody control based on the prosody information are performed. In step 20, a synthesized speech is generated.

【００３９】これにより、テキスト全体だけでなく、特
定箇所に対する長音化／非長音化の設定を行うことが可
能となり、単語などによって一音一音明確に発音させた
い場合が生じても柔軟に対応できるため、所望の合成音
声を生成することができる。This makes it possible to set the lengthening / deprolongation not only of the entire text but also of a specific portion, and it is possible to flexibly respond to the case where it is desired to pronounce each sound clearly by a word or the like. Therefore, a desired synthesized speech can be generated.

【００４０】次に、本発明を適用した一例として、入力
テキストの内容が「国道１号線、京橋駅付近、渋滞で
す。」であり、長音化設定部３の規則の内容が図６に示
すように「地名のみ長音化させない」とした場合につい
て以下に説明を加える。Next, as an example to which the present invention is applied, the content of the input text is "Route 1 near Kyobashi Station, congestion." The rules of the long sound setting section 3 are as shown in FIG. In the following, a description will be given of a case where “only the place name is not lengthened”.

【００４１】まず、テキスト解析部１によって、漢字の
読みやアクセント、ポーズなどの情報が決定される。ア
クセント情報を上向き矢印「↑」、ポーズ情報を三角印
「△」で現すと、上記のテキストの場合、「こ↑くどう
いちごうせん△きょ↑うばしふ↓きん△じゅ↑うたいで
↓す」といった読みを主体とした中間情報が生成され
る。First, the text analysis unit 1 determines information such as kanji reading, accent, and pause. When the accent information is indicated by an upward arrow “↑” and the pose information is indicated by a triangle mark “△”, in the case of the above text, “Kokudo Ichigo Sensou Kyoba Ushibashi ↓ Is generated as the main information.

【００４２】このとき、「京橋」が地名である固有名
詞、「国道」「１号線」「付近」「渋滞」が普通名詞、
「です」が助動詞、などの品詞情報も決定される。At this time, “Kyobashi” is a proper noun that is a place name, “National highway”, “Line 1”, “nearby” and “traffic jam” are ordinary nouns.
Part-of-speech information such as "is" an auxiliary verb is also determined.

【００４３】長音化候補検出部２は、図２の長音化候補
リストを参照するか、または言語辞書に登録された長音
化の可能性を示す情報を参照して「きょうばし」「こく
どう」「いちごうせん」の３単語が長音化する可能性の
ある長音化候補の音素を含んだ長音化対象語として抽出
される。The prolongation candidate detection unit 2 refers to the prolongation candidate list in FIG. 2 or refers to information indicating the possibility of prolongation registered in the language dictionary, and determines whether it is “Kyobashi”, “Kokudo”. The three words "" and "Ichigo Usen" are extracted as long target words including phonemes of long target candidates that may be prolonged.

【００４４】長音化処理部４では、長音化／非長音化設
定内容と長音化対象語の内容を解析し、この場合、地名
である「きょうばし」のみが設定内容に合致すると判断
され、「きょうばし」は長音化されず、「こくどう」と
「いちごうせん」は、それぞれ長音化されて「こくど
ー」「いちごーせん」に変換される。The lengthening processing section 4 analyzes the content of the lengthening / non-prolonging and the content of the word to be lengthened. In this case, it is determined that only the place name "Kyoubashi" matches the setting, "Kyoubashi" is not lengthened, and "Kokudo" and "Ichigo Usen" are lengthened and converted to "Kokudo" and "Ichigo Sen", respectively.

【００４５】従って、この段階で生成された中間情報は
「こ↑くどーいちごーせん△きょ↑うばしふ↓きん△じ
ゅ↑うたいで↓す」となる。この中間情報と品詞情報に
基づき、音素・韻律決定部５において音素情報と韻律情
報（主に、基本周波数、パワー、音素継続時間長）が決
められる。Therefore, the intermediate information generated at this stage is "commercial information". Based on the intermediate information and the part-of-speech information, phoneme information and prosody information (mainly, fundamental frequency, power, and phoneme duration) are determined in the phoneme / prosodic determination unit 5.

【００４６】この後、音声合成部７において、音素・韻
律決定部５から得られる音素情報と韻律情報に基づい
て、音声データベースからの最適音声素片の選択と音声
素片の接続に加えて、韻律制御処理等が行われ、所望の
合成音声が生成される。Thereafter, based on the phoneme information and the prosody information obtained from the phoneme / prosodic determination unit 5, the speech synthesis unit 7 selects the optimum speech unit from the speech database and connects the speech units. Prosody control processing and the like are performed, and a desired synthesized speech is generated.

【００４７】なお本実施例では、音素情報に変換する前
に長音化／非長音化の処理を行っているが、音素情報に
変換された後に行うことも可能である。In the present embodiment, the process of lengthening / de-prolonging the sound is performed before the conversion into the phoneme information. However, the process can be performed after the conversion into the phoneme information.

【００４８】以上においては、読み情報に対応する文中
の長音化候補が含まれる語句に関する言語情報が「地
名」であるという品詞情報に基づいた規則について例示
したが、この言語情報として、品詞情報、体言、体言止
め、自立語、あるいは付属語などであるか否か（言語辞
書に辞書情報として記憶されている）に基づく規則が利
用できる。In the above, the rule based on the part of speech information that the linguistic information on the phrase including the prolonged candidate in the sentence corresponding to the reading information is "place name" has been exemplified. Rules based on whether or not it is a noun, a noun, an independent word, or an adjunct (stored as dictionary information in a language dictionary) can be used.

【００４９】たとえば、体言止の場合、その体言の語を
強調するために、その語が長音化候補を含んでいた場合
であっても、その長音化を禁止して、一音一音明瞭に発
音させることができる。本発明の音声合成装置の用途
（言語の学習教材など）により、読みを明確に発声した
い語の音声を非長音化すること、強調した発声を行いた
い語の音声を非長音化すること、自然で明瞭な発音をし
たい語の音声を長音化することの規則を設定することが
でき、この場合には、その語の品詞情報、その語が体言
であるか否か、あるいはその語が自立語であるか付属語
であるかなどに基づいく規則が可能となる。For example, in the case of speech stop, in order to emphasize the word of the speech, even if the word includes a prolongation candidate, the prolongation of the prolongation is prohibited, and the sound is clearly pronounced one by one. Can be pronounced. Depending on the application of the speech synthesizer of the present invention (such as a language learning teaching material), it is possible to make the speech of a word whose pronunciation is to be clearly uttered non-prolonged, to make the speech of the word whose emphasis is to be uttered non-prolonged, or It is possible to set rules for prolonging the speech of words that you want to pronounce clearly in this case. In this case, the part-of-speech information of the word, whether the word is a noun, or whether the word is an independent word It is possible to make a rule based on whether or not it is an adjunct.

【００５０】また、図2の長音化候補リストは、日本語
の音節に対する候補を示しているが、外来語に対応させ
るためには、さらに候補が追加される可能性がある。Although the prolonged candidate list in FIG. 2 shows candidates for Japanese syllables, there is a possibility that more candidates may be added to correspond to foreign words.

【００５１】また、通常、長音化する可能性が生じる音
素または音節は、文節中の第２音節以降となることが言
語学上知られているので、これを強制的な規則とし、図
３の設定可能な規則においては、「文節中の第１音節以
外」という部分的な規則の設定は省略できる構成しても
よい。It is generally known from linguistics that a phoneme or a syllable that may be prolonged is the second syllable or later in a syllable. In the rule that can be set, the setting of the partial rule of “other than the first syllable in a phrase” may be omitted.

【００５２】[0052]

【発明の効果】以上の説明から明らかなように、本発明
によれば、音声合成処理の際の柔軟な長音化／非長音化
の処理が可能となり、従来技術であれば必ず長音化され
ていた単語の発音を一音一音明確に発音させたいような
場合への対応が可能となり、所望の合成音声を生成する
ことができる効果を奏する。As is apparent from the above description, according to the present invention, it is possible to flexibly lengthen / de-prolong the sound in the speech synthesis processing. It is possible to cope with a case where it is desired to make the pronunciation of each word clearly pronounced one by one, and it is possible to produce an effect that a desired synthesized speech can be generated.

[Brief description of the drawings]

【図１】実施の形態のブロック図FIG. 1 is a block diagram of an embodiment.

【図２】実施の形態における長音化候補リストの内容を
示す模式図FIG. 2 is a schematic diagram showing the contents of a long tone candidate list according to the embodiment;

【図３】実施の形態における長音化/非長音化規則の内
容を示す模式図FIG. 3 is a schematic diagram showing the content of a rule for lengthening / de-prolonging a tone in an embodiment.

【図４】実施の形態における音素・韻律決定部の出力内
容を示す模式図FIG. 4 is a schematic diagram showing output contents of a phoneme / prosodic determination unit according to the embodiment;

【図５】実施の形態におけるフローチャートを示す図FIG. 5 is a diagram showing a flowchart in the embodiment.

【図６】他の実施の形態における長音化／非長音化規則
を示す図FIG. 6 is a diagram showing a rule of lengthening / de-prolonging in another embodiment.

[Explanation of symbols]

１ａ…テキスト１ …テキスト解析部２ …長音化箇所検出部３ …長音化設定部４ …長音化処理部５ …音素・韻律決定部６ …音声データベース７ …音声合成部７ａ …合成音声 1a Text 1 Text analysis unit 2 Prolonged sound part detection unit 3 Prolongation setting unit 4 Prolongation processing unit 5 Phoneme / prosodic determination unit 6 Voice database 7 Voice synthesis unit 7a Synthesized voice

Claims

[Claims]

1. A long tone candidate position capable of performing a long tone process for changing read information of a voice different from a long tone into long tone information corresponding to a long tone based on the read information of a voice is set in advance. And a speech synthesis method that enables setting of whether or not to execute the lengthening process for the lengthening candidate position.

2. An analysis process for generating at least reading information by analyzing input text information, and converting reading information about a voice different from a long sound into long sound information corresponding to the long sound based on the generated reading information. A long sound candidate detection process for detecting a long sound candidate position capable of performing a long sound process to be changed to, and setting whether or not to change the read information of each detected long sound candidate position to the long sound information. Based on the rules, a lengthening process of changing predetermined voice reading information to the long sound information, a segment editing process of editing a speech unit based on the lengthened reading information to generate a synthesized voice, A speech synthesis method consisting of

3. The rule set in the lengthening process is as follows:
3. The speech synthesis method according to claim 2, wherein the speech synthesis method is based on position information on the prolonged sound candidate position in speech reading information.

4. The position information on the prolonged candidate position includes a sentence position including the prolonged candidate, a sentence position including the prolonged candidate, a phrase position including the prolonged candidate, and a prolonged candidate. The position of the included word, the position of the mora containing the prolonged candidate, the position of the syllable containing the prolonged candidate, the position of the section between the poses containing the prolonged candidate, and the position of the accent phrase containing the prolonged candidate 4. The speech synthesis method according to claim 3, wherein the position is a position or a position of an exhalation paragraph including a prolongation candidate.

5. The rule set in the lengthening process is as follows:
3. The method according to claim 2, wherein the utterance information is based on linguistic information about a phrase including a prolonged candidate in a sentence corresponding to the reading information.
Described speech synthesis method.

6. The speech synthesis method according to claim 5, wherein the linguistic information is part-of-speech information, a noun, a noun, an independent word, or an adjunct word.

7. An analysis processing unit that generates at least reading information by analyzing input text information;
Based on the generated reading information, a long sound candidate detection processing unit that detects a long sound candidate position that can change the reading information about the voice different from the long sound to the long sound information corresponding to the long sound,
A prolonged setting section having a rule setting whether or not to change the detected reading information of each detected prolonged candidate position to a prolonged sound; and a reading information of a position determined by a rule provided in the prolonged setting section. A speech synthesis device comprising: a long-sound processing unit that changes the word into the long-sound information; and a speech synthesis unit that edits a speech unit to generate a synthesized speech based on the long-sound reading information.