JP3638000B2

JP3638000B2 - Audio output device, audio output method, and recording medium therefor

Info

Publication number: JP3638000B2
Application number: JP09184298A
Authority: JP
Inventors: 奈穂子佐藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-04-03
Filing date: 1998-04-03
Publication date: 2005-04-13
Anticipated expiration: 2018-04-03
Also published as: JPH11288292A

Abstract

PROBLEM TO BE SOLVED: To set a pause without placing an unnatural pause in a component by calculating the distance between a relatively receiving clause pair of each clause of a text which is extracted through syntactic analysis and setting a pause insertion position based on the distance. SOLUTION: An inputted text is subjected to morphological analysis in a morpheme analyzing part 21. This morphological analysis processing converts the text into a morpheme candidate string by referring to a morpheme dictionary 22a and a word connection table 22b. Next, the morpheme candidate string undergoes syntactic analysis in a parsing part. Syntactic analysis processing produces a clause candidate string by referring to a syntactic analysis rule 24 of speech part connected information, etc. Next, a relatively received clause that corresponds to each clause is extracted by referring to the rule 24. Then, an accent connection processing part 25 performs accent connection processing by referring to an accent connection rule and further, a pause setting processing part 27 performs pause setting processing by referring to a pause setting rule.

Description

【０００１】
【発明の属する技術分野】
本発明は、電子化された入力文書を音声に変換する際の出力技術に関するものである。
【０００２】
【従来の技術】
音声出力装置の一例としてテキスト音声合成システムが挙げられる。
このシステムは入力されたテキストに対し、形態素辞書などを参照して一定のアルゴリズムにより候補中から選択した最適解に対して、読みを含む音韻を設定する。さらに、一定のルールに従ってアクセント位置、ポーズ位置、それぞれの型や長さを設定し、音声に変換するための制御記号列に変換する。この制御記号列を音声合成器に入力し、入力に応じた音声を出力する。
従来のテキスト音声合成システムにおける読み上げ時のポーズ位置設定には、テキスト中の句読点の位置に設定する他、入力テキストの総モーラ数に基づく方法や、２〜３文節間という局所的な文節係り受け関係とその結合度に基づく方法、アクセント句の句頭、句末の単語の品詞に基づく方法、確率に基づく方法などが提案されている。
【０００３】
例えば、特開平５−６１９１号公報（松下電器産業株式会社）では、時間長付与手段により、各音素の時間長を設定し、時間長累計部により会話全体の総時間長を計算し、ポーズ付与手段でその総時間長に応じてポーズ時間長、ポーズ回数、ポーズ位置を決定する音声合成装置が開示されている。
特開平５−１３４６９１号公報（インターナショナル・ビジネス・マシーンズ・コーポレーション）には、文全体の構造を完全に分析することなく、ローカルな文節間の係り受けの情報と、発声等の制限に基づいてポーズ、イントネーション、アクセントを制御する音声合成方法および装置が開示されている。
特開平６−５９６９５号公報（株式会社エイ・ティ・アール自動車翻訳電話研究所）では、音声規則合成装置において、局所的な句の係り受け関係を用いて文内のポーズを設定するポーズ設定手段が開示されている。
【０００４】
特開平６−１４９２８２号公報（日本電信電話株式会社）には、アクセントの末尾の副詞が程度を表す副詞か否か、及び次のアクセント句の先頭の単語の品詞、数詞に分けて、ポーズ設定条件を設定する合成音声ポーズ設定方法が開示されている。
特開平６−１６１４８５号公報（日本電信電話株式会社）には、様々な品詞に対するモーラ数の統計的な分布から、アクセント句境界のポーズ挿入の判定を行う、又は、品詞の影響を無視した無ポーズ区間の分布を用いることにより、ポーズを挿入する平均化無ポーズ区間を推定する合成音声ポーズ設定方式が開示されている。
【０００５】
特開平６−３４２２９７号公報（ソニー株式会社）では、入力されたデータを構文解析し、その解析結果に基づいて合成される音声に挿入するポーズの位置情報を生成する音声合成装置が開示されている。
特開平８−１２３４５６号公報（ソニー株式会社）には、入力文を形態素解析して、複合語、文節、連用修飾文節、連帯修飾文節を同定し、その同定結果に対し、統計的に求められた日本語の文中に挿入されるポーズ位置を規定するポーズ設定規則を適用して、ポーズ位置を設定する自然言語処理方法および音声合成装置が開示されている。
【０００６】
【発明が解決しようとする課題】
しかしながら、従来技術において、例えば、句読点の位置だけではポーズが足りず、聞き取りにくい出力となってしまう。入力テキストの総モーラ数からポーズ位置を算出する方法や品詞に基づく方法は、文章の構造や意味を加味しないため、不自然な位置にポーズが設定される場合がある。また、局所的な文節間の係り受けの結合度を用いる方法は、入力テキスト中の１文が長い場合、処理の単位が２、３文節であるため、その大まかな構文構造は不明なまま意味的にひとまとまりである句の中に不自然にポーズが設定される可能性がある。確率に基づく方法はサンプルとなる読み上げデータを大量に要し、なおかつ読み上げには個人差があるため、実現が困難である。
そこで、入力テキスト中の文の構造や意味のまとまりが考慮され、かつ、容易に実用化を図れるポーズ設定手段が求められている。
【０００７】
【課題を解決するための手段】
請求項１に記載の発明は、入力されたテキストを音声に変換して読み上げを行う音声出力装置において、入力されたテキストを構文解析する手段、構文解析手段により抽出されたテキストの各文節の係り受け文節対間の距離ｄを算出する手段、距離ｄに基づきポーズ挿入位置及びポーズ長を設定するポーズ設定手段を有し、ポーズ設定手段は、距離ｄ＞１のときに用いる第１のポーズ長設定テーブルと、距離ｄ＝１のときに用いる第１のポーズ長設定テーブルとは異なる第２のポーズ長設定テーブルを有し、前記の二つのテーブルを距離ｄによって切り替える音声出力装置である。
【０００８】
請求項２に記載の発明は、請求項１に記載された音声出力装置において、第１のポーズ長設定テーブルは、構文解析処理により抽出された各文節の係り受け文節対間の距離に比例したポーズ長が設定されている音声出力装置である。
【０００９】
請求項３に記載の発明は、請求項１又は２に記載された音声出力装置において、第２のポーズ長設定テーブルは、文節対の係り受け関係に基づきポーズ長が設定されている音声出力装置である。
【００１０】
請求項４に記載の発明は、請求項３に記載された音声出力装置において、ポーズ長が変更可能とされている音声出力装置である。
【００１１】
請求項５に記載の発明は、入力されたテキストを音声に変換し、読み上げを行う音声出力方法において、入力されたテキストを構文解析してテキストの各文節の係り受け文節対を抽出し、抽出された文節対間の距離ｄを算出し、距離ｄに基づきポーズ挿入位置及びポーズ長を設定し、ポーズ長は、距離ｄ＞１のときに構文解析処理により抽出された各文節の係り受け文節対間の距離ｄに比例したポーズ長で設定し、距離ｄ＝１のときに文節対の係り受け関係に基づいたポーズ長で設定する音声出力方法である。
【００１２】
請求項６に記載の発明は、請求項５に記載された音声出力方法を、コンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体である。
【００１３】
なお、本発明は、請求項１に記載された音声出力装置において、（１）前記構文解析処理が入力文書中の１文単位に行われる音声出力装置及び、（２）ポーズ設定処理が入力文書中の１文単位に行われる音声出力装置、本発明の課題を解決する手段とすることもできる。前記（１）の構成により、１文単位で構文解析を行うことは、読点までの単位の解析や、数文節の解析に比べ、構文解析精度が高まり、ポーズ設定精度の向上に寄与することができる。また、前記（２）の構成により、１文単位でポーズ設定を行うことで、１文全体でバランス良くポーズを設定することができるため、１文の意味が取り易い読み上げが可能となる。
【００１４】
【発明の実施の形態】
以下、図面を参照しながら本発明の構成と実施例を説明する。
図１は、本発明における音声出力装置の構成の一例を示したもので、テキスト入力部１、言語処理部２、韻律処理部３、音響処理部４、音声出力部５、言語データ類６、韻律生成規則７、音素片データ８から成り、テキスト入力部１に入力されたテキスト（一文）は言語処理部において、言語データ類を参照して言語処理されて構文解析される。
図２は、図１に示した言語処理部２の構成を詳細に示したものである。図２に示すように、言語処理部２は更に形態素解析部２１、形態素辞書２２ａ、単語接続表２２ｂ、構文解析部２３、構文解析規則２４、アクセント結合処理部２５、アクセント結合規則２６、ポーズ設定処理部２７、ポーズ設定規則部２８から成っている。
【００１５】
前記言語処理部２において、まず、形態素解析部では入力されたテキストについて、形態素辞書２２ａや単語接続表２２ｂを参照して形態素候補列に変換する。次に、この形態素候補列は次行程の構文解析部２３において品詞連接情報等を参照して文節候補列を作成すると共に、構文解析規則２４を参照してこの文節候補列に基づき、各文節について、それぞれに対応する係り先文節を抽出する。続いてアクセント結合処理部２５ではアクセント結合規則を参照して、アクセント結合処理を行い、更に、ポーズ設定処理部２７ではポーズ設定規則を参照して、ポーズ設定処理を行う。
ポーズ設定処理されたテキストは次の韻律処理部３において韻律生成規則７を参照して韻律が付与され、更に音響処理部４において音素片データ８を参照して音素が付与されて出力部５から音声出力される。
【００１６】
図３は、第１のポーズ長設定テーブルを示し、係り受け文節対間距離ｄとポーズ長の対応テーブルの一例である。図４は、第２のポーズ長設定テーブルを示し、文節対間の係り受け関係とポーズ長の対応テーブルの一例である。
入力文一文における係り受け関係文節対が構文解析処理によって同定されたら、ポーズ設定規則を参照し、規則に従って尤もらしい位置に尤もらしいポーズ長のポーズを設定する。ポーズ設定規則は係り受け文節対パタンとそれに対応するポーズ（長）を記載した係り受け文節対の対応ポーズパタンなどで構成される。予め用意する係り受け文節対の対応ポーズパタンは、辞書、対応テーブル、テンプレートなどの形式で実現することができる。
【００１７】
図５は、本発明におけるポーズ設定処理の一例を流れ図で示したものである。
図５を参照して、本発明におけるポーズ設定処理フローについて説明する。
入力文（テキスト）は図１，図２に示すように、言語処理部に入力されると、前述のように形態素解析部２１で形態素解析が行われかつ、その後構文解析部２３で構文解析が行われ前行程が終了する（Ｓ１０１）。この構文解析により一文中の係り受け文節対が確定したとき（Ｓ１０２）は、その一文中の先頭文節をバッファに格納し（Ｓ１０３）、次に後述するように、その先頭文節の係り先文節との距離ｄを算出する。ここで、一文中の係り受け文節対が確定しないときには再び前行程に戻る（Ｓ１０１）。
【００１８】
次に、先頭文節の係り先文節との距離ｄが１であるか否かを判断し、ｄ＝１であれば、文節間の係り受け関係対ポーズ長対応の第２のポーズ長設定テーブル（図４）を参照して係り元文節（この場合は先頭文節）直後のポーズ長を設定する（Ｓ１０６）。ｄ＝１でないとき、即ち、ｄ＞１のときは、係り受け文節間距離対ポーズ長対応の第１のポーズ長設定テーブル（図３）を参照して、係り元文節直後のポーズ長を設定する（Ｓ１０７）。このようにして先頭文節直後のポーズが設定された後、次文節があるか否かを判断し、あれば次文節をバッファへ格納し（Ｓ１０９）、前記Ｓ１０４の行程からの処理を繰り返す。また、次文節がなければ、その一文についてのポーズ設定処理を終了して次の行程へ進む（Ｓ１１０）。
【００１９】
先頭文節とその先頭文節の係り先文節との距離ｄは、先頭文節からその係先文節が何番目の文節かによるかを算出することによって得られる。例えば、１番目の文節であればｄ＝１、ｎ番目の文節であればｄ＝ｎとして、ｄに比例したポーズ長を予め用意した係り受け文節間距離とポーズ長の第１のポーズ長設定テーブル（図３）を参照してポーズを検出し、ポーズ設定を行う。
【００２０】
その場合、ｄ＝１が連続するとポーズ位置やポーズ長が一定になってしまう場合があるため、前述の処理はｄ＞１の場合に限定し、ｄ＝１の場合には一致した係り受け文節対の係り受け関係パタンとポーズ長の第１のポーズ長設定テーブルとは異なる第２のポーズ長設定テーブル（図４）を参照してポーズ設定を行う。
また、係り受け関係パタンに対するポーズ長を一意に決めず、パタン毎の相対的な値を変更可能に設定しておくことにより、自由度の高いポーズ設定を行うことが可能になる。
【００２１】
（実施例）
以上で説明したポーズ設定処理について、「普通の速度よりは幾分遅いというだけのことだ。」というテキストが入力された場合を例に採って説明する。
入力されたテキストは、形態素解析部２１において形態素解析される。この形態素解析処理では、前記テキストを形態素辞書２２ａや単語接続表２２ｂを参照して形態素候補列に変換する。
形態素候補列は次に構文解析部２３において構文解析される。この構文解析処理においては品詞連接情報等の構文解析規則２４を参照して、以下に示すような文節候補列を生成する。
１（普通，名詞）（の，助詞）
２（速度，名詞）（より，助詞）（は，助詞）
３（幾分，副詞）
４（遅い，形容・終）（と，助詞）
５（いう，動詞・終）（だけ，助詞）（の，助詞）
６（こと，名詞）（だ，助動）（。，句点）
【００２２】
次に、構文解析処理において構文解析規則２４を参照して、以下に示す各文節について対応する係り先文節を抽出する。
【００２３】
【表１】

【００２４】
一文中の係り受け文節対が以上のように確定したら、先頭文節「普通の」の係り先文節「速度よりは」との距離ｄを算出する。本実施例ではｄ＝１なので、図５に示す処理フローに従うとポーズ設定のために文節間の係り受け関係とポーズ長の第２のポーズ長設定テーブル（図４）を参照に行く。ここで、係り受け関係は連体修飾なので、「普通の」の直後のポーズ長はアクセント境界（｜）と設定する。
【００２５】
次に、文節「速度よりは」の係り先文節「遅いと」との距離ｄは、ｄ＝２なので、今度は係り受け文節間距離とポーズ長の第２のポーズ長設定テーブル（図４）を参照する。距離２に対するポーズ長は声建て境界（；）と設定する。
さらに、文節「幾分」の係り先文節「遅い」との距離ｄは、ｄ＝１なので、再び文節間の係り受け関係とポーズ長の第２のポーズ長設定テーブル（図４）を参照する。ここで、係り受け関係は連用修飾なので、「幾分」の直後のポーズは小ポーズ（，）を設定する。
以下、各文節について同様の処理を行い、未処理の文節がなくなったところでポーズ設定処理は一旦終了する。
発音記号列＝フツーノ｜ソ’クドヨリハ；イクブン，オソ’イト，イウダケノ｜コト’ダ．
ポーズ設定処理されたテキストは、最終的に発音記号列に変換され、韻律処理部３、音響処理部４を経て音声として出力される。
なお、上述した音声出力方法を、コンピュータに装着して実行させるために、そのプログラムを記録したコンピュータ読み取り可能な記録媒体で提供することができる。
【００２６】
【発明の効果】
請求項１に対応する効果：構文解析により文法的まとまり抽出することが可能であるため構成要素内に不自然なポーズが入ることなくポーズを設定することができる。
請求項２に対応する効果：構文解析により入力文からすべての係り受け文節対を抽出することができ、その係り受け文節対間の距離を算出することが可能となるため、距離に応じたポーズ長を設定することが可能である。
【００２７】
請求項３に対応する効果：係り受け文節間の距離ｄに比例したポーズ挿入手法でｄ＝１が連続した場合、一定のポーズしか挿入されず、読み上げが単調になる場合があるという弱点に対し、係り受けの関係を加味してポーズ長に変化を付けることが可能となる。
【００２８】
請求項４に対応する効果：係り受けの関係とポーズ長を固定することなく係り受けの種類によって相対的にポーズ長を設定することが可能になるため、読み上げ全体を速くしたりする必要が生じた場合に柔軟に対応することが可能である。
【００２９】
請求項５に対応する効果：構文解析により文法的まとまり抽出をすることが可能であるため構成要素内に不自然なポーズが入ることなくポーズを設定することができる。
【００３０】
請求項６に対応する効果：ポーズ設定処理を行うプログラムを任意のコンピュータにおいて容易に実施することができる。
【図面の簡単な説明】
【図１】本発明の音声出力装置の構成をブロックで示した図である。
【図２】言語処理部の細部の構成をブロックで示した図である。
【図３】ポーズ設定のために参照する文節間距離とポーズ長の第１のポーズ長設定テーブルの一例である。
【図４】ポーズ設定のために参照する文節間の係り受け関係とポーズ長の第２のポーズ長設定テーブルの一例である。
【図５】ポーズ設定処理を説明する流れ図である。
【符号の説明】
１…テキスト入力部、２…言語処理部、３…韻律処理部、４…音響処理部、５…音声出力部、６…言語データ類、７…韻律生成規則、８…音素片データ、２１…形態素解析部、２２ａ…形態素辞書、２２ｂ…単語接続表、２３…構文解析部、２４…構文解析規則、２５…アクセント結合処理部、２６…アクセント結合規則、２７…ポーズ設定処理部、２８…ポーズ設定規則部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an output technique for converting a digitized input document into sound.
[0002]
[Prior art]
An example of the voice output device is a text-to-speech synthesis system.
In this system, phonemes including readings are set for the optimum text selected from candidates by a predetermined algorithm with reference to a morpheme dictionary or the like. Furthermore, according to a certain rule, an accent position, a pose position, each type and length are set, and converted into a control symbol string for conversion into speech. This control symbol string is input to the speech synthesizer, and speech corresponding to the input is output.
In the conventional text-to-speech synthesis system, the pause position at the time of reading is set at the position of the punctuation mark in the text, the method based on the total number of mora of the input text, or the local phrase dependency between two or three phrases. A method based on a relationship and the degree of connection thereof, a method based on the phrase head of an accent phrase, a part of speech of a word at the end of the phrase, a method based on probability, and the like have been proposed.
[0003]
For example, in Japanese Patent Application Laid-Open No. 5-6191 (Matsushita Electric Industrial Co., Ltd.), the time length setting unit sets the time length of each phoneme, the time length accumulating unit calculates the total time length of the entire conversation, and gives a pause. A speech synthesizer is disclosed that determines the pause time length, the number of pauses, and the pause position according to the total length of time.
Japanese Patent Laid-Open No. 5-134691 (International Business Machines Corporation) poses based on the dependency information between local phrases and restrictions on utterances without completely analyzing the structure of the whole sentence. , A speech synthesis method and apparatus for controlling intonation and accent are disclosed.
In Japanese Patent Laid-Open No. 6-59695 (ATR Automotive Translation Telephone Research Laboratories Co., Ltd.), in a speech rule synthesizer, a pose setting means for setting a pose in a sentence using a local phrase dependency relationship. Is disclosed.
[0004]
Japanese Laid-Open Patent Publication No. Hei 6-149282 (Nippon Telegraph and Telephone Corporation) describes whether or not the adverb at the end of the accent is an adverb indicating the degree, and the pose setting is divided into the part of speech and the number of words at the beginning of the next accent phrase. A synthetic voice pose setting method for setting conditions is disclosed.
In Japanese Patent Laid-Open No. 6-161485 (Nippon Telegraph and Telephone Corporation), it is determined whether to insert a pose at an accent phrase boundary from the statistical distribution of the number of mora for various parts of speech or ignore the influence of parts of speech. A synthetic voice pose setting method is disclosed in which an average non-pause section in which a pose is inserted is estimated by using a distribution of pose sections.
[0005]
Japanese Patent Laid-Open No. 6-342297 (Sony Corporation) discloses a speech synthesizer that parses input data and generates pose position information to be inserted into speech synthesized based on the analysis result. Yes.
In Japanese Patent Laid-Open No. 8-123456 (Sony Corporation), a morphological analysis is performed on an input sentence to identify a compound word, a phrase, a combined modification phrase, and a joint modification phrase, and the identification result is statistically obtained. In addition, a natural language processing method and a speech synthesizer for setting a pose position by applying a pose setting rule that defines a pose position inserted in a Japanese sentence are disclosed.
[0006]
[Problems to be solved by the invention]
However, in the prior art, for example, the pose is not sufficient only by the position of the punctuation mark, and the output is difficult to hear. The method for calculating the pose position from the total number of mora in the input text and the method based on the part of speech do not take into consideration the structure and meaning of the sentence, and therefore, the pose may be set at an unnatural position. In addition, the method of using the dependency degree of local dependency between clauses means that if one sentence in the input text is long, the unit of processing is two or three clauses, so the rough syntax structure is unknown. There is a possibility that the pose is set unnaturally in a phrase that is a group. Probability-based methods require a large amount of sampled reading data and are difficult to implement because there are individual differences in reading.
Therefore, there is a need for a pose setting means that takes into account the structure and meaning of sentences in input text and that can be easily put into practical use.
[0007]
[Means for Solving the Problems]
According to one aspect of the present invention, in the audio output device that performs reading and converted to speech input text, means for parsing the input text, each clause of the text extracted by the syntax analyzing means the dependency means for calculating the distance d between clauses pairs, have a pause setting means for setting a pause insertion position and pause length based on the distance d, pause setting means, first used when the distance d> 1 a pause length setting table, the distance d = have different second pause length setting table one first pause length setting table which is used when the, audio output device Ru switched by a distance d the two tables is there.
[0008]
According to a second aspect of the present invention, in the audio output device according to the first aspect , the first pause length setting table is proportional to the distance between the dependent phrase pairs of each phrase extracted by the parsing process. pause length is an audio output device that has been set.
[0009]
The invention described in claim 3 is the audio output device according to

claim

1 or 2, the second pause length setting table clause pairs of dependency audio output device pause length that is set based on the relationship It is.
[0010]
The invention described in claim 4 is the audio output device according to claim 3, an audio output device pause length that is changeable.
[0011]
The invention according to claim 5, converts the input text into speech, the way audio output for performing reading, parses the input text extracts each clause of dependency clauses pairs of text, calculates the distance d between the extracted clause pair, sets the pause insertion position and pause length based on the distance d, the pause length is the distance d> relates each clause extracted by parsing process when 1 This is an audio output method in which a pause length proportional to the distance d between the received phrase pairs is set and a pause length based on the dependency relationship of the phrase pairs is set when the distance d = 1 .
[0012]
A sixth aspect of the present invention is a computer-readable recording medium recording a program for causing a computer to execute the audio output method according to the fifth aspect .
[0013]
The present invention provides a voice output device according to claim 1, (1) the parsing process the audio output device is performed in one sentence units in the input document and, (2) port over's setting process An audio output apparatus that is performed in units of one sentence in an input document can be used as a means for solving the problems of the present invention. With the configuration of (1), performing parsing in units of sentences can improve parsing accuracy and contribute to improvement of pause setting accuracy compared to parsing units and several paragraphs. it can. Also, with the configuration of (2) above, by setting the pose for each sentence, it is possible to set the pose with a good balance for the entire sentence, so that the reading of the meaning of one sentence can be easily made.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
The configuration and examples of the present invention will be described below with reference to the drawings.
FIG. 1 shows an example of the configuration of a voice output device according to the present invention. A text input unit 1, a language processing unit 2, a prosody processing unit 3, an acoustic processing unit 4, a voice output unit 5, language data classes 6, The text (one sentence) which is composed of the prosody generation rule 7 and the phoneme piece data 8 and inputted to the text input unit 1 is subjected to language processing and syntax analysis by referring to the language data in the language processing unit.
FIG. 2 shows the configuration of the language processing unit 2 shown in FIG. 1 in detail. As shown in FIG. 2, the language processing unit 2 further includes a morpheme analysis unit 21, a morpheme dictionary 22a, a word connection table 22b, a syntax analysis unit 23, a syntax analysis rule 24, an accent combination processing unit 25, an accent combination rule 26, and a pose setting. The processing unit 27 and the pause setting rule unit 28 are included.
[0015]
In the language processing unit 2, first, the morpheme analysis unit converts the input text into a morpheme candidate string with reference to the morpheme dictionary 22a and the word connection table 22b. Next, the morpheme candidate string is created by referring to the part-of-speech concatenation information and the like in the syntax analysis unit 23 in the next step, and the phrase candidate string is referenced based on the phrase candidate string with reference to the syntax analysis rule 24. , The related clauses corresponding to each are extracted. Subsequently, the accent combination processing unit 25 refers to the accent combination rule to perform accent combination processing, and the pose setting processing unit 27 refers to the pose setting rule to perform pose setting processing.
The text subjected to the pose setting process is given a prosody by referring to the prosody generation rule 7 in the next prosody processing unit 3, and further a phoneme is given by referring to the phoneme data 8 in the acoustic processing unit 4 from the output unit 5. Sound is output.
[0016]
FIG. 3 shows a first pose length setting table, which is an example of a correspondence table between dependency phrase pair distance d and pose length. FIG. 4 shows a second pose length setting table, which is an example of a correspondence table between dependency relations between phrase pairs and pose lengths.
When the dependency-related clause pair in the input sentence is identified by the parsing process, the pose setting rule is referred to, and a pose with a likely pose length is set at a likely position according to the rule. The pose setting rule includes a dependency phrase pair pattern and a corresponding pose pattern of a dependency phrase pair describing a corresponding pose (length). The corresponding pause pattern of the dependency clause pair prepared in advance can be realized in the form of a dictionary, a correspondence table, a template, or the like.
[0017]
FIG. 5 is a flowchart showing an example of the pause setting process in the present invention.
With reference to FIG. 5, the pose setting process flow in the present invention will be described.
As shown in FIGS. 1 and 2, when the input sentence (text) is input to the language processing unit, the morpheme analysis is performed by the morpheme analysis unit 21 as described above, and the syntax analysis unit 23 then performs the syntax analysis. The previous process is completed (S101). When the dependency clause pair in one sentence is determined by this parsing (S102), the first clause in the one sentence is stored in the buffer (S103), and as described later, the dependency clause of the first clause and The distance d is calculated. Here, when the dependency phrase pair in one sentence is not fixed, the process returns to the previous process again (S101).
[0018]
Next, it is determined whether or not the distance d between the first clause and the destination clause is 1, and if d = 1, the second pose length setting table corresponding to the dependency relationship between phrases versus the pose length ( Referring to FIG. 4), the pause length immediately after the original phrase (in this case, the first phrase) is set (S106) . When d = 1, that is, when d> 1, refer to the first pose length setting table (FIG. 3) corresponding to the distance between the dependency clauses and the pose length, and set the pose length immediately after the dependency clause. (S107). After the pause immediately after the first phrase is set in this way, it is determined whether or not there is a next phrase. If there is a next phrase, the next phrase is stored in the buffer (S109), and the processes from step S104 are repeated. If there is no next phrase, the pause setting process for that sentence is terminated and the process proceeds to the next step (S110).
[0019]
The distance d between the head clause and the destination clause of the head clause is obtained by calculating from the head clause how many clauses the destination clause is. For example, first, if clause as d = 1, if the n-th clause d = n, the first pause length setting proportionate pause length previously prepared dependency receiving clauses distance and pause length d The pose is detected with reference to the table (FIG. 3), and the pose is set.
[0020]
In that case, if d = 1 continues, the pause position and pause length may become constant. Therefore, the above-described processing is limited to d> 1, and when d = 1, the matching dependency clause is matched. The pose setting is performed with reference to a second pose length setting table (FIG. 4) different from the pair dependency pattern and the first pose length setting table of the pose length .
In addition, it is possible to set a pose with a high degree of freedom by uniquely setting the pose length for the dependency relationship pattern and setting the relative value for each pattern to be changeable.
[0021]
(Example)
The pose setting process described above will be described by taking as an example the case where the text “It is only somewhat slower than normal speed” is input.
The input text is morphologically analyzed by the morphological analyzer 21. In this morpheme analysis process, the text is converted into a morpheme candidate string with reference to the morpheme dictionary 22a and the word connection table 22b.
The morpheme candidate string is then parsed by the parsing unit 23. In this parsing process, a phrase candidate string as shown below is generated with reference to the parsing rules 24 such as part-of-speech concatenation information.
1 (ordinary, noun) (no particle)
2 (speed, noun) (more, particle) (ha, particle)
3 (somewhat adverb)
4 (slow, adjective, end) (and particle)
5 (say, verb, end) (only, particle) (no particle)
6 (that, noun) (da, help) (., Punctuation)
[0022]
Next, with reference to the syntax analysis rule 24 in the syntax analysis process, a corresponding clause is extracted for each of the following clauses.
[0023]
[Table 1]

[0024]
When the dependency clause pair in one sentence is determined as described above, the distance d between the first clause “ordinary” and the destination clause “rather than speed” is calculated. Since d = 1 in the present embodiment, according to the processing flow shown in FIG. 5, a second pause length setting table (FIG. 4) of the dependency relation between phrases and pause length is referred to for setting pauses . Here, since the dependency relationship is a linkage modification, the pose length immediately after “ordinary” is set as an accent boundary (|).
[0025]
Next, since the distance d between the clause “more than speed” and the destination clause “when slow” is d = 2, this time, the second pause length setting table of the dependency clause distance and pause length (FIG. 4). Refer to The pose length for distance 2 is set as a voiced boundary (;).
Further, since the distance d between the phrase “somewhat” and the dependency phrase “slow” is d = 1, the dependency relationship between phrases and the second pose length setting table (FIG. 4) are referred to again. . Here, since the dependency relationship is a continuous modification, a small pose (,) is set as the pose immediately after “somewhat”.
Thereafter, the same processing is performed for each phrase, and the pause setting process is temporarily terminated when there is no unprocessed phrase.
Phonetic symbol string = Futuno | So's Kudyoriha; Ikubun, Oso'ito, Iudakeno | Koto'da.
The text subjected to the pause setting process is finally converted into a phonetic symbol string, and is output as speech through the prosody processing unit 3 and the acoustic processing unit 4.
Note that the above-described audio output method can be provided on a computer-readable recording medium in which the program is recorded in order to be mounted on a computer and executed.
[0026]
【The invention's effect】
Effect corresponding to claim 1: Since it is possible to extract a grammatical group by syntactic analysis, it is possible to set a pose without an unnatural pose in the component.
The effect corresponding to claim 2: All dependency clause pairs can be extracted from the input sentence by syntax analysis, and the distance between the dependency clause pairs can be calculated. It is possible to set the length.
[0027]
Effect corresponding to claim 3: When d = 1 continues in the pose insertion method proportional to the distance d between the dependency clauses, only a fixed pose is inserted and the reading may be monotonous. It is possible to change the pose length in consideration of the dependency relationship.
[0028]
The effect corresponding to claim 4: Since it becomes possible to set the pose length relatively depending on the type of the dependency without fixing the relationship of the dependency and the pose length, it becomes necessary to speed up the entire reading. It is possible to respond flexibly to the case.
[0029]
Effect corresponding to claim 5: Since it is possible to extract grammatical chunks by syntactic analysis, it is possible to set a pose without an unnatural pose in the component.
[0030]
The effect corresponding to claim 6: The program for performing the pause setting process can be easily executed in any computer.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the configuration of an audio output device according to the present invention.
FIG. 2 is a block diagram illustrating a detailed configuration of a language processing unit.
FIG. 3 is an example of a first pose length setting table of inter-sentence distance and pose length referred to for pose setting.
FIG. 4 is an example of a second pose length setting table of dependency relations between phrases referred to for pose setting and a pose length;
FIG. 5 is a flowchart for explaining a pause setting process.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Text input part, 2 ... Language processing part, 3 ... Prosody processing part, 4 ... Acoustic processing part, 5 ... Speech output part, 6 ... Language data, 7 ... Prosody generation rule, 8 ... Phoneme piece data, 21 ... Morphological analysis unit , 22a ... Morphological dictionary, 22b ... Word connection table, 23 ... Syntax analysis unit, 24 ... Syntax analysis rule, 25 ... Accent combination processing unit, 26 ... Accent combination rule, 27 ... Pause setting processing unit, 28 ... Pause setting rule section.

Claims

In a speech output device that converts input text into speech and reads it out, a means for parsing the input text, a distance d between dependent clause pairs of each clause of the text extracted by the syntax analysis means means for calculating, have a pause setting means for setting a pause insertion position and pause length based on the distance d, the pause setting means comprises a first pause length setting table which is used when the distance d> 1, An audio output device having a second pose length setting table different from the first pose length setting table used when the distance d = 1, and switching between the two tables according to the distance d .

The sound outputting apparatus according to claim 1, wherein the first pause length setting table, that has been set pause length which is proportional to the distance between dependency clauses each pair of clauses extracted by the parsing process An audio output device characterized by that.

The sound outputting apparatus according to claim 1 or 2, wherein the second pause length setting table, an audio output device according to claim Rukoto pause length based on the dependency relation of the clauses pair is set.

The sound outputting apparatus according to claim 3, audio output device, wherein Rukoto the pause length is changeable.

In a speech output method for converting input text into speech and reading out, the input text is parsed to extract dependent clause pairs of each clause of the text, and the distance between the extracted clause pairs calculating a d, before setting the Ki距 release pause insertion position and pause length based on the d, the pause length is the distance d> 1 the parsing process dependency clauses each pair of clauses extracted by the time of A speech output method characterized in that a pause length proportional to the distance d is set, and a pause length based on the dependency relationship of the phrase pair is set when the distance d = 1 .

A computer-readable recording medium recording a program for causing a computer to execute the audio output method according to claim 5 .