JP2746880B2

JP2746880B2 - Compound word division method

Info

Publication number: JP2746880B2
Application number: JP62178434A
Authority: JP
Inventors: 哲也酒寄
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-07-16
Filing date: 1987-07-16
Publication date: 1998-05-06
Anticipated expiration: 2013-05-06
Also published as: JPS6421496A

Description

【発明の詳細な説明】技術分野本発明は、複合語の分割方法、より詳細には、テキス
ト音声合成のアクセント単位抽出方法に関する。従来技術音声合成において自然な韻律を付加するために、アク
セント核の位置やレベルを設定することが不可欠であ
り、このためにはまず、入力テキストを、アクセント単
位（アクセント核を１つだけ持つ形態素列）に分割する
必要がある。この際、アクセント単位は普通、１つ以上
の文節からなるが、長い複合語が１文節を作る場合に
は、この複合語（文節）を２つ以上のアクセント単位に
分割する必要がある。このための方法としては、複合語
内の係受け解析を行う方法や、複合語を構成単語数の1/
2の位置で分割する方法が提案されている。前者の方法
では、助詞や助動詞の情報なしに係受け解析を行う必要
があり、処理が複雑になる。後者の方法では、５単語以
上の非常に長い複合語や、２等分した単語列や奇数単語
からなるような場合に、不適切な分割となることが多い
などの問題があった。目的本発明は、上述のごとき実情に鑑みてなされたもの
で、特に、音声のテキスト合成において、１つのアクセ
ント単位として音声すると不自然になるような長い複合
語を、適切な位置で、複数のアクセント単位に分割する
ことを目的としてなされたものである。構成本発明は、上記目的を達成するために、入力テキスト
から形態素解析処理によって形態素解析情報を抽出し、
該形態素解析情報に基づき音韻規則によって音韻記号列
を、韻律記号導出規則によって韻律記号列を求め、予め
用意した音素片のパラメータ系列を、前記音韻記号列に
したがって読み出し、結合規則によって韻律を付加する
テキスト音声合成装置において、前記入力テキスト中に
接辞を挟み込んだ長い複合語が存在するときに、該複合
語中に接尾辞辞書中に存在する接尾辞があった場合、こ
の接尾辞の直後を分割位置とし、該複合語中に接頭辞辞
書中に存在する接頭辞のあった場合、この接頭辞の直前
を分割位置とし、前記複合語をアクセント単位に分割す
ることにより１つのアクセント単位で音声すると不自然
な長い複合語をより自然に発声するようにしたことを特
徴としたものである。以下、本発明の実施例に基づいて
説明する。第１図は、本発明の一実施例を説明するための流れ図
であるが、本発明は、以下に説明するように、長い複合
語を複合語中の接辞に注目した方法によって、複数のア
クセント単位に分割するものである。第１図において、
まず、複合語内に接辞が存在した場合には、ここが意味
的に大きな区切りとなることから、複合語尾から、接尾
辞を、複合語頭から接頭辞を検索する。ここで接尾辞が
見つけられた場合には、その接尾辞の直後を、接頭辞が
見つけられた場合には、その接頭辞の直前を分割位置と
し、２つの複合語に分割する。第２図は接尾辞Ｂが見つ
けられた例であり、第３図は接頭辞Ｃが見つけられた例
である（ただし、Ａはアクセント単位）。分割された複
合語がまだ長い場合には、更にこの処理を繰り返す。複合語を構成する単語の平均モーラ長から考えて、１
つのアクセント単位は２つないし３つの単語（この場合
１つは接辞）で構成するのが妥当である。そこで複合語
中に接尾辞も接頭辞も存在しない場合には、第４図に示
すように複合語頭から２単語毎に分割位置を設定する。効果以上の説明から明らかなように、本発明によると、複
合語中に挟み込まれた接辞に注目した方法によって、長
い複合語から文節を複数のアクセント単位に分割するこ
とが可能となる。Description: TECHNICAL FIELD The present invention relates to a compound word dividing method, and more particularly to an accent unit extracting method for text-to-speech synthesis. 2. Description of the Related Art In order to add natural prosody in speech synthesis, it is essential to set the position and level of an accent nucleus. In order to do so, first, an input text is converted into an accent unit (a morpheme having only one accent nucleus). Column). At this time, the accent unit usually consists of one or more clauses. When a long compound word forms one clause, it is necessary to divide this compound word (clause) into two or more accent units. As a method for this, there is a method of performing dependency analysis in a compound word, and a method of dividing a compound word into 1 /
A method of dividing at position 2 has been proposed. In the former method, it is necessary to perform dependency analysis without information on particles and auxiliary verbs, and the processing becomes complicated. The latter method has a problem that improper division is often caused when a very long compound word of 5 words or more, a word string divided into two or an odd word is used. Object The present invention has been made in view of the above-mentioned circumstances, and in particular, in a text synthesis of speech, a long compound word which becomes unnatural when spoken as one accent unit is formed in a plurality of appropriate positions. This is done for the purpose of dividing into accent units. Configuration In order to achieve the above object, the present invention extracts morphological analysis information from input text by morphological analysis processing,
Based on the morphological analysis information, a phoneme symbol string is obtained according to a phoneme rule and a prosody symbol string is obtained according to a prosody symbol derivation rule. A parameter sequence of a prepared phoneme segment is read out according to the phoneme symbol string, and a prosody is added according to a combination rule. In the text-to-speech synthesis apparatus, when there is a long compound word having a suffix inserted in the input text, and there is a suffix present in the suffix dictionary in the compound word, a part immediately after this suffix is divided. Position, and if there is a prefix present in the prefix dictionary in the compound word, the position immediately before this prefix is used as a division position, and the compound word is divided into accent units to produce speech in one accent unit. The feature is that unnatural long compound words are uttered more naturally. Hereinafter, a description will be given based on examples of the present invention. FIG. 1 is a flow chart for explaining one embodiment of the present invention. As will be described below, the present invention uses a method of focusing a long compound word on a plurality of accents by focusing on affixes in the compound word. It is divided into units. In FIG.
First, if an affix exists in a compound word, this is a large semantic break. Therefore, a suffix is searched from a compound ending and a prefix is searched from a compound head. If a suffix is found here, the part immediately after the suffix is found, and if a prefix is found, the part immediately before the prefix is used as the dividing position to divide the word into two compound words. FIG. 2 is an example in which the suffix B is found, and FIG. 3 is an example in which the prefix C is found (where A is an accent unit). If the divided compound words are still long, this process is further repeated. Considering the average mora length of the words that make up the compound word, 1
It is appropriate that one accent unit is composed of two or three words (one in this case, one affix). Therefore, if neither a suffix nor a prefix exists in the compound word, a division position is set every two words from the compound word head as shown in FIG. Effects As is clear from the above description, according to the present invention, a phrase can be divided into a plurality of accent units from a long compound word by a method focusing on an affix sandwiched in the compound word.

【図面の簡単な説明】第１図は、本発明の一実施例を説明するための流れ図、
第２図乃至第４図は、複合語の分割例を示す図である。Ａ……アクセント単位,B……接尾辞,C……接頭辞。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flowchart for explaining one embodiment of the present invention;
2 to 4 are diagrams showing examples of compound word division. A: Accent unit, B: Suffix, C: Prefix.

Claims

(57) [Claims] Morphological analysis information is extracted from the input text by morphological analysis processing, a phonemic symbol sequence is obtained by a phonemic rule based on the morphological analysis information, a prosodic symbol sequence is obtained by a prosodic symbol derivation rule, and a parameter sequence of a phoneme segment prepared in advance is obtained.
In a text-to-speech synthesizer that reads in accordance with the phoneme symbol string and adds a prosody according to a combination rule, when a long compound word having an affix is inserted in the input text, the compound word is included in a suffix dictionary in the compound word. If there is a suffix, the division position is set immediately after this suffix, and if there is a prefix existing in the prefix dictionary in the compound word,
A compound word characterized in that immediately before the prefix is used as a division position, and the compound word is divided into accent units so that an unnatural long compound word can be uttered more naturally when uttered in one accent unit. Split method.