JP3096499B2

JP3096499B2 - Word segmentation processing method

Info

Publication number: JP3096499B2
Application number: JP03230424A
Authority: JP
Inventors: 直之余田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1991-09-10
Filing date: 1991-09-10
Publication date: 2000-10-10
Anticipated expiration: 2015-10-10
Also published as: JPH0567075A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は、アルファベット文字
を使用する自然言語による文書を作成する文書作成方法
に関し、特に、文章の改行時に単語分割してハイフンを
付与するための単語分割処理方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document preparation method for preparing a document in a natural language using alphabet characters, and more particularly to a word division processing method for dividing a word and adding a hyphen at the time of line feed of a sentence. It is.

【０００２】[0002]

【従来の技術】近年、アルファベット文字を使用する自
然言語文章、例えば英語文章を電子的に編集印刷できる
英文ワープロが実用化されている。2. Description of the Related Art In recent years, English word processors that can electronically edit and print natural language sentences using alphabet characters, for example, English sentences, have been put to practical use.

【０００３】このような、英文ワープロの如き従来の文
書作成装置において、所定の表示や印刷書式によって定
まる広さの文書の作成領域内にアルファベット文字単語
からなる文章を表示あるいは印刷する時に、表示行ある
いは印刷行の改行位置に存在する単語をハイフンで分割
して、上下２行に跨って分割表示する必要が生じる場合
がある。In such a conventional document creation device such as an English word processor, when displaying or printing a sentence composed of alphabetic words in a document creation area of a size determined by a predetermined display or printing format, a display line is displayed. Alternatively, a word existing at a line feed position of a print line may need to be divided by a hyphen, and divided and displayed over two upper and lower lines.

【０００４】従来のこの種文書作成装置に於ては、上述
の如き単語分割処理を行うために、多数の単語に対して
その単語の分割位置情報を対応づけて記憶した単語辞書
と称するメモリが使用されていた（特公昭５７−３２３
８２号公報参照）。このような単語辞書には、単語中に
ハイフンを挿入できる分割可能位置を示すハイフン位置
情報を全ての単語に対応づけて予め記憶しておく必要が
ある。In this type of conventional document creation apparatus, in order to perform the above-described word division processing, a memory called a word dictionary that stores a large number of words in association with division position information of the words is stored. Used (Japanese Patent Publication No. 57-323)
No. 82). In such a word dictionary, it is necessary to previously store hyphen position information indicating a dividable position at which a hyphen can be inserted in a word in association with all words.

【０００５】しかしながら、例えば英語の文章に必要な
一般的な単語数は、数万語に達するので、これに対応す
るような単語辞書を作成するには、大容量のメモリが必
要になるので、この辞書の参照処理に要する時間が長く
なる欠点が生じるばかりか、当然の事ながら、単語辞書
に存在しない単語に対しては、その分割位置情報を得る
ことができない不都合があった。However, for example, the number of general words required for an English sentence reaches tens of thousands of words, and a large-capacity memory is required to create a word dictionary corresponding to this. In addition to the drawback that the time required for reference processing of the dictionary is increased, there is a disadvantage that it is not possible to obtain division position information for a word that does not exist in the word dictionary.

【０００６】[0006]

【発明が解決しようとする課題】本発明は、表示や印刷
の書式によって文書作成領域に制約を受け、上述の如き
改行処理に応じて単語を分割する必要がある場合に、単
語毎の分割位置情報を備えた大規模な単語辞書を用いる
ことなしに、全ての単語に対して、迅速に、適切な単語
の分割位置を特定できる単語分割処理方法または単語分
割装置、ならびに文書作成処理方法または文書作成装置
を提供する。SUMMARY OF THE INVENTION According to the present invention, when a document creation area is restricted by a display or print format and it is necessary to divide a word in accordance with the above-described line feed processing, a division position for each word is set. A word division processing method or a word division apparatus capable of quickly specifying an appropriate word division position for all words without using a large-scale word dictionary having information, and a document creation processing method or a document Provide a creation device.

【０００７】[0007]

【課題を解決するための手段】本発明の単語分割処理方
法は、多種の単語の単語分割可能位置を考慮して決定さ
れた音素符号列の分割単位情報と、この分割単位による
分割可能性を示す分割可能性情報とを対応づけて予め登
録した音素パターンテーブルを参照することによって、
全ての単語に対して、その単語の音素符号列に基づい
て、発音上適切な単語の分割位置を特定するものであ
る。According to the word segmentation method of the present invention, the phoneme code string segmentation unit information determined in consideration of the word segmentable positions of various words and the segmentability by the segmentation unit are determined. By referring to the phoneme pattern table registered in advance in association with the division possibility information shown,
For all the words, the appropriate word division position for pronunciation is specified based on the phoneme code string of the word.

【０００８】本発明の単語分割装置は、多種の単語の単
語分割可能位置を考慮して決定された音素符号列の分割
単位情報と、この分割単位による分割可能性を示す分割
可能性情報とを対応づけて予め登録した音素パターンテ
ーブルを備えるものであって、このテーブルを参照する
ことによって、全ての単語に対して、その単語の音素符
号列に基づいて、発音上適切な単語の分割位置を特定
し、単語の分割を行うものである。The word segmenting apparatus of the present invention divides phoneme code string segmenting unit information determined in consideration of the word segmentable positions of various words and segmentability information indicating the segmentability by the segmenting unit. It is provided with a phoneme pattern table registered in advance in association with each other, and by referring to this table, for each word, based on the phoneme code string of the word, a division position of a word appropriate for pronunciation is determined. Identify and split words.

【０００９】本発明の文書作成処理方法は、多種の単語
の単語分割可能位置を考慮して決定された音素符号列の
分割単位情報と、この分割単位による分割可能性を示す
分割可能性情報とを対応づけて予め登録した音素パター
ンテーブルを参照することによって、入力された文書中
において分割する必要があると認識された単語に対し
て、その単語の音素符号列に基づき、発音上適切な単語
の分割可能位置を複数特定し、入力された文書に設定さ
れる表示書式に基づいて前記単語の前半部分の文字数の
範囲を設定し、該単語の前半部分の文字数が前記範囲内
となるように、単語の分割位置を特定するものである。According to the document creation processing method of the present invention, the phoneme code string division unit information determined in consideration of the word division possible positions of various words, division possibility information indicating the division possibility by this division unit, and By referring to a phoneme pattern table registered in advance by associating a word with a word recognized as needing to be divided in the input document, based on the phoneme code string of that word, The plurality of dividable positions are specified, and the number of characters in the first half of the word is determined based on the display format set in the input document .
Set the range, the number of characters in the first half of the word is within the range
The word division position is specified so that

【００１０】本発明の文書作成装置は、多種の単語の単
語分割可能位置を考慮して決定された音素符号列の分割
単位情報と、この分割単位による分割可能性を示す分割
可能性情報とを対応づけて予め登録した音素パターンテ
ーブルを備え、このテーブルを参照することによって、
入力された文書中において分割する必要があると認識さ
れた単語に対して、その単語の音素符号列に基づき、発
音上適切な単語の分割可能位置を複数特定し、入力され
た文書に設定される表示書式に基づいて前記単語の前半
部分の文字数の範囲を設定し、該単語の前半部分の文字
数が前記範囲内となるように、単語の分割位置を特定す
るものである。[0010] The document creation apparatus of the present invention divides the phoneme code string division unit information determined in consideration of the word division positions of various words and the division possibility information indicating the division possibility by the division unit. A phoneme pattern table registered in advance and associated with it is provided, and by referring to this table,
For words that are recognized as needing to be divided in the input document, based on the phoneme code string of the word, specify a plurality of dividable positions of words that are appropriate for pronunciation, and set them in the input document. The first half of the word based on the display format
Set the range of the number of characters in the part,
The word division position is specified so that the number falls within the above range .

【００１１】[0011]

【作用】本発明は、表示又は印刷できる領域の制約によ
り単語を分割する必要がある場合に、分割単位の音素符
号列とこの音素符号列による分割可能性を登録したテー
ブルを用いるので、全ての単語に対して、迅速に、発音
上もっともらしい単語の分割位置を特定することができ
る。According to the present invention, when it is necessary to divide a word due to the limitation of a displayable or printable area, a table in which a phoneme code string of a unit of division and the possibility of division by this phoneme code string are used is used. For a word, it is possible to quickly specify a word division position that is likely to be pronunciationally plausible.

【００１２】[0012]

【実施例】最初に、本発明による単語分割処理方法の原
理について説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, the principle of a word division processing method according to the present invention will be described.

【００１３】単語がアルファベット文字によって構成さ
れる場合、アルファベット各文字に対してそれぞれが表
しやすい音が存在する。このようにアルファベット文字
によって表される言語には英語、フランス語、スペイン
語などの西欧諸国の言語がある。表音による文字の分類
には母音／子音などの分け方があるが、例えば英語の場
合では、’ａ’，’ｅ’，’ｉ’，’ｏ’，’ｕ’，’
ｙ’などは母音を表すために用いられることが多い。そ
こでアルファベット文字が表しやすい音（音素符号）へ
と変換する規則を用意する。このような変換テーブルの
一例を図２に示す。When a word is composed of alphabetic characters, there is a sound that can be easily represented for each character of the alphabet. The languages represented by the alphabetic characters include languages of Western European countries such as English, French, and Spanish. There are methods of classifying characters by phonograms such as vowels / consonants. For example, in the case of English, 'a', 'e', 'i', 'o', 'u', '
y ′ and the like are often used to represent vowels. Therefore, a rule is prepared for converting the sound into a sound (phoneme code) that can be easily represented by alphabet characters. FIG. 2 shows an example of such a conversion table.

【００１４】次に、当該言語において多種の単語の分割
可能単位を音素符号化したものとその分割可能性情報と
が対応づけられているテーブル（音素パターンリスト）
を用意する。このようなテーブルの一例を図３に示す。
この場合の分割可能性情報とはある音素パターンが実際
に単語の分割に用いられる可能性を表すものであり、こ
の情報は数値など可能性の大小を表しえる情報であれば
良い。このような分割可能性情報の設定については、経
験による推測により行っても良いが、大規模な単語の集
合を用いて音素パターンが実際に単語の分割単位として
用いられている頻度を統計した結果を用いてもよい。Next, in the language, a table (phoneme pattern list) in which various types of divideable units of words are phoneme-encoded and their divideability information are associated with each other.
Prepare FIG. 3 shows an example of such a table.
The division possibility information in this case indicates the possibility that a certain phoneme pattern is actually used for dividing a word, and this information may be any information such as a numerical value that indicates the magnitude of the possibility. The setting of such division possibility information may be performed by guesswork based on experience, but the result of statistically measuring the frequency at which phoneme patterns are actually used as word division units using a large-scale word set May be used.

【００１５】このリストを用いて、分割したい単語が最
も分割可能性の高い音素パターンから構成されるように
マッチングが行われる。マッチングでは、まず、分割し
たい単語の音素符号列への変換が行われる。次に、この
音素符号列が図３に示したような音素パターンを組み合
わせて成り立つように音素パターンの組み合わせをいく
つか生成する。しかしながら、音素パターンの組み合わ
せの全てが可能であるのではなく、実際に発音できない
ような組み合わせも存在する。このような実際には有り
得ない組み合わせを除外するために、図４に示したよう
な条件を用いて対処する。上述の如く有り得ない組み合
わせを除外した後のパターンの組み合わせのうち、構成
する各音素パターンの分割可能性の高いものが、発音上
もっともらしい分割位置を示すものとして選ばれる。［実施例の構成］次に、上述の本発明の単語分割処理方
法の原理を応用した実施例の構成について説明する。Using this list, matching is performed so that the word to be divided is constituted by the phoneme pattern with the highest division possibility. In the matching, first, a word to be divided is converted into a phoneme code string. Next, several combinations of phoneme patterns are generated so that this phoneme code string is realized by combining phoneme patterns as shown in FIG. However, not all combinations of phoneme patterns are possible, and there are combinations that cannot actually be pronounced. In order to exclude such a combination that is not actually possible, a measure as shown in FIG. 4 is used. Among the combinations of patterns after the impossible combinations have been excluded as described above, the one that is highly likely to divide each of the phoneme patterns that constitute it is selected as the one that indicates the most likely division position in terms of pronunciation. [Structure of Embodiment] Next, the structure of an embodiment to which the principle of the above-described word division processing method of the present invention is applied will be described.

【００１６】図１は英文書作成処理装置の一実施例の構
成図である。FIG. 1 is a block diagram of an embodiment of an English document creation processing apparatus.

【００１７】入力装置１はタイプライタ、キーボード、
磁気テープ読み取り機、文字認識装置または他の入力装
置のいずれであっても良い。The input device 1 includes a typewriter, a keyboard,
It may be any of a magnetic tape reader, a character recognition device, or another input device.

【００１８】メモリ２は、ＲＡＭなどによって構成さ
れ、入力装置１によって発生されるコード化されたアル
ファベット文字を記憶する。The memory 2 is constituted by a RAM or the like, and stores coded alphabetic characters generated by the input device 1.

【００１９】表示装置３はＣＲＴまたは他の表示装置の
いずれであっても良い。この表示装置３は常時メモリ２
の内容の表示を行う。The display device 3 may be a CRT or another display device. This display device 3 is always a memory 2
Display the contents of

【００２０】出力装置４はプリンタあるいはデータ通信
装置または他の出力装置のいずれであっても良い。Output device 4 may be a printer, a data communication device, or another output device.

【００２１】行末単語認識器５は文字の入力が設定され
た文書の表示幅を超過した場合に、分割する必要がある
行末の単語を認識する。An end-of-line word recognizer 5 recognizes an end-of-line word that needs to be divided when the input of characters exceeds the display width of the set document.

【００２２】変換テーブル７には、文字と音素符号との
対応表が記述されており、その一例が図２に示されてい
る。図２のテーブルを用いることによって、’ａ’，’
ｅ’，’ｉ’，’ｕ’，’ｏ’は’ｖ’という符号
で、’ｙ’は’ａ’という符号で、これら以外の文字に
ついては’ｃ’という符号で置き替えが行われる。The conversion table 7 describes a correspondence table between characters and phoneme codes, an example of which is shown in FIG. By using the table of FIG. 2, 'a', '
e ',' i ',' u ', and' o 'are replaced by the symbol' v ',' y 'is the symbol' a ', and other characters are replaced by the symbol' c '. .

【００２３】文字−音素変換器６は英単語を変換テーブ
ル７を用いて音素符号列へと変換する。The character-phoneme converter 6 converts an English word into a phoneme code string using the conversion table 7.

【００２４】パターン比較器８は文字−音素変換器６か
らの出力音素符号列と音素パターンテーブル９との比較
を行い、この音素符号列が音素パターンテーブル９の音
素パターンによって構成されるような音素パターンの組
み合わせを生成する。この時、組み合わせの生成は比較
条件リスト１０に記載されている条件に適合しないよう
に行われる。The pattern comparator 8 compares the phoneme code string output from the character-phoneme converter 6 with the phoneme pattern table 9, and the phoneme code string is constituted by the phoneme pattern of the phoneme pattern table 9. Generate a combination of patterns. At this time, the combination is generated so as not to meet the conditions described in the comparison condition list 10.

【００２５】音素パターンテーブル９には代表的な音素
パターンと英語において実際に英単語の分割の単位とし
てその音素パターンが用いられる可能性（分割可能性）
とが対応づけられて記録されており、その一例が図３に
示されている。図３の例では、各音素パターンに対し
て、分割可能性の最も高い音素パターンを１．０として
正規化した値を分割可能性情報として用いている。In the phoneme pattern table 9, a representative phoneme pattern and the possibility of actually using the phoneme pattern as a unit of English word division in English (division possibility)
Are recorded in association with each other, an example of which is shown in FIG. In the example of FIG. 3, for each phoneme pattern, a value normalized by setting the phoneme pattern having the highest division possibility to 1.0 is used as division possibility information.

【００２６】比較条件リスト１０には音素符号列を音素
パターンテーブル９の音素パターンの組み合わせによっ
て生成する際に用いられる条件群が記されており、この
比較条件リスト１０は図４に示されている。図４の例で
は、例えば、「”−ｅｄ”，”−ｉｎｇ”，”−ｔｈ”
などの接尾辞は既に分割可能であるとして比較を行わな
い」（Ｃ１）、「子音が４つ以上連続するような音素パ
ターンの組み合わせは行わない」（Ｃ２）などの条件が
記述されている。The comparison condition list 10 describes a group of conditions used when generating a phoneme code string by a combination of phoneme patterns in the phoneme pattern table 9, and the comparison condition list 10 is shown in FIG. . In the example of FIG. 4, for example, "" -ed ","-ing ","-th "
Such conditions are described that suffixes such as "do not compare assuming that they can be split" (C1) and "Do not combine phoneme patterns such that four or more consonants continue" (C2).

【００２７】分割位置決定器１１は音素パターン比較器
８により生成された音素パターンの組み合わせのうち分
割可能性の高い音素パターンによって構成されている組
み合わせを選択し、文書の表示幅に合わせて英単語の分
割位置の決定を行う。［実施例の処理動作１］次に、図５の英文テキストを用
いた場合についての具体的処理を説明する。図５は、英
語の文章を左右のマージンが整った表示幅４０文字の一
定の書式で表示する場合を示している。（１）分割単語認識処理入力装置１から入力される単語列はメモリ２へと読み込
まれると同時に表示装置３において表示される。図５は
本実施例で用いられているテキストの表示装置３におけ
る表示結果を表しているが、この図において表示行１２
の行末の単語”ｈｏｒｒｉｂｌｅ”は、表示行１２と表
示行１３の間の改行位置に存在するため、表示行１２と
表示行１３の２行に二分して表示しなければならない。
このような改行位置に存在する単語の認識は行末単語認
識器５によって行われる。図６の表示行１４は書式にこ
だわらずに表示した場合の出力例であるが、行末単語認
識器５はこの例によって示されるように、行末の単語”
ｈｏｒｒｉｂｌｅ”を、分割後の文字列にハイフンが付
加されることを考慮して、単語の前半分が３文字（”ｈ
ｏｒ”）以内になるように分割する必要があると認識す
る。The division position determiner 11 selects a combination of phoneme patterns having a high possibility of division from the combinations of the phoneme patterns generated by the phoneme pattern comparator 8, and adjusts the English word according to the display width of the document. Is determined. [Processing Operation 1 of Embodiment] Next, a specific process when the English text shown in FIG. 5 is used will be described. FIG. 5 shows a case in which an English sentence is displayed in a fixed format with a display width of 40 characters and left and right margins. (1) Divided Word Recognition Processing The word string input from the input device 1 is read into the memory 2 and displayed on the display device 3 at the same time. FIG. 5 shows the display result of the text on the display device 3 used in the present embodiment.
Since the word "horrible" at the end of the line exists at the line feed position between the display line 12 and the display line 13, the word "horrible" must be displayed in two lines, the display line 12 and the display line 13.
The recognition of the word existing at such a line feed position is performed by the end-of-line word recognizer 5. The display line 14 in FIG. 6 is an output example in the case where the display is performed without depending on the format. However, as shown in this example, the line end word recognizer 5 outputs the word "
In consideration of the fact that a hyphen is added to the divided character string, the first half of the word is composed of three characters ("h
or ").

【００２８】以下、この単語”ｈｏｒｒｉｂｌｅ”が図
１の本発明によって二分されるまでの処理の流れを図７
に示し、同図を参照しつつ具体的処理の説明を続ける。（２）音素符号変換処理入力単語”ｈｏｒｒｉｂｌｅ”１５は、変換テーブル７
に従って、文字−音素変換器６により音素列”ｃｖｃｃ
ｖｃｃｖ”１６へと変換される。（３）単語分割位置特定処理次に、音素パターン比較器８において、図３の如き音素
パターンテーブル９と図４の如き比較条件リスト１０を
参照して音素列１６に対応する音素パターンの組み合わ
せの生成が行われる。FIG. 7 shows the flow of processing until this word "horrible" is bisected by the present invention shown in FIG.
And the description of the specific processing will be continued with reference to FIG. (2) Phoneme code conversion processing The input word “horrible” 15 is stored in the conversion table 7
, The character-phoneme converter 6 causes the phoneme sequence "cvcc"
vccv "16. (3) Word segmentation position specifying process Next, the phoneme pattern comparator 8 refers to the phoneme pattern table 9 as shown in FIG. 3 and the comparison condition list 10 as shown in FIG. A combination of phoneme patterns corresponding to No. 16 is generated.

【００２９】図７の音素パターン組み合わせ１７〜２０
は、可能な音素パターンの組み合わせのうち、構成する
各音素パターンの分割可能性情報値の総計が最も高いも
の４つを示している。尚、各組み合わせ１７〜２０に対
応して記述された数値２１〜２４は各組み合わせを構成
する音素パターンの分割可能性情報値を、数値２５〜２
８は各組み合わせにおける分割可能性情報値の総計をそ
れぞれ表している。分割位置の特定は、分割可能性情報
値の総計が最も高い音素パターンの組み合わせ１７を参
考に行われ、音素パターンの組み合わせ３１を構成する
３つの音素パターンにより二個所の分割位置が特定され
る。（４）単語分割処理次に分割位置決定器１１により単語を二分する位置の決
定が行われる。先に、分割は単語の前半分を３文字以内
に収めるように分割しなければならないと述べたが、組
み合わせ１７を構成する最初の音素パターンを前半分と
して分割することによってこの条件は満たされる。した
がって、入力単語は組み合わせ２９によって示される位
置において文字列群３０へと分割され、文字列間をハイ
フン（−）によって結ぶことにより分割文字列３１が生
成される。The phoneme pattern combinations 17 to 20 shown in FIG.
Indicates four of the possible combinations of phoneme patterns that have the highest sum of the dividability information values of the respective phoneme patterns that make up the combinations. Numerical values 21 to 24 described corresponding to the combinations 17 to 20 represent the division possibility information values of the phoneme patterns constituting each combination, and numerical values 25 to 2
Numeral 8 indicates the sum of the division possibility information values in each combination. The division position is specified by referring to the phoneme pattern combination 17 having the highest sum of the division possibility information values, and two division positions are specified by three phoneme patterns constituting the phoneme pattern combination 31. (4) Word Division Processing Next, the division position determination unit 11 determines a position at which a word is bisected. Earlier, it was stated that the division had to be done so that the first half of the word would fit within three characters, but this condition is met by dividing the first phoneme pattern that makes up the combination 17 as the first half. Therefore, the input word is divided into character string groups 30 at the positions indicated by the combinations 29, and the character strings are connected by hyphens (-) to generate a divided character string 31.

【００３０】以上の本実施例の説明に於ては、分割可能
性情報値が最も高い組み合わせを選択しているが、一ケ
所の分割位置のみを特定するのであれば、可能な幾つか
の組み合わせの内で多くの組み合わせに共通している分
割位置を選択しても良い。［実施例の処理動作２］次に、図８の表の書式を用いた
他の実施例の説明を行う。この例では、文章を表の指定
された幅（１８文字）の枠内に収めて表示する書式が要
求されている。（１）分割単語認識処理入力装置１から入力される単語列はメモリ２へと読み込
まれると同時に表示装置３において表示される。図８は
本実施例で用いられている表の表示装置３における表示
結果を表しているが、この図において表示行３２の行末
の単語”Ｔｒａｎｓｙｌｖａｎｉａ”は、表示行３２と
表示行３３の間の改行位置に存在するため、表示行３２
と表示行３３の２行に二分して表示されなけらばならな
い。このような改行位置に存在する単語の認識は行末単
語認識器５によって行われる。図９の表示行３４は書式
にこだわらずに表示した場合の出力例であるが、行末単
語認識器５はこの例によって示されるように、行末の単
語”Ｔｒａｎｓｙｌｖａｎｉａ”を、分割後の文字列に
ハイフンが付加されることを考慮して、単語の前半分が
５文字（”Ｔｒａｎｓ”）以内になるように分割する必
要があると認識する。In the above description of this embodiment, the combination having the highest division possibility information value is selected. However, if only one division position is specified, several possible combinations are selected. Of these, the division position common to many combinations may be selected. [Processing Operation 2 of Embodiment] Next, another embodiment using the format of the table of FIG. 8 will be described. In this example, a format is required in which a sentence is displayed in a frame of a specified width (18 characters) in a table. (1) Divided Word Recognition Processing The word string input from the input device 1 is read into the memory 2 and displayed on the display device 3 at the same time. FIG. 8 shows a display result of the table used in the present embodiment on the display device 3. In this figure, the word “Transylvania” at the end of the display line 32 is displayed between the display line 32 and the display line 33. Display line 32 because it exists at the line feed position
And the display line 33 must be displayed in two lines. The recognition of the word existing at such a line feed position is performed by the end-of-line word recognizer 5. The display line 34 in FIG. 9 is an output example in the case where the display is performed regardless of the format. As shown in this example, the end-of-line word recognizer 5 converts the word “Transylvania” at the end of the line into a character string after division. Considering that a hyphen is added, it is recognized that it is necessary to divide the word so that the first half of the word is within 5 characters (“Trans”).

【００３１】以下、この単語”Ｔｒａｎｓｙｌｖａｎｉ
ａ”が図１の本発明によって二分されるまでの処理の流
れを図１０に示し、同図を参照しつつ具体的処理の説明
を続ける。（２）音素符号変換処理入力単語”Ｔｒａｎｓｙｌｖａｎｉａ”３５は、変換テ
ーブル７に従って、文字−音素変換器６により音素列”
ｃｃｖｃｃａｃｃｖｃｖｖ”３６へと変換される。（３）単語分割位置特定処理次に、音素パターン比較器８において、図３の如き音素
パターンテーブル９と図４の如き比較条件リスト１０を
参照して音素列３６に対応する音素パターンの組み合わ
せの生成が行われる。Hereinafter, this word "Transylvani"
FIG. 10 shows the flow of processing until “a” is divided into two by the present invention in FIG. 1. The specific processing will be continued with reference to FIG. 10. (2) Phoneme code conversion processing Input word “Transylvania” 35 Is a phoneme sequence by the character-phoneme converter 6 according to the conversion table 7.
ccvccaccvcvv "36. (3) Word segmentation position specifying process Next, the phoneme pattern comparator 8 refers to the phoneme pattern table 9 as shown in FIG. 3 and the comparison condition list 10 as shown in FIG. A combination of phoneme patterns corresponding to 36 is generated.

【００３２】図１０の音素パターンの組み合わせ３７〜
３９は、可能な音素パターンの組み合わせのうち、構成
する各音素パターンの分割可能性情報値の総計が最も高
いもの３つを示している。尚、各組み合わせ３７〜３９
に対応して記述された数値４０〜４２は各組み合わせを
構成する音素パターンの分割可能性情報値を、数値４３
〜４５は各組み合わせにおける分割可能性情報値の総計
をそれぞれ表している。分割位置の特定は、分割可能性
情報値の総計が最も高い音素パターンの組み合わせ３７
を参考に行われ、音素パターンの組み合わせ３７を構成
する４つの音素パターンにより三ケ所の分割位置が特定
される。（４）単語分割処理次に分割位置決定器１１により単語を二分する位置の決
定が行われる。先に、分割は単語の前半分を５文字以内
に収めるように分割しなければならないと述べたが、組
み合わせ３２を構成する最初の音素パターンを前半分と
して分割することによってこの条件は満たされる。した
がって、入力単語は組み合わせ３９によって示される位
置において文字列群４０へと分割され、文字列間をハイ
フンによって結ぶことにより分割文字列４１が生成され
る。The combination of phoneme patterns 37 to 37 shown in FIG.
Reference numeral 39 indicates three of the possible combinations of phoneme patterns, which have the highest sum of the division possibility information values of the respective phoneme patterns constituting the combination. In addition, each combination 37-39
Numerical values 40 to 42 described in correspondence with the numerical values 43 to 42 represent the dividability information values of the phoneme patterns constituting each combination, respectively.
-45 represent the sum of the division possibility information values in each combination. The division position is specified by the combination 37 of the phoneme patterns having the highest sum of the division possibility information values.
Is performed, and three division positions are specified by four phoneme patterns constituting the combination 37 of phoneme patterns. (4) Word Division Processing Next, the division position determination unit 11 determines a position at which a word is bisected. Earlier, it was stated that the division must be made so that the first half of the word falls within 5 characters, but this condition is satisfied by dividing the first phoneme pattern constituting the combination 32 as the first half. Therefore, the input word is divided into character string groups 40 at the positions indicated by the combination 39, and a divided character string 41 is generated by connecting the character strings with a hyphen.

【００３３】[0033]

【発明の効果】以上のように、本発明によれば、従来技
術では大規模な分割位置情報を備えた単語辞書などを用
いることによって決められていた単語の分割位置を、単
語を分割する単位の音素情報を用いることにより、大規
模な分割位置情報を備えた単語辞書を用いることなし
に、全ての単語に対して、迅速に、発音上もっともらし
い単語の分割位置を特定することが可能となる。As described above, according to the present invention, a word division position determined by using a word dictionary having large-scale division position information in the prior art is replaced with a word division unit. By using the phoneme information, it is possible to quickly identify the most probable word division position for all words without using a word dictionary with large-scale division position information. Become.

[Brief description of the drawings]

【図１】本発明による文書作成装置のブロック図であ
る。FIG. 1 is a block diagram of a document creation device according to the present invention.

【図２】図１における文字−音素変換テーブルを表す図
である。FIG. 2 is a diagram showing a character-phoneme conversion table in FIG. 1;

【図３】図１における音素パターンテーブルを表す図で
ある。FIG. 3 is a diagram showing a phoneme pattern table in FIG. 1;

【図４】図１における比較条件リストを表す図である。FIG. 4 is a diagram showing a comparison condition list in FIG. 1;

【図５】本単語分割処理による単語の分割過程図であ
る。FIG. 5 is a diagram showing a process of dividing a word by the word dividing process.

【図６】本単語分割処理による単語の分割過程図であ
る。FIG. 6 is a diagram showing a process of dividing a word by the present word dividing process.

【図７】本単語分割処理による単語の分割過程図であ
る。FIG. 7 is a diagram illustrating a process of dividing a word by the present word dividing process.

【図８】本単語分割処理による単語の分割過程図であ
る。FIG. 8 is a diagram showing a process of dividing a word by the word dividing process.

【図９】本単語分割処理による単語の分割過程図であ
る。FIG. 9 is a diagram showing a process of dividing a word by the present word dividing process.

【図１０】本単語分割処理による単語の分割過程図であ
る。FIG. 10 is a diagram illustrating a process of dividing a word by the word dividing process.

[Explanation of symbols]

１入力装置２メモリ３表示装置４出力装置５行末単語認識器６文字−音素変換器７変換テーブル８パターン比較器９音素パターンテーブル１０比較条件リスト１１分割位置決定器１２〜１４表示行１５入力単語１６音素列１７〜２０音素パターン組み合わせ２１〜２４各音素パターンの持つ分割可能性情報値２５〜２８各組み合わせにおける分割可能性情報値の
総計２９入力単語の組み合わせ３０文字列群３１分割文字列３２〜３４表示行３５入力単語３６音素列３７〜３９音素パターンの組み合わせ４０〜４２各音素パターンの持つ分割可能性情報値４３〜４５各組み合わせにおける分割可能性情報値の
総計４６組み合わせ４７文字列群４８分割文字列DESCRIPTION OF SYMBOLS 1 Input device 2 Memory 3 Display device 4 Output device 5 End-of-line word recognizer 6 Character-phoneme converter 7 Conversion table 8 Pattern comparator 9 Phoneme pattern table 10 Comparison condition list 11 Division position determiner 12-14 Display line 15 Input word 16 Phoneme Sequence 17-20 Phoneme Pattern Combination 21-24 Dividability Information Value of Each Phoneme Pattern 25-28 Total Dividability Information Value in Each Combination 29 Combination of Input Words 30 Character String Group 31 Split Character String 32- 34 Display Line 35 Input Word 36 Phoneme String 37-39 Combination of Phoneme Patterns 40-42 Dividability Information Value of Each Phoneme Pattern 43-45 Total of Dividability Information Value in Each Combination 46 Combination 47 Character String Group 48 Division String

Claims

(57) [Claims]

1. Registering in advance the unit division information of a phoneme code string determined in consideration of the word division possible positions of various words and the division possibility information indicating the division possibility at the division positions. A phoneme pattern table, and a phoneme code conversion process of converting each alphabetic character constituting a word into a phoneme code to generate a phoneme code string in word units. By constructing the phoneme code string obtained from the phoneme code conversion means for the word obtained from the combination of the most likely division units of the input phoneme code string by the table reference processing, the dividable position of the word is determined. A word division processing method comprising: a specified word division position specifying process.

2. A phoneme code conversion means for converting each alphabetic character constituting a word into a phoneme code to generate a phoneme code string in word units, and a decision is made in consideration of a word-dividable position of various words. A phoneme pattern table pre-registered by associating the division unit information of the phoneme code string with the division possibility information indicating the division possibility at this division position, and a phoneme code conversion means for the input word obtained from the phoneme code conversion means. The input phoneme code string is composed of the most likely combination of the division units based on each of the division unit information of the phoneme pattern table and the division possibility information of the phoneme code string to be divided. A word division device comprising: a word division position specifying unit for specifying a position.

3. A phoneme code string division unit information determined in consideration of word division possible positions of various words and division possibility information indicating division possibility at the division positions are registered in advance in association with each other. And a word recognition process that recognizes whether a word in an input document is a word that needs to be divided, and converts each alphabetic character constituting the word into a phoneme code. And a phoneme code conversion process for generating a phoneme code sequence for each word, and a phoneme code sequence obtained from the phoneme code conversion means for the word recognized in the word recognition process. Based on the division unit information and its division possibility information, the input phoneme code string is composed of a combination of division units having the highest possibility, so that the word A word division position specifying process for specifying a splittable position, and the unit based on a display format set in an input document.
Set the range of the number of characters in the first half of the word,
A word division process for determining a division position of a word recognized in the word recognition process so that the number of characters within the range falls within the range .

4. A document input means for providing a document as an input, a division unit report of a phoneme code string determined in consideration of a word division possible position of various words, and a division possibility at the division position. What is claimed is: 1. A document creation processing apparatus using a phoneme pattern table registered in advance in association with division possibility information, comprising: a word recognition unit for recognizing whether a word in an input document needs to be divided; A phoneme code conversion unit that converts each alphabetic character constituting the word into a phoneme code to generate a phoneme code string in word units, and obtains a word recognized in the word recognition process from the phoneme code conversion unit. The input phoneme code string is set to the most likely set of division units based on each of the division unit information of the phoneme pattern table and its division possibility information. By configuring the Align, the word dividing position specifying means for specifying a dividable position of the word, on the basis of the display format is set in the input document single
Set the range of the number of characters in the first half of the word,
A document creating apparatus comprising: word dividing means for determining a dividing position of a word recognized in the word recognition process so that the number of characters in the minute falls within the range .