JPS60128559A

JPS60128559A - Kana (japanese syllabary) kanji (chinese character) converter

Info

Publication number: JPS60128559A
Application number: JP58236836A
Authority: JP
Inventors: Hiromi Saito; 裕美斎藤; Kimito Takeda; 武田　公人; Tsutomu Kawada; 河田　勉
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1983-12-15
Filing date: 1983-12-15
Publication date: 1985-07-09
Also published as: JPH0221021B2

Abstract

PURPOSE:To improvae the efficiency of generation of a document by displaying and outputting the result of plural converting candidate units to a long input character string such as so-called solid sentence in an easily understandable way. CONSTITUTION:After the 1st character string representing the reading and inputted via an input device 1 comprising a Kana character reader or the like is converted into, e.g., Kana character code, the string is given to Kana/Kanji converting section 2 comprising a block extracting section 2a, a total paragraph series extracting section 2b, a paragraph extracting section 2c and a converting dictionary 3. Then the Kana/Kanji converting section 2 transfers the 2nd character string comprising Kana/Kanji mixed representation obtained to the paragraph series to an output data memory 5 of an output control section 4, which converts the data into a prescribed display output from and outputs the result to a display device 7 via a document display memory 6. Thus, an object Kana/Kanji converting charactr is selected simply from the conversion candidate characters of the homonym and homo-paragraph so as to form efficiently a Japanese sentence.

Description

【発明の詳細な説明】（発明の技術分野）本発明は例えば文単位として入力されるような長さの長
い連続仮名文字系列を適宜文節単位に区切りながら仮名
漢字変換を行って漢字仮名混じりの日本δｎ文章を効果
的に作成することのできる仮名漢字変換装置に関する。Detailed Description of the Invention (Technical Field of the Invention) The present invention converts a long sequence of continuous kana characters, such as those input as sentences, into kana-to-kanji characters while appropriately dividing them into phrases to convert them into kanji and kana characters. The present invention relates to a kana-kanji conversion device that can effectively create Japanese δn sentences.

[Technical background of the invention and its problems]

従来の日本語ワードプロセッサ等における仮名漢字人ノ
コの単位は、一般に単一文節に限られており、また名詞
の複合語の入力においても高々数単位程度に限定されて
いるものが殆んどである。従って、この種の装置を用い
て日本語文章を入力作成する場合、単語あるいは文節の
単位を常に意識しなければならず、オペレータにとって
大ぎな負担となった。そこで最近では文章入力の単位を
制限することなく、文単位の読み仮名列、所謂ベタ文を
入力し、このベタ文に対して仮名漢字変換処理を行う研
究が種々試みられている。それらは具体的には、例えば
文節解析処理を再帰的に行うことでその目的を達成して
いる。然し乍ら、この為には相当長い処理時間を必要と
し、またバッファメモリを大量に消費せざるを得ないと
いう問題があった。また処理時間およびメモリ量を制限
して文節解析処理のアルゴリズムを簡素化することが考
えられているが、その変換処理精度が劣化することが否
めなかった。しがも、このようにして得られた変換結果
をどのように表示出力すれば、その同音ａ選択の指示を
扱い易くできるかが大きな課題となっていた。In conventional Japanese word processors, the unit of Kana-Kanji Jin-no-Ko is generally limited to a single phrase, and in most cases, the input of noun compound words is limited to several units at most. . Therefore, when inputting and creating Japanese sentences using this type of device, the operator must always be aware of the units of words or phrases, which places a heavy burden on the operator. Therefore, recently, various studies have been attempted to input a sequence of pronunciations of kana in sentence units, so-called solid sentences, without restricting the units of text input, and to perform kana-kanji conversion processing on the solid sentences. Specifically, they achieve their purpose by, for example, recursively performing clause analysis processing. However, there are problems in that this requires a considerably long processing time and consumes a large amount of buffer memory. Furthermore, although it has been considered to simplify the algorithm for phrase analysis processing by limiting the processing time and memory amount, it cannot be denied that the accuracy of the conversion processing deteriorates. However, a major problem has been how to display and output the conversion results obtained in this way so that the instruction for selecting the homophone a can be easily handled.

例えば、「さんだかをもとめる」なる文字列を仮名人力
した場合、「さんだかを、・′もとめる」と機械的に文
節分割ができることが予想されるが、あるいは「さんだ
かをも／とめる」という分割形式も文法的にありうる。For example, if the character string ``sandaka wo motoru'' is manually written in kana, it is expected that it will be possible to mechanically divide the phrase into phrases ``sandaka wo,・'motoru'', or it can be divided into phrases ``sandaka wo mo/stop''. Divided forms are also grammatically possible.

この場合、一般に経編的にみて所謂最長一致するものが
確からしいと考えられるが、このような経験則だけに基
いて、常に入力文字列の前方から文節解析処理をｉ）い
、例えば「残高をも／止める」だけを抽出したのではそ
の変換精度が著しく悪くなる。従って、結局「残高をも
／止める」、及び「残高を請求める」等の複数の変換候
補を抽出し、その）鉄板をＡペレータの判断に委ねるこ
とが必要となる。In this case, it is generally considered that the so-called longest match is most likely from a historical perspective, but based on this empirical rule, always perform phrase analysis starting from the beginning of the input character string i). Extracting only "also/stop" will significantly degrade the conversion accuracy. Therefore, it becomes necessary to extract a plurality of conversion candidates, such as ``also/stop the balance'' and ``to request the balance,'' and to leave the final decision to the decision of the operator A.

また、「けいさんしき」という入ノ〕に対しても、もし
装置内の辞書に［計算式］という単に７ｉが登録されて
いないとすると、結果は同様にしてｒ　Ｃｔ　０７式」
、「計算し／木」ど云うように複数の変換結果が生じる
。更には「毛／遍産、／式」のような変換結果も生じる
。更には仮に［るけい−“′流刑′。Also, for the entry ``Keisanshiki'', if 7i, ``calculation formula'', is not registered in the dictionary in the device, the result will be r Ct 07 formula in the same way.''
, ``compute/tree'' produces multiple transformation results. Furthermore, conversion results such as ``hair/hentai,/shiki'' also occur. Furthermore, even if [Rukei-“'Exile’].

」という単語が辞書登録されているとずれは、「残高を
請求め／流刑／算式」という候補も出現する。If the word `` is registered in the dictionary, candidates such as ``Claim the balance/Exile/Calculation formula'' will also appear.

ところで、このような入力に対する多様な変換結果の中
で、最も確からしいものを第１順位に出力するための評
価処理として、例えば、全体を構成する文節数あるいは
単語数の少ない順に優先度を決定する方法が考えられて
いる。具体的には、例えば［こうがくしよと（」の入力
に対して、「高額／所得」を［項が／句／所１ｑ」や「
項が／区処と／旬」より確からしい〜と判定するもので
ある。尚、この場合、同音語については使用頻度の人な
る単語から優先して出力するのが自然であり、好ましい
。しかル、全体を構成する文節数が同じであってもその
区切り方が異なる場合もあり、また変換漏れを少なくす
るために構成数の多い解析結果をも含めて出力する場合
には、その取扱うデータ１ｌｌＳ造が複雑になる。しか
もオペレータがより選択操作を行い易くする為に、それ
らの複数の変換結果をどのように表示出力するかが問題
となる。By the way, as an evaluation process for outputting the most probable one in the first order among various conversion results for such an input, for example, priority is determined in order of the number of clauses or words that make up the whole. A method is being considered. Specifically, for example, in response to the input of [Kou Gaku Shiyoto (), you can input "high amount/income" as [section ga/phrase/place 1q] or "
It is determined that the term is more likely than ``/kudokoroto/season''. In this case, it is natural and preferable to output homophones with priority in the order of frequency of use. However, even if the number of phrases that make up the whole phrase is the same, the way they are separated may be different, and when outputting analysis results that include a large number of phrases in order to reduce omissions in conversion, it is necessary to handle them. Data structure becomes complicated. Moreover, the problem is how to display and output the plurality of conversion results in order to make it easier for the operator to perform selection operations.

例えば各々の文節内の同音開部分のみを輝度変更等の愈
性を付加して示すだけでは、異なる文節の区切りを持つ
他の文節系列が存在するのか否かが判らない。更には、
同音異議詔の選択の池に、同音異文節椙造を選択する必
要があることも有り、問題が多かった。For example, if only the homophonic parts within each phrase are shown with elegance such as brightness changes, it is not possible to tell whether there are other phrase series with different phrase breaks. Furthermore,
There were many problems, as it was necessary to select the homophone dissent clause Sugizo in the selection pond for the homophone objection edict.

［発明の目的］本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、所謂へ９文のように長さの長い
入力文字列に対する複数の変換候補単位に対する結果を
、判り易く表示出力して文書作成の効率を図り得る仮名
漢字変換装置を提供することにある。[Object of the Invention] The present invention has been made in consideration of the above circumstances, and its purpose is to obtain results for a plurality of conversion candidate units for a long input character string, such as the so-called 9 sentences. An object of the present invention is to provide a kana-kanji conversion device that can display and output information in an easy-to-understand manner and improve the efficiency of document creation.

[Summary of the invention]

本発明は、複数の単語を辞書登録した辞書検索部を用い
て一連の入力文字列からその文節単位の系列を抽出し、
これらの各文節をその文節の読みに対応する仮名漢字混
じり表記にそれぞれ変換して出力するに際し、前記入力
文字列に対する複数の文節系列の相互に異なる区切り位
置の中で最も前方にある文節始点を基準点とし、この基
準点以降に存在する次の文節の自立ＨＤ部分と前記丼１
Ｍ点で区切られた前文節の付属語部分とを変換候補の単
位として仮名漢字変換を行い、その変換候補を順に表示
出力するようにしたものである。The present invention uses a dictionary search unit in which a plurality of words are registered in a dictionary to extract a series of clause units from a series of input character strings,
When converting and outputting each of these clauses into the kana-kanji mixed notation corresponding to the pronunciation of the clause, the earliest clause starting point among the mutually different break positions of the plurality of clause series for the input string is determined. As a reference point, the free-standing HD part of the next clause that exists after this reference point and the bowl 1
Kana-kanji conversion is performed using the adjunct part of the previous clause separated by M points as a unit of conversion candidates, and the conversion candidates are displayed and output in order.

〔Effect of the invention〕

かくして本発明によれば、例えば上述した例の「ざんだ
かをもとめるけいさんしき」という入力に対して、「残
高をも／止める」や「残高を請求める」のように文節系
列の候補を作成し、その変換候補結果を［［残高］を［
も止コめる」、［［残高］を請求］める」等のようにめ
、これを順に表示出力するので、同音次候補の選択処理
を非常に簡単化することができる。即ち、例えば、前記
［も止］の部分については、実際上「も止」という単語
は存在しないが、その読み仮名である「もと」の部分に
他の同音単語情報や別表配器が存在している、つまり同
じ読みに対応する他の候補が存在することが示され、文
節内の同音語の選択と同様に同音語次候補の切り換え指
示により、異構造文節列の選択操作を簡易に、且つ効率
良く行うことができる。従って、異構造文節列の中の別
の候補を選択する為の特殊な指示キーを設けることなく
、従来の同音語次候補キーを用いてその選択処理を効率
良く行うことが可能となる等の絶大なる効果が奏せられ
る。Thus, according to the present invention, for example, in response to the above-mentioned input of ``Keisanshiki to obtain Zandaka'', phrase series candidates such as ``Also/stop the balance'' and ``Claim the balance'' are created. Click [Balance] to select the conversion candidate result.
Since these messages are displayed and output in order, such as "stop request" or "request [balance]", the process for selecting the next candidate with the same sound can be greatly simplified. That is, for example, regarding the part [Modome] mentioned above, the word "Motome" does not actually exist, but other homophone word information and separate table arrangement exist in the pronunciation "Moto" part. In other words, it is shown that there are other candidates corresponding to the same pronunciation, and in the same way as selecting homophones in a bunsetsu, the command to switch the next homophone candidate simplifies the selection operation of differently structured bunsetsu strings. Moreover, it can be performed efficiently. Therefore, without providing a special instruction key to select another candidate in a string of different structure phrases, it becomes possible to efficiently perform the selection process using the conventional homophone next candidate key. A tremendous effect can be produced.

[Embodiments of the invention]

以下、図面を参照して本発明の一実施例装置につき説明
する。Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

第１図は実施例装置の概略構成図である。入力装置１は
鍵盤装置や音声認識装置、仮名文字読取り装置等からな
る。この入力装置１を介して入力される読みを表わす文
字列（第１の文字列）は、例えば仮名文字コードに変換
された後、仮名漢字変換部２に与えられる。尚、上記読
みを表わす第１の文字列は、例えば平仮名、片１反名、
ローマ字等どして示されるものである。FIG. 1 is a schematic configuration diagram of an embodiment device. The input device 1 includes a keyboard device, a voice recognition device, a kana character reading device, and the like. A character string (first character string) representing the pronunciation inputted via the input device 1 is converted into, for example, a kana character code, and then provided to the kana-kanji converter 2 . In addition, the first character string representing the above reading may be, for example, hiragana, katakana, katakana,
It is indicated in Roman letters, etc.

仮名漢字変換部２は、例えばブロック抽出部２ａ。The kana-kanji converter 2 is, for example, a block extractor 2a.

総文節系列抽出部２ｂ、文節抽出部２ｃ、および変換辞
書３により構成される。この仮名漢字変換部２は、前記
入力装置１より転送された第１の文字列に対して、それ
に該当する漢字混じりの表示文字列からなる第２の文字
列をめ、これを出力制御部４に与えている。ブロック抽
出部２ａは、上記第１の文字列に対して、特に長さの長
い入力文字列を能率よく解析する為に設けられたもので
あり、予め設定されている数値Ｎ１例えばＮ＝４として
、Ｎ文ＷＪ以内の文節系列として対応づける文節解析結
果をめるもので、対応した文節解析結果がない時には、
上記第１の入力文字列を幾つかのブロック区間に分割し
ている。そしてこのようにして分割した各ブロックの読
み仮名列を総文節系列抽出部２ｂに送り、またこの総文
節系列抽出部２ｂでめられた前記各ブロックについての
仮名漢字変換結果、即ち第２の文字列を順次出力制御部
４に送っている。It is composed of a total phrase series extraction section 2b, a phrase extraction section 2c, and a conversion dictionary 3. The kana-kanji conversion unit 2 generates a second character string consisting of a display character string mixed with kanji corresponding to the first character string transferred from the input device 1, and outputs this to the output control unit 4. is giving to The block extraction unit 2a is provided to efficiently analyze particularly long input character strings with respect to the first character string, and extracts a preset value N1, for example, N=4. , contains the phrase analysis results to be associated as a phrase series within N sentences WJ, and when there is no corresponding phrase analysis result,
The first input character string is divided into several block sections. Then, the pronunciation kana string of each block divided in this way is sent to the total bunsetsu series extraction section 2b, and the kana-kanji conversion result for each block found by the total bunsetsu series extraction section 2b, that is, the second character The columns are sequentially sent to the output control section 4.

上記総文節系列抽出部２ｂは、文節抽出部２ｃを用いて
前記第１の文字列を分割可能な文節系列に分割し、これ
らの文節系列についてそ、れぞれめられた仮名漢字混り
表記の変換結果を上記ブロック抽出部２ａに出力するも
のである。文節系列は前記入力文字列の文節分割可能な
全ての組合せにつぃてめられ、例えば優先度の評価にに
って、その確からしい候補の順に順位付されたのち、そ
の第１順位のものから順に出力される。尚、優先度の評
価としては、一般に文節個数が少ないもののほうが入力
目的とする漢字混り文に対応している傾向が高いことか
ら、例えば文節個数の少ない文節系列を優先して出力す
る等して行われる。The total clause series extracting unit 2b divides the first character string into divisible clause series using the clause extracting unit 2c, and for each of these clause series, the corresponding kana-kanji mixed notation is written. The conversion result is output to the block extraction section 2a. The phrase series is used for all combinations of the input character string that can be divided into phrases, and after ranking the most likely candidates based on, for example, a priority evaluation, the first phrase is selected. are output in order. In addition, when evaluating the priority, in general, there is a higher tendency for phrases with a small number of phrases to correspond to sentences containing kanji that are the input target, so for example, phrase sequences with a small number of phrases are given priority for output. will be carried out.

文節抽出部２Ｃは、入力された文字コード列と変換辞書
３に予め登録された文字列（単語）との間で照合検索を
行い、上記第１の文字列に該当する漢字混りの表記文字
からなる第２の文字列をめている。変換辞書３は、例え
ば第２図にそのメモリ構成例を示すように、入力見出し
表領域３ａ、出力見出し表領域３ｂおよび品詞領域３Ｃ
とを備え、上記入力見出し要領ｊｇ３ａに読みを表わす
第１の文字列を格納し、またこの第１の文字列に対応す
る漢字混じりの表記文字からなる第２の文字列を上記出
力見出し表領域３ｂに格納したものとなっている。The phrase extraction unit 2C performs a collation search between the input character code string and character strings (words) registered in advance in the conversion dictionary 3, and extracts written characters containing kanji that correspond to the first character string. We are looking for a second string of characters. The conversion dictionary 3 has an input heading table area 3a, an output heading table area 3b, and a part of speech area 3C, as shown in FIG.
A first character string representing the reading is stored in the input heading guide jg3a, and a second character string consisting of written characters mixed with kanji corresponding to the first character string is stored in the output heading table area. 3b.

そして品詞領域３Ｃには、上記第１および第２の文字列
に対する品詞の情報を格納している。The part-of-speech area 3C stores part-of-speech information for the first and second character strings.

しかして文節抽出部２Ｃは、与えられた入力文字列に対
して、例えば公知の前方最長一致法により、変換辞書３
の入力見出し表領域３ａに予め登録されている文字列（
単語）を検索し、その活用語尾や付属語の解析等を行っ
て、前記入力文字列の頭部より最も長、く一致した入力
文字列部分を文節抽出結果としてめている。この時、上
記の活用語尾の解析は品詞領域３Ｃに格納された品詞項
目に基いて行われる。そして、この解析によって見出さ
れた文節抽出結果に対応する漢字混りの表記文字からな
る第２の文字列を前記出力見出し表領域３ｂから読出し
、これを出力している。更にこのとき、前記総文節系列
抽出部２１）は、前記文節抽出部２Ｃへの入力単位（文
節抽出結果）を、前記入力文字列に対して定められたブ
ロック区間における文節分割可能な組合せについて様々
に変化させ、その中の最も確からしい文節系列をめてい
る。Therefore, the phrase extraction unit 2C uses the conversion dictionary 3 for the given input character string, for example, using the known longest forward match method.
The character string (
The system searches for a word), analyzes its inflected endings and adjuncts, and selects the part of the input string that matches the length of the input string from the beginning as the clause extraction result. At this time, the above-mentioned analysis of the conjugated ending is performed based on the part-of-speech item stored in the part-of-speech area 3C. Then, a second character string consisting of notation characters including kanji characters corresponding to the phrase extraction result found by this analysis is read out from the output heading table area 3b and output. Furthermore, at this time, the total phrase series extraction unit 21) inputs the input unit (phrase extraction result) to the phrase extraction unit 2C into various combinations that can be divided into phrases in the block section determined for the input character string. , and found the most probable bunsetsu sequence among them.

仮名漢字変換部２は、このような文節系列に対してめら
れた仮名漢字混じり表記からなる第２の文字列を出力制
御部４の出力データメモリ５に転送している。出力制御
部４はこれらのデータを所定の表示出力形式に変換し、
文書表示用メモリ６を介して表示装置７に出力している
。The kana-kanji conversion unit 2 transfers a second character string composed of kana-kanji mixed notation determined for such a phrase series to the output data memory 5 of the output control unit 4. The output control unit 4 converts these data into a predetermined display output format,
It is output to the display device 7 via the document display memory 6.

ここで前記出力データメモリ５は第３図（ａ）に示すよ
うに組合せテーブル５ａ、マツピングデープル５ｂ、見
出し類テーブル５Ｃからなり、上記辞自検索部２によっ
て変換された見出し語を、各相合せの構造と併わせで記
憶するものである。この例は第４図（ａ）に示す日本語
文の例をデータ格納構造を表現したもので、組合せテー
ブル５ａは、前記入力文字列の文Ｗ５構造に対応して番
号付けされた各文節の並びを記述している。このテーブ
ル５ａの行は文節構造の解釈上の種類を、列は個々の文
節構造における文節の繋がりを順に記述したものとなっ
ている。即ち、最初のブロックでは（１！補か１種類、
２番目のブロックは２種類、３番目は３種類の系列候補
が有ることが示される。またこれらの各数値はマツピン
グテーブル５ｂの要素番号を示している。また各文節単
位毎に存在する同音異議語の見出し語をテーブル５Ｃで
グループ化して記憶し、マツピングテーブル５ｂの各要
素をポインタとして、見出し類テーブル５ｃの各グルー
プをそれぞれ記述している。このようにして入力の読み
仮名位置との対応関係も同時に記述している。As shown in FIG. 3(a), the output data memory 5 includes a combination table 5a, a mapping table 5b, and a heading type table 5C. It is stored together with the combination structure. This example represents the data storage structure of the example of the Japanese sentence shown in FIG. is described. The rows of this table 5a describe the interpretational types of bunsetsu structures, and the columns describe the connections of clauses in each bunsetsu structure in order. That is, in the first block (1! Complement or 1 type,
It is shown that there are two types of sequence candidates for the second block and three types of sequence candidates for the third block. Further, each of these numerical values indicates an element number of the mapping table 5b. In addition, the headwords of homophone dissonance words that exist for each phrase unit are grouped and stored in table 5C, and each group in headline class table 5c is described using each element of mapping table 5b as a pointer. In this way, the correspondence with the input reading kana position is also described at the same time.

第３図（ｂ）は前記出力データメモリ６の内容を表示”
ｉＮ＠７に出力する為の文字表示情報を記憶する前記文
書表示用メモリ６のテーブル構造である。このメモリ６
は前記出力データメモリ５の内容に基いて、同じ読み仮
名位置にある同音意義語や異構造文節列について比較照
合し、先ず入力文字列に対する全ての文節系列候補中の
共通の文節区切り箇所で前記入力文字列を分割し、さら
に共通の付属器文字部分および唯−通りめられる変換結
果部分とをそれぞれ他の部分から切離して記述している
。また、ある文節の頭部の見出し語文字に対して、同じ
読み仮名を付属語部分に持つ変換候補結果については、
先の文節頭部に合せて区切り、区切られた後ろ側の付属
語部分の文字列を次に続く文節に結合させて記述してい
る。即ち、成る文節系列の文節における付属語の文字で
ありても、対応する同じ読みの部分が他の文節系列中に
おいて自立語の始りの一部になっていれば、その付属語
の文字は次に続く文節の変換（候補に結合させて記述し
ている。これらの同音語は後述するように、その表示属
性を通常の表示属性とは異ならせて表示出力される。第
５図（ａ）〜（ｄ）はその表示例であり、斜線部が表示
属性の異なりを示している。この表示属性の変更は、例
えば表示文字の反転、ブリンク、輝度変更、下線付等に
よって行われる。FIG. 3(b) shows the contents of the output data memory 6.
This is a table structure of the document display memory 6 that stores character display information to be output to iN@7. This memory 6
is based on the contents of the output data memory 5, and compares and matches homophone meaning words and different structure clause strings located at the same pronunciation kana position, and first, the above-mentioned at a common clause break point among all clause series candidates for the input character string. The input character string is divided, and the common appendage character part and the only passable conversion result part are separated from the other parts and described. In addition, for conversion candidate results that have the same reading kana in the attached word part for the headword character at the beginning of a certain clause,
It is divided according to the beginning of the previous clause, and the character string of the adjunct after the division is joined to the next clause. In other words, even if it is a character of an adjunct word in a clause of a phrase series, if the corresponding part with the same reading is part of the beginning of an independent word in another phrase series, the character of the adjunct word is Conversion of the next clause (described by combining with candidates.As will be described later, these homophones are displayed and output with their display attributes different from the normal display attributes.Figure 5 (a) ) to (d) are display examples, and the shaded areas indicate different display attributes.Changes in display attributes are performed, for example, by inverting displayed characters, blinking, changing brightness, underlining, etc.

次に第３図に示す仮名文字入力例を用いて、上記仮名漢
字変換部２における仮名漢字変換処理につき説明する。Next, the kana-kanji conversion process in the kana-kanji conversion section 2 will be explained using the example of kana character input shown in FIG.

ブロック抽出部２ａは入力された文字系列の先頭から最
大Ｎ文節の系列を可能な限りめ体る。ここでＮを例えば
４とすると第４図（ａ）に示す例では、先づ入力系列全
体を文節抽出部２ｃにパノ〕し、最長一致法により「モ
して」を第１の文節結果として得る。次にこの文節切れ
目以降を始点く次の文節開始文字位置）として同様に最
長一致結果をめ、「こんどのと」なる文節を得る。この
ような処理を順に繰返して第１の文節系列候補「そして
／こんごのと／うしは／かいていし」を第４図（ｂ）中
の項目「ア」の如く得る。次にこの項目「ア」で得られ
た文節系列と別の文節系列を１ｑるために、第３番目の
文節結果「うしは」の最後の１文字、つまりその文節に
おいて付属語として解析される「は」を削除してこれを
文節抽出部２Ｃに送り、同様にして最長−致結果「うし
」なる文節を得、続く［は」で始まる文節として［は」
をめる。以下、同様にして上記第３、第２および第１の
文節により短い文節が１９られる都度、更にこれらに続
く別の文節列を順次水める。このようにして入力文字列
に対して文節分割可能な全ての４文節系列を第４図（ｂ
）に示す如くめる。尚この時、対応する漢字混じりの見
出し詔候補（第２の文字列）も同時にめておく。The block extraction unit 2a extracts a sequence of up to N phrases from the beginning of the input character sequence as much as possible. Here, if N is set to 4, for example, in the example shown in Figure 4(a), the entire input sequence is first panned to the clause extractor 2c], and by the longest match method, ``mo shite'' is set as the first clause result. obtain. Next, the longest matching result is found in the same way, with the next bunsetsu starting character position starting after this bunsetsu break, to obtain the bunsetsu ``kondonoto.'' Such processing is repeated in order to obtain the first clause series candidate ``And/Kongonoto/Ushiha/Kaiteishi'' as shown in item ``A'' in FIG. 4(b). Next, in order to 1q the bunsetsu series obtained for this item "A" and another bunsetsu series, the last character of the third clause result "Ushiha", that is, is analyzed as an adjunct in that clause. Delete "wa" and send it to the phrase extraction unit 2C, and in the same way, obtain the longest-matching phrase "ushi", followed by "ha" as a phrase starting with "ha".
I put it on. Thereafter, in the same manner, each time 19 short phrases are created by the third, second, and first phrases, another string of phrases following these phrases is sequentially watered. In this way, all four-clause sequences that can be divided into phrases for the input character string are divided into Figure 4 (b
). At this time, a corresponding heading edict candidate (second character string) containing kanji is also noted at the same time.

次にこれらの系列のうちで、その全体の長さが最長とな
る候補（文節系列）だけに着目する。このことは先に示
したように入力による文節数が最小どなる系列が、入力
目的とする変換結果【こ含っている傾向が高いと云うこ
とに立脚している。このことは、一つのブロックの文節
構成数が最小であればよく、また文節の構成数が同じで
あれば、そのブロックはより長い長さをもつことを意味
している。Next, among these sequences, attention is focused only on the candidate whose total length is the longest (the bunsetsu sequence). This is based on the fact that, as shown above, the sequence with the smallest number of clauses in the input is more likely to contain the conversion result intended for input. This means that it is sufficient that the number of clauses in one block is the minimum, and that if the number of clauses is the same, the block has a longer length.

しかして前記第４図（ｂ）に示される結果の中で、最長
の文節系列となるものは、項目「ア」と項目「つ」に示
されるものである。そこで次（ここれら文節系列が共通
に文節の切れ目をもってＱ）る個所を見つける。この例
では［そして／〜］と［〜は／〜］しか共通の文面区切
り箇所として５にめられる。ブロック抽出部２ａはこの
ような２つの位置をブロックの区切りと判定するもので
、第１ブロック区間を「そして」、第２ブロック区間を
「こんごのとうしは」とする。そして順次これらの区間
の文字列を総文節列解析部２１）に解析させ、それらの
変換結果を出力制御部４へ送って（Ａる。Among the results shown in FIG. 4(b), the longest phrase series are those shown in the item "a" and the item "tsu". Then, find the next (Q) where these phrase series have a common phrase break. In this example, only [and/~] and [~wa/~] are used as 5 as common text delimiters. The block extraction unit 2a determines these two positions as block divisions, and defines the first block section as "and" and the second block section as "Kongo no Toshiwa." Then, the character strings in these sections are sequentially analyzed by the total phrase string analysis section 21), and the conversion results are sent to the output control section 4 (A).

この結果、上記第１のブロック区間は「そして」のみの
候補となり、この情報が先ず出力制御０３４へ送られる
。尚、この場合、他に同音韻が無いので、そのまま文書
中の文字データ（変換結果）として通常形態で表示され
る。しかる後、第２ブロック区間の解析が行われる。As a result, the first block section becomes a candidate for only "and", and this information is first sent to the output control 034. In this case, since there are no other homophones, the text is displayed as character data (conversion result) in the document in its normal form. After that, the second block section is analyzed.

ここで、前記総文節列解析部２ｂは、与えられたブロッ
ク区間の読みに対応した文字系列を総当りでめるもので
あるが、実際にはＭ４図（ｂ）に示すように既に文節系
列がめられているので、その指定区間の範囲に対応する
ものだけ選べば十分である。そして優先度の評価として
、例えば文節数最小の構成の候補だけを選ぶと、その解
析結果は第４図（ｄ）の項目「π」のようになる。勿論
、その他の文節候補列も出力データメモリ５に与えてお
くようにしてもよい。例えば「今後の／問う７誌は」を
も、出力結果に加えることも可能である。Here, the total bunsetsu string analysis unit 2b is to use a round-robin method to find character sequences corresponding to the pronunciations of a given block section, but in reality, as shown in Figure M4 (b), the bunsetsu sequence Therefore, it is sufficient to select only those that correspond to the range of the specified interval. As a priority evaluation, for example, if only the candidates with the minimum number of clauses are selected, the analysis result will be as shown in the item "π" in FIG. 4(d). Of course, other clause candidate sequences may also be provided to the output data memory 5. For example, it is also possible to add "What are the 7 magazines to ask about in the future?" to the output results.

さて、ブロック抽出部２ａは、次に前記入力文字系列の
うちで未だにブロックが決定していない残りの部分、つ
まり［かいていしげ・・・・・・」なる文字列について
、同様の方法でブロックの単位をめ、第４図（ｄ）の項
目ｒＩ［ＩＪに示す如き変換結果をめる。続いて変換結
果「限って」を第３図（ｄ）の項目ｒ　ＩＶ　Ｊのよう
にめ、その入力系列全体に対する変換処理を終了するこ
とになる。Next, the block extraction unit 2a extracts blocks in the same way for the remaining part of the input character sequence for which blocks have not yet been determined, that is, the character string [Kaiteishige...]. Find the unit and enter the conversion result as shown in item rI[IJ in FIG. 4(d). Subsequently, the conversion result "Limited" is set as the item r IV J in FIG. 3(d), and the conversion process for the entire input series is completed.

以上の変換結果は各ブロック毎に出力制御部４へ送られ
る。出力、制御部４は出力データメモリ５に格納された
各ブロック毎のデータをそれぞれ変換し、文書表示用メ
モリ６に順次スタックし、表示装置７に出力する。即ち
、出力制御部４Ｔ：（ま文書表示用データを前記第３図
（ｂ）の如く作成し、これを先ず第４図（ａ）に示すよ
うに表示装置７に出力している。上記の例では、「今後
のと」の文節候補に対しては、自立語部が゛今後°“と
して、付属イｎ部がパのと“′として分離される。文節
候補「牛は」も同様に“牛″と″は″に分離される。The above conversion results are sent to the output control section 4 for each block. The output/control unit 4 converts each block of data stored in the output data memory 5, sequentially stacks it in the document display memory 6, and outputs it to the display device 7. That is, the output control section 4T: (prepares document display data as shown in FIG. 3(b), and first outputs it to the display device 7 as shown in FIG. 4(a). In the example, for the bunsetsu candidate ``Shiki no to'', the independent word part is separated as ``Shikoku °'', and the attached in part is separated as Pa no and ``'.The same applies to the bunsetsu candidate ``Ushiwa''. “Cow” and “” are separated into “”.

また第２文節系列候補では自立語部が゛今後″、付属器
部が゛のパとして分離され、「投資は」は゛投資″と゛
は”′とに分離される。従ってこの場合、゛の′°と゛
は″は共に共通の付属器文字であり、また゛今後″には
他に同音語がないので、これらの文字部分については通
常の表示形態で表示される。また他の文字については複
数の変換候補（同音異字）が存在することから、これを
強調して示すために例えば高輝度で表示される。つまり
、文節「今後のと」における゛と″は付ＲＨＥではある
が、次の文節単語「牛」と共に扱われる。Furthermore, in the second clause series candidate, the independent word part is separated as ``from now on'', the adjunct part as ``pa'', and ``investment wa'' is separated into ``investment'' and ``wa''. Therefore, in this case, ``'°'' and ``wa'' are both common appendage characters, and since there is no other homophone for ``hereafter'', these character parts are displayed in the normal display form. Furthermore, since there are a plurality of conversion candidates (homophones and allographs) for other characters, they are displayed with high brightness, for example, to emphasize these candidates. In other words, although `` and '' in the phrase ``Koku no To'' are added RHE, they are treated together with the next phrase word ``ushi''.

従って、表示装置７には、［そして今後の［と牛コは［
改定し限界弁に］限って」と表示されることになる。尚
、［］内は高輝度表示される文字を示している。Therefore, the display device 7 shows [and future [and Ushiko is [
``Limited to limit valves'' will now be displayed. Note that characters in brackets [ ] indicate characters displayed with high brightness.

ここで目的とする見出し語の選択の為に、入力装置１に
は例えば第１図中１８に示すように選択キーが設けられ
ている。この選択キー１ａは、例えば第５図に示すよう
に［と牛〕の部分にカーソル１ｂを合わせ、この状態で
前記選択キー１ａが操作されたとき、その表示を次の候
補に変更する役割を担うものである。従ってこの場合に
は、第４図＜ａ＞に示されるテーブル５ａの第１行目の
４％造に代えて、第２行目に示す文節列構造、即ち［今
後の［投資］は」が第５図（ｂ）に示す如く出力表示さ
れる。In order to select the desired headword here, the input device 1 is provided with a selection key as shown at 18 in FIG. 1, for example. This selection key 1a has the role of changing the display to the next candidate when the cursor 1b is placed on the [and cow] part as shown in FIG. 5, and the selection key 1a is operated in this state. It is the responsibility of Therefore, in this case, instead of the 4% construction in the first row of table 5a shown in FIG. The output is displayed as shown in FIG. 5(b).

そして、更に前記選択キー１ａを操作していくと［投資
」が［闘志］、［透視コの如く、順次他の同音語に変更
され、再び元の［と牛］の表示に戻ることになる。尚、
これらの動作は前記第３図（ｂ）に示した文書表示用メ
モリ６に（８納されたデータに基づいて行われる。Then, as the selection key 1a is further operated, "investment" will be changed to other homophones one after another, such as "fighting spirit" and "clairvoyance", and will return to the original display of "and cow". . still,
These operations are performed based on data stored in the document display memory 6 shown in FIG. 3(b).

ところで前記第３図（ｂ）に示される文書対応テーブル
６ａは、表示装置７の表示画面上の座慄値１１〜ｉ７と
出力データ（変換結果）との対応を表したものである。By the way, the document correspondence table 6a shown in FIG. 3(b) represents the correspondence between the shock values 11 to i7 on the display screen of the display device 7 and output data (conversion results).

表示語テーブル６１＋は出力データの内容を示している
。しかして゛変換候補の表示は、先ず文書対応テーブル
６ａのデータ１３によって表示語テーブル６ｂの「■と
牛」が指示され、これが表示される。しかる後、前記選
択キー１ａが操作されると、データ１３によってそのブ
ロックに；Ｊ５ける次の候補「■投資」にポインタが進
められ、表示が切換えられる。このようにしてポインタ
が進められて「■凍死」まで表示が切替えられると、そ
の次には再び「■と牛」に戻るように制御されている。The display word table 61+ shows the contents of the output data. However, in displaying the conversion candidates, first, "■ and cow" in the display word table 6b is designated by data 13 in the document correspondence table 6a, and this is displayed. Thereafter, when the selection key 1a is operated, the pointer is advanced to the next candidate "Investment" in that block according to the data 13; J5, and the display is switched. In this way, when the pointer is advanced and the display is switched to "■ Freeze to death", the display is controlled to return to "■ and cow".

この表示語テーブル６ｂに示されるように、前述したよ
うに文節単位に基いて解析された複数の変換候補結果は
、同音異字の関係に従って整理され、部分的にその区切
りの単位が変更されている。As shown in this display word table 6b, the multiple conversion candidate results analyzed based on the bunsetsu unit as described above are organized according to homophone and allograph relationships, and the unit of separation is partially changed. .

同様にして「かいていしげ・・・」なる文字列について
も、自立語部が゛改定″、付ａ語部が゛′シ″として扱
われ、”　ｔ、　”の部分については他に″資源″とい
う文節単語が存在しているので、この゛シ″は次の文節
パ限界″と結合して出力される。続く「発に」までは全
候補が共通に持つ単語の区切りがないので、これらはま
とめて出力される。なお機械処理上、自立語は１単詔毎
に文節として扱われる。Similarly, for the character string "Kaitishige...", the independent word part is treated as "revised", the suffix a word part is treated as "'shi", and the "t" part is treated as "resource". Because there is a phrase word, this "stake" is output by combining it with the next phrase pyramid. Since there is no break between the words that all candidates have in common until the following "Hatsu ni", these are output all at once. Note that in machine processing, each independent word is treated as a clause.

第６図乃至第９図は本装置の上述した処理の流れを示す
ものである。第６図に示す制御フローにおいて、入力装
置１からから得られる入力キーコードは常時調べられて
おり、その入力コードが日本語文の読みに対応する仮名
文字コードであれば、順次スタックに蓄えられる。また
上記入力コードが変換要求を示すものであれば第７図に
示される仮名°漢字変換処理が行われる。この変換要求
は、入力装置が変換要求キーを有している場合、オペレ
ータが適当な長さの文字列を入力後、この変換要求キー
を打鍵することにより光生される。また入力装置が変換
要求キーを有するか否かに拘らず、入力文字が例えば句
読点を示すコードであったことを検出した場合には、上
記変換要求を自動的に発生することが望ましい。6 to 9 show the flow of the above-mentioned processing of this apparatus. In the control flow shown in FIG. 6, the input key codes obtained from the input device 1 are constantly checked, and if the input codes are kana character codes corresponding to the pronunciation of a Japanese sentence, they are sequentially stored in the stack. If the input code indicates a conversion request, the kana/kanji conversion process shown in FIG. 7 is performed. If the input device has a conversion request key, the conversion request is generated by the operator inputting a character string of an appropriate length and then pressing the conversion request key. Further, regardless of whether the input device has a conversion request key or not, it is desirable to automatically generate the conversion request when it is detected that the input character is a code indicating a punctuation mark, for example.

また入力コードが前記選択キー１ａに対応したものであ
れば、第８図に示す同音語選択処理を行ない、その他の
コード、例えば訂正、挿入、削除等のコートの場合には
既に表示された文章について用東処理が行なわれる。ま
た第９図は本発明における第７図中の変換候補の編集出
力処理を示すものである。If the input code corresponds to the selection key 1a, the homophone selection process shown in FIG. Yoto processing is performed on the Further, FIG. 9 shows the editing and output processing of the conversion candidates in FIG. 7 according to the present invention.

尚、一般にワードプロセッサでは、同音語の）■択を各
変換結果毎に逐次実行するものと、例えば１頁分の文字
列の入力後に一括して選択するものが知られているが、
本発明装置にあっては、そのいずれの方式であってもよ
い。Generally speaking, word processors are known to perform the selection of homophones) for each conversion result sequentially, and to select them all at once after inputting, for example, one page of character strings.
In the device of the present invention, either of these methods may be used.

以上説明したように本装置によれば、比較的長い入力仮
名文字列を解析し、仮名漢字混りの文字列列に変換して
日本語文章を作成していく際、その結果として生ずる多
数の文節列候補を、単純明快に表示することができ、オ
ペレータの同音語選択の操作の能率を上げることができ
る。つまり、一つの文節候補における同音語の選択ばか
りではなく、文節区切りの異なる異文節系列の候補につ
いても前述したように一括して同音字の選択を行い得る
。従って、同音１１Ｂおよび同音異文節の変換候補文字
から目的とする仮名漢字変換文字を簡易に選択して日本
語文章を極めて効率良く作成することができる。またオ
ペレータの負担を大幅に軽減することができ、その実用
的利点は絶大である。As explained above, according to this device, when a relatively long input kana character string is analyzed and converted into a character string containing kana and kanji to create a Japanese sentence, a large number of characters are generated as a result. The phrase string candidates can be displayed simply and clearly, and the efficiency of the operator's homophone selection operation can be improved. In other words, it is possible to select homophones not only for one phrase candidate, but also for candidates in a series of different phrases with different phrase breaks as described above. Therefore, it is possible to easily select the target kana-kanji conversion characters from the conversion candidate characters of homophone 11B and homophone different phrases, and to create a Japanese sentence extremely efficiently. Furthermore, the burden on the operator can be significantly reduced, and its practical advantages are enormous.

尚、本発明は上記実施例に限定されるものではない。例
えば、表示出力用の同音語を順次文書中表示することに
代えて、同音語グループの詔旬を文言の下方に表示し、
これをオペレータが選択するようにしても良い。また同
音語グループの中から目的とする文字が選ばれないどき
には、その後の変換結果を表示しないようにして、その
選択を促すようにしても良い。要するに本発明はその要
旨を逸脱しない範囲で種々変形して実施することができ
る。Note that the present invention is not limited to the above embodiments. For example, instead of displaying homophones for display output sequentially in the document, the homonym group Shushun is displayed below the text,
This may be selected by the operator. Furthermore, when the desired character is not selected from the homophone group, subsequent conversion results may not be displayed to prompt the user to select it. In short, the present invention can be implemented with various modifications without departing from the gist thereof.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す装置概略構成図、第２
図は変換辞書のメモリの構成を示づ図、第３図は出力デ
ータメモリと文＠表示用メモリの構成を示す図、第４図
は入力文字列とその変換処理を示す図、第５図は変換！
戻？ｔｔｉの表示例を示す図、第６図乃至第９図は変換
処理の制御フローの一例を示す図である。１・・・入力装置、２・・・仮名漢字変換部、３・・・
変換辞占、４・・・出力制御部、５・・・出力データメ
モリ、６・・・制御テーブル、７・・・表示装置、１ａ
・・・第１のｉｌｌ平キー２ａ・・・ブロック抽出部、
２ｂ・・・総文節系列抽出部、２Ｃ・・・文節抽出部、
３ａ・・入力見出し表領域、３ｂ・・・出力見出し表領
域、６ａ・・・文内対応テーブル、Ｇｂ・・・表示語テ
ーブル。出願人代理人　弁理士　鈴江武彦第３図（ｂ）第４ｉ２！（ｄ）第５０第７　図第８１Ｅ第９　ＪFIG. 1 is a schematic configuration diagram of an apparatus showing one embodiment of the present invention, and FIG.
Figure 3 shows the configuration of the memory of the conversion dictionary, Figure 3 shows the configuration of the output data memory and sentence @ display memory, Figure 4 shows the input character string and its conversion process, Figure 5 is converted!
Return? FIGS. 6 to 9 are diagrams showing an example of a display of tti, and FIGS. 6 to 9 are diagrams showing an example of a control flow of conversion processing. 1... Input device, 2... Kana-kanji converter, 3...
Conversion dictionary, 4... Output control unit, 5... Output data memory, 6... Control table, 7... Display device, 1a
. . . first ill plain key 2a . . . block extractor,
2b... Total clause series extraction unit, 2C... Clause extraction unit,
3a...Input heading table area, 3b...Output heading table area, 6a...Intra-sentence correspondence table, Gb...Display word table. Applicant's agent Patent attorney Takehiko Suzue Figure 3 (b) 4i2! (d) No. 50 No. 7 Fig. 81E No. 9 J

Claims

[Claims]

(1) An input device for obtaining a series of input character strings, a dictionary search unit in which a plurality of words are registered in the dictionary, and phrase sequence extraction for extracting a sequence of phrase units from the input character string using the dictionary search unit. and a result output unit that converts each clause found by the clause series extraction unit into a kana-kanji mixed notation corresponding to the pronunciation of the clause and outputs the result output unit. When the series extraction unit obtains multiple clause series for an input character string, the earliest clause start point among the mutually different break positions of these clause series is set as a reference point, and the clauses that exist after this reference point are A kana-kanji conversion device characterized in that the independent word part of the next clause is combined with the adjunct part of the previous clause separated by the reference point to perform kana-kanji conversion as a unit of one conversion candidate.

(2) The result output unit displays and outputs the units of conversion candidates after the reference point by changing the display attributes by reversing the displayed characters, blinking, changing the fi degree, or adding underlining. The kana-to-kanji conversion device according to item 1.

(3) The display results of conversion candidate units after the reference point are to output the kana-kanji conversion results of multiple conversion candidate units having a common reference point in order in response to a request to switch to the next homophone candidate. A kana-to-kanji conversion device according to claim 1.