JPH0336660A

JPH0336660A - Document preparing device

Info

Publication number: JPH0336660A
Application number: JP1171074A
Authority: JP
Inventors: Yukihiro Karasaki; 幸弘唐崎
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-07-04
Filing date: 1989-07-04
Publication date: 1991-02-18

Abstract

PURPOSE:To improve efficiency for preparing a document by storing a pair of word by learning and one of this pair of the words as a determined word, referring this determined word and preparing the document from the reading input of a character string. CONSTITUTION:When the reading of the character string is inputted from a keyboard 4, a CPU 1 executes the program processing of a Japanese syllabary (KANA)/Chinese character (KANJI) converting program and an input/output control program, etc., in a ROM 2. Then, it is decided whether the converted KANJI of the former or latter word out of a pair of the words exists in the determined word buffer of a RAM 3 or not. When such a KANJI exists, the pair of the words based on learning are provided in a word order table and the word order table is referred. Then, the pair of the words including correspondent connection are selected and the document is prepared. Thus, the document can be speedily and exactly prepared.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、読みを単語に変換し、変換の結果同音異義語
が存在するとき、この中で適切な語を優先して出力し得
る機能を持った文書作成装置に関する。[Detailed description of the invention] [Objective of the invention] (Industrial application field) The present invention converts pronunciations into words, and when homophones exist as a result of the conversion, priority is given to appropriate words among them. The present invention relates to a document creation device having a function of outputting data.

（従来技術）日本語文書作成装置では、キーボードや音声等によって
文字列の「読み」を人力し、読みと漢字混じり文字列（
平仮名、片仮名文字列の場合もある）との対を記憶した
辞書メモリを入力された「読み」で検索し、辞書メモリ
から得られる文字列を表示装置等に出力して漢字混じり
文字列を得る機能を有する。(Prior art) In Japanese document creation devices, the "reading" of character strings is manually performed using a keyboard or voice, etc., and character strings containing readings and kanji (
The dictionary memory that stores pairs of hiragana and katakana character strings (sometimes hiragana and katakana character strings) is searched for the input ``yomi'', and the character string obtained from the dictionary memory is output to a display device to obtain a kanji-mixed character string. Has a function.

しかし、日本語の特性として１つの「読み」に対応して
、複数の同音異義語が存在しており、変換結果として複
数の同音異義語が出力される。このため、日本語文書作
成装置には、オペレータに複数の同音異義語の中から所
望の単語を選択させるための機能（同音異義語選択機能
）が備えられている。例えば、同音異義語の１つの代表
を変換結果の文字列中に他の文字とは異なる形式で表示
（反転表示、上線付き表示、高輝度表示等）する。However, as a characteristic of the Japanese language, there are multiple homonyms corresponding to one reading, and multiple homonyms are output as the conversion result. For this reason, the Japanese document creation device is equipped with a function (homonym selection function) that allows the operator to select a desired word from among a plurality of homophones. For example, one representative of the homophones is displayed in a format different from other characters in the character string of the conversion result (inverted display, overlined display, high brightness display, etc.).

そして、この同音異義語がカーソル等で指示されている
ときキー（［次候補コキー）の押下によって同音異義語
が順番に表示され、所望の同音異義語が表示されている
とき他のキー（［選択］キー）の押下でその単語が文書
文字列中に記憶され、選択確定されるようになっている
。Then, when this homophone is indicated with a cursor etc., the homophones are displayed in order by pressing the key ([Next candidate key), and when the desired homophone is displayed, pressing the other key ([ When the user presses the [Select] key, the word is stored in the document string and the selection is confirmed.

さて、上記同音異義語の選択において、オペレータが所
望する単語が最初の変換結果としてできるだけ多く出力
されていれば、前記［次候補］キーの押下回数が減り文
書作成効率が向上する。例えば、「しよう」なる読みに
は、「仕様、使用、試用、私用、子葉、枝葉、しよう」
という同音異義語が存在する。この順番でいつも同音異
義語の表示が行われるとしたと、「私用」を所望するオ
ペレータは常に３回の［次候補］キーの押下を行わなけ
ればならない。In selecting the homophones, if as many words desired by the operator as possible are output as the initial conversion result, the number of presses of the [Next Candidate] key will be reduced and document creation efficiency will be improved. For example, the reading ``Let's'' includes ``specification, use, trial, personal use, cotyledon, branch and leaf, let's''.
There is a homophone. Assuming that homophones are always displayed in this order, an operator who desires "private use" must always press the [Next Candidate] key three times.

この問題を解消するために、日本語文書作成装置では２
つの方法が採られている。In order to solve this problem, Japanese document creation devices have two
Two methods have been adopted.

第１の方法（短期学習機能）は、最近に選択された単語
を他の同音異義語より優先して表示することである。例
えば、荊記例で「私用」が選択されたとしたと、次の「
しよう」の読み入力に対しては「私用、仕様、使用、試
用、子葉、枝葉、しよう」の順番で表示される。これは
、文章の中で、同じ単語が続けて選択される可能性が高
いという特性を利用している。The first method (short-term learning function) is to display recently selected words in preference to other homophones. For example, if "Private" is selected in the Jingji example, the following "
In response to the pronunciation input of ``Let's do it,'' it is displayed in the order of ``Private use, Specification, Use, Trial, Cotyledon, Branch and Leaves, Let's.'' This takes advantage of the characteristic that the same word is likely to be selected consecutively in a sentence.

第２の方法（使用頻度学習機能）は、各単語の選択頻度
を係数しておき、最も選択頻度の高い単語を優先して出
力する方法である。例えば、「仕様（３）、使用（５）
、試用（２）、私用（４）、子葉（Ｏ）、枝葉（１）シ
よう（０）　Ｊ　　（０内は選択頻度）の場合、「使用
、私用、仕様、試用、枝葉、子葉、しよう」の順番で表
示される。そして、「私用」が更に２回選択されると次
からの「しよう」の読みに対しては、「私用、使用、仕
様、試用、枝葉、子葉、しよう」の順に同音異義語が表
示される。The second method (use frequency learning function) is a method in which the selection frequency of each word is calculated as a coefficient, and the word with the highest selection frequency is given priority and output. For example, "Specifications (3), Usage (5)
, trial (2), private (4), cotyledon (O), branch (1), copy (0) J (0 is the selection frequency) , Let's do it.'' Then, when "private" is selected two more times, homonyms will be displayed for the next pronunciation of "private" in the order of "private, use, specification, trial, branch, leaf, cotyledon, sho." be done.

これは、作成される文書の属する分野（手紙、特許明細
書等）、又は、装置が使用されている環境（法律事務所
、病院等）で使用される単語に偏りがあるという特性を
利用している。尚、上記２つの方法を併用するものもあ
る。This takes advantage of the fact that there are biases in the words used in the field of the document being created (letters, patent specifications, etc.) or in the environment in which the device is used (law offices, hospitals, etc.). ing. Note that there are also methods that use the above two methods together.

（発明が解決しようとした課題）しかし、上述した文書作成時の特性が常に利用できると
は限らない。(Problem to be Solved by the Invention) However, the above-mentioned characteristics at the time of document creation cannot always be used.

例えば、「私用で使用」なる文字列を得たい場合、短期
学習機能が働くと、 ■「しようでしよう」との読み入力。For example, if you want to obtain the character string "for personal use," when the short-term learning function is activated, you can input the pronunciation of "sho de sho."

■例えば「使用で使用」　（アングラインはその単語が
未選択の状態を示す）との変換候補の表示。■For example, displaying conversion candidates for "use" (an angline indicates that the word is not selected).

■前の「使用」を次候補表示させて「私用」で選択する
と、「私用で稈」」となり、後ろの「使用」まで短期学
習機能が働いて、「私用」に代わってしまう。■If you display the next option for the previous "Use" and select "Private use", it will become "Private use", and the short-term learning function will work until the next "Use", and it will be replaced with "Private use". .

■再度、後ろの「私用」に対し、［次候補］キーを押し
て「使用」を表示させ選択。■Again, press the [Next option] key for "Private use" to display "Use" and select it.

の現象が起こり、かえって操作効率を悪化させる。This phenomenon occurs, which actually worsens operational efficiency.

使用頻度学習機能の場合には、常に頻度の高いものが優
先される。従って、「私用」の使用頻度が「使用」に比
べて高いと、変換結果は常に「私用で私用」である。又
、「使用」の使用頻度が「私用」にまさると、今度は常
に「使用で使用」となり、所望の「私用で使用」の変換
結果を最初から得ることはできない。In the case of the usage frequency learning function, priority is always given to the most frequently used item. Therefore, if the usage frequency of "private use" is higher than that of "use", the conversion result is always "private use and private use". Furthermore, if the frequency of use of "use" exceeds that of "private use," it will always become "use by use," and the desired conversion result of "use by private use" cannot be obtained from the beginning.

本発明は、上記問題に鑑み、前の単語と後ろの単語との
関係（「共起関係」という）を学習記憶することにより
、学習された共起関係を利用して同音異義語の中から所
望のものが優先して得られる日本語文書作成装置の提供
を目的とした。In view of the above problems, the present invention learns and memorizes the relationship between the previous word and the following word (referred to as "co-occurrence relationship"), and uses the learned co-occurrence relationship to select from homophones. The purpose of this invention is to provide a Japanese document creation device that gives priority to what is desired.

［発明の構成］（課題を解決するための手段）上記目的を達成するため、本発明の文書作成装置は、２
つの単語の対を記憶する記憶手段と、（ａｌ　同音異義
語の選択人力手段によって同音異義語の中からある単語
が選択されたとき、この単語とこの単語の前方又は後方
に存在する単語との対を前記記憶手段に登録するととも
に、（ｂ）読みの単語への変換時において、第１の単語が変
換結果として出力された後、この単語の後方で入力され
た読みに対して複数の同音異義語が出力されるとき、前
記第１の単語と前記同音異義語の内のいずれかの単語と
の対が前記記憶手段に登録されているか否かを調べ、前
記同音異義語の中から前記記憶手段に登録されている単
語を優先的に変換結果として出力する処理部とを具備し
ている。[Structure of the invention] (Means for solving the problem) In order to achieve the above object, the document creation device of the present invention has the following features:
a storage means for storing pairs of words; (b) At the time of converting a reading into a word, after the first word is output as a conversion result, a plurality of homophones are registered for the reading input after this word. When a synonym is output, it is checked whether a pair of the first word and any one of the homophones is registered in the storage means, and the first word is selected from among the homophones. and a processing unit that outputs words registered in the storage means as conversion results preferentially.

（作用）上記構成によって、ある単語の選択時点で、この単語と
他の単語（前又は後ろに来る可能性のある）との共起関
係が学習されていき、以降の変換時にこの学習結果を利
用して同音異義語の優先出力を行えるため文書の作成効
率が向上する。(Operation) With the above configuration, when a certain word is selected, the co-occurrence relationship between this word and other words (which may come before or after it) is learned, and this learning result is used during subsequent conversion. This can be used to prioritize output of homonyms, improving document creation efficiency.

（実施例）第１図は、本発明を実施した日本語文書作成装置のシス
テム構成を示す図である。(Example) FIG. 1 is a diagram showing the system configuration of a Japanese document creation device that implements the present invention.

ＣＰＵＩの管理の元に、ＲＯＭ２、ＲＡＭ３キーボード
４、表示装置５、外部記憶装置（例えばフロッピーディ
スク装置）６が接続されている。A ROM 2, a RAM 3, a keyboard 4, a display device 5, and an external storage device (for example, a floppy disk device) 6 are connected under the control of the CPUI.

キーボード４、表示装置５、外部記憶装置６には各々コ
ントローラやバッファメモリ（例えば表示装置５には表
示イメージを記憶するビデオＲＡＭ）が接続され、デー
タの記憶、データ入出力の制御を行っているが、周知の
事項なので図面からは省略されている。キーボード４に
は、「読み」を人力するための［文字］キー　「読みｊ
の変換を起動する［変換］キー、変換候補（同音異義語
）の順次表示を指示する［次候補コキー、変換候補の選
択確定を指示する［選択］キー　カーソルを表示画面中
で移動させる［カーソル］キーが設けられている。A controller and a buffer memory (for example, a video RAM for storing display images in the display device 5) are connected to the keyboard 4, display device 5, and external storage device 6 to control data storage and data input/output. However, since this is a well-known matter, it is omitted from the drawing. On the keyboard 4, there is a [character] key for manually reading "yomi".
[Convert] key to start the conversion of , [Next candidate key] to instruct the display of conversion candidates (homonyms) in sequence, [Select] key to instruct to confirm the selection of the conversion candidate. [Cursor] key to move the cursor within the display screen. ] key is provided.

ＲＯＭ２には、本発明に関係するかな漢字変換用のプロ
グラム、及び、共起関係学習のプログラム、その他デー
タ入出力の制御プログラム、画面エディタ用のプログラ
ム等日本語文書作成装置に必要な機能をＣＰＵＩやその
他端末機器４．５．６に行わせるプログラムが記憶され
ている。ＲＯＭ２に記憶されたプログラムで特に本発明
に関係する部分については、後でフローチャートを用い
て説明する。また、ＲＯＭ２には、「読み」とこの読み
に対応する単語（漢字混じり文字列、片仮名語等）との
対が記憶された辞書が記憶される。The ROM 2 contains functions necessary for the Japanese document creation device such as a program for kana-kanji conversion related to the present invention, a program for co-occurrence relationship learning, a data input/output control program, a screen editor program, etc. Other programs to be executed by the terminal equipment 4.5.6 are also stored. Parts of the program stored in the ROM 2 that are particularly relevant to the present invention will be explained later using flowcharts. Further, the ROM 2 stores a dictionary in which pairs of "yomi" and words corresponding to the "yomi" (character strings containing kanji, katakana words, etc.) are stored.

ＣＰＵＩは、かな漢字変換プログラムの制御の元に、「
読み」でこの辞書を検索することにより、読みに対応す
る単語を辞書から読み出すことができる。また、各単語
には番号（「辞書番号」というが付されており、ＣＰＵ
ｌ−は、「読み」による単語の検索の他に辞書番号によ
っても辞書から単語を読み出すことができる。更に、Ｒ
ＯＭ２には、作成された文書や変換結果の文字列を表示
装置５に表示させるための文字パターンも記憶されてい
る。Under the control of the Kana-Kanji conversion program, the CPU
By searching this dictionary for ``Yomi,'' the word corresponding to the yomi can be read from the dictionary. In addition, each word is assigned a number (called a "dictionary number"), and the CPU
In addition to searching for words by "reading", l- can also read words from the dictionary by dictionary number. Furthermore, R
The OM2 also stores character patterns for displaying created documents and character strings as conversion results on the display device 5.

ＲＡＭ３には、かな漢字変換に使用される各種バッファ
（入力された読みを記憶する人力バッファ、同音異義語
が積まれるスタック、確定文節を記憶する文節バッファ
等）、作成された文書を構成する文字の文字コード（改
行等を示す制御文字のコードも含む）を記憶する文書バ
ッファ、共起関係を記憶する共起テーブルが設けられて
いる。RAM3 stores various buffers used for kana-kanji conversion (a manual buffer that stores the input pronunciation, a stack that stores homonyms, a bunsetsu buffer that stores fixed phrases, etc.), and the characters that make up the created document. A document buffer for storing character codes (including codes for control characters indicating line breaks, etc.) and a co-occurrence table for storing co-occurrence relationships are provided.

第２図は、本実施例で利用される共起、テーブルの内部
構成を示す。共起テーブルとは、単語同志の引合い関係
（「漢字」と「変換」とのつながりは優先するというよ
うな関係）を記憶したテーブルである。テーブルの１つ
のエントリには、「前にくる単語」と「後ろにくる単語
」との対と、両単語の間のつなぎの種類を示すコード（
助詞の種類、連接関係（「漢字変換」のように直接つな
がるもの）、前の単語が用言であるときの活用語尾等）
とが記憶される。「つなぎの種類（単語間の文法的なつ
ながり）を示すコード」は、例えば［０＝連接（″共同
”十“募金”走り”＋”込み”のようなつながり）］　
　　［１＝”の”］、［２＝”が“］、［３＝　に”］
、［４＝　を”］・・・・・・［８＝連体形］、［９＝
未然形］の様に設定されている。「前にくる単語」と「
後ろにくる単語」とは各々辞書番号の形で記憶される。FIG. 2 shows the internal structure of the co-occurrence table used in this embodiment. A co-occurrence table is a table that stores reference relationships between words (such as a relationship where the connection between "kanji" and "conversion" is prioritized). One entry in the table contains a pair of ``words that come before'' and ``words that come after'' and a code (
types of particles, conjunctions (directly connected like in ``kanji conversion''), conjugation endings when the previous word is a pragmatic, etc.)
is memorized. The "code indicating the type of connection (grammatical connection between words)" is, for example, [0 = conjunction (connections such as "community", "ten", "donation", running" + "include")]
[1="'s"], [2="is"], [3="to"]
, [4= wo”]...[8=adjunct form], [9=
It is set like [unnatural form]. “The word that comes before” and “
The following words are each stored in the form of a dictionary number.

また、文書バッファとは別に、選択・確定され単語（同
音異義語が存在せず一意に確定した単語も含む）の確定
単語バッファがＲＡＭ３に設けられている。この確定単
語バッファの内部構成を第３図に示す。この実施例では
、確定単語バッファは約１頁分の確定単語を記憶し得る
容量を持つ。In addition to the document buffer, a confirmed word buffer for selected and confirmed words (including uniquely confirmed words with no homophones) is provided in the RAM 3. The internal structure of this confirmed word buffer is shown in FIG. In this embodiment, the confirmed word buffer has a capacity to store approximately one page's worth of confirmed words.

読み人力と変換が進む度に、変換の結果得られる文節単
位で「未選択を示すコード」と「文節のつなぎの種類を
示すコード（前述のもの）」が登録されていく。また、
文末の文節（句点、改行等の制御コードの前の文節）で
は、「文節のつなぎの種類を示すコード」に代わりに、
「文の切れ目であることを示すコード」が記憶される。As the reading skill and conversion progresses, a ``code indicating unselected'' and ``code indicating the type of clause link (as described above)'' will be registered for each phrase obtained as a result of the conversion. Also,
In the final clause of a sentence (the clause before a control code such as a full stop or line break), instead of the "code indicating the type of clause connection",
A "code indicating that it is a break in a sentence" is stored.

そして、候補となった単語が選択◆確定されたとき、こ
の単語に対応した前記「未選択を示すコード」が選択さ
れた候補の辞書番号に置き換えられる。また、「読み」
からの変換の結果−意に確定する単語にあっては、「未
選択を示すコード」の代わりに、当該単語の辞書番号が
始めから記憶される。Then, when the candidate word is selected and confirmed, the "code indicating unselected" corresponding to this word is replaced with the dictionary number of the selected candidate. Also, “reading”
As a result of conversion from - For words that are determined at will, the dictionary number of the word is stored from the beginning instead of the "code indicating unselected".

以下、第４図、第５図を参照して共起関係学習のプログ
ラムの説明を行う。The co-occurrence relationship learning program will be explained below with reference to FIGS. 4 and 5.

今、第５図（ａ）に示されているように、表示装置５の
画面に「（のＩな星１をする。」という未選択の文が表
示されているとした。オペレータが［カーソル］キーを
「感じ」のところにカーソルを持っていくと、「凰１」
の単語が選択対象になる（ステップ１０）。また、文書
バッファ中の未選択の語の位置には「未選択を示すコー
ド」と「同音異義語がスタックされているバッファの番
地」が記憶されており、［選択］キーが押された時には
、表示されている文字列の先頭から米選「感じ」が未選
択の単語の先頭であれば、［選択］キーを押下すると「
感じ」が選択対象となり、カーソルがその桁位置に飛ぶ
。Now, as shown in FIG. 5(a), it is assumed that an unselected sentence "I give 1 star to (I)" is displayed on the screen of the display device 5.The operator selects the cursor ] key and move the cursor to "feeling", "凰1"
The word becomes the selection target (step 10). In addition, at the position of an unselected word in the document buffer, a ``code indicating unselected'' and ``address of the buffer where homophones are stacked'' are stored, and when the [Select] key is pressed, If ``feeling'' is the beginning of an unselected word from the beginning of the displayed character string, press the [Select] key.
"Feeling" becomes the selection target and the cursor jumps to that digit position.

ここで、［次候補］キーが押下される度（ステップ１２
）に、スタックされている同音異義語が読み出され、例
えば「感じ→漢字→幹事→監事→完治→寛二→・・・・
・・→かんじ→感じ→・・・」の順に変換候補が切替え
て表示される（ステップ１４）。Here, each time the [Next Candidate] key is pressed (step 12
), the stacked homophones are read out, and for example, ``Feel → Kanji → Secretary → Auditor → Kanji → Kanji →...
Conversion candidates are switched and displayed in the order of "...→Feeling→Feeling→..." (Step 14).

「漢字」が所望するものであるとしたと、「漢字」が表
示されている状態（第５図（ｂ）の表示状態）で［選択
］キーを押下し、「漢字」の単語を確定する（ステップ
１６）。この場合、「漢字」の文字コードが、前述の「
未選択を示すコード」と「同音異義語がスタックされて
いるバッファの番地」の代わりに文書バッファの同桁位
置に格納される（ステップ１８）。ここで、確定単語バ
ッファがアクセスされ、今回選択対象となった単語が存
在する文節を含んで前後４文節以内の確定単語を抽出す
る（ステップ２０）。尚、確定単語の抽出は１つの文の
範囲（前記「文の切れ目であることを示すコード」を持
った文節で区切られた範囲）で行われる。そして、「今
回選択対象となった単語」と抽出された「確定単語」と
の対、及び、この対の中で前にある方の単語に付随した
「つなぎを示すコード」が前記共起テーブルに登録され
る（ステップ２２）。続いて、今回選択対象となった単
語より後に未選択の文節があるか否かを調べるため文書
バッファがサーチされる（ステップ２４）。未選択の文
節がまだあればその文節内の単語が新たな選択対象とな
り（ステップ２６）、上述の処理が繰り返される。また
、未選択の文節が存在しなければ通常の「読み」人力モ
ードに戻る。Assuming that "Kanji" is the desired word, press the [Select] key while "Kanji" is displayed (display state shown in Figure 5 (b)) to confirm the word "Kanji". (Step 16). In this case, the character code of "Kanji" is
They are stored at the same digit position in the document buffer instead of the code indicating "unselected" and the address of the buffer where the homophone is stacked (step 18). Here, the confirmed word buffer is accessed, and confirmed words within four clauses before and after the clause in which the currently selected word exists are extracted (step 20). Note that the extraction of confirmed words is performed within a range of one sentence (a range separated by clauses having the above-mentioned "code indicating that it is a break in a sentence"). Then, the pair of the "currently selected word" and the extracted "confirmed word" and the "code indicating a link" attached to the earlier word in this pair are stored in the co-occurrence table. (step 22). Subsequently, the document buffer is searched to see if there is an unselected phrase after the currently selected word (step 24). If there are still unselected phrases, the words in those phrases become new selection targets (step 26), and the above-described process is repeated. Furthermore, if there are no unselected phrases, the process returns to the normal "reading" manual mode.

第５図（ｂ）に示された状態で、「漢字」が選択された
ときには、後ろの文節はまだ確定していないため対が登
録されないが、前方の文節（［この］［装置ではコ）が
確定しているため、［「こ」と「漢字」の辞書番号の対
と「助詞”の”コード］、及び、［「装置」と「漢字」
の辞書番号の対と「助詞”では”のコード］とが共起テ
ーブルに登録される。以降「適格」　「返還」の箇所で
の選択処理において同様の共起関係の登録が行われる。In the state shown in FIG. 5(b), when "Kanji" is selected, the pair is not registered because the later clause is not yet determined, but the earlier clause ([this] [in the device, ko) is not registered. Since these are fixed, [the pair of dictionary numbers for "ko" and "kanji" and the "code" for "particle"] and ["device" and "kanji"]
The pair of dictionary numbers and the code for ``particle'' are registered in the co-occurrence table.Similar co-occurrence relationships are subsequently registered in the selection process for ``eligibility'' and ``return.''

ここで、「返還」を「変換」に変更して選択した場合、
特に、［「漢字」と「変換」の辞書コードの対と「漢字
」に付随した助詞“の”のコード］とが登録されること
になるので、次回からの変換では、「漢字の」−「返還
」という日本語では存在し得ない対はでなくなる。Here, if you change "Return" to "Convert" and select it,
In particular, [the pair of dictionary codes for "kanji" and "conversion" and the code for the particle "no" attached to "kanji"] will be registered, so in the next conversion, "kanji no" - There is no longer a pair that cannot exist in the Japanese word ``return''.

ここで、第６図を参照して、かな漢字変換における共起
テーブル利用方法を説明する。Here, a method of using the co-occurrence table in kana-kanji conversion will be explained with reference to FIG.

キーボード３から「読み」の入力（例えば「かんじのこ
うりつきでそうさせいのよいへんかん」）が行われ、入
力バッファに「読み」がスタックされていく　（ステッ
プ３０）。［変換］キーが押下されると、かな漢字変換
プログラムが起動され、入力バッファに記憶された「読
み」の変換が指示される（ステップ３２）。尚、句読点
の入力や改行、タブの入力等で確実に文節が切れること
が分かった時点でも変換プログラムが起動される。まず
、かな漢字変換プログラムが「読み」から文節を抽出し
、変換候補を順次出力していく。例えば、単語辞書に登
録されている単語と読み文字列との一致による単語の抽
出、及び、助詞等が記憶された付属語辞書を利用した助
詞抽出を行って文節の切断試行を行い、当該単語の同音
異義語をスタックに積んでいく。スタックへの積載の過
程で、各同音異義語に優先度をつけるために、前述した
短期学習機能、選択頻度学習機能が利用される。そして
、文法的な接続関係が記憶された文法辞書単語辞書をを
利用して次に続く文字列との接続関係を調べ文法的に誤
った変換候補を捨てたり、文節切断のやり直しを行う方
法が知られている。ここでは［かんじ助動詞“の”］　
［こうりってき十助動詞”だ”の連体形＝２で］［そう
させい助動詞”の”］　［よい］　［へんかん］と文節
抽出される（ステップ３４）。ここで、第１文節の「か
んじ」なる単語に、前述した短期学習機能や選択頻度学
習機能が働いて「漢字」が第１候補として表示されたと
した（ステップ３６）。この場合、「漢字」に対応する
辞書番号で、共起テーブルの「前にくる単語」の部分が
サーチされ、同じ単語が登録されているか否かがチエツ
クされる（ステップ３８）。ここでは、「漢字」が登録
されているので、次に共起テーブルで「漢字」の辞書番
号に対応して記憶されている「つなぎの種類」の欄に登
録されているものと「読み」におけるつなぎの部分とが
一致しているかどうかチエツクされる（ステップ４０）
。この例では、「の」が一致しているので、ステップ４
２に進み、共起テーブルで「漢字」に対応して記憶され
ている「後ろにくる語」の辞書番号（「変換」）を読み
出し、以降の文節（［こうりってき助動動詞”だ”の連
体形＝”で”］　［そうさせい十助動”の”１　［よい
］［へんかん］）の変換候補としてスタックに積まれて
いる各同音異義語群をチエツクする。ここでは、［へん
かん］の文節の候補（■返還、■変換、■へんかん）の
中の「変換］で一致が取れるため「変換」が優先候補（
この文節の最初の表示の対象）とされ（ステップ４４）
、ステップ４６で変換結果候補の表示が行われる。ステ
ップ３８．４０．４２で各々条件不成立ならば処理がス
テップ４６に進み、通常のかな漢字変換と同様にスタッ
クに積まれた同音異義語の内の最優先の単語が変換結果
候補として表示される。A ``yomi'' is inputted from the keyboard 3 (for example, ``Kanji no Koritsuki de sosai ni yoi henkan''), and the ``yomi'' is stacked in the input buffer (step 30). When the [conversion] key is pressed, the kana-kanji conversion program is activated, and the conversion of the "yomi" stored in the input buffer is instructed (step 32). Note that the conversion program is also started when it is determined that a phrase will definitely be cut off by inputting punctuation marks, line breaks, tabs, etc. First, the Kana-Kanji conversion program extracts phrases from the reading and outputs conversion candidates one by one. For example, a word is extracted by matching a word registered in a word dictionary with a pronunciation character string, and a particle extraction is performed using an adjunct word dictionary that stores particles, etc., and an attempt is made to cut a clause. Put the homophones of ' into the stack. In the process of loading onto the stack, the short-term learning function and selection frequency learning function described above are used to prioritize each homophone. Then, there is a method that uses a grammar dictionary word dictionary that stores grammatical connection relationships to check the connection relationship with the next character string, discarding grammatically incorrect conversion candidates, and redoing bunsetsu cutting. Are known. Here [Kanji auxiliary verb “no”]
[The adnominal form of the ten auxiliary verb "da" = 2] [the auxiliary verb "sousei"] [good] [henkan] are extracted (step 34). Here, it is assumed that the short-term learning function and the selection frequency learning function described above are activated for the word ``kanji'' in the first clause, and ``kanji'' is displayed as the first candidate (step 36). In this case, the "previous word" portion of the co-occurrence table is searched using the dictionary number corresponding to "kanji", and it is checked whether the same word is registered (step 38). Here, "Kanji" is registered, so next, in the co-occurrence table, the "Yomi" is registered with the one registered in the "Connection type" column that is stored corresponding to the dictionary number of "Kanji". It is checked whether the connecting part in
. In this example, "no" matches, so step 4
Proceed to step 2, read the dictionary number ("conversion") of the "following word" stored in correspondence with "kanji" in the co-occurrence table, and read out the dictionary number ("conversion") of the "word that comes after" that corresponds to the "kanji" in the co-occurrence table. Checks each homophone group on the stack as a conversion candidate for "1 [good] [henkan]) of [Sousai Jusuke-dou]". Here, "conversion" is the priority candidate (conversion) because a match can be found with "conversion" among the phrase candidates for [henkan] (■return, ■conversion, ■henkan).
(the target of the first display of this clause) (step 44)
, in step 46, conversion result candidates are displayed. If each of the conditions is not satisfied in steps 38, 40, and 42, the process proceeds to step 46, and the highest priority word among the homonyms stacked on the stack is displayed as a conversion result candidate, as in normal kana-kanji conversion.

一方、ステップ３６で、第１文節の「かんじ」なる単語
に、前述した短期学習機能や選択頻度学習機能が働いて
、例えば「感じ」が第１候補として表示されたとした。On the other hand, in step 36, it is assumed that the short-term learning function and the selection frequency learning function described above are activated for the word "kanji" in the first clause, and for example, "feeling" is displayed as the first candidate.

この場合には、「感じ」が共起テーブルに記憶されてい
ないので従来の変換結果と同様な候補が表示される。し
かし、ステップ４６の後、同音異義語の選択モードで、
［次候補］キーが押され（ステップ４８）、「感じ」が
別の同音異義語に変更、表示される（ステップ５０）と
、ステップ３８に戻って、再度共起テーブルが参照され
、以降の文節の変換候補も共起関係に従って変更される
。「感じ」が「漢字」に変更された場合には、「へんか
ん」の読みに対応して「変換」が候補表示される。In this case, since "feeling" is not stored in the co-occurrence table, candidates similar to conventional conversion results are displayed. However, after step 46, in the homophone selection mode,
When the [Next candidate] key is pressed (step 48) and "feeling" is changed to another homophone and displayed (step 50), the process returns to step 38, the co-occurrence table is referred to again, and the subsequent Conversion candidates for phrases are also changed according to co-occurrence relationships. When "feeling" is changed to "kanji", "conversion" is displayed as a candidate corresponding to the reading of "henkan".

共起テーブルに「つなぎの種類」の情報を記憶する理由
は次の点による。「記者」−「帰（る）」（注：用言は
語幹を登録）という単語のみの共起関係を記憶したとし
たと、「きしゃかかえる」→「記者だ帰る」の場合には
正しい変換結果を得られるが、「きしゃでかえる」の場
合にも共起が起こってしまい、「記者で帰る」と意味的
におかしく変換される。「記者」−Ｆ帰（る）」の対に
「が」という「つなぎの情報」を付加しておけば、「き
しゃでかえる」の場合には「で」という助詞の存在によ
って共起関係は参照されないから、「記者で帰る」のよ
うな無用な共起が起こらない。The reason for storing the "type of linkage" information in the co-occurrence table is as follows. If you memorize the co-occurrence relationship of only the words "reporter" - "return" (note: the stem is registered for the terminology), it is correct in the case of "kishakaeru" → "journalist return". Although the conversion result is obtained, a co-occurrence also occurs in the case of ``Kishadekaeru'', resulting in a semantically incorrect conversion as ``I'm going home as a reporter.'' If we add the ``bridging information''``ga'' to the pair ``reporter'' - ``F return (ru)'', we can create a co-occurrence relationship due to the existence of the particle ``de'' in the case of ``kisha de kaeru.'' is not referenced, so unnecessary co-occurrences such as ``I'm going home as a reporter'' do not occur.

共起の登録（共起の学習）において、変換候補の選択が
行われていく度に共起関係が共起テーブルに登録されて
いくが、共起テーブルの容量を無制限に用意するわけに
はいかない。従って、何等可の方法で共起テーブルの内
容を更新する必要がある。「共起テーブルの内容の更新
」には、■　ファストイン−ファストアウト（Ｆ　Ｉ　
ＦＯ）の原則で、最も古い対を追い出して新しいものを
登録する方法。In co-occurrence registration (co-occurrence learning), co-occurrence relationships are registered in the co-occurrence table each time a conversion candidate is selected, but it is not possible to have an unlimited capacity for the co-occurrence table. It's fleeting. Therefore, it is necessary to update the contents of the co-occurrence table by any suitable method. "Updating the contents of the co-occurrence table" includes ■ Fast in-fast out (FI
FO) method of evicting the oldest pair and registering a new one.

■　共起テーブルの各エントリに使用頻度の記憶部を設
け、エントリ内の対が利用される度に使用頻度の更新を
行い、使用頻度の少ないものから追い出していく方法。■ A method in which each entry in the co-occurrence table is provided with a storage section for the frequency of use, the frequency of use is updated every time a pair in the entry is used, and the least frequently used ones are purged.

■　共起テーブルの各エントリに使用フラグを設け、エ
ントリ内の対が利用されたときフラグを立て、全エント
リのフラグを定期的にオフするようにし、ある時点でフ
ラグオフのものを追い出していく方法。■ A method of setting a usage flag for each entry in the co-occurrence table, setting the flag when a pair within an entry is used, and periodically turning off the flags of all entries, and at a certain point, expelling those whose flags are off. .

等、キャッシュディレクトリからのアドレス対の追い出
しの手法が応用できる。For example, a method of evicting address pairs from the cache directory can be applied.

共起テーブルは、作成文書の外部記憶への保存時に文書
とともに外部保存させ、文書の追加、変更に再利用可能
とした、共起テーブル単独で１つのファイルとして外部
記憶に保存して、他の装置に読み込んで同じ環境を設定
できるようにしても良い。The co-occurrence table can be saved externally with the created document when it is saved to external memory, and can be reused for adding or changing documents.The co-occurrence table can be saved as a single file in external memory, and other It may also be possible to load it into the device and set the same environment.

更に、上記実施例では、文書中の複数の箇所に存在する
同音異義語の選択を、読み入力とは独立して［選択］キ
ーにより行い得る一括選択方式を利用している。同音語
の選択方式としては、−括選択方式の他に逐次選択方式
が存在する。逐次選択方式とは、読みが入力され変換さ
れた後、次の読みが入力されるまでの間その変換結果の
次候補表示と選択を行うことができ、次の読みが入力さ
れると前の変換された結果が自動的に確定される方式で
ある。逐次選択方式に本発明を適用する場合には、文の
切れ目（読点や改行等の制御コード入力）が来るまで選
択結果（確定した単語の辞書番号とつなぎの情報）をバ
ッファリングしておき、ある単語が確定する度に前に遡
ってバッファリングされている単語との対を共起テーブ
ルに学習記憶していく。そして、変換の時点でも、文の
切れ目まで変換結果を保持し、ある読みが入力され複数
の同音異義語が変換出力されたとき、バッファリングさ
れている既に変換確定された単語との対を共起テーブル
でチエツクし、前記同音異義語の中から登録されていた
ものを優先的に出力すれば良い。Further, in the above embodiment, a batch selection method is used in which homophones existing in multiple locations in a document can be selected using the [Select] key independently of reading input. As a homophone selection method, there is a sequential selection method in addition to a -bloc selection method. The sequential selection method means that after a reading is input and converted, the next candidate for the conversion result can be displayed and selected until the next reading is input, and when the next reading is input, the previous one This is a method in which the converted results are automatically determined. When applying the present invention to a sequential selection method, the selection results (dictionary numbers of confirmed words and transition information) are buffered until a break in the sentence (control code input such as a comma or line break) is reached. Every time a word is determined, pairs with previously buffered words are learned and stored in the co-occurrence table. Even at the time of conversion, the conversion results are retained up to the end of the sentence, and when a certain pronunciation is input and multiple homophones are converted and output, the pair is shared with the buffered word that has already been converted. It is only necessary to check the origination table and output the registered homophones preferentially from among the homonyms.

尚、上記実施例において、同音異義語は１箇所に順番に
表示するものとしたが、同音異義語の各々に番号を付け
て複数個をまとめて表示するものとし、番号入力又はカ
ーソルでの指示によりこの中から同音異義語を選択する
ものでも良い。この場合には、最も先頭に共起により優
先度が高くされた単語を表示すれば良い。In the above embodiment, the homonyms are displayed in one place in order, but each homonym is numbered and a plurality of homonyms are displayed together, and the number input or cursor instruction is required. It is also possible to select homophones from among these. In this case, the word with the highest priority due to co-occurrence may be displayed at the beginning.

［発明の効果］本発明によれば、先に例として取り上げた「しようでし
ょっ」の場合にも、１回の選択で「私用」と「使用」と
の共起関係を学習するため、う望まれない変換候補の出
力はなされず、「私用で使用」という変換結果（候補）
が適確に得られる。従って、無用な同音異義語の選択操
作（［次候補］キーの変電なる押下）がなくなり、操作
性の向上が計られる。[Effects of the Invention] According to the present invention, even in the case of "Let's do it" taken as an example earlier, the co-occurrence relationship between "private" and "use" is learned with one selection. Undesired conversion candidates are not output, and the conversion result (candidate) is “for personal use”.
can be obtained accurately. Therefore, unnecessary homophone selection operations (pressing the [Next Candidate] key) are eliminated, and operability is improved.

[Brief explanation of drawings]

第１図は本発明を実施した日本語文書作成装置のシステ
ムブロック図、第２図はＲＡＭに設けられた共起テーブ
ルの内部構成を示す図、第３図はＲＡＭに設けられた確
定単語バッファの内部構成を示す図、第４図は共起関係
学習プログラムの動きを示すフローチャート、第５図は
同音意義語の選択例を示す図、第６図は学習された共起
関係を利用した変換動作を示すフローチャートである。２・・・共起関係学習プログラムを記憶したＲＯＭ３・
・・共起テーブルが設けられたＲＡＭ（７３１７）　　
代理人　弁理士　則近　憲佑同山下→ 第２図第３図罹定単語バッファ第４図第５図（ａ）第５図（ｂ） →翻ステップ４６Fig. 1 is a system block diagram of a Japanese document creation device implementing the present invention, Fig. 2 is a diagram showing the internal structure of a co-occurrence table provided in RAM, and Fig. 3 is a diagram showing a fixed word buffer provided in RAM. Figure 4 is a flowchart showing the operation of the co-occurrence relationship learning program, Figure 5 is a diagram showing an example of selecting homophones, and Figure 6 is a conversion using the learned co-occurrence relationships. It is a flowchart showing the operation. 2... ROM3 that stores the co-occurrence relationship learning program.
・RAM with co-occurrence table (7317)
Agent Patent attorney Norichika Kensuke Yamashita → Figure 2 Figure 3 Affected word buffer Figure 4 Figure 5 (a) Figure 5 (b) → Translation step 46

Claims

[Claims]

(1) An input unit for inputting reading information of a character string; a dictionary storage unit storing a plurality of homophones having this reading in correspondence with the reading information; a processing unit that reads words from a dictionary storage unit and performs conversion; A document creation device that creates a document while uniquely determining a word by having a selection input means for giving an instruction for selection, comprising a storage means for storing a pair of two words, and the processing unit ( a) When a certain word is selected by the selection input means, a pair of this word and a word existing before or after this word is registered in the storage means; (b) When converting the reading into a word; In , after a first word is output as a conversion result, when multiple homonyms are output for the reading input after this word, the first word and the homonyms are It is characterized by checking whether a pair with any of the words is registered in the storage means, and preferentially outputting the words registered in the storage means from among the homophones as a conversion result. document creation device.

(2) The storage means stores information on the grammatical connection of both words together with the pair of words, and the processing device stores information on the grammatical connection of both words when registering the word pair at the time of word selection. Information on connections is stored in the storage means, and at the time of conversion, a pair of the first word and any one of the homophones and a grammatical connection between the two words are registered in the storage means. 2. The document creation device according to claim 1, wherein when the word is a homophone, the word among the homophones is preferentially outputted as a conversion result.