JPH0630100B2

JPH0630100B2 - Kana-Kanji conversion method

Info

Publication number: JPH0630100B2
Application number: JP59195698A
Authority: JP
Inventors: 佐敏山内
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-09-20
Filing date: 1984-09-20
Publication date: 1994-04-20
Anticipated expiration: 2009-04-20
Also published as: JPS6175467A

Description

【発明の詳細な説明】技術分野本発明は仮名漢字変換方式、詳細には、日本語ワードプ
ロセッサ等のような自然言語処理技術を用いた仮名漢字
変換方式に関する。TECHNICAL FIELD The present invention relates to a kana-kanji conversion system, and more particularly to a kana-kanji conversion system using a natural language processing technique such as a Japanese word processor.

従来技術従来、日本語ワードプロセッサ等のような日本語処理装
置においては、オペレータが操作する入力装置に入力し
た仮名列に対して、仮名と漢字の混在した仮名漢字の変
換を行っている。このような仮名漢字変換方式の中で、
特に漢字単語と漢字単語とを接続して形成する複合語の
処理の問題が大きくなってきている。2. Description of the Related Art Conventionally, in a Japanese processing device such as a Japanese word processor, a kana / kanji mixed with kana and kanji is converted to a kana string input to an input device operated by an operator. In this Kana-Kanji conversion method,
In particular, the problem of processing compound words formed by connecting kanji words and kanji words is becoming more serious.

特に、特公昭５８−４３７８号公報に開示されているよ
うに、組合せにより単語辞書を検索し、単語同士の合成
が可能かどうかの判定手段を有し、それの判定結果によ
り単語同士の合成の確定、非確定を行なう方式や、特開
昭５６−３８６６５号公報に開示されているように、単
語間同士の結び付きの強さを検定して複合語となりうる
かどうかの判定を行なう方式がある。In particular, as disclosed in Japanese Examined Patent Publication No. Sho 58-4378, there is a means for searching a word dictionary by a combination and determining whether or not words can be combined, and the result of the judgment determines whether words can be combined. There is a method of confirming or unconfirming, and a method of judging whether or not it can be a compound word by testing the strength of connection between words as disclosed in Japanese Patent Laid-Open No. 56-38665.

しかしながら、それらの方式では単語同士の結びつきの
自由度が、特に同音異義語の中では大きいので、余程厳
しい判定基準を設けていないことには誤った判定をし、
誤解析のもととなってしまう。実際上は、その判定基準
を厳しくするのにも限度があり、誤解析の率が高く実用
的には向いていない欠点があった。However, in these methods, the degree of freedom in connecting words is large, especially in homonyms, so it is erroneous that a strict criterion is not set,
It will be a source of erroneous analysis. In practice, there is a limit to how strict the criterion is, and there is a drawback that the rate of erroneous analysis is high and it is not suitable for practical use.

目的本発明はこのような従来技術の欠点を解消し、同音異義
語の判別を容易にいかも確実に行なうことのできる仮名
漢字変換方式を提供することを目的とする。An object of the present invention is to solve the above-mentioned drawbacks of the prior art and to provide a kana-kanji conversion method capable of easily and surely identifying homonyms.

構成本発明は上記の目的を達成させるため入力装置により入
力された仮名文字列を単語辞書記憶装置内で検索するこ
とにより単語単位の変換候補語を抽出して漢字と仮名の
混った仮名漢字に変換する仮名漢字変換方式において、
隣接単語の組合せを複合語として抽出して、その読みを
検定することによって同音異義語の判別を行なうことを
特徴としたものである。Structure In order to achieve the above object, the present invention extracts a conversion candidate word in units of words by searching a word dictionary storage device for a kana character string input by an input device to obtain kana and kana mixed with kana and kana. In the Kana-Kanji conversion method for converting to
The feature is that the combination of adjacent words is extracted as a compound word, and its reading is tested to identify homonyms.

以下、本発明の実施例に基づいて具体的に説明する。Hereinafter, specific description will be given based on examples of the present invention.

第１図は本発明を日本語ワードプロセッサに適用したと
きの機能別全体ブロック図である。第１図において、本
実施例はキーボード１、入力文字バッファ２、仮名漢字
変換制御部３、単語辞書記憶装置４、複合語辞書記憶装
置５、品詞活用表部６、接続重み表部７、評価器８、出
力文字バッファ９、陰極線管(CRT)１０、文書ファイル
部１１、プリンタ１２から構成されている。FIG. 1 is an overall block diagram by function when the present invention is applied to a Japanese word processor. In FIG. 1, a keyboard 1, an input character buffer 2, a kana-kanji conversion control unit 3, a word dictionary storage device 4, a compound word dictionary storage device 5, a part-of-speech utilization table unit 6, a connection weight table unit 7, and an evaluation are shown in FIG. It comprises a container 8, an output character buffer 9, a cathode ray tube (CRT) 10, a document file section 11 and a printer 12.

キーボード１は、入力装置の一例で、ひらがな、かたか
な、アルファベット等の文字、数字、記号等の表音文字
を入力する表音文字キー、仮名漢字変換を指示する変換
キー等のファンクションキーを有する。入力文字列バッ
ファ２はキーボード１から次々に入力される入力仮名文
字列を一時的に蓄積し、仮名漢字変換処理を終了した入
力仮名文字列を部分的に次々と消去する。The keyboard 1 is an example of an input device, and has function keys such as phonetic character keys for inputting phonetic characters such as characters such as hiragana, katakana, alphabet, numbers, and symbols, and conversion keys for instructing kana-kanji conversion. Have. The input character string buffer 2 temporarily stores the input kana character strings input one after another from the keyboard 1 and partially erases the input kana character strings that have undergone the kana-kanji conversion processing.

仮名漢字変換制御部３は仮名漢字の処理手順のプログラ
ムを記憶するメモリと、入力文字列バッファ２からデー
タを取込んだり、そのデータをもとにして後述の単語や
複合語辞書記憶装置3,4を検索したりしてデータを取込
む際のバッファと、品詞活用表部６から列（うけ）と行
（かかり）の位置のデータを取込んだり、接続重み表部
７から該当する接続重みのデータを取込む際のバッファ
と、評価器８から演算した評価値とか第１位候補語（候
補語の中で評価最大の候補語）や第２位候補語を記憶し
ておくメモリと、ワークエリアとを含み、後述するよう
に他の機能の制御する。The kana-kanji conversion control unit 3 fetches data from the input character string buffer 2 and a memory for storing a program of a kana-kanji processing procedure, and based on the data, a word and compound word dictionary storage device 3, which will be described later, A buffer used when retrieving 4 or fetching data, fetching data at the positions of columns (rows) and rows (hangs) from the part-of-speech utilization table 6, or the corresponding connection weight from the connection weight table 7 And a memory for storing the evaluation value calculated from the evaluator 8, the first candidate word (the candidate word having the largest evaluation among the candidate words) and the second candidate word, The work area and other functions are controlled as described later.

単語辞書記憶装置４は表音文字である入力カナ文字を表
記するのに必要な表記文字である単語（たとえば、自立
語、付属語、接辞語や助数詞等。また、動詞、形容詞の
活用形すべてを各一単語として扱う。）の読み、その表
記、その品詞、その頻度ランク、その出力順位学習、複
合語辞書記憶装置５のアドレスもしくはシーケンスナン
バーとなるポインタの項目を格納している。The word dictionary storage device 4 is a word (for example, an independent word, an adjunct, an affix, a classifier, etc.) which is a notation character necessary for notifying an input kana character that is a phonetic character. Also, all inflected forms of verbs and adjectives. Is treated as one word.), Its notation, its part of speech, its frequency rank, its output rank learning, the address of the compound word dictionary storage device 5 or the item of the pointer serving as the sequence number.

複合語辞書記憶装置５は単語辞書記憶装置４中に格納さ
れている単語が複数つらなって１つの概念を表わす複合
語で、隣りどうしの単語の関係を記述している辞書。例
えば、「超音速飛行機」の２つの単語同士の組合わせで
ある「超−高速」、「高速−飛行」、「飛行−機」の組
合せで複合語を構成することを表現する。The compound word dictionary storage device 5 is a compound word that represents one concept by combining a plurality of words stored in the word dictionary storage device 4, and is a dictionary describing the relationship between adjacent words. For example, it is expressed that a compound word is composed of a combination of "super-high speed", "high speed-flying", and "flying-aircraft" which is a combination of two words "supersonic airplane".

この辞書記憶装置の表現方法はいろいろとあるが、第１
として単語辞書記憶装置４中の各単語のアドレスもしく
はシーケンシャルナンバーを複合語の構成となるように
対で記憶しておく、第２に単語辞書記憶装置４中のポイ
ンタ欄に複合語辞書記憶装置５の対応語の先頭格納アド
レスを記録しておき、複合語辞書記憶装置５の中には複
合語を形成している後続単語の単語辞書記憶装置４中の
アドレスもしくはシーケンシャルナンバーを記録してお
く。なお、「超高速飛行機」等のように後続の単語が複
数ある場合は連続して記録しておく。その場合、他の前
出単語に対応する後続単語との境界の識別は先頭の１ビ
ットを０と１とを交互に変化させて使用することにより
行なわれる。なお、アドレス，シーケンシャルナンバー
は、これらを総称してアドレス情報と呼ぶことができ
る。Although there are various ways of expressing this dictionary storage device,
The address or sequential number of each word in the word dictionary storage device 4 is stored as a pair so as to form a compound word. Secondly, the compound word dictionary storage device 5 is stored in the pointer column of the word dictionary storage device 4. The corresponding beginning address of the corresponding word is recorded, and the address or sequential number of the subsequent word forming the compound word in the word dictionary storage device 4 is recorded in the compound word dictionary storage device 5. If there are a plurality of subsequent words such as "super high-speed airplane", record them continuously. In that case, the boundary between the preceding word and the succeeding word corresponding to the other preceding word is identified by alternately changing the leading 1 bit between 0 and 1. The address and the sequential number can be generically called address information.

品詞活用表部６は単語辞書記憶装置４を検索した抽出し
た単語の品詞とその単語の前後に接続する単語の品詞で
もって後述の接続重み表部７の接続重み表の行（かか
り）と列（うけ）の場所を決定するための索引表を格納
している。なお、体言系は「かかり」と「うけ」の行番
号、列番号のみが記録してあるが、用言系は語尾をも記
録してある。The part-of-speech utilization table unit 6 is a row (column) and a column of a connection weight table of a connection weight table unit 7, which will be described later, with the part-of-speech of the extracted word searched in the word dictionary storage device 4 and the part-of-speech of the word connected before and after the word. It stores an index table for determining the location of (uke). In addition, the synonym system records only the line numbers and column numbers of "bare" and "uke", but the grammatical system also records the endings.

接続重み表部７は品詞の行と列とが配置され、それらの
交差部分にマトリックス状に単語間の接続の程度を示す
数値が配置されている。品詞活用表部６で指定した行番
号は列番号との交差部の数値がそれらの品詞を有する単
語の接続の強さを示す。その中の数値としては、０；接
続不可。１；接続することはあるが非常にまれである。
２；一般的に接続する。３；特に接続が強い。というよ
うに４ランクに設定してある。The connection weight table 7 has rows and columns of parts of speech arranged therein, and numerical values indicating the degree of connection between words are arranged in a matrix at the intersections thereof. The row number specified in the part-of-speech utilization table section 6 indicates the connection strength of the word having the part-of-speech at the intersection with the column number. The numerical value among them is 0; connection is not possible. 1; Connected, but very rare.
2; Generally connected. 3; Particularly strong connection. It is set to 4 ranks.

評価器８は単語辞書記憶装置４から抽出した単語のよみ
長、頻度ランク、また、接続重み表部７から抽出した単
語間の接続重み等のパラメータにより、扱っている単語
がその位置にありうる尤らしさを評価する。なお、評価
演算する式の一例としては、単語のよみ長×３＋頻度ランク＋（接続重み）^２＝評価
値を用いる出力文字列バッファ９は評価器８で評価された単語を評
価値順に一時的に蓄積したり、確定した単語を順次確定
順にメモリする。The evaluator 8 can find the word being handled at that position depending on parameters such as the reading length and frequency rank of the word extracted from the word dictionary storage device 4 and the connection weight between words extracted from the connection weight table unit 7. Evaluate the likelihood. As an example of the expression for the evaluation calculation, the word length x 3 + frequency rank + (connection weight) ² = evaluation value is used. The output character string buffer 9 temporarily stores the words evaluated by the evaluator 8 in the order of evaluation values. The words that have been accumulated or fixed are sequentially stored in the fixed order.

CRT１０は表示装置の１例で、出力文字列バッファ９に
一時的に蓄積された未確定の単語列で一番評価値の高い
単語例を表示して、オペレータからの確定かどうかの確
認をうるための表示装置。The CRT 10 is an example of a display device, and displays an example of the word with the highest evaluation value in the undetermined word string temporarily stored in the output character string buffer 9 to allow the operator to confirm whether or not it is confirmed. Display device for.

文書ファイル部１１はCRT１０で確認、修正された単語
列を文書的にファイル化して蓄積する記憶装置である。The document file unit 11 is a storage device that stores the word strings confirmed and corrected by the CRT 10 in a document format.

プリンタ１２は文書ファイル部１１の内容をプリントア
ウトする装置。The printer 12 is a device for printing out the contents of the document file section 11.

第２図は単語辞書記憶装置４の内容の一部を模式的に示
した図、第３図は複合辞書記憶装置５の内容の２種類の
実施例で、第３図(a)は前述の第１の方式で、左側に
は、該当単語のポインタに相当する複合語辞書記憶装置
５のアドレスが記録されており、真中には、該当単語の
単語辞書記憶装置４の中のアドレスが記録されており、
右側にはその該当単語の後続単語の単語辞書記憶装置４
の中のアドレスが記録されている。第３図(b)は前述の
第２の方式で、境界識別フラグ１ビットで左側に記録さ
れており、右側には、後続単語の単語辞書記憶装置４の
中におけるアドレスが記録されている。FIG. 2 is a diagram schematically showing a part of the contents of the word dictionary storage device 4, FIG. 3 shows two kinds of embodiments of the contents of the composite dictionary storage device 5, and FIG. According to the first method, the address of the compound word dictionary storage device 5 corresponding to the pointer of the corresponding word is recorded on the left side, and the address of the word dictionary storage device 4 of the corresponding word is recorded in the middle. And
On the right side, the word dictionary storage device 4 of the subsequent words of the corresponding word
The address inside is recorded. FIG. 3 (b) shows the above-mentioned second method, in which a boundary identification flag of 1 bit is recorded on the left side, and the address of the succeeding word in the word dictionary storage device 4 is recorded on the right side.

第４図は品詞活用表部６の内容の一部を模式的に示した
図で、「うけ」の欄及び「かかり」の欄に列番号、行番
号が記録されている。FIG. 4 is a diagram schematically showing a part of the contents of the part-of-speech utilization table section 6, in which the column numbers and the row numbers are recorded in the “uke” column and the “hang” column.

第５図は接続重み表部７の接続重み表を概念的に示した
模式図で、行には品詞活用表のかかり番号（行番号）を
示し、列には品詞活用表のうけ番号（列番号）を示し、
それらの番号の行列部には４ランクの接続重みの数値が
格納されている。FIG. 5 is a schematic diagram conceptually showing the connection weight table of the connection weight table unit 7, in which the row indicates the number of the part-of-speech utilization table (row number), and the column indicates the receiving number (column) of the part-of-speech utilization table. Number)
Numerical values of connection weights of four ranks are stored in the matrix part of these numbers.

第６図は本実施例の一例を示すフローチャートである。
次に、「最近は多くの兼業農家がいます。」の仮名漢字
変換文を作成する例で説明する。「最近は多くの」迄の
解析が進んでいて、その文の末尾の「の」は格助詞の
「の」として切り出されているとする。この時点では品
詞活用表部６の表の格助詞「の」の欄の「かかり」から
接続重み表部７の行（かかり）番号を指定している状態
である。FIG. 6 is a flowchart showing an example of this embodiment.
Next, an example of creating a kana-kanji conversion sentence of "There are many part-time farmers recently" is explained. It is assumed that the analysis up to "a lot of these days" is progressing, and the "no" at the end of the sentence is cut out as the case particle "no". At this point in time, the line (barrier) number of the connection weight table part 7 is specified from "bare" in the case particle "no" column of the table of the part-of-speech utilization table part 6.

次に、キーボード１から「けんぎょうのうか…」と次々
と入力され、一旦入力文字バッファ２に蓄積される。こ
の入力時にはキーボード１から１文字入力がある毎に(S
20)、記号か文字かの判断がなされ(S21)、文字の場合、
ｎ文字たまったかどうかの判断がなされる(S22)。記号
の場合は次のプロセスに進行するが、文字の場合、所定
のｎ文字が入力文字バッファ２にバッファされる迄は次
のプロセスに進行しない。なお、それらの判断は、仮名
漢字変換制御部３で行なう。「けんぎょうのうか…」と
いう具合いにｎ文字（たとえばｎ＝６）が入力文字バッ
ファに蓄積されると、「け」、「けん」、「けんぎょ
う」、「けんぎょうの」の種類の文字列の組である単語
辞書検索用の仮名文字列が仮名漢字変換制御部３で作成
される(S23)。仮名漢字変換制御部３はそれらの検索用
仮名文字列のよみに従って単語辞書記憶装置４を検索し
(S24)、「毛」、「気」、「券」、「県」…「兼業」、
「検校」等の表記文字の単語を候補語としてあげる。そ
の時、第２図に示されているそれらの各単語の品詞、頻
度ランク、出力順位、ポインタ等のデータを取出す。Next, "Kenkyo no Ka ..." is successively input from the keyboard 1 and is temporarily stored in the input character buffer 2. At this time, every time one character is input from the keyboard 1 (S
20), it is judged whether it is a symbol or a character (S21), and if it is a character,
It is judged whether or not n characters have been accumulated (S22). In the case of a symbol, it proceeds to the next process, but in the case of a character, it does not proceed to the next process until a predetermined n characters are buffered in the input character buffer 2. The kana-kanji conversion control unit 3 makes these determinations. When n characters (for example, n = 6) are accumulated in the input character buffer, such as "Kenkyo no ...", "Ken", "Ken", "Kenkyo", and "Kenkyono" A kana character string for word dictionary search, which is a set of character strings of different types, is created by the kana-kanji conversion control unit 3 (S23). The kana-kanji conversion control unit 3 searches the word dictionary storage device 4 according to the reading of those kana character strings for search.
(S24), "hair", "ki", "ticket", "prefecture" ... "side job",
Candidate words are words with written characters such as "school". At that time, the data such as the part of speech, the frequency rank, the output rank, and the pointer of each word shown in FIG. 2 is taken out.

次に、複合語辞書記憶装置５のアドレスに対応するポイ
ンタがあるかどうかの判定を仮名漢字変換制御部３は行
なう(S25)。たとえば、表記「検校」の単語のポインタ
は６５５３５番で、これは１６進表示でFFFFに相当し、
この場合、複合語辞書記憶装置５のアドレスはないので
複合語辞書記憶装置５を検索する必要はなく評価処理に
進む。Next, the kana-kanji conversion control unit 3 determines whether or not there is a pointer corresponding to the address of the compound word dictionary storage device 5 (S25). For example, the pointer of the word "inspection" is 65535, which is equivalent to FFFF in hexadecimal notation.
In this case, since there is no address of the compound word dictionary storage device 5, there is no need to search the compound word dictionary storage device 5 and the process proceeds to the evaluation process.

表記「兼業」の単語のポインタは７５３３番で、これは
６５５３５番とは異なるので、複合語があり、仮名漢字
変換制御部３は複合語辞書記憶装置５のそのアドレスで
検索する(S26)。複合語辞書記憶装置５のアドレス７５
３３番の記憶位置には第３図(a)に示してあるようにア
ドレス組合せによる複合語が格納されている。The pointer of the word of the notation "part-time job" is number 7533, which is different from number 65535, so there is a compound word, and the kana-kanji conversion control unit 3 searches for that address in the compound word dictionary storage device 5 (S26). Address 75 of the compound word storage device 5
In the memory location No. 33, as shown in FIG. 3 (a), a compound word by address combination is stored.

１つは１１３７９−１７６３４で、このアドレスで単語
辞書記憶装置４を検索すると「兼業−者」となり、その
読みは、「けんぎょうしゃ」となる。その読みと入力文
字バッファ２内の仮名文字列の「けんぎょうのう」とを
仮名漢字変換制御部３で照合すると明らかに不一致とな
る。One is 11379-17634, and when the word dictionary storage device 4 is searched by this address, it becomes "part-time worker", and its reading becomes "kengyosha". When the reading and the kana character string "kengyonou" in the input character buffer 2 are collated by the kana-kanji conversion control unit 3, there is a clear discrepancy.

したがって、次のアドレス組合せ１１３７９−２３１１
５をもとにして単語辞書記憶装置４を検索すると「兼業
−農家」となり、その読みは「けんぎょうのうか」とな
る。その読みと入力文字バッファ内の仮名文字列の「け
んぎょうのうか」とを仮名漢字変換制御部３で照合する
と明らかに一致する。Therefore, the next address combination 11379-2311
When the word dictionary storage device 4 is searched based on 5, it becomes "part-time-farmer", and its reading becomes "kengyonoka". When the reading and the kana character string “kengyo no uka” in the input character buffer are collated by the kana-kanji conversion control unit 3, they clearly match.

これは一致するので、その単語の「農家」の品詞である
６４（一般名詞）と頻度ランクである４と出力順位１と
ポインタ６５５３５とが仮名漢字変換制御部３により単
語辞書記憶装置より取出されてバッファされる(S28)。
この複合語候補として取出された単語の「農家」は、次
のようにして新たに１つの単語、「兼業農家」として評
価器８で評価される。読み長は「兼業」である前単語と
「農家」である後続単語の両者の和で、８であり、頻度
ランクは両単語の内で低い方で「兼業」の頻度ランク３
である。「兼業農家」の複合語の前の単語「の」との接
続重みを検定する場合。仮名漢字変換制御器３は「の」
に接続する単語である「兼業」の品詞がサ変名詞である
ことがわかっているから、品詞活用表部６内の表にした
がって接続重み表の列（うけ）番号を取出して接続重み
表部７に転送し、同様に格助詞「の」の行（かかり）番
号をすでに接続重み表部７に転送してあるから、それら
の列及び行番号から接続重み表の接続重み（本実施例の
場合、２とする。）を仮名漢字変換制御器３は取出す。Since these match, 64 (general noun) which is the part of speech of the word "farmer", 4 which is the frequency rank, output rank 1 and pointer 65535 are taken out from the word dictionary storage device by the kana-kanji conversion control unit 3. Buffered (S28).
The word “farmer” extracted as the compound word candidate is evaluated by the evaluator 8 as a new word, “part-time farmer” as follows. The reading length is 8 which is the sum of both the previous word that is "side job" and the subsequent word that is "farmer", and the frequency rank is the lower of both words, the frequency rank of "side job" is 3
Is. When testing the connection weight with the word "no" before the compound word "part-time farmer". Kana-Kanji conversion controller 3 is "no"
Since it is known that the part-of-speech of the word "side job" that is connected to the part-of-speech is a sahen noun, the column number of the connection weight table is extracted according to the table in the part-of-speech utilization table unit 6 and the connection weight table unit 7 To the connection weight table section 7 in the same way, since the row (barred) number of the case particle "no" has already been transferred to the connection weight table section 7, the connection weight of the connection weight table (in the case of this embodiment) , 2) is taken out by the kana-kanji conversion controller 3.

評価器８は評価式に従がって上記数値を用いて「兼業農
家」の評価値を演算した結果、３１となった。他の単
語、たとえば表記「兼業」、「検校」についても上記と
同様に評価演算したところ、２２、２０となった(S2
9)。したがって、出力文字バッファ９には「兼業農
家」、「兼業」、「検校」の順位で蓄積される。The evaluator 8 calculated 31 as a result of calculating the evaluation value of “part-time farmer” using the above-mentioned numerical values according to the evaluation formula. The other words, such as the notations "side job" and "school check", were evaluated and calculated in the same way as above, resulting in 22, 20 (S2
9). Therefore, the output character buffer 9 is accumulated in the order of “part-time farmer”, “part-time job”, and “school inspection”.

ここで、評価値の一番高い単語である「兼業農家」を表
示する語として仮に確定し、行（かかり）番号を指定す
る品詞は後続語の「農家」の一般名詞を用い、品詞活用
表部６の表から行（かかり）番号を設定する。ここで、
仮名漢字変換制御器３は最優先の単語の評価値、即ち、
候補語の中の最大の評価値を積算し、その積算値がある
閾値を越えたらトリガー信号を発し(S40)。このトリガ
ー信号を出力文字バッファ９が入力するとその積算した
評価値の単語迄の未確定単語列を確定して(S41)、それ
を文書ファイル部１１は所定の記憶位置に記憶する。Here, the word with the highest evaluation value is temporarily determined as the word that displays "part-time farmer", and the part of speech that specifies the line (hang) number uses the general noun of the subsequent word "farmer" and the part-of-speech utilization table. Set the row (take) number from the table of part 6. here,
The kana-kanji conversion controller 3 evaluates the highest priority word, that is,
The maximum evaluation value of the candidate words is integrated, and when the integrated value exceeds a certain threshold, a trigger signal is issued (S40). When this trigger signal is input to the output character buffer 9, the undetermined word string up to the word of the integrated evaluation value is confirmed (S41), and the document file unit 11 stores it in a predetermined storage position.

もし、トリガー信号が発生しなかった場合には、未だ未
確定のまま、次の仮名文字列の解析に移る。If the trigger signal is not generated, the process proceeds to the analysis of the next kana character string while it is still undetermined.

入力文字バッファ２内の仮名文字列は「けんぎょうのう
か」を削除し「がいます。」となり、この場合、句読点
の記号があるのでｎ文字なくても被検索文字列作成に移
る(S23)。上記と同様に、「が」、「がい」、「がい
ま」、「がいます」の単語辞書検索用の文字列が仮名漢
字変換制御器３内で作成され、第６図のフローチャート
にしたがって、上記と同様に仮名漢字変換制御器３が単
語辞書記憶装置４を検索し、「が」の格助詞、「概」、
「害」、「該」等の単語を候補語としてあげる。The kana character string in the input character buffer 2 is deleted by deleting "Kengyonoka", and in this case, since there is a punctuation mark, it proceeds to create the searched character string even if there are no n characters (S23). ). Similar to the above, the character strings for word dictionary search of "ga", "gai", "gaima", "gaima" are created in the Kana-Kanji conversion controller 3, and according to the flowchart of FIG. Similarly to the above, the kana-kanji conversion controller 3 searches the word dictionary storage device 4, and searches for the case particle "ga", "general",
Words such as "harm" and "the" are given as candidate words.

上記と同様の手順によって格助詞の「が」が取出され、
次に補助動詞の「い」が取出され、次に丁寧助動詞の
「ます」が取出されて解析され、句読点によりトリガー
信号が発せられ、それらの変換語が確定され、「最近は
多くの兼業農家がいます。」の仮名漢字変換された文が
文書ファイル１１に蓄積される。それをプリントアウト
させる場合にはプリンタ１２に転送すればよい。In the same procedure as above, the case particle "ga" is taken out,
Next, the auxiliary verb "i" is taken out, then the polite auxiliary verb "masu" is taken out and analyzed, and a trigger signal is issued by the punctuation mark, and those conversion words are confirmed, "Recently, many part-time farmers The kana-kanji converted sentences of “gaisare.” Are accumulated in the document file 11. When printing it out, it may be transferred to the printer 12.

効果本発明によれば同音異義語があっても、その後続単語が
複合語を形成するかどうかを検索することにより容易に
しても確実に同音異義語の判別ができ、その判別率が向
上した。Effect According to the present invention, even if there is a homonym, the homonym can be surely discriminated even if it is easy by searching whether the succeeding word forms a compound word, and the discrimination rate is improved. .

[Brief description of drawings]

第１図は本発明を日本語ワードプロセッサに適用した一
実施例の機能別ブロック全体図、第２図は第１図の単語辞書記憶装置の内容の一部を模式
的に示した図、第３図は第１図の複合語辞書記憶装置の内容の一部を模
式的に示した各方式の図、第４図は第１図の品詞活用表部の内容の一部を模式的に
示した概略図、第５図は第１図の接続重み表部の内容の一部を模式的に
示した概略図、第６図は第１図の処理工程を示すフローチャートであ
る。主要部分の符号の説明３…仮名漢字変換制御部４…単語辞書記憶装置５…複合語辞書記憶装置７…接続重み表部８…評価器９…出力文字バッファFIG. 1 is an overall functional block diagram of an embodiment in which the present invention is applied to a Japanese word processor, and FIG. 2 is a diagram schematically showing a part of the contents of the word dictionary storage device of FIG. FIG. 4 is a diagram of each system schematically showing a part of the contents of the compound word dictionary storage device of FIG. 1, and FIG. 4 is a diagram showing a part of the contents of the part-of-speech utilization table part of FIG. Schematic diagram, FIG. 5 is a schematic diagram schematically showing a part of the contents of the connection weight table portion of FIG. 1, and FIG. 6 is a flowchart showing the processing steps of FIG. Explanation of code of main part 3 ... Kana-Kanji conversion control unit 4 ... Word dictionary storage device 5 ... Compound word dictionary storage device 7 ... Connection weight table unit 8 ... Evaluator 9 ... Output character buffer

Claims

[Claims]

1. An input device for inputting a kana character string, a word dictionary storage device storing information on words for writing, and a combination of words stored in the word dictionary storage device. The compound word dictionary storage device and the kana character string input by the input device are searched based on the word dictionary storage device and the compound word dictionary storage device to convert the kana character string into a kanji kana mixed character string. A kana-kanji conversion control unit, the word dictionary storage device, for each word stored therein, discrimination information for discriminating whether or not it is necessary to search the compound word dictionary storage device; Search information for searching the compound word dictionary storage device when it is necessary to search the word dictionary storage device is stored in association with each word, and the compound word dictionary storage device includes the single word Among the combinations, the address information of the word dictionary storage device that stores information of at least the word located after is stored, and the kana-kanji conversion control unit is a kana character string input by the input device. The word dictionary storage device is searched based on to extract a word that is a homonym, and it is determined whether or not it is necessary to search the compound word dictionary storage device for the extracted word based on the determination information. When it is necessary to search the compound word dictionary storage device, the compound word dictionary storage device is searched according to the search information corresponding to the word, and at least after the combination of words stored in the compound word dictionary storage device. The information of the positioned word is obtained from the word dictionary storage device according to the address information held in the compound word dictionary storage device, and the compound word is extracted. Mito kanji conversion method by assaying the match between readings of the input kana character string and performing discrimination of the homonym.