JPH0388066A

JPH0388066A - Automatic detecting/correcting device for error in japanese document

Info

Publication number: JPH0388066A
Application number: JP1225268A
Authority: JP
Inventors: Masahiro Oku; 雅博奥
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1989-08-31
Filing date: 1989-08-31
Publication date: 1991-04-12

Abstract

PURPOSE:To detect a homonym error in an independent word by extracting an inspection case indicated in an inspection case determining rule out of a clause modifying a word to be inspected and inspecting meaning connecting relation between the meaning category of the word included in the inspection case and the word to be inspected. CONSTITUTION:When a homonym to be inspected exists in an inputted Japanese sentence, the meaning connecting relation between the appearance of the word to be inspected and the meaning category of the word included in the case indicated by the inspection case determining rule 12 out of the clause modifying the word to be inspected is obtained by retrieving meaning connection dictionaries 13, 14 by means of the appearance of the word to be inspected. When the meaning connecting relation does not exist, the existence of a homonym error is decided. A word having the same reading and the same part of speech for the homonym error is extracted as a correction candidate, the correction candidate is substituted with the word inspected as the homonym error, the substituted candidate is inspected by the dictionaries 13, 14 and the correction candidate to be connected is decided as the correct candidate of the homonym error.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は１日本文文書処理装置に係り、特に、入力され
た日本語文書中から自動的に同音異義語の使用誤りを検
出し、その訂正候補を提示する日本文誤り自動検出・訂
正装置に関するものである。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a Japanese document processing device, and in particular, to a device for automatically detecting errors in the use of homophones in an input Japanese document and detecting the errors in the use of homophones. This invention relates to an automatic Japanese sentence error detection/correction device that presents correction candidates.

[Conventional technology]

一般に日本語ワードプロセッサでは、仮名行列あるいは
ローマ字列を入力し、単語あるいは文節単位の仮名漢字
変換が実行されるため、単語変換誤りや変換域誤り等に
よって同音異義語選択誤りが生じることがある。この種
の同音異義語誤りは、同音異義語の意味的な使い用けを
熟知する必要があるため、仮名漢字変換誤りによるだけ
でなく、原文文書作成の過程で作成者の思い込みや思い
違いによる使用誤りによるものも頻発する。In general, Japanese word processors input a kana matrix or a Roman character string and perform kana-kanji conversion on a word or phrase basis, so homonym selection errors may occur due to word conversion errors or conversion range errors. This type of homophone error requires a thorough understanding of the semantic usage of homophones, so it is not only due to a kana-kanji conversion error, but also due to the creator's assumptions or misconceptions during the process of creating the original document. Mistakes also occur frequently.

従来、入力された日本文中の入力誤りや仮名漢字変換に
おける単語の選択族りにより生じる同音異義語誤りの自
動検出は、誤りやすい同音異義語を予め辞書にマークし
ておき、入力文書において、辞書中の該同音異義語の文
字列と一致する箇所すべてを同音異義語誤りとして抽出
していた。Conventionally, automatic detection of homophone errors that occur due to input errors in input Japanese sentences or word selection errors in kana-kanji conversion has been done by marking error-prone homophones in a dictionary in advance, and checking the dictionary in the input document. All locations that matched the character string of the homophone were extracted as homophone errors.

[Problem to be solved by the invention]

上記従来技術においては、入力された日本文中の、予め
辞書にマークしておいた同音異義語の文字列と一致する
箇所すべてを同音異義語誤りとして抽出するため、以下
の問題点があった。The above-mentioned conventional technology has the following problems because all parts of the input Japanese sentence that match the character strings of homophones marked in advance in the dictionary are extracted as homophone errors.

■　正しく使用されている語もすべて同音異義語誤りと
して抽出してしまう。■ All words that are used correctly are also extracted as homonym errors.

■　訂正候補を抽出することが困難である。■ It is difficult to extract correction candidates.

■　単独語の同音異義語誤りを検出できない。単独語の
同音異義語誤りとは、複合語に含まれる同音異義語誤り
ではないものをいう。例として「排出ガスを帰省する」
の「帰省する」が単独語の同音異義語誤りに当たる。■ Cannot detect homophone errors for single words. A homophone error in a single word is one that is not a homophone error in a compound word. For example, "returning exhaust gas"
``Go home'' is a single word homophone error.

なお、複合語に含まれる同音異義語誤りの自動検出・訂
正方式や装置の例としては、本出願人による特願昭６３
−１４９４４８号及び特願平１−１１５５７６号がある
が、これらは複合語に含まれる同音異義語誤りのみを対
象としている。An example of an automatic detection/correction method and device for homophone errors contained in compound words is the patent application published in 1983 by the present applicant.
-149448 and Japanese Patent Application No. 1-115576, but these target only homophone errors contained in compound words.

本発明の目的は、上記の問題点を解決し、入力された日
本文について的確に、単独語などの同音異義語誤りを検
出して、訂正候補を提示する日本文誤り自動検定・訂正
装置を提供することにある。The purpose of the present invention is to solve the above-mentioned problems and provide an automatic Japanese sentence error verification/correction device that accurately detects homophone errors such as single words in input Japanese sentences and presents correction candidates. It is about providing.

[Means to solve the problem]

本発明の日本文誤り自動検出・訂正装置は、同音異義語
の字面と検定格決定ルールに示される格要素の持つ単語
の意味カテゴリとの連接可否情報を記述した意味連接辞
書と、同音異義語ごとに意味連接辞書中の意味連接情報
がどの格要素との連接情報を記述したものかを規定する
検定格決定ルールとを備え、さらに、入力日本文の形態
素解析を行い、入力日本文を単語単位に分割する第１の
手段と、単語単位に分割された入力日本文を文節単位に
まとめる第２の手段と、文節間の係り受け関係を解析す
ることによって入力文を構造化する第３の手段と、入力
日本文に同音異義語が含まれるか否かを判定し、含まれ
る場合にはその語を検定対象とする第４の手段と、第４
の手段で抽出された検定対象１つ１つについて、前記検
定格決定ルールを該検定対象の字面で検索することによ
って、該検定対象との意味的な連接可否を検定する対象
となる格要素（検定路と呼ぶ）を決める第５の手段と、
第５の手段で決定した検定路の持つ単語の意味カテゴリ
と検定対象との意味的な連接を、前記意味連接辞書を参
照することによって検定し、両者に意味的な連接関係が
存在しない場合にのみ。The automatic Japanese sentence error detection/correction device of the present invention includes a semantic concatenation dictionary that describes the concatenation information between the font of a homophone and the meaning category of a word that has a case element shown in the test case determination rule; For each case, the semantic conjunction information in the semantic conjunction dictionary describes the conjunction information with which case element. Furthermore, it performs morphological analysis of the input Japanese sentence and converts the input Japanese sentence into words. A first means of dividing the input Japanese sentence into units, a second means of organizing the input Japanese sentence divided into words into phrases, and a third means of structuring the input sentence by analyzing the dependency relationships between phrases. a fourth means for determining whether or not an input Japanese sentence contains a homophone and, if the input Japanese sentence contains a homonym, and for making the word a test subject;
For each test target extracted by the method, by searching for the test case determination rule using the font of the test target, the case element to be tested for semantic connection with the test target ( a fifth means for determining the test path (referred to as the test path);
The semantic connection between the word semantic category of the test path determined by the fifth means and the test target is tested by referring to the semantic connection dictionary, and if there is no semantic connection between the two, only.

該検定対象を同音異義語誤りであると検定する第６の手
段と、第６の手段で同音異義語誤りであると検定された
単語と同一の読みと品詞を持つ単語（同音異義の関係に
ある語）を、該同音異義語誤りの訂正候補として抽出し
、該訂正候補と、該訂正候補に対応する同音異義語誤り
の単語とを置き換えて、該訂正候補と該訂正候補に対す
る検定格の持つ単語の意味カテゴリとの連接可否情報を
意味連接辞書から検索し、意味的な連接が可能な場合に
は該訂正候補を正解候補として出力し、すべての訂正候
補との意味的な連接が不可の場合には抽出したすべての
訂正候補を正解候補として出力する第７の手段とから構
成される。A sixth means for testing the test subject as having a homophone error; A certain word) is extracted as a correction candidate for the homophone error, and the correction candidate and the word with the homophone error corresponding to the correction candidate are replaced, and the test case for the correction candidate and the correction candidate is calculated. Searches the semantic association dictionary for information on whether or not the word can be linked with the semantic category, and if semantic linkage is possible, outputs the correction candidate as the correct answer candidate, and semantic linkage with all correction candidates is not possible. and a seventh means for outputting all the extracted correction candidates as correct candidates in the case of .

[For production]

入力日本文に検定対象となる同音異義語（以下、検定対
象単語と呼ぶ）が存在する場合に同音異義語誤りの検定
処理を開始する。そして、該検定対象単語の字面と、該
検定対象単語を修飾する文節のうちの検定格決定ルール
に示される格の持つ単語の意味カテゴリ（意味カテゴリ
とは、名詞をその意味によって分類するためのものであ
り、予め用意した意味カテゴリ体系に基づいている）と
の意味的な連接関係を、このような連接可否情報を記述
した意味連接辞書を検定対象単語の字面で検索すること
によって取得し、意味的な連接関係が存在しない場合に
同音異義語誤りであると検定する。When a homophone to be tested (hereinafter referred to as a word to be tested) exists in the input Japanese sentence, the homophone error testing process is started. Then, the semantic category (semantic category is a term used to classify nouns according to their meaning) is based on the font of the word to be tested and the case shown in the test case determination rule among the clauses that modify the word to be tested. (based on a semantic category system prepared in advance) by searching a semantic linkage dictionary that describes such linkability information using the font of the word to be tested, If there is no semantic conjunction, it is determined that it is a homophone error.

同音異義語誤りに対して、同一の読みと品詞を持つ単語
を訂正候補として抽出しく意味連接辞書を同音異義語誤
りの読みで検索することによって抽出する）、この訂正
候補を同音異義語誤りであると検定された単語と置き換
えて前記と同様に意味連接辞書を用いて検定を行い、連
接可能な訂正候補を同音異義語誤りに対する正解候補と
する。For homophone errors, words with the same pronunciation and part of speech are extracted as correction candidates (by searching the meaning conjunction dictionary with the pronunciation of the homophone error), and these correction candidates are extracted as correction candidates for homophone errors. A test is performed in the same manner as above using the semantic concatenation dictionary by replacing the words that have been verified as existing, and correction candidates that can be concatenated are determined as correct candidates for the homophone error.

これにより、日本文中に現れる単独語等の同音異義語が
正しいか否かを検定し、誤っていると検定した場合には
、その訂正候補を抽出することができる。Thereby, it is possible to test whether a homophone such as a single word appearing in a Japanese sentence is correct or not, and if it is judged to be incorrect, a correction candidate can be extracted.

〔Example〕

以下、本発明の一実施例について図面による説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例の基本構成図を示す。FIG. 1 shows a basic configuration diagram of an embodiment of the present invention.

１６は日本文誤り自動検出・訂正装置本体で、ハード的
にはＣＰＵ、メモリなどで構成されるが、機能的には、
形態素解析部上、文節認定処理部２、係り受け解析部３
、同音異義語抽出部４、検定格抽出部５、同音異義語検
定処理部６、同音異義語訂正抽出部７、意味連接辞書検
索部８よりなる。16 is the automatic Japanese sentence error detection/correction device, which consists of a CPU, memory, etc. in terms of hardware, but in terms of functionality,
Morphological analysis part 2, clause recognition processing part 2, dependency analysis part 3
, a homophone extraction section 4, a test case extraction section 5, a homophone test processing section 6, a homophone correction extraction section 7, and a meaning conjunction dictionary search section 8.

形態素解析部１は、本装置１６の入力である日本文を、
日本語単語辞書９と文法辞書１０とを用いて単語分割を
行い、各単語に品詞や意味カテゴリなどの付与を行う、
文節認定処理部２は、単語分割された入力文を単語の持
つ品詞を頼りに文節単位にまとめる処理を行う。係り受
け解析部３は、文節間の係り受け関係を、係り受けルー
ル１１を用いて判定し、入力文の構造化を行う。The morphological analysis unit 1 analyzes the Japanese sentence input to the device 16 by
Word division is performed using the Japanese word dictionary 9 and grammar dictionary 10, and a part of speech, meaning category, etc. is assigned to each word.
The phrase recognition processing unit 2 performs a process of grouping the word-divided input sentence into phrase units based on the parts of speech of the words. The dependency analysis unit 3 determines the dependency relationships between clauses using the dependency rules 11, and structures the input sentence.

同音異義語抽出部４は、入力文中に検定対象の同音異義
語が存在するか否かを判定し、存在する場合には該同音
異義語を検定対象単語として抽出する。検定格抽出部５
は、検定格決定ルール１２を用いて、同音異義語抽出部
４で抽出された同音異義語１つ１つに対して、どの格要
素の持つ単語の意味カテゴリとの意味的な連接を検定す
るかを決定する。The homophone extraction unit 4 determines whether or not a homophone to be tested exists in the input sentence, and if so, extracts the homophone as a word to be tested. Qualification extractor 5
uses the test case determination rule 12 to test the semantic connection of each homonym extracted by the homonym extraction unit 4 with the semantic category of the word which case element has. Decide whether

同音異義語検定処理部６は、同音異義語抽出部４で抽出
された同音異義語１つ工つに対して、検定格抽出部５で
決まった格要素の持つ単語の意味カテゴリとの連接可否
を、意味連接辞書１３，１４を用いて検定し、該同音異
義語が誤りであるか否かを決定する。The homophone test processing unit 6 determines whether each homophone extracted by the homophone extraction unit 4 can be linked with the meaning category of the word that has the case element determined by the test case extraction unit 5. is tested using the semantic concatenation dictionaries 13 and 14 to determine whether the homophone is incorrect.

同音異義語訂正候補抽出部７は、意味連接辞書１３．１
４を該同音異義語誤りであると検定された単語の読みで
検索することによって訂正候補を抽出する。このとき訂
正候補としては、対応する同音異義語誤りと同一の品詞
を持つもののみを抽出する。次に、該訂正候補の字面で
検定格決定ルール１２を検索して、該訂正候補に対する
検定格を決める。さらに同音異義語訂正候補抽出部７で
は、該同音異義語誤りである単語とその訂正候補とを置
き換えて、該訂正候補に対する検定格の持つ単語の意味
カテゴリとの連接可否を意味連接辞書１３．１４を用い
て検定し、連接可の訂正候補すべてを正解候補として出
力ファイル１５に出力し、すべての訂正候補が連接不可
の場合には訂正候補すべてを出力ファイルに出力する。The homophone correction candidate extraction unit 7 has a semantic linkage dictionary 13.1.
Correction candidates are extracted by searching for 4 using the pronunciation of the word that has been determined to be a homophone error. At this time, only those having the same part of speech as the corresponding homophone error are extracted as correction candidates. Next, the proficiency determination rule 12 is searched using the font of the correction candidate to determine the proficiency for the correction candidate. Furthermore, the homophone correction candidate extracting unit 7 replaces the word with the homophone error and its correction candidate, and determines whether or not the correction candidate can be connected to the semantic category of the word that has the test case in the semantic connection dictionary 13. 14, all correction candidates that can be connected are output as correct candidates to an output file 15, and if all correction candidates cannot be connected, all correction candidates are output to an output file.

意味連接辞書検索部８は、同音異義語検定処理部６及び
同音異義語訂正候補抽出部７からの指示で同意味連接辞
書１３．１４を検索するものである。The semantic conjunctive dictionary search section 8 searches the homophone conjunctive dictionaries 13 and 14 in response to instructions from the homophone test processing section 6 and the homophone correction candidate extraction section 7.

日本語単語辞書９は日本語単語の品詞情報や意味カテゴ
リなどの形態素情報を記述している。文法辞書１０は日
本語単語の持つ品詞間の接続関係などの文法情報を記述
している。係り受けルール１１は、文節間の係り受け関
係（例：助詞「が」を持つ文節は後方の用言文節を修飾
する）や、ヒユーリスティックルール（例：係り受け関
係は交差しない、同じ格助詞を持つ複数の文節が１つの
用言を修飾することはない）を記述している。検定格決
定ルール１２は、同音異義語の字面をキーとして持ち、
該同音異義語との意味的な連接関係の強い格要素がなん
であるか（例えば、を格、に格）を記述したものである
。前方意味連接辞書１３は、同音異義語の字面とその読
みの２つをキーとして持ち、該同音異義語と検定路（検
定格決定ルール１２に記述されている格）の持つ単語の
意味カテゴリとの間の連接可否情報を記述している。The Japanese word dictionary 9 describes morphological information such as part-of-speech information and semantic categories of Japanese words. The grammar dictionary 10 describes grammatical information such as connectivity between parts of speech of Japanese words. Dependency rule 11 is based on the dependency relationship between clauses (e.g., a clause with the particle "ga" modifies the following pragmatic clause) and the hyuristic rule (e.g., dependency relationships do not intersect, the same case does not intersect). (Multiple clauses with particles do not modify a single phrase). Qualification determination rule 12 has the font of the homophone as a key,
It describes the case elements that have a strong semantic connection with the homophone (for example, wo case, ni case). The forward meaning conjunctive dictionary 13 has two keys, the font of the homophone and its pronunciation, and the meaning category of the word that the homophone and the pronunciation path (the case described in the proficiency determination rule 12) have. It describes information on whether connection is possible between the two.

後方意味連接辞書１４は、同音異義語が名詞を連体修飾
するような場合に、該同音異義語の後方に接続すること
ができる単語の持つ意味カテゴリとの連接可否情報を記
述したものである。The backward meaning concatenation dictionary 14 describes information on whether concatenation is possible with a meaning category of a word that can be concatenated after a homophone when the homophone modifies a noun as an adnominal.

該日本文誤り自動検出・訂正装置の処理フローを第２図
に示す、以下、これに従って第１図の動作を説明する。The processing flow of the automatic Japanese sentence error detection/correction device is shown in FIG. 2, and the operation of FIG. 1 will be explained below in accordance with this flow.

ステップＳｌ：日本文誤り自動検出・訂正装置本体１６の入力である日
本文に対して、形態素解析部１では、日本語単語辞書９
、文法辞書１０を用いて単語候補抽出、品詞接続検定な
どの形態素解析を行い、該日本文を単語単位に分割し、
それぞれの単語に品詞情報、意味カテゴリなどを付与し
た後、解析結果を文節認定処理部２に送る。Step Sl: For the Japanese sentence that is input to the automatic Japanese sentence error detection/correction device main body 16, the morphological analysis unit 1 uses the Japanese word dictionary 9.
, perform morphological analysis such as word candidate extraction and part-of-speech connection test using the grammar dictionary 10, divide the Japanese sentence into word units,
After adding part-of-speech information, meaning categories, etc. to each word, the analysis results are sent to the clause recognition processing section 2.

ステープＳ２：文節認定処理部２では、形態素解析結果をもとに、入力
文を文節単位にまとめ、この結果を係り受け解析部３に
送る。STEP S2: The clause recognition processing unit 2 groups the input sentence into clause units based on the morphological analysis results, and sends the results to the dependency analysis unit 3.

ステップＳ３：係り受け解析部３では、係り受けルール１１を用いて、
文節間の係り受け関係を求め、入力文を構造化する。そ
して、形態素解析結果と係り受け解析結果とを同音異義
語抽出部４に送る。Step S3: The dependency analysis unit 3 uses the dependency rule 11 to
Determine the dependency relationships between clauses and structure the input sentence. Then, the morphological analysis result and the dependency analysis result are sent to the homophone extraction unit 4.

ステップＳ４：同音異義語抽出部４では、入力文中に検定すべき同音異
義語が単独語として存在する場合に、これを検定対象単
語として形態素解析結果から抽出する。抽出方法として
は、日本語単語辞書９中に検定対象とする同音異義語字
面に検定対象であることを示すフラグを立てておく方法
や、意味連接辞書１３．１４を入力文の各単語の字面で
検索する方法などが考えられるが、ここでは特に限定し
ない。Step S4: In the homophone extraction unit 4, when a homophone to be tested exists as a single word in the input sentence, it is extracted from the morphological analysis result as a word to be tested. The extraction method is to set a flag in the Japanese word dictionary 9 to indicate that the font of the homophone to be tested is to be tested, or to use the meaning conjunction dictionary 13.14 to check the font of each word in the input sentence. There are several possible ways to search, but this is not particularly limited here.

ステップＳ５ニステップＳ４において、検定対象単語が抽出されている
場合には、ステップＳ６に進む。そうでなければ本処理
を終了する。Step S5 If the word to be tested has been extracted in step S4, the process proceeds to step S6. Otherwise, this process ends.

検定対象単語が用言である場合には、その活用形が連体
形であるか否かによって処理を振り分ける。If the word to be tested is a predicate, the processing is divided depending on whether its inflected form is an adjunctive form or not.

ステップＳ７：検定対象単語の活用形が連体形の場合には、検定に用い
る意味連接辞書として後方意味連接辞書１４を選択する
。なお、第２図においては、「検定対象辞書」という変
数に後方意味連接辞書をアサインすることによって、前
記の選択を行っている。Step S7: When the inflected form of the word to be tested is an adnominal form, the backward semantic conjunctive dictionary 14 is selected as the semantic concatenative dictionary used for the test. In FIG. 2, the above selection is made by assigning the backward semantic linking dictionary to the variable "test dictionary".

ステップＳ８：検定対象単語の活用形が連体形以外の場合および検定対
象単語が用語以外の場合には、検定に用いる意味連接辞
書として前方意味連接辞書１３を選択する。なお、第２
図においては、ｒ検定対象辞書」という変数に前方意味
連接辞書をアサインすることによって、前記の選択を行
っている。Step S8: When the conjugation form of the word to be tested is other than the adnominal form, and when the word to be tested is other than a term, the forward semantic linkage dictionary 13 is selected as the semantic linkage dictionary used for the test. In addition, the second
In the figure, the above selection is made by assigning the forward semantic concatenation dictionary to the variable "r test target dictionary".

ステップＳ９：検定格抽出部５では、検定対象単語の字面で検定格決定
ルール１２を検索し、どの格要素の持つ単語の意味カテ
ゴリとの連接を見るかを決定する。Step S9: The test case extracting unit 5 searches the test case determination rule 12 based on the font of the word to be tested, and determines which case element is associated with the meaning category of the word.

この格要素のことを検定路と呼ぶ。This case element is called the test path.

ステップＳＩＯ：同音異義語検定処理部６では、まず、係り受け解析部３
で得られた係り受け関係をもとに、検定対象単語を修飾
している文節のうち、ｒ検定路の持つ単語の意味カテゴ
リ」を抽出する。Step SIO: In the homophone test processing unit 6, first, the dependency analysis unit 3
Based on the dependency relationships obtained in , the meaning category of the word that the r test path has among the phrases that modify the word to be tested is extracted.

ステップＳｌｌ：次に、意味連接辞書検索部８を介して、検定対象単語の
字面で意味連接辞書１３．１４を検索し、該検定対象単
語に関するレコードを受け取る。Step Sll: Next, the semantic linking dictionary 13, 14 is searched for the font of the word to be tested via the semantic linking dictionary search unit 8, and a record regarding the word to be tested is received.

ステップＳ１２：次に、ステップＳｌｌで得た該検定対象単語に関する意
味連接辞書上３，１４中のレコードの、ステップＳＩＯ
で得た「検定路の持つ単語の意味カテゴリＪに対応する
意味カテゴリ番号（意味カテゴリと１：ｌに対応してい
る）に関する意味連接情報を得る。検定対象単語と「検
定路の持つ単語の意味カテゴリ」に対応する意味カテゴ
リ番号との間の連接が可である場合には、ステップＳ１
３に進み、そうでない場合にはステップＳ１４に進む。Step S12: Next, step SIO
Obtain the semantic connection information regarding the semantic category number (corresponding to the semantic category and 1:l) that corresponds to the semantic category J of the word in the test path obtained in . If the connection between the meaning category number and the meaning category number corresponding to "semantic category" is possible, step S1
If not, the process proceeds to step S14.

ステップＳ１３：該検定対象単語を連接ＯＫとして本処理を終了する。Step S13: This process ends with the word to be tested being OK to be connected.

ステップＳ１４：該検定対象単語を同音異義語誤りであるとして、全情報
を同音異義語訂正候補抽出部７に送る。Step S14: The test target word is determined to be a homophone error, and all information is sent to the homophone correction candidate extraction unit 7.

ステップＳ１５　：同音異義語訂正候補抽出部７では、意味連接辞書検索部
８を介し、同音異義語誤りの読みで意味連接辞書１３．
１４を検索し、該同音異義語誤りの単語に対して、同一
の読みと品詞の持つ単語すべてを訂正候補として抽出す
る。ここで、その数をｍとする。Step S15: The homophone correction candidate extracting unit 7 searches the semantic concatenation dictionary 13. with the reading of the homophone error via the semantic concatenation dictionary search unit 8.
14, and all words with the same pronunciation and part of speech are extracted as correction candidates for the word with the homophone error. Here, let the number be m.

ステップ８１６：ｉ番目の訂正候補を対象として処理を進める。Step 816: Processing proceeds with the i-th correction candidate as the target.

まず、ｉ＝１とする。First, let i=1.

ステップＳ１７：該訂正候補の字面をキーとして、検定路決定ルール１２
を検索し、検定路を決める。Step S17: Using the face of the correction candidate as a key, test path determination rule 12
Search for and decide the test path.

ステップ８１８：検定対象単語を修飾する文節のうち、ステップＳ１７で
得られた検定路と同一の助詞を持つ格要素を抽出し、該
格要素の持つ単語の意味カテゴリを得る。Step 818: Extract case elements having the same particle as the test path obtained in step S17 from among the clauses modifying the word to be tested, and obtain the meaning category of the word possessed by the case element.

ステップＳ１９ニステップＳ１５で得たｉ番目の訂正候補に関する意味連
接辞書１３，１４のレコード中の、ステップＳ１７で得
た検定路での単語の持つ意味カテゴリに対応する意味カ
テゴリ番号（意味カテゴリと１：１に対応している）に
関する意味連接情報を得る。両者の間の連接が可である
場合には、ステップＳ２０に進み、そうでない場合には
ステップＳ２１に進む。In step S19, the meaning category number (semantic category and 1 :1)) is obtained. If the connection between the two is possible, the process advances to step S20; otherwise, the process advances to step S21.

ステップＳ２０：該訂正候補を正解候補として出力ファイル１５に出力し
、ステップＳ２１に進む。Step S20: Output the correction candidate as a correct answer candidate to the output file 15, and proceed to step S21.

ステップＳ２１：次の訂正候補に関する処理をするために、ｉ＝ｉ　＋　
１とする。Step S21: In order to process the next correction candidate, i=i +
Set to 1.

スー・プＳ２２：ｍ個の訂正候補すべてについて処理が終了するかどうか
で処理を振り分ける。訂正候補すべてについて処理が終
了している場合には、ステップＳ２１に進み、そうでな
い場合にはステップＳエフ戻る。Soup S22: Processing is distributed depending on whether processing is completed for all m correction candidates. If the processing has been completed for all correction candidates, the process advances to step S21; otherwise, the process returns to step SF.

ステップＳ２３：１つでも正解候補として出力ファイル１５に出力されて
いる場合には、本処理終了し、そうでない場合には、ス
テップＳ２４に進む。Step S23: If even one correct answer candidate is output to the output file 15, this process ends; otherwise, the process proceeds to step S24.

ステップＳ２４：ｍ個の訂正候補すべてを正解候補として出力ファイル１
５に出力する。Step S24: Output file 1 with all m correction candidates as correct answer candidates.
Output to 5.

なお、ステップＳ４で複数の同音異義語が抽出された場
合には、１つ１つの同音異義語について、ステップ８５
以下の処理を行う。Note that if a plurality of homonyms are extracted in step S4, step 85 is performed for each homonym.
Perform the following processing.

第３図に前方意味連接辞書工３のフィールド構成例を示
す、なお、後方意味連接辞書工４についても同様である
。第３図において、１７は意味連接辞書を検索するとき
のキーとなる同音異義語の字面、１８は同音異義語訂正
候補抽出部７において意味連接辞書を検索するときのキ
ーとなる同音異義語の読み、工９は同音異義語の品詞、
２０はあらかじめ設けた意味カテゴリ体系（第３図では
Ｎ個の意味カテゴリからなる）の各意味カテゴリに対し
て付与した番号対応に、１７の字面と検定格の単語の持
つ意味カテゴリとの連接可否情報を記述したカテゴリ番
号対応の連接可否情報部である。２工は２０の各意味カ
テゴリ番号の連接可否情報を示すフィールドであり、＃
ｎ（１≦ｎ≦Ｎ）は連接可の場合には「Ｏ」、連接不可
の場合には「Ｘ」、連接不明の場合（該字面について連
接可であり、さらに同音異義の関係にある別表記の単語
に対しても連接可である場合）には「Δ」を表している
。２２は前方意味連接辞書上３のニレコードを示す。FIG. 3 shows an example of the field configuration of the forward semantic linking dictionary engineer 3, and the same applies to the backward semantic linking dictionary engineer 4. In FIG. 3, numeral 17 indicates the face of the homophone, which is the key when searching the semantic association dictionary, and numeral 18 indicates the face of the homophone, which is the key when searching the semantic association dictionary in the homophone correction candidate extraction unit 7. Reading, engineering 9 is the part of speech of the homophone,
20 indicates whether or not the 17 characters can be connected to the meaning category of the word in the test case, based on the number assigned to each meaning category in the semantic category system (consisting of N meaning categories in Figure 3) prepared in advance. This is a linkability information section corresponding to a category number that describes information. 2nd field is a field indicating whether or not each of the 20 meaning category numbers can be connected, and #
n (1≦n≦N) is "O" if concatenation is possible, "X" if conjunctive is not possible, and "X" if concatenation is unknown (if concatenation is possible for the character face and there is a homophone relationship) If the word can also be connected to the written word), "Δ" is shown. 22 indicates the 3rd record on the forward meaning conjunction dictionary.

第４図に検定格決定ルール１２の構成例を示す。FIG. 4 shows an example of the configuration of the qualification determination rule 12.

第４図において、２３は検定格決定ルールを検索すると
きのキーとなる同音異義語の字面、２４は２３の同音異
義語に対する検定格を示す助詞の字面である。２５は検
定格決定ルール１２のｌレコードを示す。In FIG. 4, numeral 23 is the face of a homophone that is a key when searching for a test case determination rule, and 24 is the face of a particle indicating the test case for the homophone 23. 25 indicates the l record of the qualification determination rule 12.

次に、具体例について説明する。例文は「排気ガスを現
在の半分の量に旦豊工ゑ。」　（正解：規制する）とす
る。Next, a specific example will be explained. An example sentence is, ``We will reduce the amount of exhaust gas to half the current amount.'' (Correct answer: regulate).

第５図及び第６図に前方意味連接辞書１３、検定格決定
ルール１２の内容例をそれぞれ示す。第５図の前方意味
連接辞書中、ｒＯＪは連接可、「×」は連接不可、「Δ
」は連接不明を意味する。5 and 6 show examples of the contents of the forward meaning conjunction dictionary 13 and the qualification determination rule 12, respectively. In the forward meaning conjunction dictionary in Figure 5, rOJ is connectable, “×” is not connectable, and “Δ
” means unknown connection.

第７図は例文「排気ガスを現在の半分の量に帰省する。Figure 7 shows an example sentence, ``We will reduce the amount of exhaust gas to half of the current amount.

」に対する処理過程を示したものである。This figure shows the processing process for ``.

まず、形態素解析部１において、単語分割及び各単語へ
の読み、品詞、意味カテゴリなどの付与が行われる。次
に、文節認定処理部２において、文節単位にまとめられ
（例文では、排気ガスを／現在の／半分の／量に／帰省
する／）、さらに係り受け解析部３で、係り受けルール
１１を用いて文節間の係り受け関係が認定され、入力文
の構造化が行われる（ステップＳｌ−ステップＳ３）。First, the morphological analysis unit 1 divides words and assigns readings, parts of speech, meaning categories, etc. to each word. Next, the phrase recognition processing unit 2 groups the phrases into phrases (in the example sentence, reduce the exhaust gas to /the current /half of the amount /go home/), and the dependency analysis unit 3 adds dependency rule 11. The dependency relationship between clauses is recognized using this method, and the input sentence is structured (Step S1-Step S3).

、第７図（ａ）は例文の形態素解析結果、同図（ｂ）は
係り受け解析結果である。, FIG. 7(a) shows the result of morphological analysis of an example sentence, and FIG. 7(b) shows the result of dependency analysis.

次に、同音異義語抽出部４において、検定対象単語とな
る同音異義語が抽出される。例文では。Next, the homophone extraction unit 4 extracts homophones to be the words to be tested. In the example sentence.

「帰省する」が検定対象単語となる（ステップＳ４、Ｓ
５）。本実施例では、形態素解析時点で検定対象である
ことを示すフラグを日本語単語辞書９のレコードの中に
設けたが、別の方法として、同音異義語抽出部４におい
て各単語の字面をキーとして意味連接辞書１３．１４を
検索することによって、検定対象単語を抽出する方法も
考えられる。さらに、同音異義語抽出部４では、検定対
象単語の品詞及び活用形を用いて検定対象辞書として前
方意味連接辞書１３あるいは後方意味連接辞書１４のい
ずれかを選択する。例文の場合、「帰省する」は用言で
あるが、終止形であるので、検定対象辞書は前方意味連
接辞書上３が選択される（ステップ８６〜ステツプＳ８
）。“Go home” becomes the word to be tested (steps S4, S
5). In this embodiment, a flag indicating that it is a test target is set in the record of the Japanese word dictionary 9 at the time of morphological analysis, but as another method, the homonym extraction unit 4 can key the font of each word. It is also conceivable to extract the words to be tested by searching the semantic linking dictionary 13, 14 as follows. Further, the homophone extraction unit 4 selects either the forward semantic conjunctive dictionary 13 or the backward semantic conjunctive dictionary 14 as the dictionary to be tested, using the part of speech and conjugated form of the word to be tested. In the case of the example sentence, "Go home" is a predicate, but it is a final form, so 3 of the forward meaning conjunctive dictionary is selected as the dictionary to be tested (step 86 to step S8).
).

検定格抽出部５では、検定格決定ルール１２を用いて検
定格の決定を行う。例文では、検定対象単語「帰省する
」の語幹「帰省」で検定格決定ルール１２を検索する。The qualification extracting unit 5 determines the qualification using the qualification determination rule 12. In the example sentence, the test qualification determination rule 12 is searched for the stem word "homecoming" of the test target word "homecoming".

検定格決定ルールエ２は第６図のようになっているので
、検定格は「に」格に決まる（ステップＳ９）。Since the proficiency determination rule 2 is as shown in FIG. 6, the proficiency case is determined to be the "ni" case (step S9).

次に、同音異義語検定処理部６において、検定格の単語
の持つ意味カテゴリの取得が行われる（ステップ５１０
）。例文の構造化は第７図（ｂ）のごとくなっているの
で、検定対象単語「帰省する」を修飾している文節は「
排気ガスを」と「量に」の２つである。このうち、検定
格が「に」格であるのは、ｒ量に」であるので、この文
節の単語「量」の持つ意味カテゴリ「量」　（意味カテ
ゴリ番号＝１５４）を取得する。次に、同音異義語検定
処理部６では、検定対象単語「帰省する」の語幹「帰省
」で検定対象辞書（例文においては前方意味連接辞書１
３）を検索する（ステップ５１１）。前方意味連接辞書
１３は第５図のようになっているので、「帰省」に対す
る意味カテゴリ番号１５４を持つ単語との連接可否は連
接不可ｒＸＪである。従って、「帰省する」は同音異義
語誤りであると検定される（ステップＳ１２．ステップ
５１４）。Next, the homophone test processing unit 6 acquires the meaning category of the word with the test case (step 510).
). The structure of the example sentence is as shown in Figure 7 (b), so the clause modifying the test word ``homecoming'' is ``
There are two types: "exhaust gas" and "quantity." Among these, the test case "ni" is "r quantity ni", so the semantic category "quantity" (semantic category number=154) of the word "quantity" in this clause is acquired. Next, in the homophone test processing unit 6, the stem word ``Kikaku'' of the test target word ``Khomeishuru'' is used in the test target dictionary (in the example sentence, the forward meaning conjunctive dictionary 1
3) is searched (step 511). Since the forward meaning conjunction dictionary 13 is as shown in FIG. 5, the possibility of concatenation with the word having the meaning category number 154 for "homecoming" is rXJ. Therefore, "Go home" is determined to be a homophone error (Step S12. Step 514).

第７図（ｃ）は上記ステップ８４〜Ｓ１４の処理内容で
ある。FIG. 7(c) shows the processing contents of steps 84 to S14.

同音異義語訂正候補抽出部７では、同音異義語誤り「帰
省するｊの語幹の読みである「きせい」で、品詞が「帰
省する」と同じ「さ変動詞」であるものを前方意味連接
辞書１３から検索し、訂正候補として抽出する。前方意
味連接辞書１３は第５図のようになっているので、訂正
候補として、「規制」、「規正」の２つ（ｍ　＝　２　
）が得られる（ステップ５１５）。この処理内容を示し
たのが第７図（ｄ）である。The homophone correction candidate extraction unit 7 performs a forward semantic conjunction of the homophone error ``Kisei'', which is the stem reading of ``Kisei'', which is the pronunciation of the stem of ``Kisei'', which has the same part of speech as ``Kisei''. It is searched from the dictionary 13 and extracted as a correction candidate. Since the forward meaning conjunction dictionary 13 is as shown in Figure 5, there are two correction candidates: "regulation" and "regulation" (m = 2).
) is obtained (step 515). FIG. 7(d) shows the contents of this process.

次に、同音異義語訂正候補抽出部７では、まず、訂正候
補「規制」について処理が行われる（ステップ５１６）
。訂正候補の字面「規制」で検定格決定ルール１２を検
索し、検定格を決める。検定格決定ルール１２は第６図
のようになっているので、「規制」に対する検定格は「
を」格に決まる（ステップ５１７）、例文の構造化は第
７図（ｂ）のごとくなっているので、検定対象単語「帰
省する」を修飾している文節は「排気ガスを」とｒ量に
」の２つである。このうち、訂正候補「規制」に対する
検定格が「を」格であるのは、ｒ排気ガスを」であるの
で、この文節の単語「排気ガス」の持つ意味カテゴリ「
気体」　（意味カテゴリ番号＝４３）を取得する（ステ
ップ８１８）、第５図より、「規制」と意味カテゴリ番
号４３との連接可否は、連接可「０」であるので（ステ
ップ５１９）、「規制」を「帰省」の正解候補として出
力ファイル１５に出力する（ステップ５２０）。Next, the homophone correction candidate extraction unit 7 first processes the correction candidate "regulation" (step 516).
. The qualification determination rule 12 is searched using the word "regulation" of the correction candidate, and the qualification is determined. Since the proficiency determination rule 12 is as shown in Figure 6, the proficiency rating for "Regulation" is "
The example sentence is structured as shown in Figure 7 (b), so the phrase that modifies the test word ``go home'' is ``exhaust gas'' (step 517). There are two. Among these, the valid case for the correction candidate "regulation" is "r exhaust gas", so the meaning category of the word "exhaust gas" in this clause is "
Gas" (semantic category number = 43) is obtained (step 818). From FIG. "Restrictions" is output to the output file 15 as a correct answer candidate for "Homecoming" (step 520).

第７図（ｅ）は、この訂正候補「規制」についての処理
内容である。FIG. 7(e) shows the processing details for this correction candidate "regulation".

次に、訂正候補「規正」について処理を行う（ステップ
５２１）、ｒ規制」と同様にして「規正」に対する検定
格も「を」格に決まる。「規正」と意味カテゴリ番号４
３との連接可否は、連接不可「×」であるので、「規正
」は正解候補とはならない。Next, processing is performed on the correction candidate "Regulation" (step 521), and in the same way as "r-Regulation", the test case for "Regulation" is also determined to be "O" case. “Regulations” and meaning category number 4
As for whether or not it can be connected with 3, it is "x" which means that it cannot be connected, so "regulation" is not a correct answer candidate.

第７図（ｆ）は、この訂正候補「規正」についての処理
内容である。FIG. 7(f) shows the processing details for this correction candidate "Regulation".

すべての訂正候補の処理が終了した段階で（ステップ５
２２）、正解候補として「規制」が出力されているので
、ステップ８２４をとばして本処理を終了する（ステッ
プ５２３）。When all correction candidates have been processed (Step 5)
22) Since "regulation" is output as a correct answer candidate, step 824 is skipped and the present process is ended (step 523).

以上のようにして、ｒ排気ガスを現在の半分の量に帰省
する」に含まれる同音異義語誤り「帰省する」に対する
正解候補として、「規制するＪが出力ファイル１５に得
られる。As described above, ``Regulate J'' is obtained in the output file 15 as a correct answer candidate for the homophone error ``Return'' included in ``Reduce the amount of r exhaust gas to half the current amount''.

〔Effect of the invention〕

以上の説明から明らかなように、本発明の日本文誤り自
動検出・訂正装置によれば、以下の効果が得られる。As is clear from the above description, the automatic Japanese sentence error detection/correction device of the present invention provides the following effects.

■　意味的な連接関係を検定することによって、同音異
義語誤りか否かの判定を行うので、同音異義語誤りであ
るものだけを的確に検出することができる。- Since it is determined whether or not there is a homophone error by testing the semantic linkage, it is possible to accurately detect only homophone errors.

■　同音異義語誤りに対する正解候補として、意味的な
連接関係を満足するもののみを抽出するので、確からし
い候補のみを提示できる。■ Since only those that satisfy the semantic conjunction are extracted as correct answer candidates for homophone errors, only likely candidates can be presented.

■　検定対象単語を修飾する文節の中から、検定格決定
ルールに示される検定格を抽出し、この検定格の持つ単
語の意味カテゴリと検定対象単語との意味的な連接関係
を検定するので、単独語の同音異義語誤りを検出するこ
とができる。■ The test case shown in the test case determination rule is extracted from the clauses that modify the test target word, and the semantic linkage between the word semantic category of this test case and the test target word is tested. Homophone errors in single words can be detected.

[Brief explanation of drawings]

第１図は本発明の一実施例の基本構成図、第２図は第１
図の動作を説明するための処理の概略フロー図、第３図
は意味連接辞書のフィールド構成例を示す図、第４図は
検定格決定ルールのフィールド構成例を示す図、第５図
は前方意味連接辞書の内容例を示す図、第６図は検定格
決定ルールの内容例を示す図、第７図は例文「排気ガス
を現在の半分の量に帰省する」に対する処理過程を示す
図である。ｌ・・・形態素解析部、　２・・・文節認定処理部。３・・・係り受け解析部、　４・・・同音異義語抽出部
。５・・・検定格決定部、　６・・・同音異義語検定処理
部、　　７・・・同音異義語訂正候補抽出部。８・・・意味連接辞書検索部、　９・・・日本語単語辞
書、　１０・・・文法辞書、　１１・・・係り受けルー
ル、　１２・・・検定格決定ルール、　１３・・・前方
意味連接辞書、　１４・・・後方意味連接辞書、１５・
・・出力ファイル、　　１６・・・日本文誤り自動検出
・訂正装置本体、　１７・・・同音異義語の字面、１８
・・・同音異義語の読み、　１９・・・同音異義語の品
詞、　２０・・・意味カテゴリ番号の対応の連接可否情
報、　２１・・・各意味カテゴリ番号の連接可否情報を
示すフィールド、　２２・・・意味連接辞書のルーコー
ド、　２３・・・同音異義語の字面、　２４・・・検定
格を示す助詞の字面、２５・・・検定格決定ルールのル
ーコード。弔 ■ 図第３図〕乞ｊ違ヒリトｔ「番８１嗜のフィールド不ＡＡｑす第
４図オｍ仝辷そルールのみ−ｌレトオ露八へ曲已藷詰む２５第５図創４妹造搏辞書の内軸列第６図種曵林４犬宅ルールリ内采仲ｊ第７図（Ｃ）ｒｌてイ４トっ？”Ａ’ｉつは　・・・１：＋：クヨ第７図（４）（ｅ）Fig. 1 is a basic configuration diagram of an embodiment of the present invention, and Fig. 2 is a basic configuration diagram of an embodiment of the present invention.
Figure 3 is a diagram showing an example of the field configuration of a semantic linkage dictionary, Figure 4 is a diagram showing an example of the field configuration of a qualification determination rule, and Figure 5 is a diagram showing the forward direction. Figure 6 is a diagram showing an example of the content of a meaning conjunction dictionary, Figure 6 is a diagram showing an example of the content of a qualification determination rule, and Figure 7 is a diagram showing the processing process for the example sentence ``Reduce the amount of exhaust gas to half the current amount.'' be. l... Morphological analysis unit, 2... Clause recognition processing unit. 3... Dependency analysis section, 4... Homophone extraction section. 5... Test case determination section, 6... Homonym test processing section, 7... Homophone correction candidate extraction section. 8... Semantic conjunction dictionary search unit, 9... Japanese word dictionary, 10... Grammar dictionary, 11... Dependency rule, 12... Certification determination rule, 13... Forward semantic conjunction Dictionary, 14... Backwards meaning conjunction dictionary, 15.
・・Output file, 16・Japanese sentence error automatic detection/correction device main body, 17・font of homophones, 18
...Pronunciation of the homophone, 19...Part of speech of the homophone, 20...Connection availability information for the corresponding semantic category numbers, 21...Field showing connection availability information for each semantic category number, 22 ...Lou code of the semantic conjunction dictionary, 23...Character of homophones, 24...Character of particle indicating certifying case, 25...Lou code of test case determination rule. Condolence ■ Figure Figure 3] Beg for the wrong field ``No. Inner axis row of the dictionary, Figure 6, Seeds Hikibayashi 4 Inutaku Ruri, Naikanakaj Figure 7 (C) rltei4to?”A'iha...1:+:Kuyo Figure 7 ( 4) (e)

Claims

[Claims]

(1) In a device that automatically detects and corrects errors in homophones contained in Japanese sentences created by a Japanese input device such as a Japanese word processor, the font and proficiency determination of homophones is performed. A semantic linkage dictionary that describes whether the case element shown in the rule can be linked with the semantic category of the word, and a semantic linkage dictionary that describes the linkage information with which case element for each homophone. a first means for performing morphological analysis of the input Japanese sentence and dividing the input Japanese sentence into word units; and a first means for dividing the input Japanese sentence into word units by morphologically analyzing the input Japanese sentence; A second means of structuring the input sentence by analyzing the dependency relationships between clauses, and a third means of structuring the input sentence by analyzing the dependency relationships between clauses, and determining whether or not the input Japanese sentence contains homonyms, If the word is a test target, a fourth means is used, and for each test target extracted by the fourth means, the above-mentioned qualification determination rule is searched for the character of the test target. A fifth means for determining a case element (referred to as a test case) to be tested for semantic connection with the test target, and a semantic category of the word of the test case determined by the fifth means and the test target. A sixth means for testing the semantic linkage of by referring to the semantic linkage dictionary, and testing the test target as a homophone error only when there is no semantic linkage relationship between the two. and extracting words (words in a homonymous relationship) that have the same pronunciation and part of speech as the word tested as a homophone error in the sixth means as correction candidates for the homophone error; Replace the correction candidate with the homophone error word corresponding to the correction candidate, and search the semantic connection dictionary for information on whether or not the correction candidate can be connected with the semantic category of the word that has the qualification for the correction candidate. , if a semantic link is possible, the correction candidate is output as a correct answer candidate, and if a semantic link with all correction candidates is not possible, all extracted correction candidates are output as correct answer candidates. 7 means, and an automatic Japanese sentence error detection system characterized by:
correction device.