JPS6394364A

JPS6394364A - Automatic correction device for wrong character in japanese sentence

Info

Publication number: JPS6394364A
Application number: JP61238059A
Authority: JP
Inventors: Shinichiro Takagi; 伸一郎高木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1986-10-08
Filing date: 1986-10-08
Publication date: 1988-04-25
Anticipated expiration: 2010-01-30
Also published as: JPH077414B2

Abstract

PURPOSE:To attain the automatic correction of a wrong word, by generating a Japanese sentence correction candidate character dictionary, and a character concatenation probability dictionary by using the same kink of document in which no wrong character is included in advance. CONSTITUTION:For an input Japanese sentence data base 3 that is a read result in an input device 1, a wrong character including area is detected by a wrong character detecting part 6 which performs a morphemic analysis such as the extraction of a word candidate, and the qualification of the connection of a part of speech, and furthermore, the position of the wrong character is detected by using the character concatenation probability dictionary 8, and after that, the Japanese sentence correction candidate dictionary 10 is indexed setting a peripheral character other than the character at the position of the wrong character as a key, and a correction candidate character is extracted. Furthermore, the orthographic string probability of a tentative character string in which as correction candidate is inserted at the position of the wrong character in order is found by using the character concatenation dictionary 8, and the correction candidate is squeezed by performing ranking and cutting, and then, the morphemic analysis is applied again on the tentative character string of a clause level inserted to an original sentence for each correction candidate, and grammatical checking is performed, thereby, a grammatically true correction candidate can be selected.

Description

【発明の詳細な説明】（発明の属する技術分野）本発明は、日本語文書データベース作成のため、入力装
置から読み込まれた漢字かな混じりの日本文文字列に含
まれる誤字の自動訂正を行うために、抽出された候補文
字群から正解候補を選択し、自動修正を行う日本文誤字
自動修正装置に関するものである。[Detailed description of the invention] (Technical field to which the invention pertains) The present invention is for automatically correcting typographical errors contained in Japanese character strings containing kanji and kana read from an input device in order to create a Japanese document database. The present invention relates to an automatic correcting device for Japanese typographical errors that selects correct candidates from a group of extracted candidate characters and automatically corrects them.

（従来の技術）新聞記事、出版用原稿、科学技術論文等の多量の日本文
文書を電子ファイル化して日本文文書データベースを作
成する場合、これらの読み取り結果に混入する棄却文字
や誤読文字、誤字は単語辞書および文法辞書を用いた形
態素解析や修正者によるチェックによって検出されるが
、その修正や自動訂正を実施するためには、正解候補の
含有率の高い候補抽出と、形態素解析に基づいた厳格な
日本文チェックが必要である。(Conventional technology) When creating a Japanese document database by converting a large amount of Japanese documents such as newspaper articles, publication manuscripts, and scientific and technical papers into electronic files, it is necessary to avoid rejecting characters, misread characters, and typos that are mixed into the reading results. are detected by morphological analysis using a word dictionary and grammar dictionary and checking by a corrector, but in order to correct or automatically correct them, it is necessary to extract candidates with a high percentage of correct candidates and to perform a check based on morphological analysis. Strict Japanese language checking is required.

従来の候補抽出および自動修正の例としては。Examples of traditional candidate extraction and automatic correction are:

英語におけるスペルチェック及びその補正を論じた、Ｊ
、　Ｌ、　Ｐａｔｅｒｓｏｎ「“Ｌｅｃｔｕｒｅ　Ｎｏｔｅｓ　ｃｏｍｐｕｔｅｒ　
５ｃｉｅｎｃｅ　Ｖｏｌ、　９６Ｃｏｍｐｕｔｅｒ　　
Ｐｒｏｇｒａｍｓ　　Ｆｏｒ　　Ｓｐｅｌｌｉｎｇ　　
Ｃｏｒｒｅｃｔｉｏｎ”、Ｓｐｒｉｎｇｅｒ−Ｖｅｒｌ
ｏｇ社刊１９８０年発行」に述べられるように、 ■　正しいスペルと誤り易いスペルを対で辞書に保持し
、誤りスペルを検出した際に正しいスペルに置換する方
法。Discussing spell checking and its correction in English, J.
, L. Paterson ““Lecture Notes computer
5science Vol, 96Computer
Programs For Spelling
Correction”, Springer-Verl
As described in "Published by Ogsha, 1980", ■ A method of storing correct spellings and incorrect spellings in pairs in a dictionary, and replacing incorrect spellings with correct spellings when detected.

■　英語単語に対して予め正しいスペルから統計的に２
文字置換、１文字挿入、１文字欠如、１文字誤りのルー
ルを適用した誤りスペルリストを生成し、この誤りスペ
ルで検出された場合に、正しいスペルを抽出、修正する
方法がある。■ Statistically 2 from the correct spelling of English words
There is a method of generating an incorrect spelling list applying the rules of character replacement, one character insertion, one character missing, and one character error, and extracting and correcting the correct spelling when the incorrect spelling is detected.

このような正誤両パターンを辞書化して誤字を検出自動
修正を行う方法は特開昭６１−１７８７号公報に示され
ている“文章の異常検査修正装置″で述べられている。A method of detecting and automatically correcting typographical errors by converting both correct and incorrect patterns into a dictionary is described in ``Sentence Abnormality Inspection and Correction Apparatus'' disclosed in Japanese Patent Laid-Open No. 1787-1987.

しかし、これらの従来の方法では次の問題点がある。However, these conventional methods have the following problems.

■　字種数が多く、分かち書きされない日本文文書では
、誤字を検出するために用いる誤り文字列パターンが膨
大となるうえ、文節境界に発生する誤字に対して単語、
文節境界を越えた候補抽出が困難である。■ In Japanese documents that have many character types and are not separated, the number of incorrect character string patterns used to detect typographical errors is enormous.
It is difficult to extract candidates across clause boundaries.

■　入力装置の誤り特性が不明の場合、候補抽出、修正
が困難である。■ If the error characteristics of the input device are unknown, it is difficult to extract candidates and correct them.

■　形態素解析による訂正候補選択を行う際、多数の訂
正候補が抽出されると処理負荷が大となる。また人手修
正であっても文法的に明らかに誤りである修正候補が多
数含まれるため処理負荷となる。■ When selecting correction candidates by morphological analysis, the processing load increases if a large number of correction candidates are extracted. Furthermore, even if the correction is done manually, it will result in a processing load because it includes many correction candidates that are clearly grammatically incorrect.

（発明の目的）本発明の目的は、予め誤字を含まない同種の大量の文書
を用いて日本文訂正候補文字辞書と文字連接確率辞書を
作成して、誤字が検出された場合、日本文訂正候補文字
辞書を用いた候補抽出およびこれらの各訂正候補によっ
て作られる仮文字列の正字列確率を文字連接確率辞書を
用いて算出して候補を絞り込むことによって、字種数１
文節境界、誤字数、入力装置の誤り特性に依存しない候
補抽出及び処理性能の高い自動修正を行う日本文誤字自
動修正装置を提供することにある。(Objective of the Invention) The object of the present invention is to create a Japanese sentence correction candidate character dictionary and a character concatenation probability dictionary using a large number of documents of the same type that do not contain typographical errors in advance, and when a typographical error is detected, to correct the Japanese sentence. By extracting candidates using a candidate character dictionary and narrowing down the candidates by calculating the correct character string probability of the temporary character string created by each of these correction candidates using a character concatenation probability dictionary, the number of character types is 1.
An object of the present invention is to provide an automatic correction device for spelling errors in Japanese characters, which performs candidate extraction and automatic correction with high processing performance independent of bunsetsu boundaries, number of typographical errors, and error characteristics of an input device.

（発明の構成）（発明の特徴と従来の技術との差異）本発明は、誤字自動修正の対象となる文書と同種の誤字
を含まない大量の文書を用いて抽出されるＮ文字の文字
列あるいはこれらから選択した特定のＮ文字、Ｎ−１文
字、・・・、２文字の文字列パターンを抽出し、これら
の各文字列における第ｉ番目（ｉ＝１・・・Ｎ）の文字
以外の文字列パターンが等しい場合に第ｉ番目の文字を
訂正候補文字として収集した日本文訂正候補文字辞書と
、同じく抽出されるＮ文字の文字列パターンの出現頻度
情報に基づいて、予め算定した各Ｎ文字の文字連接確率
辞書をそれぞれ予め作成しておき、入力装置から入力さ
れた入力日本文データベースに含まれる誤字を、単語辞
書および文法辞書を用いた形態素解析によって検出した
場合、誤字の前後の周辺の文字列パターンをキーとして
日本文訂正候補文字辞書を索引して訂正候補文字を抽出
し、各訂正候補文字を原文中の誤字位置に挿入した仮文
字列の正字列確率を文字連接確率辞書により算定して訂
正候補文字の順位付け、足切りによる絞り込みを行い、
絞り込まれた訂正候補を上位から形態素解析による誤字
検出処理によって正しい候補を選択し、自動修正するこ
とを特徴とする。(Structure of the Invention) (Characteristics of the Invention and Differences from the Prior Art) The present invention provides a character string of N characters that is extracted using a large number of documents that do not contain the same type of typographical errors as the document that is the target of automatic typographical correction. Alternatively, extract a character string pattern of specific N characters, N-1 characters, ..., 2 characters selected from these, and extract characters other than the i-th (i = 1 ... N) character in each of these character strings. Based on the Japanese correction candidate character dictionary in which the i-th character is collected as a correction candidate character when the character string patterns of If a character conjunctive probability dictionary for N characters is created in advance, and a typographical error contained in the input Japanese sentence database input from the input device is detected by morphological analysis using a word dictionary and a grammar dictionary, the characters before and after the typographical error are The Japanese sentence correction candidate character dictionary is indexed using surrounding character string patterns as keys, correction candidate characters are extracted, and each correction candidate character is inserted into the incorrect position in the original text.The correct character string probability of the temporary character string is calculated as a character concatenation probability dictionary. Calculate and rank correction candidate characters, narrow down by cutting,
It is characterized by selecting correct candidates from the narrowed-down correction candidates through typographical error detection processing using morphological analysis and automatically correcting them.

従来の技術とは、 ■　日本文訂正候補文字辞書による候補抽出を行うので
、文書の字種数、分かち書きの有無、入力装置の誤り特
性に依存しない候補抽出が可能。The conventional technology is: ■ Candidate extraction is performed using a Japanese sentence correction candidate character dictionary, so it is possible to extract candidates that are independent of the number of character types in the document, the presence or absence of separation, and the error characteristics of the input device.

■　抽出された候補を文字連接確率辞書の適用により絞
り込むので、精度の高い候補抽出が可能。■ Since the extracted candidates are narrowed down by applying a character concatenation probability dictionary, highly accurate candidate extraction is possible.

■　絞り込まれた候補の文法的チェックを施すことによ
り誤字の自動修正が可能。■ Typos can be automatically corrected by checking the grammar of narrowed down candidates.

■　明らかな文法的誤りの訂正候補を自動的に排除する
ので、人手修正による負荷が小さくて済む。という点が
異なる。■ Since candidates for correction of obvious grammatical errors are automatically eliminated, the burden of manual correction can be reduced. The difference is that.

（実施例）第１図は本発明の基本構成例であり、１は漢字ＯＣＲ，
ベンタッチ、タブレット、キーボード等の入力装置、２
は入力あるいは読み込みを行う入力処理部、３は入力装
置１によって読み込まれ、磁気装置に文字コードの形式
で記録されている読み取り結果の入力日本文データベー
ス、４は単語辞書、５は文法辞書、６は単語辞書４およ
び文法辞書５を用いた形態素解析によって誤字が含まれ
る文節レベルの誤字含有域を抽出する誤字検出部、７は
誤字検出部６で抽出した誤字含有域内から誤字と見なす
文字位置を検出する誤字位置検出部、８は文字連接確率
辞書、９は辞書１０を索引し誤字に対する訂正候補文字
を抽出する訂正候補文字抽出部、ｌＯは日本文訂正候補
文字辞書、１１は辞書１０を索引して各訂正候補の順位
付けおよび候補の足切りを行う訂正候補絞り込み部、１
２は訂正候補を上位より順に誤字位置に挿入した仮文字
列を形態素解析を行ってチェックする訂正候補チェック
部、１３は訂正候補チェック部１２の文法的なチェック
で認定された訂正候補を修正者によって選択する訂正候
補選択部、１４は修正用端末、１５は誤字救済された日
本文文書データベース、１６はＣＰＵ／メモリから成る
処理装置である。(Example) Fig. 1 shows an example of the basic configuration of the present invention, and 1 is a kanji OCR,
Input devices such as Bentouch, tablet, keyboard, etc., 2
3 is an input processing unit that performs input or reading; 3 is an input Japanese database of reading results read by the input device 1 and recorded in the form of character codes in the magnetic device; 4 is a word dictionary; 5 is a grammar dictionary; 6 7 is a typo detection unit that extracts a typo-containing area at the bunsetsu level containing a typo by morphological analysis using the word dictionary 4 and grammar dictionary 5; 8 is a character concatenation probability dictionary; 9 is a correction candidate character extraction unit that indexes the dictionary 10 and extracts correction candidate characters for the typo; lO is a Japanese sentence correction candidate character dictionary; 11 is an index for the dictionary 10. a correction candidate narrowing section that ranks each correction candidate and cuts the candidates;
Reference numeral 2 refers to a correction candidate checking unit that performs morphological analysis to check the provisional character string in which correction candidates are inserted at the misspelled positions in order from the top, and 13 refers to a corrector that uses correction candidates that have been certified by the grammatical check of the correction candidate checker 12. 14 is a correction terminal, 15 is a database of Japanese documents in which typographical errors have been corrected, and 16 is a processing device consisting of a CPU/memory.

この方式では、入力装置１で読み込んだ読み取り結果で
ある入力日本文データベース３に対して、単語候補抽出
、品詞接続検定等の形態素解析を行う誤字検出部６によ
って誤字含有域を検出し、さらに文字連接確率辞書８を
用いて誤字位置を検出した後、誤字位置以外の周辺の文
字をキーとして日本文訂正候補辞書１０を索引すること
によって訂正候補文字を抽出する。In this method, a typographical error detection unit 6 performs morphological analysis such as word candidate extraction and part-of-speech connection test on an input Japanese sentence database 3 that is the reading result read by the input device 1, and detects areas containing typographical errors. After detecting a typographical error position using the conjunction probability dictionary 8, correction candidate characters are extracted by indexing the Japanese sentence correction candidate dictionary 10 using surrounding characters other than the typographical error position as keys.

さらに、文字連接確率辞書８を用いて、誤字位置に訂正
候補を順に挿入した仮文字列の正字列確率を求め、順位
付は並びに足切りを行って訂正候補を絞り込んだ後で、
各訂正候補について原文に挿入した文節レベルの仮文字
列を再度形態素解析を行って、文法的なチェックを実施
し、文法的に正しい訂正候補を選択する。Furthermore, using the character concatenation probability dictionary 8, calculate the correct character string probability of the temporary character string in which correction candidates are inserted in order at the error positions, and after narrowing down the correction candidates by ranking them,
For each correction candidate, the clause-level temporary character string inserted into the original text is morphologically analyzed again, a grammatical check is performed, and a grammatically correct correction candidate is selected.

この時点で訂正候補が１つに絞り込まれると、候補を誤
字と置換して自動修正に成功する。しかし、依然複数の
時は修正用端末１４から修正者が候補を選択する。If the correction candidates are narrowed down to one at this point, the candidate is replaced with the typographical error and automatic correction is successful. However, if there are still a plurality of candidates, the corrector selects a candidate from the correction terminal 14.

第２図は、第１図の基本構成例において誤字を検出した
後の訂正候補抽出および自動訂正において、誤字が２個
ある場合の実施例である。FIG. 2 shows an example in which there are two typographical errors in the extraction of correction candidates and automatic correction after detecting typographical errors in the basic configuration example of FIG. 1.

本例で、１７は誤字が検出された誤字含有域、１８は誤
字、１９は正解文字、２０は誤字位置検出部７で文字連
接確率を用いて検出した誤字位置、２１は誤字位置候補
のに対する訂正候補、２２は訂正候補２１から文字連接
確率により絞り込んだ訂正候補、２３゜２４は誤字位置
候補■に対する訂正候補および絞り込んだ訂正候補、２
５は訂正候補絞り込み部１１で順位付けした訂正候補列
、２６は候補順番、２７は訂正候補列２５のうちの候補
順番上位２候補、２８は訂正候補チェック後に自動修正
した文字列である。In this example, 17 is a typo-containing area where a typo was detected, 18 is a typo, 19 is a correct character, 20 is a typo position detected by the typo position detection unit 7 using character concatenation probability, and 21 is a target for a candidate for a typo position. Correction candidates, 22 are correction candidates narrowed down from the correction candidates 21 based on character concatenation probability, 23 and 24 are correction candidates for the typographical position candidate ■ and narrowed down correction candidates, 2
5 is a correction candidate string ranked by the correction candidate narrowing section 11, 26 is a candidate order, 27 is a top two candidates in candidate order among the correction candidate string 25, and 28 is a character string automatically corrected after checking correction candidates.

本例では、誤字含有域１７を文字連接確率辞書８を用い
た誤字位置を検出して複数の誤字候補を抽出した後、各
誤字に対し訂正候補抽出を行う。In this example, a plurality of typo candidates are extracted by detecting typo positions in the typo-containing area 17 using the character concatenation probability dictionary 8, and then correction candidates are extracted for each typo.

次に、各誤字位置の各訂正候補を順に原文文字列に挿入
し、文字列の文字連接確率を算出して低確率の候補の足
切りを行い、訂正候補２２．２４を得る、さらに順位付
けを行い訂正候補列２５を作成する。Next, each correction candidate at each typo position is inserted into the original string in order, the character concatenation probability of the string is calculated, and candidates with low probability are cut off to obtain correction candidates 22 and 24, and further ranked. Then, a correction candidate sequence 25 is created.

これらの訂正候補について再度、形態素解析を行って各
訂正候補の文法的チェックを行った後、候補数１のとき
は自動訂正する。After performing morphological analysis on these correction candidates again and performing a grammatical check on each correction candidate, automatic correction is performed when the number of candidates is 1.

しかし、候補数が２個以上の場合でそのうち候補順番の
上位２個が１つの誤字位置のときはその位置に上位１位
の訂正候補を埋め込み、また候補順番の上位２個が別々
の誤字位置のときは２文字誤りとして仮定し、各誤字位
置に両候補を埋め込む。However, if there are two or more candidates and the top two candidates in the candidate order are one typo position, the top one correction candidate is embedded in that position, and the top two candidates in the candidate order are separate typo positions. In this case, it is assumed that there is a two-character error, and both candidates are embedded in each error position.

ただし、訂正候補列２５を文法チェックした時点で３個
以上の複数が残る場合、候補順番の上位１位の訂正候補
を選択せずに、訂正候補選択部において修正用端末１４
からこれらの訂正候補を修正者が選択する手段および訂
正候補自動抽出に失敗した場合に修正者が修正を行う手
段を備えている。However, if three or more plurals remain after checking the grammar of the correction candidate column 25, the correction candidate selection unit selects the correction candidate from the correction terminal 14 without selecting the correction candidate with the highest rank in the candidate order.
The corrector is provided with means for the corrector to select these correction candidates from among them, and means for the corrector to make corrections when automatic correction candidate extraction fails.

このような構成および作用となっているから、従来の技
術に比べて、字種数、分かち書きの有無、誤字数、入力
装置の誤り特性に依存しない候補抽出が可能である譬か
、抽出した訂正候補を文字連接確率により絞り込み、こ
れらを対象に文法チェックを行うので候補の正解率が高
く、自動訂正可能となり、また処理に要する時間を削減
できる。Because of this structure and operation, compared to conventional techniques, it is possible to extract candidates that are independent of the number of character types, the presence or absence of separation, the number of typographical errors, and the error characteristics of the input device. Candidates are narrowed down based on character concatenation probabilities and a grammar check is performed on these candidates, so the accuracy rate of candidates is high, automatic correction is possible, and the time required for processing can be reduced.

さらに文法チェックにより訂正候補を絞り込むので、人
手による候補選択を行う場合でも負荷の軽減を図ること
ができるという改善があった。Furthermore, since correction candidates are narrowed down by grammar checking, there has been an improvement in that the load can be reduced even when candidates are selected manually.

（発明の効果）以上説明したように、誤字自動訂正の対象とする日本文
文書と同種の誤字を含まない大量の文書を用いて、抽出
されるＮ文字の文字列あるいはこれらから選択した特定
のＮ文字、Ｎ−１文字、・・・。(Effect of the invention) As explained above, by using a large number of documents that do not contain the same type of typographical errors as the Japanese document targeted for automatic typographical error correction, a character string of N characters to be extracted or a specific character string selected from these can be extracted. N characters, N-1 characters,...

２文字の文字列を抽出し、これを用いて訂正候補文字を
抽出する日本文訂正候補文字辞書および、そのＮ文字の
出現頻度に基づいて算定したＮ文字の文字連接確率辞書
をそれぞれ予め作成して、入力装置に読み込まれた入力
日本文データベース内の誤字を形態素解析によって検出
した場合、前記訂正候補文字辞書による訂正候補抽出お
よび文字連接確率辞書による候補の順位付けと足切りに
よる絞り込みを行い、これに対して、形態素解析による
文法チェックを施して自動修正を行うのであるから、 ■　字種数、分かち書きの有無、誤字数、入力装置の誤
り特性に依存しない候補抽出、絞り込みによる精度の高
い候補抽出が可能。A Japanese sentence correction candidate character dictionary that extracts a two-character character string and uses it to extract correction candidate characters, and a character conjunctive probability dictionary of N characters calculated based on the appearance frequency of the N characters are created in advance. When a typographical error in the input Japanese sentence database read into the input device is detected by morphological analysis, correction candidates are extracted using the correction candidate character dictionary, candidates are ranked using the character concatenation probability dictionary, and the candidates are narrowed down by cutting, On the other hand, automatic correction is performed by checking the grammar using morphological analysis. ■ Candidates with high accuracy are extracted and narrowed down without depending on the number of character types, the presence or absence of separation, the number of typos, and the error characteristics of the input device. Extraction possible.

■　絞り込まれた少数の候補に対する文法的チェックを
施すことにより、誤字の自動修正が可能。■ By performing a grammatical check on a small number of narrowed down candidates, it is possible to automatically correct typos.

■　文法的誤りの訂正候補を自動的に排除するので１人
手修正においても処理負荷を削減できる。■ Since correction candidates for grammatical errors are automatically eliminated, the processing load can be reduced even when correcting by one person.

という利点がある。There is an advantage.

[Brief explanation of the drawing]

第１図は本発明の基本構成例、第２図は誤字における訂
正候補抽出および自動訂正実施例である。１　・・・入力装置、　２・・・入力処理部、３　・・
・入力日本文データベース、４　・・・単語辞書、　５・・・文法辞書、６　・・・
誤字検出部、７　・・・誤字位置検出部。８　・・・文字連接確率辞書、９　・・・訂正候補文字抽出部、１０・・・日本文訂正候補文字辞書、１１・・・訂正候補絞り込み部。１２・・・訂正候補チェック部、１３・・・訂正候補選択部、１４・・・修正用端末、１５・・・誤字救済された日本文文書データベース、１６・・・処理装置、　１７・・・誤字含有域、１８・
・・誤字、　１９・・・正解文字、２０・・・誤字位置
、２１・・・誤字位置候補■に対する訂正候補、２２・・
・２１から絞り込んだ訂正候補、２３・・・誤字位置候
補■に対する訂正候補、２４・・・２３から絞り込んだ
訂正候補、２５・・・順位付けした訂正候補列、２６・・・候補順番、２７・・・２５における上位２候補、２８・・・訂正候補チェック後自動修正した文字列。第１図１７・・ａグ含菊賊　旧識５１９・工厭Ｌ８　　　　２０　　狭筈位置２Ｉ・・・　
１町シ弧１東泗ビ■１く　ナイ事るｊ１五爾芝渉町２４
２３り・６秋うＱＬｆｊ！丁ｔ、侯＾ｈ“２５　順目１
キヴいｊｅｔ恢桶°ず・１２６・狭硝噴捨FIG. 1 shows an example of the basic configuration of the present invention, and FIG. 2 shows an example of extracting correction candidates and automatically correcting typographical errors. 1...Input device, 2...Input processing unit, 3...
・Input Japanese sentence database, 4...word dictionary, 5...grammar dictionary, 6...
Misprint detection section, 7... Misprint position detection section. 8...Character concatenation probability dictionary, 9...Correction candidate character extraction unit, 10...Japanese sentence correction candidate character dictionary, 11...Correction candidate narrowing down unit. 12...Correction candidate checking section, 13...Correction candidate selection section, 14...Correction terminal, 15...Japanese document database from which typographical errors have been corrected, 16...Processing device, 17... Area containing typographical errors, 18・
... Misprint, 19... Correct character, 20... Misprint position, 21... Correction candidate for misprint position candidate ■, 22...
・Correction candidates narrowed down from 21, 23... Correction candidates for the typographical error position candidate ■, 24... Correction candidates narrowed down from 23, 25... Ranked correction candidate sequence, 26... Candidate order, 27 ...Top 2 candidates in 25, 28...Character string automatically corrected after checking correction candidates. Figure 1 17... agu including chrysanthemum bandits Old knowledge 5 19. Engineering center L8 20 Narrow position 2I...
1 town arc 1 Higashibi ■ 1 Ku Nai Kotoru j1 Gojishiba Watarucho 24
23ri・6autumn QLfj! Ding t, Hou^h “25 Order 1
Kivuijet Kyouoke°zu・126・Sanosaifusute

Claims

[Scope of Claims] For characters that are rejected or misspelled characters caused by input errors or character recognition errors in Japanese sentences in a Japanese document database input from a character input device, misspellings are included by morphological analysis using a word dictionary and a grammar dictionary. A typographical error detection unit extracts a typographical error area at the bunsetsu level, and a typographical position detection unit extracts a character position that is considered to be a typographical error based on the probability of inter-character conjunction from this typographical error area. A character string of N characters extracted using a document that does not contain typographical errors, or a specific N character selected from these, N-1 character,..., the i-th character string (i = 1... - A Japanese correction candidate character dictionary that collects the i-th character when the patterns other than the character N) are the same as a correction candidate character, and the characters around the typo position extracted by the typo position detection unit as keys. A correction candidate character extraction unit that indexes the Japanese sentence correction candidate character dictionary and extracts correction candidate characters for misspellings, and a character concatenation of each N character calculated in advance based on appearance frequency information regarding the pattern of N characters extracted in advance. a character conjunction probability dictionary that holds probability information using each N character as a key; a correction candidate narrowing section that uses the character conjunction probability dictionary to rank and narrow down correction candidates extracted by a correction candidate character extraction section; A correction candidate checking section that inserts corrected correction candidates into the original character string and performs a grammatical check by morphological analysis, and a correction candidate selection section that allows a corrector to select correct characters from the correction candidates that have undergone the grammatical check. The automatic sentence error correction device includes means for automatically correcting correction candidates by extracting correction candidates for detected typos using the candidate dictionary, narrowing them down using the probability dictionary, and further performing a grammar check using morphological analysis. An automatic correction device for Japanese typographical errors.