JPS6394364A - Automatic correction device for wrong character in japanese sentence - Google Patents

Automatic correction device for wrong character in japanese sentence

Info

Publication number
JPS6394364A
JPS6394364A JP61238059A JP23805986A JPS6394364A JP S6394364 A JPS6394364 A JP S6394364A JP 61238059 A JP61238059 A JP 61238059A JP 23805986 A JP23805986 A JP 23805986A JP S6394364 A JPS6394364 A JP S6394364A
Authority
JP
Japan
Prior art keywords
character
correction
dictionary
candidate
correction candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP61238059A
Other languages
Japanese (ja)
Other versions
JPH077414B2 (en
Inventor
Shinichiro Takagi
伸一郎 高木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP61238059A priority Critical patent/JPH077414B2/en
Publication of JPS6394364A publication Critical patent/JPS6394364A/en
Publication of JPH077414B2 publication Critical patent/JPH077414B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To attain the automatic correction of a wrong word, by generating a Japanese sentence correction candidate character dictionary, and a character concatenation probability dictionary by using the same kink of document in which no wrong character is included in advance. CONSTITUTION:For an input Japanese sentence data base 3 that is a read result in an input device 1, a wrong character including area is detected by a wrong character detecting part 6 which performs a morphemic analysis such as the extraction of a word candidate, and the qualification of the connection of a part of speech, and furthermore, the position of the wrong character is detected by using the character concatenation probability dictionary 8, and after that, the Japanese sentence correction candidate dictionary 10 is indexed setting a peripheral character other than the character at the position of the wrong character as a key, and a correction candidate character is extracted. Furthermore, the orthographic string probability of a tentative character string in which as correction candidate is inserted at the position of the wrong character in order is found by using the character concatenation dictionary 8, and the correction candidate is squeezed by performing ranking and cutting, and then, the morphemic analysis is applied again on the tentative character string of a clause level inserted to an original sentence for each correction candidate, and grammatical checking is performed, thereby, a grammatically true correction candidate can be selected.

Description

【発明の詳細な説明】 (発明の属する技術分野) 本発明は、日本語文書データベース作成のため、入力装
置から読み込まれた漢字かな混じりの日本文文字列に含
まれる誤字の自動訂正を行うために、抽出された候補文
字群から正解候補を選択し、自動修正を行う日本文誤字
自動修正装置に関するものである。
[Detailed description of the invention] (Technical field to which the invention pertains) The present invention is for automatically correcting typographical errors contained in Japanese character strings containing kanji and kana read from an input device in order to create a Japanese document database. The present invention relates to an automatic correcting device for Japanese typographical errors that selects correct candidates from a group of extracted candidate characters and automatically corrects them.

(従来の技術) 新聞記事、出版用原稿、科学技術論文等の多量の日本文
文書を電子ファイル化して日本文文書データベースを作
成する場合、これらの読み取り結果に混入する棄却文字
や誤読文字、誤字は単語辞書および文法辞書を用いた形
態素解析や修正者によるチェックによって検出されるが
、その修正や自動訂正を実施するためには、正解候補の
含有率の高い候補抽出と、形態素解析に基づいた厳格な
日本文チェックが必要である。
(Conventional technology) When creating a Japanese document database by converting a large amount of Japanese documents such as newspaper articles, publication manuscripts, and scientific and technical papers into electronic files, it is necessary to avoid rejecting characters, misread characters, and typos that are mixed into the reading results. are detected by morphological analysis using a word dictionary and grammar dictionary and checking by a corrector, but in order to correct or automatically correct them, it is necessary to extract candidates with a high percentage of correct candidates and to perform a check based on morphological analysis. Strict Japanese language checking is required.

従来の候補抽出および自動修正の例としては。Examples of traditional candidate extraction and automatic correction are:

英語におけるスペルチェック及びその補正を論じた、J
、 L、 Paterson 「“Lecture Notes computer 
5cience Vol、 96Computer  
Programs  For  Spelling  
Correction”、Springer−Verl
og社刊1980年発行」に述べられるように、 ■ 正しいスペルと誤り易いスペルを対で辞書に保持し
、誤りスペルを検出した際に正しいスペルに置換する方
法。
Discussing spell checking and its correction in English, J.
, L. Paterson ““Lecture Notes computer
5science Vol, 96Computer
Programs For Spelling
Correction”, Springer-Verl
As described in "Published by Ogsha, 1980", ■ A method of storing correct spellings and incorrect spellings in pairs in a dictionary, and replacing incorrect spellings with correct spellings when detected.

■ 英語単語に対して予め正しいスペルから統計的に2
文字置換、1文字挿入、1文字欠如、1文字誤りのルー
ルを適用した誤りスペルリストを生成し、この誤りスペ
ルで検出された場合に、正しいスペルを抽出、修正する
方法がある。
■ Statistically 2 from the correct spelling of English words
There is a method of generating an incorrect spelling list applying the rules of character replacement, one character insertion, one character missing, and one character error, and extracting and correcting the correct spelling when the incorrect spelling is detected.

このような正誤両パターンを辞書化して誤字を検出自動
修正を行う方法は特開昭61−1787号公報に示され
ている“文章の異常検査修正装置″で述べられている。
A method of detecting and automatically correcting typographical errors by converting both correct and incorrect patterns into a dictionary is described in ``Sentence Abnormality Inspection and Correction Apparatus'' disclosed in Japanese Patent Laid-Open No. 1787-1987.

しかし、これらの従来の方法では次の問題点がある。However, these conventional methods have the following problems.

■ 字種数が多く、分かち書きされない日本文文書では
、誤字を検出するために用いる誤り文字列パターンが膨
大となるうえ、文節境界に発生する誤字に対して単語、
文節境界を越えた候補抽出が困難である。
■ In Japanese documents that have many character types and are not separated, the number of incorrect character string patterns used to detect typographical errors is enormous.
It is difficult to extract candidates across clause boundaries.

■ 入力装置の誤り特性が不明の場合、候補抽出、修正
が困難である。
■ If the error characteristics of the input device are unknown, it is difficult to extract candidates and correct them.

■ 形態素解析による訂正候補選択を行う際、多数の訂
正候補が抽出されると処理負荷が大となる。また人手修
正であっても文法的に明らかに誤りである修正候補が多
数含まれるため処理負荷となる。
■ When selecting correction candidates by morphological analysis, the processing load increases if a large number of correction candidates are extracted. Furthermore, even if the correction is done manually, it will result in a processing load because it includes many correction candidates that are clearly grammatically incorrect.

(発明の目的) 本発明の目的は、予め誤字を含まない同種の大量の文書
を用いて日本文訂正候補文字辞書と文字連接確率辞書を
作成して、誤字が検出された場合、日本文訂正候補文字
辞書を用いた候補抽出およびこれらの各訂正候補によっ
て作られる仮文字列の正字列確率を文字連接確率辞書を
用いて算出して候補を絞り込むことによって、字種数1
文節境界、誤字数、入力装置の誤り特性に依存しない候
補抽出及び処理性能の高い自動修正を行う日本文誤字自
動修正装置を提供することにある。
(Objective of the Invention) The object of the present invention is to create a Japanese sentence correction candidate character dictionary and a character concatenation probability dictionary using a large number of documents of the same type that do not contain typographical errors in advance, and when a typographical error is detected, to correct the Japanese sentence. By extracting candidates using a candidate character dictionary and narrowing down the candidates by calculating the correct character string probability of the temporary character string created by each of these correction candidates using a character concatenation probability dictionary, the number of character types is 1.
An object of the present invention is to provide an automatic correction device for spelling errors in Japanese characters, which performs candidate extraction and automatic correction with high processing performance independent of bunsetsu boundaries, number of typographical errors, and error characteristics of an input device.

(発明の構成) (発明の特徴と従来の技術との差異) 本発明は、誤字自動修正の対象となる文書と同種の誤字
を含まない大量の文書を用いて抽出されるN文字の文字
列あるいはこれらから選択した特定のN文字、N−1文
字、・・・、2文字の文字列パターンを抽出し、これら
の各文字列における第i番目(i=1・・・N)の文字
以外の文字列パターンが等しい場合に第i番目の文字を
訂正候補文字として収集した日本文訂正候補文字辞書と
、同じく抽出されるN文字の文字列パターンの出現頻度
情報に基づいて、予め算定した各N文字の文字連接確率
辞書をそれぞれ予め作成しておき、入力装置から入力さ
れた入力日本文データベースに含まれる誤字を、単語辞
書および文法辞書を用いた形態素解析によって検出した
場合、誤字の前後の周辺の文字列パターンをキーとして
日本文訂正候補文字辞書を索引して訂正候補文字を抽出
し、各訂正候補文字を原文中の誤字位置に挿入した仮文
字列の正字列確率を文字連接確率辞書により算定して訂
正候補文字の順位付け、足切りによる絞り込みを行い、
絞り込まれた訂正候補を上位から形態素解析による誤字
検出処理によって正しい候補を選択し、自動修正するこ
とを特徴とする。
(Structure of the Invention) (Characteristics of the Invention and Differences from the Prior Art) The present invention provides a character string of N characters that is extracted using a large number of documents that do not contain the same type of typographical errors as the document that is the target of automatic typographical correction. Alternatively, extract a character string pattern of specific N characters, N-1 characters, ..., 2 characters selected from these, and extract characters other than the i-th (i = 1 ... N) character in each of these character strings. Based on the Japanese correction candidate character dictionary in which the i-th character is collected as a correction candidate character when the character string patterns of If a character conjunctive probability dictionary for N characters is created in advance, and a typographical error contained in the input Japanese sentence database input from the input device is detected by morphological analysis using a word dictionary and a grammar dictionary, the characters before and after the typographical error are The Japanese sentence correction candidate character dictionary is indexed using surrounding character string patterns as keys, correction candidate characters are extracted, and each correction candidate character is inserted into the incorrect position in the original text.The correct character string probability of the temporary character string is calculated as a character concatenation probability dictionary. Calculate and rank correction candidate characters, narrow down by cutting,
It is characterized by selecting correct candidates from the narrowed-down correction candidates through typographical error detection processing using morphological analysis and automatically correcting them.

従来の技術とは、 ■ 日本文訂正候補文字辞書による候補抽出を行うので
、文書の字種数、分かち書きの有無、入力装置の誤り特
性に依存しない候補抽出が可能。
The conventional technology is: ■ Candidate extraction is performed using a Japanese sentence correction candidate character dictionary, so it is possible to extract candidates that are independent of the number of character types in the document, the presence or absence of separation, and the error characteristics of the input device.

■ 抽出された候補を文字連接確率辞書の適用により絞
り込むので、精度の高い候補抽出が可能。
■ Since the extracted candidates are narrowed down by applying a character concatenation probability dictionary, highly accurate candidate extraction is possible.

■ 絞り込まれた候補の文法的チェックを施すことによ
り誤字の自動修正が可能。
■ Typos can be automatically corrected by checking the grammar of narrowed down candidates.

■ 明らかな文法的誤りの訂正候補を自動的に排除する
ので、人手修正による負荷が小さくて済む。という点が
異なる。
■ Since candidates for correction of obvious grammatical errors are automatically eliminated, the burden of manual correction can be reduced. The difference is that.

(実施例) 第1図は本発明の基本構成例であり、1は漢字OCR,
ベンタッチ、タブレット、キーボード等の入力装置、2
は入力あるいは読み込みを行う入力処理部、3は入力装
置1によって読み込まれ、磁気装置に文字コードの形式
で記録されている読み取り結果の入力日本文データベー
ス、4は単語辞書、5は文法辞書、6は単語辞書4およ
び文法辞書5を用いた形態素解析によって誤字が含まれ
る文節レベルの誤字含有域を抽出する誤字検出部、7は
誤字検出部6で抽出した誤字含有域内から誤字と見なす
文字位置を検出する誤字位置検出部、8は文字連接確率
辞書、9は辞書10を索引し誤字に対する訂正候補文字
を抽出する訂正候補文字抽出部、lOは日本文訂正候補
文字辞書、11は辞書10を索引して各訂正候補の順位
付けおよび候補の足切りを行う訂正候補絞り込み部、1
2は訂正候補を上位より順に誤字位置に挿入した仮文字
列を形態素解析を行ってチェックする訂正候補チェック
部、13は訂正候補チェック部12の文法的なチェック
で認定された訂正候補を修正者によって選択する訂正候
補選択部、14は修正用端末、15は誤字救済された日
本文文書データベース、16はCPU/メモリから成る
処理装置である。
(Example) Fig. 1 shows an example of the basic configuration of the present invention, and 1 is a kanji OCR,
Input devices such as Bentouch, tablet, keyboard, etc., 2
3 is an input processing unit that performs input or reading; 3 is an input Japanese database of reading results read by the input device 1 and recorded in the form of character codes in the magnetic device; 4 is a word dictionary; 5 is a grammar dictionary; 6 7 is a typo detection unit that extracts a typo-containing area at the bunsetsu level containing a typo by morphological analysis using the word dictionary 4 and grammar dictionary 5; 8 is a character concatenation probability dictionary; 9 is a correction candidate character extraction unit that indexes the dictionary 10 and extracts correction candidate characters for the typo; lO is a Japanese sentence correction candidate character dictionary; 11 is an index for the dictionary 10. a correction candidate narrowing section that ranks each correction candidate and cuts the candidates;
Reference numeral 2 refers to a correction candidate checking unit that performs morphological analysis to check the provisional character string in which correction candidates are inserted at the misspelled positions in order from the top, and 13 refers to a corrector that uses correction candidates that have been certified by the grammatical check of the correction candidate checker 12. 14 is a correction terminal, 15 is a database of Japanese documents in which typographical errors have been corrected, and 16 is a processing device consisting of a CPU/memory.

この方式では、入力装置1で読み込んだ読み取り結果で
ある入力日本文データベース3に対して、単語候補抽出
、品詞接続検定等の形態素解析を行う誤字検出部6によ
って誤字含有域を検出し、さらに文字連接確率辞書8を
用いて誤字位置を検出した後、誤字位置以外の周辺の文
字をキーとして日本文訂正候補辞書10を索引すること
によって訂正候補文字を抽出する。
In this method, a typographical error detection unit 6 performs morphological analysis such as word candidate extraction and part-of-speech connection test on an input Japanese sentence database 3 that is the reading result read by the input device 1, and detects areas containing typographical errors. After detecting a typographical error position using the conjunction probability dictionary 8, correction candidate characters are extracted by indexing the Japanese sentence correction candidate dictionary 10 using surrounding characters other than the typographical error position as keys.

さらに、文字連接確率辞書8を用いて、誤字位置に訂正
候補を順に挿入した仮文字列の正字列確率を求め、順位
付は並びに足切りを行って訂正候補を絞り込んだ後で、
各訂正候補について原文に挿入した文節レベルの仮文字
列を再度形態素解析を行って、文法的なチェックを実施
し、文法的に正しい訂正候補を選択する。
Furthermore, using the character concatenation probability dictionary 8, calculate the correct character string probability of the temporary character string in which correction candidates are inserted in order at the error positions, and after narrowing down the correction candidates by ranking them,
For each correction candidate, the clause-level temporary character string inserted into the original text is morphologically analyzed again, a grammatical check is performed, and a grammatically correct correction candidate is selected.

この時点で訂正候補が1つに絞り込まれると、候補を誤
字と置換して自動修正に成功する。しかし、依然複数の
時は修正用端末14から修正者が候補を選択する。
If the correction candidates are narrowed down to one at this point, the candidate is replaced with the typographical error and automatic correction is successful. However, if there are still a plurality of candidates, the corrector selects a candidate from the correction terminal 14.

第2図は、第1図の基本構成例において誤字を検出した
後の訂正候補抽出および自動訂正において、誤字が2個
ある場合の実施例である。
FIG. 2 shows an example in which there are two typographical errors in the extraction of correction candidates and automatic correction after detecting typographical errors in the basic configuration example of FIG. 1.

本例で、17は誤字が検出された誤字含有域、18は誤
字、19は正解文字、20は誤字位置検出部7で文字連
接確率を用いて検出した誤字位置、21は誤字位置候補
のに対する訂正候補、22は訂正候補21から文字連接
確率により絞り込んだ訂正候補、23゜24は誤字位置
候補■に対する訂正候補および絞り込んだ訂正候補、2
5は訂正候補絞り込み部11で順位付けした訂正候補列
、26は候補順番、27は訂正候補列25のうちの候補
順番上位2候補、28は訂正候補チェック後に自動修正
した文字列である。
In this example, 17 is a typo-containing area where a typo was detected, 18 is a typo, 19 is a correct character, 20 is a typo position detected by the typo position detection unit 7 using character concatenation probability, and 21 is a target for a candidate for a typo position. Correction candidates, 22 are correction candidates narrowed down from the correction candidates 21 based on character concatenation probability, 23 and 24 are correction candidates for the typographical position candidate ■ and narrowed down correction candidates, 2
5 is a correction candidate string ranked by the correction candidate narrowing section 11, 26 is a candidate order, 27 is a top two candidates in candidate order among the correction candidate string 25, and 28 is a character string automatically corrected after checking correction candidates.

本例では、誤字含有域17を文字連接確率辞書8を用い
た誤字位置を検出して複数の誤字候補を抽出した後、各
誤字に対し訂正候補抽出を行う。
In this example, a plurality of typo candidates are extracted by detecting typo positions in the typo-containing area 17 using the character concatenation probability dictionary 8, and then correction candidates are extracted for each typo.

次に、各誤字位置の各訂正候補を順に原文文字列に挿入
し、文字列の文字連接確率を算出して低確率の候補の足
切りを行い、訂正候補22.24を得る、さらに順位付
けを行い訂正候補列25を作成する。
Next, each correction candidate at each typo position is inserted into the original string in order, the character concatenation probability of the string is calculated, and candidates with low probability are cut off to obtain correction candidates 22 and 24, and further ranked. Then, a correction candidate sequence 25 is created.

これらの訂正候補について再度、形態素解析を行って各
訂正候補の文法的チェックを行った後、候補数1のとき
は自動訂正する。
After performing morphological analysis on these correction candidates again and performing a grammatical check on each correction candidate, automatic correction is performed when the number of candidates is 1.

しかし、候補数が2個以上の場合でそのうち候補順番の
上位2個が1つの誤字位置のときはその位置に上位1位
の訂正候補を埋め込み、また候補順番の上位2個が別々
の誤字位置のときは2文字誤りとして仮定し、各誤字位
置に両候補を埋め込む。
However, if there are two or more candidates and the top two candidates in the candidate order are one typo position, the top one correction candidate is embedded in that position, and the top two candidates in the candidate order are separate typo positions. In this case, it is assumed that there is a two-character error, and both candidates are embedded in each error position.

ただし、訂正候補列25を文法チェックした時点で3個
以上の複数が残る場合、候補順番の上位1位の訂正候補
を選択せずに、訂正候補選択部において修正用端末14
からこれらの訂正候補を修正者が選択する手段および訂
正候補自動抽出に失敗した場合に修正者が修正を行う手
段を備えている。
However, if three or more plurals remain after checking the grammar of the correction candidate column 25, the correction candidate selection unit selects the correction candidate from the correction terminal 14 without selecting the correction candidate with the highest rank in the candidate order.
The corrector is provided with means for the corrector to select these correction candidates from among them, and means for the corrector to make corrections when automatic correction candidate extraction fails.

このような構成および作用となっているから、従来の技
術に比べて、字種数、分かち書きの有無、誤字数、入力
装置の誤り特性に依存しない候補抽出が可能である譬か
、抽出した訂正候補を文字連接確率により絞り込み、こ
れらを対象に文法チェックを行うので候補の正解率が高
く、自動訂正可能となり、また処理に要する時間を削減
できる。
Because of this structure and operation, compared to conventional techniques, it is possible to extract candidates that are independent of the number of character types, the presence or absence of separation, the number of typographical errors, and the error characteristics of the input device. Candidates are narrowed down based on character concatenation probabilities and a grammar check is performed on these candidates, so the accuracy rate of candidates is high, automatic correction is possible, and the time required for processing can be reduced.

さらに文法チェックにより訂正候補を絞り込むので、人
手による候補選択を行う場合でも負荷の軽減を図ること
ができるという改善があった。
Furthermore, since correction candidates are narrowed down by grammar checking, there has been an improvement in that the load can be reduced even when candidates are selected manually.

(発明の効果) 以上説明したように、誤字自動訂正の対象とする日本文
文書と同種の誤字を含まない大量の文書を用いて、抽出
されるN文字の文字列あるいはこれらから選択した特定
のN文字、N−1文字、・・・。
(Effect of the invention) As explained above, by using a large number of documents that do not contain the same type of typographical errors as the Japanese document targeted for automatic typographical error correction, a character string of N characters to be extracted or a specific character string selected from these can be extracted. N characters, N-1 characters,...

2文字の文字列を抽出し、これを用いて訂正候補文字を
抽出する日本文訂正候補文字辞書および、そのN文字の
出現頻度に基づいて算定したN文字の文字連接確率辞書
をそれぞれ予め作成して、入力装置に読み込まれた入力
日本文データベース内の誤字を形態素解析によって検出
した場合、前記訂正候補文字辞書による訂正候補抽出お
よび文字連接確率辞書による候補の順位付けと足切りに
よる絞り込みを行い、これに対して、形態素解析による
文法チェックを施して自動修正を行うのであるから、 ■ 字種数、分かち書きの有無、誤字数、入力装置の誤
り特性に依存しない候補抽出、絞り込みによる精度の高
い候補抽出が可能。
A Japanese sentence correction candidate character dictionary that extracts a two-character character string and uses it to extract correction candidate characters, and a character conjunctive probability dictionary of N characters calculated based on the appearance frequency of the N characters are created in advance. When a typographical error in the input Japanese sentence database read into the input device is detected by morphological analysis, correction candidates are extracted using the correction candidate character dictionary, candidates are ranked using the character concatenation probability dictionary, and the candidates are narrowed down by cutting, On the other hand, automatic correction is performed by checking the grammar using morphological analysis. ■ Candidates with high accuracy are extracted and narrowed down without depending on the number of character types, the presence or absence of separation, the number of typos, and the error characteristics of the input device. Extraction possible.

■ 絞り込まれた少数の候補に対する文法的チェックを
施すことにより、誤字の自動修正が可能。
■ By performing a grammatical check on a small number of narrowed down candidates, it is possible to automatically correct typos.

■ 文法的誤りの訂正候補を自動的に排除するので1人
手修正においても処理負荷を削減できる。
■ Since correction candidates for grammatical errors are automatically eliminated, the processing load can be reduced even when correcting by one person.

という利点がある。There is an advantage.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の基本構成例、第2図は誤字における訂
正候補抽出および自動訂正実施例である。 1 ・・・入力装置、 2・・・入力処理部、3 ・・
・入力日本文データベース、 4 ・・・単語辞書、 5・・・文法辞書、6 ・・・
誤字検出部、 7 ・・・誤字位置検出部。 8 ・・・文字連接確率辞書、 9 ・・・訂正候補文字抽出部、 10・・・日本文訂正候補文字辞書、 11・・・訂正候補絞り込み部。 12・・・訂正候補チェック部、 13・・・訂正候補選択部、 14・・・修正用端末、 15・・・誤字救済された日本文文書データベース、 16・・・処理装置、 17・・・誤字含有域、18・
・・誤字、 19・・・正解文字、20・・・誤字位置
、 21・・・誤字位置候補■に対する訂正候補、22・・
・21から絞り込んだ訂正候補、23・・・誤字位置候
補■に対する訂正候補、24・・・23から絞り込んだ
訂正候補、25・・・順位付けした訂正候補列、 26・・・候補順番、 27・・・25における上位2候補、 28・・・訂正候補チェック後自動修正した文字列。 第1図 17・・aグ含菊賊 旧識5 19・工厭L8    20  狭筈位置2I・・・ 
1町シ弧1東泗ビ■1く ナイ事るj1五爾芝渉町24
23り・6秋うQLfj!丁t、侯^h“25 順目1
キヴいjet恢桶°ず・126・狭硝噴捨
FIG. 1 shows an example of the basic configuration of the present invention, and FIG. 2 shows an example of extracting correction candidates and automatically correcting typographical errors. 1...Input device, 2...Input processing unit, 3...
・Input Japanese sentence database, 4...word dictionary, 5...grammar dictionary, 6...
Misprint detection section, 7... Misprint position detection section. 8...Character concatenation probability dictionary, 9...Correction candidate character extraction unit, 10...Japanese sentence correction candidate character dictionary, 11...Correction candidate narrowing down unit. 12...Correction candidate checking section, 13...Correction candidate selection section, 14...Correction terminal, 15...Japanese document database from which typographical errors have been corrected, 16...Processing device, 17... Area containing typographical errors, 18・
... Misprint, 19... Correct character, 20... Misprint position, 21... Correction candidate for misprint position candidate ■, 22...
・Correction candidates narrowed down from 21, 23... Correction candidates for the typographical error position candidate ■, 24... Correction candidates narrowed down from 23, 25... Ranked correction candidate sequence, 26... Candidate order, 27 ...Top 2 candidates in 25, 28...Character string automatically corrected after checking correction candidates. Figure 1 17... agu including chrysanthemum bandits Old knowledge 5 19. Engineering center L8 20 Narrow position 2I...
1 town arc 1 Higashibi ■ 1 Ku Nai Kotoru j1 Gojishiba Watarucho 24
23ri・6autumn QLfj! Ding t, Hou^h “25 Order 1
Kivuijet Kyouoke°zu・126・Sanosaifusute

Claims (1)

【特許請求の範囲】 文字入力装置から入力された日本文書データベースにお
ける日本文の入力誤りまたは文字認識誤りによって生じ
た読み取り棄却文字あるいは誤字について、単語辞書お
よび文法辞書を用いた形態素解析によって誤字が含まれ
る文節レベルの誤字含有域を抽出する誤字検出部と、 この誤字含有域から文字間の連節確率によって誤字と見
なす文字位置を抽出する誤字位置検出部と、 予めこれらの日本文文書と同種で誤字を含まない文書を
用いて抽出されるN文字の文字列あるいはこれらから選
択した特定のN文字、N−1文字、・・・、2文字の文
字列における第i番目(i=1・・・N)の文字以外の
パターンが等しい場合の第i番目の文字を訂正候補文字
として収集した日本文訂正候補文字辞書と、 誤字位置検出部により抽出された誤字位置以外の周辺の
文字をキーとして日本文訂正候補文字辞書を索引し、誤
字に対する訂正候補文字を抽出する訂正候補文字抽出部
と、 予め抽出したN文字のパターンに関する出現頻度情報に
基づいて、予め算定された各N文字の文字連接確率情報
を各N文字をキーとして保持する文字連節確率辞書と、 その文字連接確率辞書を用いて訂正候補文字抽出部で抽
出した訂正候補の順位付けおよび絞り込みを行う訂正候
補絞り込み部と、 絞り込まれた訂正候補を原文文字列に挿入して形態素解
析による文法的チェックを行う訂正候補チェック部と、 文法的チェックを受けた訂正候補から訂正者が正字を選
択する訂正候補選択部とを有する日本文誤字自動修正装
置であって、 検出した誤字に対する訂正候補を前記候補辞書により抽
出し、前記確率辞書により絞り込んでさらに形態素解析
による文法チェックを行って訂正候補を自動的に修正す
る手段を備えることを特徴とする日本文誤字自動修正装
置。
[Scope of Claims] For characters that are rejected or misspelled characters caused by input errors or character recognition errors in Japanese sentences in a Japanese document database input from a character input device, misspellings are included by morphological analysis using a word dictionary and a grammar dictionary. A typographical error detection unit extracts a typographical error area at the bunsetsu level, and a typographical position detection unit extracts a character position that is considered to be a typographical error based on the probability of inter-character conjunction from this typographical error area. A character string of N characters extracted using a document that does not contain typographical errors, or a specific N character selected from these, N-1 character,..., the i-th character string (i = 1... - A Japanese correction candidate character dictionary that collects the i-th character when the patterns other than the character N) are the same as a correction candidate character, and the characters around the typo position extracted by the typo position detection unit as keys. A correction candidate character extraction unit that indexes the Japanese sentence correction candidate character dictionary and extracts correction candidate characters for misspellings, and a character concatenation of each N character calculated in advance based on appearance frequency information regarding the pattern of N characters extracted in advance. a character conjunction probability dictionary that holds probability information using each N character as a key; a correction candidate narrowing section that uses the character conjunction probability dictionary to rank and narrow down correction candidates extracted by a correction candidate character extraction section; A correction candidate checking section that inserts corrected correction candidates into the original character string and performs a grammatical check by morphological analysis, and a correction candidate selection section that allows a corrector to select correct characters from the correction candidates that have undergone the grammatical check. The automatic sentence error correction device includes means for automatically correcting correction candidates by extracting correction candidates for detected typos using the candidate dictionary, narrowing them down using the probability dictionary, and further performing a grammar check using morphological analysis. An automatic correction device for Japanese typographical errors.
JP61238059A 1986-10-08 1986-10-08 Japanese typographical error correction device Expired - Lifetime JPH077414B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61238059A JPH077414B2 (en) 1986-10-08 1986-10-08 Japanese typographical error correction device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61238059A JPH077414B2 (en) 1986-10-08 1986-10-08 Japanese typographical error correction device

Publications (2)

Publication Number Publication Date
JPS6394364A true JPS6394364A (en) 1988-04-25
JPH077414B2 JPH077414B2 (en) 1995-01-30

Family

ID=17024546

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61238059A Expired - Lifetime JPH077414B2 (en) 1986-10-08 1986-10-08 Japanese typographical error correction device

Country Status (1)

Country Link
JP (1) JPH077414B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011065384A (en) * 2009-09-16 2011-03-31 Nippon Telegr & Teleph Corp <Ntt> Text analysis device, method, and program coping with wrong letter and omitted letter
CN111259654A (en) * 2018-11-30 2020-06-09 北京嘀嘀无限科技发展有限公司 Text error detection method and device
CN111368918A (en) * 2020-03-04 2020-07-03 拉扎斯网络科技(上海)有限公司 Text error correction method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011065384A (en) * 2009-09-16 2011-03-31 Nippon Telegr & Teleph Corp <Ntt> Text analysis device, method, and program coping with wrong letter and omitted letter
CN111259654A (en) * 2018-11-30 2020-06-09 北京嘀嘀无限科技发展有限公司 Text error detection method and device
CN111259654B (en) * 2018-11-30 2023-09-15 北京嘀嘀无限科技发展有限公司 Text error detection method and device
CN111368918A (en) * 2020-03-04 2020-07-03 拉扎斯网络科技(上海)有限公司 Text error correction method and device, electronic equipment and storage medium
CN111368918B (en) * 2020-03-04 2024-01-05 拉扎斯网络科技(上海)有限公司 Text error correction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
JPH077414B2 (en) 1995-01-30

Similar Documents

Publication Publication Date Title
US7983903B2 (en) Mining bilingual dictionaries from monolingual web pages
JP2693780B2 (en) Text processing systems and methods for checking in text processing systems whether units or chemical formulas are used correctly and consistently
Oh et al. An English-Korean transliteration model using pronunciation and contextual rules
CN109460552B (en) Method and equipment for automatically detecting Chinese language diseases based on rules and corpus
Volk et al. Strategies for reducing and correcting OCR errors
CN110110334B (en) Remote consultation record text error correction method based on natural language processing
Chang A new approach for automatic Chinese spelling correction
Hossain et al. Development of bangla spell and grammar checkers: Resource creation and evaluation
Ahamed et al. Spell corrector for Bangla language using Norvig’s algorithm and Jaro-Winkler distance
Liyanapathirana et al. Sinspell: A comprehensive spelling checker for sinhala
Chaudhuri Reversed word dictionary and phonetically similar word grouping based spell-checker to Bangla text
Zukarnain et al. Spelling checker algorithm methods for many languages
JPS6394364A (en) Automatic correction device for wrong character in japanese sentence
Mridha et al. An approach for detection and correction of missing word in Bengali sentence
Debnath et al. A Hybrid Approach to Design Automatic Spelling Corrector and Converter for Transliterated Bangla Words
Ren et al. A hybrid approach to automatic Chinese text checking and error correction
JP2599973B2 (en) Japanese sentence correction candidate character extraction device
JP2774495B2 (en) Natural language processor
JPS6382542A (en) Extraction device for japanese sentence correction candidate character
JPH0244459A (en) Japanese text correction candidate extracting device
Fahrudin et al. Analysis and Development of KEBI 1.0 Checker Framework as an Application of Indonesian Spelling Error Detection
Sheykholeslam et al. A Framework for Spelling Correction in Persian Language Using Noisy Channel Model.
Mon et al. Myanmar spell checker
JPH0362260A (en) Detecting/correcting device for katakana word error
JP2595047B2 (en) Japanese sentence automatic verification and correction device

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term