JPH0248938B2

JPH0248938B2 -

Info

Publication number: JPH0248938B2
Application number: JP56198941A
Authority: JP
Inventors: Tamaki Saito; Toshiaki Sugimura
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1981-12-10
Filing date: 1981-12-10
Publication date: 1990-10-26
Also published as: JPS5899829A

Description

【発明の詳細な説明】本発明は、日本語情報処理システムにおいて文
字例データに含まれる誤り文字の検出・修正を支
援する装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a device for supporting detection and correction of erroneous characters included in character example data in a Japanese information processing system.

従来、例えばワードプロセツサとして商品化さ
れている言語処理システムにおいては、単語を綴
りの文字列として収容した辞書を用い、入力され
るデータ（文章、単語など）の文字列を該辞書に
収容されている単語の文字列と比較・照合するこ
とにより入力文字列の誤りを検出し、該入力文字
列に誤りが含まれている場合には辞書に収容され
ている単語の文字（列）に置き換えることによ
り、誤りを訂正するように構成されていた。例え
ば、英語のワードプロセツサの場合には単語毎に
区切られて入力される文字列を、一文字ずつ辞書
中の単語と照合し、文字の順序も含めて一致する
文字の数が最大の単語を入力された単語と見なし
て、一致しない文字を誤りとして辞書中の対応す
る単語の文字で置き換えることにより、誤り文字
の訂正を行つている。しかしながら日本語文章を
対象にした場合には漢字の字種が多いこと、ま
た、２字の漢字で構成される単語が多いため誤り
文字の判定が困難なこと（２字の単語の場合はど
ちらが誤りかが判らない）、さらに文章が分かち
書きされないため単語の切出しが難しいこと、な
どの日本語自体が有する種々の（計算機処理上
の）問題点のためにデータ中に含まれる誤りを自
動的に検出し、修正することは難しく、入手によ
るチエツクによらざるを得なかつた。 Conventionally, language processing systems commercialized as word processors, for example, use a dictionary that stores words as spelling strings, and store strings of input data (sentences, words, etc.) in the dictionary. Detects errors in the input string by comparing and collating it with the string of words in the input string, and if the input string contains an error, replaces it with the characters (sequence) of the word stored in the dictionary. It was designed to correct errors. For example, in the case of an English word processor, a character string that is inputted into words is checked against words in a dictionary character by character, and the word with the largest number of matching characters, including the order of the characters, is selected. Erroneous characters are corrected by treating them as input words and replacing them with characters from the corresponding word in the dictionary, treating unmatched characters as errors. However, when targeting Japanese sentences, it is difficult to identify incorrect characters because there are many types of kanji, and many words are composed of two kanji (in the case of two-character words, which one is wrong? Errors contained in data can be automatically removed due to various (computer processing) problems that Japanese itself has, such as the difficulty in separating words because sentences are not separated. It was difficult to detect and correct it, and we had to check it by acquiring it.

本発明は上記従来の欠点を除去するため、日本
語文の文字列の文字間の接続可否から誤り文字の
候補を検出し、誤り文字の候補に類似した文字の
うちその前及び又は後の文字と接続可能な文字を
修正文字の候補として出力するようにしたもの
で、その目的とするところは日本語文の誤り文字
の検出・修正作業の効率を上げることにある。以
下、図面について詳細に説明する。 In order to eliminate the above-mentioned conventional drawbacks, the present invention detects error character candidates based on the connectivity between characters in a character string of a Japanese sentence, and connects characters similar to the error character candidate with the previous and/or subsequent characters. It is designed to output connectable characters as correction character candidates, and its purpose is to improve the efficiency of detecting and correcting incorrect characters in Japanese sentences. The drawings will be described in detail below.

第１図は本発明の一実施例を示すもので、図中
１は入力部、２は２文字分の日本語のデータを一
時記憶するレジスタ、３は２文字分のデータを一
時記憶するバツフア３ａを有する制御部、４は照
合回路、５は日本語文の相連なる２文字の接続可
否を内容とした文字連接辞書を記憶する記憶部
（以下、説明の簡略のため単に文字連接辞書と称
す。）、６は出力部、７は誤り符号発生回路７−
１、候補開始符号発生回路７−２，候補終了符号
発生回路７−３よりなる符号発生回路、８は検索
回路、９は個々の文字に対してその特徴によつて
類似する文字を分類した文字分類辞書を記憶した
記憶部（以下、説明の簡略のため単に文字分類辞
書と称す。）である。レジスタ２は２文字分のデ
ータ格納領域を有するシフトレジスタで構成さ
れ、制御部３の制御の下に入力部１より入力した
文字のデータを一時記憶し、１文字毎に出力部６
に送出する如くなつている。 FIG. 1 shows an embodiment of the present invention, in which 1 is an input section, 2 is a register that temporarily stores Japanese data for two characters, and 3 is a buffer that temporarily stores data for two characters. 3a is a control unit, 4 is a collation circuit, and 5 is a storage unit that stores a character concatenation dictionary containing information on whether or not two consecutive characters in a Japanese sentence can be connected (hereinafter simply referred to as a character concatenation dictionary for the sake of brevity). ), 6 is an output section, 7 is an error code generation circuit 7-
1. A code generation circuit consisting of a candidate start code generation circuit 7-2 and a candidate end code generation circuit 7-3; 8 a search circuit; 9 a character classification of characters similar to each character according to its characteristics; This is a storage unit that stores a classification dictionary (hereinafter simply referred to as a character classification dictionary for the sake of brevity). The register 2 is composed of a shift register having a data storage area for two characters. Under the control of the control unit 3, the register 2 temporarily stores the data of the characters input from the input unit 1, and outputs the data for each character to the output unit 6.
It's starting to look like it's being sent out.

文字連接辞書５の内容は、例えば第２図に示す
如く、相連なる２文字のうち第１文字を行に配列
した文字に、第２文字を列に配列した文字にそれ
ぞれ対応させ、日本語の言葉として存在するとこ
ろの連接を構成する２文字の交点に‘１”の値
を、日本語の言葉として存在しないところの連接
を構成する２文字の交点“０”の値をそれぞれ与
えたものである。ここで、前記“１”の値の代り
にその言葉として使用頻度を表わすような値を用
いることも可能である。なお、図面上では漢字同
士の組み合せしか示されていないが、実際にはか
なと漢字、かな同士等についての接続可否も含ま
れている。 The contents of the character concatenation dictionary 5 are, for example, as shown in FIG. A value of '1' is given to the intersection of two characters that make up a conjunction that exists as a word, and a value of '0' is given to the intersection of two letters that make up a conjunction that does not exist as a Japanese word. Here, instead of the value "1" mentioned above, it is also possible to use a value that indicates the frequency of use of the word.Although the drawing only shows combinations of kanji, in reality It also includes whether or not kana and kanji can be connected, as well as kana and kana.

誤り符号発生回路７−１，候補開始符号発生回
路７−２、候補終了符号発生回路７−３は制御部
３により駆動され、それぞれ誤り符号“／”（ス
ラツシユ）、候補開始符号“〔”（始め大括弧）、候
補終了符号“〕”（終り大括弧）を出力部６に出力
する如くなつている。なお、誤り符号としては
JIS−C6226の符号表の空領域にシステム独自に
定めた２バイト符号でもよい。 The error code generation circuit 7-1, candidate start code generation circuit 7-2, and candidate end code generation circuit 7-3 are driven by the control unit 3, and generate error code "/" (slush) and candidate start code "[" (), respectively. The starting bracket) and the candidate ending code "]" (closing bracket) are output to the output unit 6. In addition, the error code is
A 2-byte code uniquely defined by the system may be used in the empty area of the JIS-C6226 code table.

文字分類辞書９の内容は、例えば第３図に示す
如く、JIS−C6226のコード順に見出しの漢字を
配列し、各漢字に対応してその漢字と「読み」が
共通の漢字を分類し並べたものと、第４図に示す
如く、あいうえお順に見出しのひらがなを配列
し、各ひらがなに対応してそのひらがなと「形」
が類似したひらがなを分類し並べたものと、更に
図示しないが、かたかな、アルフアベツト等の
「形」が類似したものを並べたものとを組み合せ
たもの、または第５図に示す如く、見出しの漢字
に対応してその漢字と「形」が類似した漢字を分
類し並べたものと前記第４図のもの等とを組み合
せたもの、あるいはこれらをすべて組み合せたも
のである。 The contents of the character classification dictionary 9 are, for example, as shown in Figure 3, the kanji in the heading are arranged in the order of JIS-C6226 codes, and corresponding to each kanji, the kanji that have the same reading as that kanji are classified and arranged. As shown in Figure 4, arrange the heading hiragana in alphabetical order, and write the hiragana and ``shape'' corresponding to each hiragana.
A combination of a classification and arrangement of similar hiragana and a further arrangement of similar "shapes" such as Katakana and Alphabetical (not shown), or a heading as shown in Figure 5. This is a combination of the classification and arrangement of kanji that are similar in shape to the kanji corresponding to the kanji, and the kanji shown in Figure 4 above, or a combination of all of these.

次に動作について説明する。まず入力部１より
入力した漢字、かな及び句読点等の混合する日本
語の文字列データのうち先頭及び２番目のデータ
をそれぞれレジスタ２に第１文字及び第２文字と
して取り込む。制御部３はレジスタ２内の２文字
のデータをそのままバツフア３ａに書き込み、更
に照合回路４に送る。照合回路４は該レジスタ２
の第１文字および第２文字のデータを文字連接辞
書５の行および列の文字データにアクセスし、そ
の交点の値を制御部３に送出する。該照合回路４
からの値が“１”（即ち、レジスタ２内の２つの
文字が日本語として正しく連接する場合）であれ
ば、制御部３はレジスタ２内の第１文字のデータ
をそのまま出力部６に送出し、第２文字を前記第
１文字を格納した領域に移し、入力部１から次の
１文字をレジスタ２に取り込む。この後、レジス
タ２内の新しい第１文字、第２文字のデータをバ
ツフア３ａに書き込み、前記同様に文字連接の照
合を行なう。 Next, the operation will be explained. First, the first and second data of the mixed Japanese character string data including kanji, kana, punctuation marks, etc. inputted from the input unit 1 are taken into the register 2 as the first and second characters, respectively. The control section 3 writes the two-character data in the register 2 as is into the buffer 3a, and further sends it to the collation circuit 4. The matching circuit 4 is the register 2.
The character data of the row and column of the character concatenation dictionary 5 are accessed for the data of the first character and the second character, and the value of the intersection is sent to the control unit 3. The matching circuit 4
If the value from is "1" (that is, when the two characters in register 2 are correctly concatenated as Japanese), the control unit 3 sends the data of the first character in register 2 to the output unit 6 as is. Then, the second character is moved to the area where the first character was stored, and the next character is taken into the register 2 from the input section 1. Thereafter, the data of the new first and second characters in the register 2 are written into the buffer 3a, and character concatenation is verified in the same manner as described above.

照合回路４からの値が“０”（即ち、レジスタ
２内の２つの文字が日本語として連接しない場
合）であれば、制御部３は誤り符号発生回路７−
１より誤り符号“／”を出力部６に出力させ、更
にレジスタ２内の第１文字のデータを出力部６に
送出し次の１文字を入力部１より取り込む。但
し、ここではバツフア３ａの内容はそのままとす
る。次に制御部３は候補開始符号発生回路７−２
より候補開始符号“〔”を出力部６に出力させ、
バツフア３ａ内の第１文字のデータを検索回路８
に送る。検索回路８は該第１文字のデータを見出
しとして文字分類辞書９を検索し、類似した対応
文字を制御部３へ送出する。制御部３は前記類似
した文字を順次第１文字としてバツフア３ａ内の
第２文字と組み合せた２文字のデータを照合回路
４に送る。照合回路４は文字連接辞書５にアクセ
スし、前記各２文字の接続可否をチエツクし、
“０”または“１”の値を制御部３へ出力する。
制御部３は該照合回路４からの値によつて前記類
似した文字のうちバツフア３ａ内の第２文字の直
前に存在し得る文字のみを出力部６に順次送出
し、この後、候補終了符号発生回路７−３から候
補終了符号“〕”を出力部６に送出させる。 If the value from the matching circuit 4 is "0" (that is, when the two characters in the register 2 are not concatenated as Japanese characters), the control unit 3 outputs the error code from the error code generating circuit 7-
1 outputs the error code "/" to the output section 6, furthermore, the data of the first character in the register 2 is sent to the output section 6, and the next character is taken in from the input section 1. However, here, the contents of the buffer 3a are left unchanged. Next, the control unit 3 starts the candidate start code generation circuit 7-2.
output the candidate start code “[” to the output unit 6,
Search circuit 8 for data of the first character in buffer 3a
send to The search circuit 8 searches the character classification dictionary 9 using the data of the first character as a heading, and sends similar corresponding characters to the control section 3. The control unit 3 sequentially converts the similar characters into one character and sends data of two characters combined with the second character in the buffer 3a to the matching circuit 4. The matching circuit 4 accesses the character concatenation dictionary 5, checks whether or not each of the two characters can be connected,
A value of “0” or “1” is output to the control unit 3.
Based on the value from the matching circuit 4, the control unit 3 sequentially sends to the output unit 6 only the characters that can exist immediately before the second character in the buffer 3a among the similar characters, and then outputs the candidate end code. The candidate end code "]" is sent from the generating circuit 7-3 to the output unit 6.

次に制御部３は候補開始符号発生部回路７−２
より出力部６に候補開始符号“〔”を送出させ、
バツフア３ａ内の第２文字のデータを検索回路８
に送り、該第２文字を見出しとして文字分類辞書
９を検索させる。検索回路８は、該文字分類辞書
９よりバツフア３ａ内の第２文字に類似した文字
を制御部３へ送出し、制御部３は該文字を順次第
２文字としてバツフア３ａ内の第１文字と組み合
せた２文字のデータを照合回路４へ送り、文字連
接辞書５によつて接続可否をチエツクさせる。該
チエツクの結果に従つて制御部３はバツフア３ａ
内の第１文字の直後に接続し得る文字のみを出力
部６に順次送出し、その後、候補終了符号発生回
路７−３より候補終了符号“〕”を出力部６に送
出させる。 Next, the control section 3 uses the candidate start code generation section circuit 7-2.
causes the output unit 6 to send out the candidate start code "[",
Search circuit 8 for data of second character in buffer 3a
, and the character classification dictionary 9 is searched using the second character as a heading. The search circuit 8 sends characters similar to the second character in the buffer 3a from the character classification dictionary 9 to the control unit 3, and the control unit 3 sequentially selects the characters as two characters and assigns them to the first character in the buffer 3a. The data of the combined two characters is sent to the matching circuit 4, and the character concatenation dictionary 5 checks whether or not the characters can be connected. According to the result of the check, the control section 3 controls the buffer 3a.
Only the characters that can be connected immediately after the first character in the characters are sequentially sent to the output unit 6, and then the candidate ending code “]” is sent to the output unit 6 from the candidate ending code generation circuit 7-3.

次に制御部３はレジスタ２内の次の２文字分の
データをバツフア３ａに新しく書き込み、前記同
様に文字連接の照合を行ない、以下これを繰り返
す。 Next, the control section 3 newly writes data for the next two characters in the register 2 into the buffer 3a, performs character concatenation verification in the same manner as described above, and repeats this process.

第６図は前記実施例で説明した処理の流れをフ
ローで示したものである。 FIG. 6 is a flowchart showing the process flow explained in the above embodiment.

第７図は前記実施例における入力文字列と出力
文字列の一例を示すもので、図中ai，aij（ｉ，ｊ
≧１）は日本語文の各文字を表わしている。ここ
では入力文字列中、２番目の文字a₂が誤り文字で
あつた場合（即ちa₁a₂及びa₂a₃という文字連接が
日本語に存在しない）を示している。（a₂の下線
は誤り文字であることを示すために付したもの
で、実際の文字列には記されていない。）出力文
字列中、〔a₁₁a₁₂……〕はa₁に類似した文字であ
つてその後にa₂が接続可能なもの、即ちa₂が正し
く入力された文字でa₁が誤り文字であると見なし
た場合のa₁の代りになり得る修正文字の候補の集
合である。また〔a₂₁a₂₂……〕はa₂に類似した文
字であつてその前にa₁が接続可能なもの、即ちa₁
が正しく入力された文字でa₂が誤り文字であると
見なした場合のa₂の代りになり得る修正文字の候
補の集合である。また〔a′₂₁a′₂₂…〕，〔a₃₁a₃₂…〕
についても前記同様であるが、〔a′₂₁a′₂₂…〕はa₃
が接続することができる文字の集合であり、一般
には〔a₂₁a₂₂…〕とは異なる。ここで、第７図で
はa₁a₂及びa₂a₃という文字連接が日本語に存在し
ない場合を示したが、日本語に存在しない文字連
接はa₁a₂，a₂a₃のいずれか一方のみであつても良
い。 FIG. 7 shows an example of an input character string and an output character string in the above embodiment, and in the figure ai, aij (i, j
≧1) represents each character of a Japanese sentence. Here, a case is shown in which the second character a ₂ in the input character string is an error character (that is, the character combinations a ₁ a ₂ and a ₂ a ₃ do not exist in Japanese). (The underline in a ₂ is added to indicate that it is an incorrect character, and is not written in the actual string.) In the output string, [a ₁₁ a ₁₂ ...] is similar to a ₁ . characters that can be connected with a ₂ after that, i.e. candidates for corrected characters that can be substituted for a ₁ when a ₂ is a correctly input character and a ₁ is an error character. It is a gathering. Also, [a ₂₁ a ₂₂ ...] is a character similar to a ₂ that can be connected with a ₁ before it, that is, a ₁
This is a set of candidates for corrected characters that can be substituted for a 2 when A ₂ is considered to be a correctly input character and a ₂ is an error character. Also, [a′ ₂₁ a′ ₂₂ …], [a ₃₁ a ₃₂ …]
The same applies to above, but [a′ ₂₁ a′ ₂₂ …] is a ₃
is a set of characters that can be connected, and is generally different from [a ₂₁ a ₂₂ ...]. Here, Figure 7 shows the case where the character conjunctions a ₁ a ₂ and a ₂ a ₃ do not exist in Japanese, but the character conjunctions that do not exist in Japanese are either a ₁ a ₂ or a ₂ a ₃ . It may be only one of them.

前記の如くして検出された誤り文字の候補、及
び誤り文字に置き換わる修正文字の候補は一般に
人間の判断によつて正誤を決定され、誤り文字の
修正が行われる。そのために使用される表示装
置、正誤指示装置、文字指定装置、文字入力装置
等（図示せず）は本発明装置の後（実施例では出
力部６の後）に配置される如くなる。 The error character candidates detected as described above and the correction character candidates to replace the error characters are generally determined to be correct or incorrect by human judgment, and the error characters are corrected. A display device, a correctness indicating device, a character designation device, a character input device, etc. (not shown) used for this purpose are arranged after the device of the present invention (in the embodiment, after the output section 6).

なお、文字連接辞書５の内容は前述した通りで
あるが、この辞書を作成する際の日本語文の対象
として、例えば国語辞書に収録されている“語”
を用いれば、“単語”内での文字の接続可否条件
の辞書とすることができる。また、一般の文章全
体における文字の接続可否条件の辞書とする場合
には本発明装置を適用して誤り文字を検出・修正
する対象の入力文字列データから作成することが
できる。但し、文字連接辞書５の作成の際に対象
とした入力データの言葉の範囲と実際に検出・修
正支援処理を行なう入力データの言葉の範囲とが
異なる（例えば法律用語と医学用語のように分野
が異なる）ような場合には、検出された誤りが全
て誤りであるとは限らない。 The contents of the character concatenation dictionary 5 are as described above, but when creating this dictionary, the target of Japanese sentences is, for example, "words" recorded in the Japanese dictionary.
By using this, it can be used as a dictionary of conditions for allowing or not to connect characters within a "word". Furthermore, in the case of creating a dictionary of conditions for connecting characters in a general sentence as a whole, the device of the present invention can be applied to create a dictionary from input character string data whose erroneous characters are to be detected and corrected. However, the range of words in the input data targeted when creating the character concatenation dictionary 5 is different from the range of words in the input data for which detection/correction support processing is actually performed (for example, legal terminology and medical terminology). (different), not all detected errors are errors.

また文字分類辞書９内の漢字に対応する部分
が、第３図に示すような“読み”による分類の場
合は、50音配列のタブレツト形入力装置のように
各漢字が“読み”を手掛りにして入力されるよう
な入力装置を使用して入力された文字列のデータ
の誤り検出・修正に有効であり、第５図に示すよ
うな“形”による分類の場合は、漢字OCRのよ
うに各漢字が“形”を手掛りにして入力されるよ
うな入力装置によつて入力された文字列のデータ
の誤り検出・修正に有効である。 In addition, if the part corresponding to the kanji in the character classification dictionary 9 is classified by ``yomi'' as shown in Figure 3, each kanji uses the ``yomi'' as a cue, as in the case of a tablet-type input device with a 50-syllabary arrangement. It is effective for detecting and correcting errors in character string data input using input devices such as kanji OCR. This method is effective for detecting and correcting errors in character string data input using an input device in which each kanji character is input based on its shape.

誤り符号、候補開始符号、候補終了符号等は必
ずしも全て必要なわけではなく、少なくすること
も可能である。（例えば、誤り符号“／”は説明
を容易にするためのもので、省略してもよい。）誤り文字検出・修正の精度の向上は２文字だけ
でなく、３文字、４文字……とより長い範囲の文
字間で連接をチエツクすることによつてなし得
る。これは付属語、代名詞、形式名詞などと付属
語との接続のチエツクに有効である。 The error code, candidate start code, candidate end code, etc. are not all necessarily necessary, and they can be reduced. (For example, the error code "/" is for ease of explanation and may be omitted.) The accuracy of error character detection and correction is improved not only for 2 characters, but also for 3 characters, 4 characters, and so on. This can be done by checking for concatenation between characters over a longer range. This is effective for checking the connection between adjuncts, pronouns, formal nouns, etc. and adjuncts.

以上説明したように本発明によれば、日本語文
の文字列データに含まれる誤り文字の候補を検出
し、更にその文字に代るべき修正用の文字の候補
を出力するため、従来すべて人手に頼つていた日
本語文のデータチエツク（いわゆるベリフアイ）
の作業の一部を分担させることができ、作業の効
率を上げることができるとともに誤り文字検出・
修正の精度を上げることができる利点がある。 As explained above, according to the present invention, candidates for erroneous characters included in character string data of Japanese sentences are detected, and candidates for correction characters to be substituted for the characters are output. Data check of the Japanese text that I relied on (so-called verification)
It is possible to share a part of the work, increasing work efficiency and improving the detection of erroneous characters.
This has the advantage of increasing the accuracy of correction.

なお、本発明は日本語ワードプロセツサ、漢字
OCR、音声認識による日本語入力装置等、いわ
ゆる日本語入力装置すべてに適用でき、また、一
度入力されて出来上がつたデータ（例えばデータ
ベースに収容されているデータ）の誤り文字検
出・修正にも適用できる。 The present invention is a Japanese word processor, a kanji word processor, and a kanji word processor.
It can be applied to all so-called Japanese input devices, such as Japanese input devices using OCR and voice recognition, and can also be used to detect and correct erroneous characters in data that has been input once (for example, data stored in a database). Applicable.

[Brief explanation of the drawing]

図面は本発明の説明に供するもので、第１図は
本発明の一実施例を示す誤り文字検出・修正支援
装置のブロツク構成図、第２図は文字連接辞書の
内容の一例を示す説明図、第３図、第４図及び第
５図は文字分類辞書の内容の一例を示す説明図、
第６図は第１図の装置における処理のフローチヤ
ート、第７図は入力文字列と出力文字列の一列を
示す説明図である。１……入力部、２……レジスタ、３……制御
部、４……照合回路、５……文字連接辞書、６…
…出力部、７……符号発生回路、８……検索回
路、９……文字分類辞書。 The drawings serve to explain the present invention; FIG. 1 is a block diagram of an erroneous character detection/correction support device showing one embodiment of the present invention, and FIG. 2 is an explanatory diagram showing an example of the contents of a character concatenation dictionary. , FIG. 3, FIG. 4, and FIG. 5 are explanatory diagrams showing examples of the contents of a character classification dictionary,
FIG. 6 is a flowchart of processing in the apparatus of FIG. 1, and FIG. 7 is an explanatory diagram showing a line of input character strings and output character strings. DESCRIPTION OF SYMBOLS 1...Input part, 2...Register, 3...Control part, 4...Verification circuit, 5...Character concatenation dictionary, 6...
...Output section, 7... Code generation circuit, 8... Search circuit, 9... Character classification dictionary.

Claims

[Claims]

1. A first means for temporarily storing two or more consecutive characters of a character string of a Japanese sentence, and a second means for storing a character concatenation dictionary whose content is whether or not at least two consecutive characters of a Japanese sentence can be connected; a third means for storing a character classification dictionary in which characters similar to each character are classified according to their characteristics;
a fourth means for comparing the contents of the means with the contents of the second means to detect an erroneous character candidate; and a fourth means for searching the third means for a character similar to the erroneous character candidate; An erroneous character detection/correction support device comprising a fifth means that selects and outputs only characters that can be connected to other characters of the first means by comparing the contents of the means.