JPH05233619A

JPH05233619A - Method for correcting error of japanese language sentence and device therefor

Info

Publication number: JPH05233619A
Application number: JP4030902A
Authority: JP
Inventors: Hirobumi Tamagawa; 博文玉川; Junichi Kubota; 淳市久保田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1992-02-18
Filing date: 1992-02-18
Publication date: 1993-09-10

Abstract

PURPOSE:To provide a Japanese language sentence error correcting method which can extract a correction candidate character, and the device for performing its method, even if a character which cannot be analyzed is any kind of character of a KANJI (Chinese character), a KANA (Japanese syllabary), etc., as the result of executing a form analysis of an inputted Japanese sentence. CONSTITUTION:The device is provided with a morpheme analyzing part 3 for parsing a sentence into word units, a word dictionary part 5 for holding a notation and a part of speech used by the morpheme analyzing part 3, or also, by allowing a semantic attribute to correspond, an unanalyzable character detecting part 7 for detecting a character which becomes unanalyzable in the morpheme analyzing part 3 and the corresponding position in the sentence of the unanalyzable character, a dictionary retrieving part 8 for retrieving a word which contains a designated character in the notation from the word dictionary part 5, and a word connection testing part 9 for executing a test of consistency of the word based on the part-of-speech information in a state that a word and an idle character-string retrieved by the dictionary retrieving part 8 and the unanalyzable character in the sentence are replaced, by which the error of a Japanese language sentence is corrected.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、主として日本語ワープ
ロにおいて採用され、文章中の脱字、および誤って混入
した不要の文字による誤りに対する訂正候補単語を抽出
し訂正を行なう、誤り訂正方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is mainly used in Japanese word processors, and an error correction method and apparatus for extracting and correcting correction candidate words for errors due to missing characters in a sentence and unnecessary characters erroneously mixed in. Regarding

【０００２】[0002]

【従来の技術】日本語ワープロなどで作成した日本語文
章中には、入力時のかな漢字変換の変換ミスや編集ミス
により誤字、脱字といった誤りが含まれることがある。
このような文章中の誤りを訂正する方法として、例えば
特開平１−２８１５６１号公報に、日本文訂正候補文字
抽出方法が開示されている。図７は、その方法が適用さ
れる日本文訂正装置の構成を示す構成図である。2. Description of the Related Art Japanese sentences created by a Japanese word processor may include errors such as typographical errors and omissions due to conversion errors and editing errors in Kana-Kanji conversion during input.
As a method for correcting such an error in a sentence, for example, Japanese Patent Laid-Open No. 1-281561 discloses a Japanese sentence correction candidate character extraction method. FIG. 7 is a configuration diagram showing the configuration of a Japanese sentence correction device to which the method is applied.

【０００３】この図において、「未知語検出部」では、
日本語単語辞書及び文法辞書を用いて日本文の形態素解
析を行い、単語の位置的あるいは文法的に不連続となる
箇所の文字を未知語として検出するようになっている。
一方、「漢字単語テーブル」では、漢字２文字からなる
高出現頻度の単語を、その前方文字あるいは後方文字を
同一とする単語の組として分類し、その出現頻度、品
詞、意味属性を対として出現頻度順に格納している。そ
して、その前方１文字あるいは後方１文字をキーとして
検索することができるようになっている。In this figure, in the "unknown word detecting section",
Morphological analysis of Japanese sentences is performed using a Japanese word dictionary and a grammar dictionary, and characters at positions where words are discontinuous in terms of position or grammar are detected as unknown words.
On the other hand, in the “Kanji word table”, words with a high appearance frequency consisting of two Kanji characters are classified as a set of words having the same forward or backward characters, and the occurrence frequency, part of speech, and semantic attribute appear as a pair. Stored in order of frequency. Then, the front one character or the rear one character can be searched as a key.

【０００４】また、「訂正候補文字抽出部」では、かか
る漢字単語テーブルを用いて、検出された未知語に対し
訂正候補となる文字を複数文字単語の形式で抽出し、更
に、「訂正候補選択部」では、抽出された訂正候補文字
を含む複数文字単語について、前後の単語との文法的な
接続関係、意味的な承授関係或いは単語の出現頻度を用
いて訂正候補を選択するようになっている。なお、図中
破線で囲む部分は、ＣＰＵ及びメモリからなる処理装置
を示している。In addition, the "correction candidate character extraction section" uses the Kanji word table to extract characters that are candidates for correction to the detected unknown word in the form of a plurality of character words, and further selects "correction candidate selection". The “part” selects a correction candidate for a multi-character word including the extracted correction candidate character by using the grammatical connection relationship with the preceding and following words, the semantic acceptance relationship, or the appearance frequency of the word. ing. In addition, a portion surrounded by a broken line in the drawing indicates a processing device including a CPU and a memory.

【０００５】この処理装置における動作の流れについて
は、図中矢印でもって示してある。先ず、キーボード等
の「入力装置」を通じて入力された日本文は、「入力処
理部」で処理された後、「入力日本文データベース」と
して文字コードの形式で記憶される。続いて、「未知語
検出部」にて、「日本語単語辞書」及び「文法辞書」を
用いて入力された日本文の形態素解析が行われ、その結
果を基に単語の位置的或いは文法的に不連続となる箇所
の文字が未知語として検出される。The flow of operations in this processing apparatus is indicated by arrows in the figure. First, a Japanese sentence input through an "input device" such as a keyboard is processed by an "input processing unit" and then stored in a character code format as an "input Japanese sentence database". Next, in the "unknown word detector", morphological analysis of the Japanese sentence input using the "Japanese word dictionary" and "grammar dictionary" is performed, and the position or grammatical analysis of the word is performed based on the result. Characters at discontinuous points are detected as unknown words.

【０００６】更に、「訂正候補文字抽出部」にて、「漢
字単語テーブル」を用いて検出された未知語に対する訂
正候補文字が抽出される。ひき続き、「訂正候補選択
部」にて、「日本語単語辞書」及び「文法辞書」を用い
て抽出されたいくつかの訂正候補文字の中から適切な訂
正候補文字の選択がなされる。このようにして、誤り救
済された「日本文文書データベース」が作成されること
になる。Further, the "correction candidate character extracting section" extracts a correction candidate character for the unknown word detected using the "kanji word table". Subsequently, the “correction candidate selection unit” selects an appropriate correction candidate character from some correction candidate characters extracted using the “Japanese word dictionary” and the “grammar dictionary”. In this way, the error-relieved “Japanese document database” is created.

【０００７】[0007]

【発明が解決しようとする課題】しかしながら、上述し
たような処理装置によれば、抽出される訂正候補文字が
漢字２文字からなる単語に限定されており、ひらがな語
や２文字以上の単語を訂正候補として抽出することはで
きない。また、誤って文章中に混入した不要の文字を削
除することもできない。更に、漢字２文字からなる単語
であって高出現頻度のものについて予め漢字単語テーブ
ルを準備しなければならず、それを保持するために多大
な記憶領域が必要となる。However, according to the above-described processing apparatus, the correction candidate characters to be extracted are limited to words consisting of two Kanji characters, and Hiragana words and words of two or more characters are corrected. It cannot be extracted as a candidate. Also, it is not possible to delete unnecessary characters that are accidentally mixed in the text. Furthermore, a kanji word table must be prepared in advance for a word consisting of two kanji characters and having a high appearance frequency, and a large storage area is required to hold it.

【０００８】本発明は、かかる現状に鑑みてなされたも
のであり、入力された日本文の形態素解析を行った結
果、解析不能となった文字が漢字、ひらがな等いづれの
字種であっても、その訂正候補文字の抽出を可能とする
日本語文章誤り訂正方法と、その方法を実施するための
装置を提供することを目的とする。The present invention has been made in view of the above circumstances, and even if the characters that cannot be analyzed as a result of morphological analysis of the input Japanese sentence are kanji, hiragana, etc. An object of the present invention is to provide a Japanese sentence error correction method that enables extraction of the correction candidate character and an apparatus for implementing the method.

【０００９】[0009]

【課題を解決するための手段】上記問題点を解決するた
めに本発明は、文章を単語単位に分解する形態素解析部
と、前記形態素解析部で使用する単語の表記と品詞、あ
るいはさらに意味属性を対応付けて保持する単語辞書部
と、前記形態素解析部において解析不能となった文字と
該解析不能文字の文章中における対応位置を検出する解
析不能文字検出部と、指定した文字を表記中に包含する
単語を前記単語辞書部から検索する辞書検索部と、前記
辞書検索部で検索した単語および空文字列と文章中の該
解析不能文字とを置換した状態で単語の品詞情報をもと
にした文法的接続の可否の検定、あるいはさらに意味属
性をもとにした単語の整合性の検定を行う単語接続検定
部を具備した日本語文章誤り訂正装置を構成する。In order to solve the above problems, the present invention provides a morphological analysis unit for decomposing a sentence into words, a word notation and a part of speech used in the morphological analysis unit, or further a semantic attribute. A word dictionary section that holds the corresponding words, an unanalyzable character detection section that detects a character that cannot be analyzed by the morpheme analysis section and a corresponding position in the sentence of the unanalyzable character, and a designated character in the notation. Based on the part-of-speech information of a word in a state in which the dictionary search unit that searches the included word from the word dictionary unit and the word and the empty character string searched by the dictionary search unit and the unanalyzable character in the sentence are replaced. (EN) A Japanese sentence error correction device is provided with a word connection verification unit that tests whether or not a grammatical connection is possible or further tests the consistency of words based on semantic attributes.

【００１０】[0010]

【作用】本発明は上記した構成により、形態素解析部が
単語の表記と品詞、あるいはさらに意味属性を対応つけ
て保持する単語辞書部を使用して日本語文章を単語単位
に分解する。続いて、解析不能文字検出部が形態素解析
の結果から解析不能となった文字と該解析不能文字の文
章中における対応位置を検出する。According to the present invention, the morpheme analysis unit having the above-described configuration decomposes a Japanese sentence into words using the word dictionary unit that holds the word notation and the part of speech or the semantic attribute in association with each other. Then, the unanalyzable character detection unit detects the unanalyzable character and the corresponding position in the sentence of the unanalyzable character from the result of the morphological analysis.

【００１１】また、辞書検索部は、前記解析不能文字検
出部が検出した解析不能文字を表記中に包含する単語を
前記単語辞書部から検索する。更に、単語接続検定部
は、前記辞書検索部が検索した単語および空文字列と文
章中の該解析不能文字とを置換した状態で単語の品詞情
報をもとにした文法的接続の可否の検定、あるいはさら
に意味属性をもとにした単語の整合性の検定を行う。Further, the dictionary search unit searches the word dictionary unit for a word including the unanalyzable character detected by the unanalyzable character detection unit in the notation. Furthermore, the word connection verification unit is a test of whether or not grammatical connection is possible based on the part-of-speech information of the word in a state where the word and the empty character string searched by the dictionary search unit and the unparsable character in the sentence are replaced, Alternatively, a word consistency test is performed based on the semantic attribute.

【００１２】以上により、日本語文章中の解析不能文字
に対して、該解析不能文字を包含しかつ、該解析不能文
字の前後の単語と文法的に接続可能あるいは意味的に整
合性のある訂正単語の候補が抽出される。そして、該解
析不能文字は、訂正候補単語と置換される。或いは、該
解析不能文字は空文字列と置換され、文章中から除去さ
れる。As described above, a correction that includes an unparseable character and is grammatically connectable or semantically consistent with the words before and after the unparseable character with respect to the unparseable character in the Japanese sentence. Word candidates are extracted. Then, the unanalyzable character is replaced with the correction candidate word. Alternatively, the unparsable character is replaced with an empty character string and removed from the sentence.

【００１３】[0013]

【実施例】以下、本発明の一実施例を図面に従って具体
的に説明する。図１は本発明にかかる日本語文章誤り訂
正装置の構成を示す構成図である。この日本語文章誤り
訂正装置は、装置に対する指示や日本語文章の入力を行
なう入力部１と、入力部１からの入力に従って当該機能
を起動する制御部２と、日本語文章を単語単位に分解す
る形態素解析部３と、日本語文章を記憶するテキスト記
憶部４と、形態素解析部３が日本語文章を形態素解析す
る際に使用する単語の表記と品詞を対応付けて保持する
単語辞書部５と、形態素解析の結果、単語単位に分解さ
れた日本語文章の情報および解析不能部分についての情
報を記憶する形態素解析結果記憶部６と、形態素解析結
果記憶部６から解析不能となった文字と該解析不能文字
の文章中における対応位置を検出する解析不能文字検出
部７と、指定した文字を表記中に包含する単語を単語辞
書部５から検索する辞書検索部８と、文章中の該解析不
能文字と辞書検索部８で検索した単語を置換した状態で
単語の品詞情報をもとに文法的接続の可否を検定する単
語接続検定部９と、辞書検索部８が検索した単語のうち
単語接続検定部９により文法的に接続可能と判定された
単語を記憶する訂正候補記憶部１０と、訂正候補記憶部
１０に記憶された単語を文章中の該解析不能文字と対応
付けて表示する訂正候補表示部１１と、訂正候補表示部
１１が表示した単語の内の一つを操作者の指示に従い選
択する訂正候補選択部１２と、文章中の該解析不能文字
を訂正候補選択部１２で選択した訂正候補単語と置換す
る文字列置換部１３と、入力部１からの入力情報、テキ
スト記憶部４に記憶されている日本語文章、訂正候補表
示部１１の処理結果の情報を表示する出力部１４とから
構成されている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT An embodiment of the present invention will now be described with reference to the drawings.
To explain. FIG. 1 is a Japanese sentence error correction according to the present invention.
It is a block diagram which shows the structure of a primary device. This Japanese sentence error
The correction device is used to input instructions and Japanese sentences to the device.
Nau input unit 1 and the function according to the input from the input unit 1.
Control unit 2 that activates and decomposes Japanese sentences into words
Morphological analysis unit 3 and a text record that stores Japanese sentences
The storage unit 4 and the morphological analysis unit 3 perform morphological analysis on Japanese sentences.
Correspond and hold the word notation and part of speech used when
As a result of morphological analysis, the word dictionary unit 5 is decomposed into word units.
Information about the Japanese text that was
Morphological analysis result storage unit 6 for storing information and morphological analysis results
Characters that cannot be parsed from the result storage unit 6 and the unparseable characters
Unparsable Character Detection to Detect Corresponding Positions in Text
Words that include part 7 and specified characters
The dictionary search unit 8 that searches from the writing unit 5 and the analysis
With the Nohji and the word searched by the dictionary search unit 8 replaced
A simple test for grammatical connection based on word part-of-speech information
Of the words searched by the word connection verification unit 9 and the dictionary search unit 8
It was determined by the word connection verification unit 9 that grammatical connection was possible.
Correction candidate storage unit 10 for storing words and correction candidate storage unit
Correspond the word stored in 10 with the unparseable character in the sentence
Correction candidate display unit 11 to be attached and displayed, and correction candidate display unit
Select one of the words displayed by 11 according to the operator's instruction.
Correction candidate selection unit 12 to select, and the unparsable character in the sentence
Is replaced with the correction candidate word selected by the correction candidate selecting unit 12.
Character string replacing unit 13, the input information from the input unit 1,
Japanese sentences and correction candidate table stored in the strike storage unit 4
From the output unit 14 that displays the information of the processing result of the display unit 11
It is configured.

【００１４】以上のように構成された日本語文章誤り訂
正装置の動作を以下に説明する。先ず、操作者が、テキ
スト記憶部４に記憶されている日本語文章に対して誤り
訂正を行う指示を入力部１より入力すると、制御部２は
形態素解析部３を起動させる。そして、形態素解析部３
は、単語辞書部５を使用してテキスト記憶部４に記憶さ
れている日本語文章を単語単位に分解し、その情報を形
態素解析結果記憶部６に格納する。The operation of the Japanese sentence error correction device configured as described above will be described below. First, when the operator inputs an instruction to perform error correction on a Japanese sentence stored in the text storage unit 4 from the input unit 1, the control unit 2 activates the morphological analysis unit 3. Then, the morphological analysis unit 3
Uses the word dictionary unit 5 to decompose the Japanese sentence stored in the text storage unit 4 into word units, and stores the information in the morphological analysis result storage unit 6.

【００１５】次に、制御部２は解析不能文字検出部７を
起動させる。そして、解析不能文字検出部７は形態素解
析結果記憶部６に格納された情報を参照し、解析不能文
字を検出すると同時に、該解析不能文字の文章中におけ
る対応位置の情報を検出し、これらの情報を制御部２に
返す。更に、制御部２は解析不能文字検出部７から返送
された解析不能文字を指定文字に設定し、辞書検索部８
を起動させる。そして、辞書検索部８は、単語辞書部５
に登録されている全ての単語に対して、単語ごとに表記
を検証することにより指定された文字を表記中に包含す
るか否かを判定し、包含している単語のみを訂正候補記
憶部１０に格納する。Next, the control unit 2 activates the unanalyzable character detection unit 7. Then, the unanalyzable character detection unit 7 refers to the information stored in the morpheme analysis result storage unit 6, detects the unanalyzable character, and at the same time detects the information of the corresponding position in the sentence of the unanalyzed character. The information is returned to the control unit 2. Further, the control unit 2 sets the unanalyzable character returned from the unanalyzable character detection unit 7 as the designated character, and the dictionary search unit 8
To start. The dictionary search unit 8 then uses the word dictionary unit 5
For all the words registered in, it is determined whether or not the specified character is included in the notation by verifying the notation for each word, and only the included words are included in the correction candidate storage unit 10. To store.

【００１６】続いて、制御部２は単語接続検定部９を起
動させる。そして、単語接続検定部９は訂正候補記憶部
１０に対して空文字列の単語を追加した後、形態素解析
結果記憶部６に格納された情報を参照し、訂正候補記憶
部１０に記憶されている単語と該解析不能文字とを置換
した状態で文法的に接続不可能な候補を訂正候補記憶部
１０から削除する。Subsequently, the control unit 2 activates the word connection verification unit 9. Then, the word connection verification unit 9 refers to the information stored in the morphological analysis result storage unit 6 after adding the word of the empty character string to the correction candidate storage unit 10, and is stored in the correction candidate storage unit 10. The grammatically inaccessible candidates are deleted from the correction candidate storage unit 10 with the words replaced with the unparseable characters.

【００１７】ひき続き、制御部２は訂正候補表示部１１
を起動させ、同時に、解析不能文字検出部７で検出した
該解析不能文字の位置の情報を訂正候補表示部１１に渡
す。そして、訂正候補表示部１１はテキスト記憶部４と
訂正候補記憶部１０を参照し、文章中の解析不能文字の
位置と訂正候補単語を対応付けて出力部１４に表示する
と同時に、訂正候補選択部１２を起動させる。Subsequently, the control unit 2 causes the correction candidate display unit 11 to
Is started, and at the same time, information on the position of the unanalyzable character detected by the unanalyzable character detection unit 7 is passed to the correction candidate display unit 11. Then, the correction candidate display unit 11 refers to the text storage unit 4 and the correction candidate storage unit 10, displays the position of the unanalyzable character in the sentence and the correction candidate word in association with each other on the output unit 14, and simultaneously displays the correction candidate selection unit. 12 is activated.

【００１８】ここで、操作者は訂正候補表示部１１によ
り表示された訂正候補単語のうちの一つを選択する指示
を、入力部１から入力する。そして、訂正候補選択部１
２はこの入力を受け付けると、文字列置換部１３を起動
させ、同時に、選択された訂正候補単語の情報を文字列
置換部１３に渡す。そこで、文字列置換部１３は該解析
不能文字と、前記選択された訂正候補単語を置換する。Here, the operator inputs, via the input unit 1, an instruction to select one of the correction candidate words displayed by the correction candidate display unit 11. Then, the correction candidate selection unit 1
When 2 receives this input, it activates the character string replacing unit 13, and at the same time, passes the information of the selected correction candidate word to the character string replacing unit 13. Therefore, the character string replacing unit 13 replaces the uncorrectable character with the selected correction candidate word.

【００１９】図２は、解析不能文字を含む日本語文章の
形態素解析結果の例を示す模式図である。即ち、作成し
た日本語文章をテキスト記憶部４に記憶させた後、操作
者がその誤り訂正を行なう指示を入力部１より入力した
場合に、形態素解析部３で行われる日本語文章の形態素
解析結果の例を示している。なお、その解析結果につい
ては形態素解析結果記憶部６に記憶される。FIG. 2 is a schematic diagram showing an example of a morphological analysis result of a Japanese sentence containing unanalyzable characters. That is, when the created Japanese sentence is stored in the text storage unit 4 and the operator inputs an instruction to correct the error from the input unit 1, the morphological analysis of the Japanese sentence performed by the morphological analysis unit 3 is performed. An example of the results is shown. The analysis result is stored in the morphological analysis result storage unit 6.

【００２０】具体的には、図で示すように、「私は走り
まが」なる日本語文章が、斜線の示す位置で単語単位に
分解された情報として記憶されることになる。分解され
た単語の内容については、「私」が名詞、「は」が係助
詞、「走り」が動詞「走る」の連用形又は名詞、次の
「ま」はこの文章において解析不能な文字であり、最後
の「が」は格助詞となっている。More specifically, as shown in the figure, the Japanese sentence "I am running" is stored as information decomposed into word units at the position indicated by diagonal lines. Regarding the content of the decomposed words, "I" is a noun, "ha" is a particle, "run" is a verb "run", or a noun, and the next "ma" is a character that cannot be analyzed in this sentence. , The last "ga" is a case particle.

【００２１】図３は、解析不能文字を表記中に包含する
単語に、空文字列（ＮＵＬＬと表示する）を追加した訂
正候補単語のリストを示す模式図である。この訂正候補
単語リストは次のようにして作成される。即ち、解析不
能文字検出部７が、上述した形態素解析の結果を参照し
て、解析不能文字である「ま」の検出を行い、これと同
時に、この解析不能文字「ま」の文章中における対応位
置の情報を制御部２に返す。そこで制御部２は、解析不
能文字「ま」を指定文字として設定して辞書検索部８を
起動させる。ここに、辞書検索部８は単語辞書部５に登
録されている全ての単語に対してその表記を検証するこ
とによって、指定文字「ま」を表記中に包含した単語を
検索する。なお、検索された訂正候補単語については訂
正候補記憶部１０に格納される。FIG. 3 is a schematic diagram showing a list of correction candidate words in which an empty character string (displayed as NULL) is added to the words including unparsable characters in the notation. This correction candidate word list is created as follows. That is, the unanalyzable character detection unit 7 refers to the result of the above-mentioned morphological analysis to detect the unanalyzable character "ma", and at the same time, corresponds to the unanalyzable character "ma" in the sentence. The position information is returned to the control unit 2. Therefore, the control unit 2 sets the unanalyzable character "MA" as the designated character and activates the dictionary search unit 8. Here, the dictionary retrieval unit 8 verifies the notation of all the words registered in the word dictionary unit 5, thereby retrieving the word including the designated character "MA" in the notation. The retrieved correction candidate word is stored in the correction candidate storage unit 10.

【００２２】検索され、格納される訂正候補単語リスト
としては、図で示すように、解析不能文字である「ま
（アンダーラインを付して示す）」を包含した、「あい
まい」、「あからさま」、・・、「ますます」、「ま
せ」、「ましょ」、「まし」、「ます」、「ますれ」、
「まい」、・・の如くリストアップされている。図４
は、解析不能文字「ま」を上記訂正候補単語で置換した
状態を示す置換テキスト列と、その文法的な接続可能性
の判定結果を示す模式図である。置換テキスト列につい
ては次のようにして作成される。即ち、制御部２が単語
接続検定部９を起動させると、単語接続検定部９は図３
に示すように、訂正候補記憶部１０に空文字列の単語
（ＮＵＬＬ）を追加させる。更に、単語接続検定部９は
形態素解析結果記憶部６を参照しながら、訂正候補記憶
部１０に記憶されている訂正候補の各単語と解析不能文
字である「ま」を置換する。Correction candidate word list searched and stored
As shown in the figure,Well
(Indicated by underlining)
Well"", "Akara"Well, ...Well","Well
"", "Well","Well","Well","Well"
"WellIt is listed as "...". Figure 4
Replaced the unparsable character "ma" with the above correction candidate word
Replacement text string indicating status and its grammatical connectability
It is a schematic diagram which shows the determination result of. For replacement text string
Is created as follows. That is, the control unit 2
When the connection verification unit 9 is activated, the word connection verification unit 9 will be displayed in FIG.
As shown in FIG.
Add (NULL). Furthermore, the word connection verification unit 9
Correction candidate storage while referring to the morphological analysis result storage unit 6
Each word of the correction candidate and the unanalyzable sentence stored in the section 10
Replace the letter "ma".

【００２３】具体的には図で示すように、解析不能文字
である「ま」が図３で示す各訂正候補単語で置換され、
「走り−あいまい− が」、「走り−あからさま−
が」、・・「走り−ますます−が」、「走り−ませ−
が」、「走り−ましょ−が」、「走り−まし−が」、
「走り−ます−が」、「走り−ますれ−が」、「走り−
まい−が」・・「走り−（）−が」の如き置換テキス
ト列が作成される。Specifically, as shown in the figure, the unanalyzable character "ma" is replaced with each correction candidate word shown in FIG.
"Running - love or decoction - is", "running - Akara is or -
Ga ", ..." ran - masu or be - is "," running - or not -
Ga "," running - or cane - is "," running - or to - is ",
"Run - or be - it is", "running - or them - is", "running -
Or decoction - is "..." running - () - there is such as replacement text string of "is created.

【００２４】そして、このようにして作成された置換テ
キストの各列に対し、単語接続検定部９は、解析不能文
字「ま」を置換した各訂正候補単語に先行する単語「走
り（動詞の連用形、或いは名詞）」と、後続する単語
「が（格助詞）」との文法的な接続可能性を判定する。
この判定結果としては、「走り−ます−が」と、「走り
−（）−が」の２つについては文法的に接続可能とな
り（〇印で示す）、その他については接続が不可能であ
る（×印で示す）ため、訂正候補単語のリストから削除
される。Then, for each column of the replacement text created in this way, the word connection test unit 9 precedes each correction candidate word in which the unanalyzable character "ma" has been replaced with the word "run (verb continuous form). , Or a noun) ”and the subsequent word“ ga (case particle) ”are grammatically connectable.
As a result of this determination, the "run - is - or be", "running - () - is" for the two of grammar connectable with the result (indicated by .smallcircle.), The other for the impossible connection Since it exists (indicated by X), it is deleted from the list of correction candidate words.

【００２５】図５は、日本語文章中の解析不能文字に対
応して訂正候補単語を表示した状態を示す模式図であ
る。即ち、制御部２が訂正候補表示部１１を起動させ、
同時に、解析不能文字検出部７で検出した解析不能文字
「ま」の位置の情報を渡すことにより、訂正候補表示部
１１では、テキスト記憶部４に記憶された日本語文章と
訂正候補記憶部１０に記憶された訂正候補単語を参照し
ながら、図に示すように、解析不能文字「ま」を斜線で
仕切り、この解析不能文字に対応付けて訂正単語候補
「１．ます」、「２．（ＮＵＬＬ）」を出力部１４に表
示する。また、訂正候補表示部１１は、同時に訂正候補
選択部１２を起動させる。FIG. 5 is a schematic diagram showing a state in which correction candidate words are displayed corresponding to unanalyzable characters in a Japanese sentence. That is, the control unit 2 activates the correction candidate display unit 11,
At the same time, by passing the information on the position of the unanalyzable character “m” detected by the unanalyzable character detection unit 7, the correction candidate display unit 11 causes the Japanese sentence stored in the text storage unit 4 and the correction candidate storage unit 10 to be displayed. As shown in the figure, the unanalyzable character "ma" is separated by diagonal lines while referring to the correction candidate words stored in the table, and the correction word candidates "1.mas" and "2. "NULL)" is displayed on the output unit 14. The correction candidate display unit 11 also activates the correction candidate selection unit 12 at the same time.

【００２６】図６は、日本語文章中の解析不能文字を、
選択した誤り訂正候補語と置換した状態を示す模式図で
ある。即ち、操作者が訂正候補表示部１１により表示さ
れた訂正候補単語のうちの一つとして、単語「ます」を
選択するために、番号１を入力部１より入力すると、訂
正候補選択部１２ではかかる入力を受け付けて、文字置
換部１３を起動させ、同時に、選択された訂正単語候補
の番号１を渡す。その結果、文字列置換部１３は、図に
示すように、該解析不能文字である「ま」を前記選択さ
れた訂正候補単語「ます」で置換して、「私は走ります
が、」と訂正する。FIG. 6 shows the unparseable characters in Japanese sentences,
It is a schematic diagram which shows the state replaced with the selected error correction candidate word. That is, when the operator inputs the number 1 from the input unit 1 to select the word "masu" as one of the correction candidate words displayed by the correction candidate display unit 11, the correction candidate selection unit 12 Upon receiving such an input, the character replacement unit 13 is activated, and at the same time, the selected correction word candidate number 1 is passed. As a result, as shown in the figure, the character string replacing unit 13 replaces the unparsable character “m” with the selected correction candidate word “masu”, and “I run, but” correct.

【００２７】上述したように本実施例においては、日本
語文章中の解析不能文字を表記中に包含するような単語
を、形態素解析で使用している単語辞書中から検索す
る。更に、検索した訂正候補単語に空文字列（ＮＵＬ
Ｌ）を追加し、これと文章中の該解析不能文字とを置換
する。続いて、この置換した状態において、品詞情報を
もとにして文法的に接続不可能な訂正候補単語を削除
し、文章中の該解析不能文字と訂正単語候補とを対応付
けて表示する。そこで、操作者の指示に従って文字列を
訂正単語候補で置換すれば、文章中の脱字、あるいは誤
って混入した不要の文字による誤りが訂正されることに
なる。As described above, in the present embodiment, the word dictionary used in the morphological analysis is searched for words that include unanalyzable characters in Japanese sentences in the notation. Furthermore, an empty character string (NUL
L) is added to replace this with the unparsable character in the sentence. Then, in this replaced state, the correction candidate words that cannot be grammatically connected are deleted based on the part-of-speech information, and the unanalyzable characters in the sentence and the correction word candidates are displayed in association with each other. Therefore, if the character string is replaced with the correction word candidate in accordance with the instruction of the operator, an error due to a missing character in the sentence or an unnecessary character mixed by mistake will be corrected.

【００２８】なお、本実施例においては、日本語文章の
全体の形態素解析を行なった後に解析不能文字検出部７
を起動させることとしたが、形態素解析で解析不能文字
の直後の単語の認定が終了した時点で解析不能文字検出
部７を起動させるようにしてもかまわない。また、辞書
検索部８については、単語辞書部５に登録されている全
ての単語に対して単語ごとに表記を検証し、指定された
文字を表記中に包含するか否かの判定を行い、包含する
単語のみを検出することとしたが、他の手段によって検
出することも可能である。更に、訂正候補単語の表示に
ついつては、図５に示すような形式で出力部１４に表示
させたが、勿論、他の形式で表示させてもかまわない。
また、訂正候補単語を選択する際に候補に付された番号
により選択の指示を行うこととしたが、その他にもマウ
スなどの指示デバイスにより選択の指示を行う手段を利
用してもかまわない。In this embodiment, the unanalyzable character detector 7 is used after the morphological analysis of the entire Japanese sentence is performed.
However, it is also possible to activate the unanalyzable character detection unit 7 when the recognition of the word immediately after the unanalyzable character is completed in the morphological analysis. Further, the dictionary search unit 8 verifies the notation for each word with respect to all the words registered in the word dictionary unit 5 and determines whether or not the designated character is included in the notation, Although only the included words are detected, it is also possible to detect them by other means. Further, the correction candidate words are displayed on the output unit 14 in the format as shown in FIG. 5, but of course they may be displayed in other formats.
Further, when the correction candidate word is selected, the selection instruction is given by the number attached to the candidate. However, a means for giving an instruction for selection by an instruction device such as a mouse may be used.

【００２９】[0029]

【発明の効果】以上のように、本発明の日本語文章誤り
訂正装置においては、解析不能となった文字を包含する
ような単語を、形態素解析で使用している単語の表記を
キーとして検索可能な単語辞書から検索し、検索した単
語群を文章中の解析不能文字と置換した状態で文法的接
続の可否を検定する。従って、漢字、ひらがななどいず
れの字種の解析不能文字に対しても、訂正候補単語の文
字数を限定せずに抽出することが可能となる。また、い
かなる脱字あるいは誤って混入した不要の文字による誤
りに対しても、もれなく訂正候補を抽出し、誤りを精度
良く訂正することが可能となる。更に、訂正候補単語を
検索するための特別のテーブルを使用することがなく、
多大な記憶領域を必要としないため、その実用的効果は
極めて大きい。As described above, in the Japanese sentence error correction device of the present invention, a word that includes a character that cannot be analyzed is searched using the notation of the word used in the morphological analysis as a key. A possible word dictionary is searched, and the possibility of grammatical connection is tested with the searched word group replaced with unparsable characters in the sentence. Therefore, it is possible to extract uncorrectable characters of any character type such as kanji and hiragana without limiting the number of characters of the correction candidate word. Further, it is possible to extract correction candidates without fail for any error caused by any missing characters or erroneously mixed unnecessary characters, and correct the error with high accuracy. Furthermore, without using a special table for searching correction candidate words,
Since a large storage area is not required, its practical effect is extremely large.

[Brief description of drawings]

【図１】本発明にかかる日本語文章誤り訂正装置の構成
を示す構成図である。FIG. 1 is a configuration diagram showing a configuration of a Japanese sentence error correction device according to the present invention.

【図２】解析不能文字を含む日本語文章の形態素解析結
果の例を示す摸式図である。FIG. 2 is a schematic diagram showing an example of a morphological analysis result of a Japanese sentence containing unanalyzable characters.

【図３】解析不能文字を表記中に包含する単語に、空文
字列を追加した訂正候補単語のリストを示す摸式図であ
る。FIG. 3 is a schematic diagram showing a list of correction candidate words in which an empty character string is added to words including unanalyzable characters in the notation.

【図４】解析不能文字を訂正候補単語で置換した状態を
示す置換テキスト列と、その文法的な接続可能性の判定
結果を示す摸式図である。FIG. 4 is a schematic diagram showing a replacement text string showing a state in which an unparseable character is replaced with a correction candidate word, and a grammatical connectability determination result.

【図５】日本語文章中の解析不能文字に対応して訂正候
補単語を表示した状態を示す摸式図である。FIG. 5 is a schematic diagram showing a state in which correction candidate words are displayed corresponding to unanalyzable characters in a Japanese sentence.

【図６】日本語文章中の解析不能文字を、選択した誤り
訂正候補単語と置換した状態を示す摸式図である。FIG. 6 is a schematic diagram showing a state in which an unanalyzable character in a Japanese sentence is replaced with a selected error correction candidate word.

【図７】従来の装置の日本文訂正装置の構成を示す構成
図である。FIG. 7 is a configuration diagram showing a configuration of a Japanese sentence correction device of a conventional device.

[Explanation of symbols]

１入力部２制御部３形態素解析部４テキスト記憶部５単語辞書部６形態素解析結果記憶部７解析不能文字検出部８辞書検索部９単語接続検定部１０訂正候補記憶部１１訂正候補表示部１２訂正候補選択部１３文字列置換部１４出力部 1 Input Section 2 Control Section 3 Morphological Analysis Section 4 Text Storage Section 5 Word Dictionary Section 6 Morphological Analysis Result Storage Section 7 Unanalyzable Character Detection Section 8 Dictionary Search Section 9 Word Connection Verification Section 10 Correction Candidate Storage Section 11 Correction Candidate Display Section 12 Correction candidate selection unit 13 Character string replacement unit 14 Output unit

Claims

[Claims]

1. A morphological analysis step of decomposing a sentence into words when correcting an error in a Japanese sentence, a character that cannot be analyzed in the morphological analysis step, and a correspondence between the unanalyzable character in the sentence. An unanalyzable character detection step of detecting a position, a dictionary search step of searching a word including a specified character in the notation used in the morpheme analysis step in a word dictionary that holds the notation and the part of speech in association with each other, Execute the word connection verification step that tests whether or not grammatical connection is possible based on the part-of-speech information of the word with the word and empty character string searched in the dictionary search step replaced with the unparsable character in the sentence. Omissions in sentences,
Characterized by extracting a candidate for a correction word that includes the unparsable character with respect to the unparsable character due to an unnecessary character that is mistakenly mixed and is grammatically connectable with the words before and after the unparsable character Japanese sentence error correction method.

2. A morpheme analysis unit that decomposes a sentence into words, a word dictionary unit that holds the notation of a word used in the morpheme analysis unit and a part of speech, and the morpheme analysis unit cannot analyze the words. The unanalyzable character detection unit that detects a corresponding position in the sentence between the character and the unanalyzed character, the dictionary search unit that searches the word dictionary unit for a word that includes the specified character in the notation, and the dictionary search unit. Equipped with a word connection verification unit that tests whether or not grammatical connection is possible based on the part-of-speech information of a word with the searched word and empty character string replaced with the unparsable character in the text A candidate for a correction word that includes the unparsable character with respect to the unparsable character due to an unnecessary character erroneously mixed and is grammatically connectable with the words before and after the unparsable character. Japanese Text error correction device.

3. The morphological analysis step uses a word dictionary that holds notations, parts of speech, and semantic attributes in association with each other, and further, the word connection verification step uses words and empty character strings searched in the dictionary search step. In addition to testing whether or not grammatical connection is possible based on the part-of-speech information of a word with the unparsed characters in the sentence replaced, it is also possible to test the consistency of the word based on the semantic attribute. Claim 1 characterized by
How to correct the written Japanese text error.

4. The word dictionary unit is a word dictionary unit that holds a notation, a part of speech, and a semantic attribute in association with each other, and the word connection verification unit searches words and empty character strings and sentences in the dictionary searched by the dictionary search unit. In addition to testing the grammatical connection based on the part-of-speech information of the word in the state of replacing the unparsable character, the word connection verification unit for testing the consistency of the word based on the semantic attribute 3. The Japanese sentence error correction device according to claim 2, wherein