JPH05135096A

JPH05135096A - Morpheme analyzing system

Info

Publication number: JPH05135096A
Application number: JP3297517A
Authority: JP
Inventors: Yoshimichi Okuno; 義道奥野; Jiyousuke Hiraoka; 丈介平岡
Original assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Current assignee: Meidensha Corp; Meidensha Electric Manufacturing Co Ltd
Priority date: 1991-11-14
Filing date: 1991-11-14
Publication date: 1993-06-01
Anticipated expiration: 2015-11-13
Also published as: JP3109187B2

Abstract

PURPOSE:To also extract the wrong characters or missed characters of a sen tence while enabling high-speed retrieval and reducing memory capacity. CONSTITUTION:A character index is generated to sort all the characters contained in the respective words of a word dictionary one by one and to define preserving positions in the respective words of the word dictionary as an index list and for the morpheme analysis of an input text, the index list of each character is extracted from the character index concerning the character string of the input text. When the index list is continued, the character string is judged as one word and concerning the character string for which the word is not judged, it is judged that wrong characters, missed characters or any frequent character exists at the discontinuous part.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、自然言語処理のための
形態素解析方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a morphological analysis method for natural language processing.

【０００２】[0002]

【従来の技術】自然言語処理は、ワードプロセッサや機
械翻訳などに応用されてきており、処理過程としてはま
ず入力テキストを単語毎に区切って品詞情報や意味情報
を与える形態素解析、つづいて統語処理（構文解析）が
行われ、これら処理で残る曖昧性や漠然性を取除くため
の意味処理や文脈処理などが行われる。2. Description of the Related Art Natural language processing has been applied to word processors and machine translations. As a processing process, first, morphological analysis is performed to divide input text into words to give part-of-speech information and semantic information, followed by syntactic processing ( Parsing) is performed, and semantic processing and context processing are performed to remove the ambiguity and vagueness that remain in these processes.

【０００３】従来の形態素解析方式には有限オートマト
ンによる方式、連想記憶による方式さらにはニューロ技
術を利用した方式がある。Conventional morphological analysis methods include a method using a finite automaton, a method using associative memory, and a method using a neuro technique.

【０００４】有限オートマトンによる方式は、文字マッ
チングの状態遷移図よりオートマトンを生成し、ソフト
ウエア又はワイヤードロジックによって文字列の単語理
解を行う。In the finite automaton method, an automaton is generated from a state transition diagram of character matching, and a word of a character string is understood by software or wired logic.

【０００５】連想記憶による方式は、メモリ本体に検索
機能を与えて文字列のセル毎の比較によって全文検索を
行う。In the associative memory system, a full text search is performed by providing a search function to the memory body and comparing character strings cell by cell.

【０００６】ニューロ技術を利用した方式は、ニューロ
コンピュータにより学習を行いながら全文検索を行う。In the method using the neuro technology, full text search is performed while learning is performed by a neuro computer.

【０００７】[0007]

【発明が解決しようとする課題】従来の形態素解析方式
において、有限オートマトン方式は、文字列の字抜けや
ワールドカードの利用による曖昧検索も可能であるが、
文字列の一文字目が曖昧になったときの対応がとりにく
い問題があった。また、ハードウエア構成の場合にはア
プリケーションソフトの様々な検索要求に対応しきれな
いし、ソフトウエア構成の場合には検索スピードの点で
他の方式に劣ることがある。In the conventional morphological analysis method, the finite automaton method is capable of missing characters in a character string and fuzzy search by using a world card.
There was a problem that it was difficult to handle when the first character of the character string became ambiguous. Further, in the case of a hardware configuration, various search requests of application software cannot be handled, and in the case of a software configuration, the search speed may be inferior to other methods.

【０００８】次に、連想記憶方式は、メモリ本体に大容
量のものを必要として高いコストになる。なお、メモリ
本体をディスクのような大容量記憶装置を利用すること
は技術的に非常に難しく、現在ではメモリ本体が１ＭＢ
にも満たない。また、ソフトウエア、特にアプリケーシ
ョンとの結合が難しく、複雑な文章の検索にはアプリケ
ーション側のプログラムも複雑になってしまう。Next, the associative memory system requires a large-capacity memory body, resulting in high cost. It is technically very difficult to use a mass storage device such as a disk for the memory body, and the memory body is currently 1 MB.
Less than Further, it is difficult to combine with software, especially with an application, so that a program on the application side also becomes complicated for searching a complicated sentence.

【０００９】次に、ニューロコンピュータ方式は、学習
までは検索に時間がかかるが、学習後には文書の量にあ
まり左右されずに高速検索ができ、また曖昧検索も可能
である。しかし、問題はベタのテキストイメージに対す
る検索であれば高速検索になるが、構造を持った電子化
辞書のようなデータベースに対して対応が難しくなる。Next, in the neurocomputer method, although it takes time to search until learning, after learning, high-speed searching can be performed without being much influenced by the amount of documents, and fuzzy searching is also possible. However, the problem is that if the search is for a solid text image, it will be a fast search, but it will be difficult to deal with a database such as a structured electronic dictionary.

【００１０】本発明の目的は、高速検索及び小容量メモ
リ構成にしながら文章の誤字や脱字の抽出も可能にした
形態素解析方式を提供することにある。An object of the present invention is to provide a morphological analysis method which enables extraction of typographical errors or omissions of sentences while achieving high-speed retrieval and a small-capacity memory configuration.

【００１１】[0011]

【課題を解決するための手段】本発明は、前記課題の解
決を図るため、日本語文字の単語を表記文字と属性リス
トの構造体で保存する単語辞書と、前記単語辞書中の表
記文字に含まれる文字を１文字毎に全てソートした文字
及び該表記文字中の保存位置になるインデックスリスト
とを有する構造体の文字インデックスと、形態素解析対
象となる入力テキストの形態素を解析する処理装置とを
備え、前記処理装置は入力テキストの各文字に一致する
前記文字インデックスの文字とインデックスリストを抽
出し、前記保存位置が連続する文字列は１つの単語とし
て抽出し、この抽出に失敗した文字列中に１文字又は所
定数文字分だけ保存位置が連続しないときに該１文字又
は所定数文字分に誤字，脱字，多字の何れかが存在する
ことを判定する。In order to solve the above-mentioned problems, the present invention provides a word dictionary for storing words of Japanese characters in a notation character and a structure of an attribute list, and a notation character in the word dictionary. A character index of a structure having a character in which all the included characters are sorted for each character and an index list which is a storage position in the notation character, and a processing device which analyzes a morpheme of an input text to be a morpheme analysis target. The processing device extracts the character of the character index that matches each character of the input text and the index list, extracts the character string having the consecutive storage positions as one word, and extracts the character string that fails to be extracted. When the storage positions are not consecutive by one character or a predetermined number of characters, it is determined that any one of the one character or the predetermined number of characters includes an erroneous character, a missing character, or a multi-character.

【００１２】[0012]

【作用】本発明によれば、単語辞書に保存される１文字
毎に文字インデックスを生成しておき、入力テキストの
文字列を構成する各文字について文字インデックスのイ
ンデックスリストから保存位置の連続性の有無を判定す
ることで当該文字列が１つの単語か否かを解析する。ま
た、連続しない１文字又は所定数文字を含むときに当該
文字又は文字列が誤字等になることの判定を得る。According to the present invention, a character index is generated for each character stored in the word dictionary, and the continuity of the storage position is determined from the index list of the character index for each character forming the character string of the input text. By determining the presence or absence, it is analyzed whether or not the character string is one word. Also, when one character or a predetermined number of characters that are not continuous are included, it is determined that the character or the character string becomes an erroneous character.

【００１３】[0013]

【実施例】図１は本発明の一実施例を示す構成図であ
る。ファイル構成の単語辞書１は、平仮名，片仮名，英
数字，単漢字，熟語を含む単語の表記とその属性リスト
の構造体を有して格納されている。例えば、構造体ポイ
ンタＫには構造体１ｋで示すように表記に「秋雨」とい
う熟語が属性リストの品詞に「名詞」が、読みに「アキ
サメ」等が格納されている。1 is a block diagram showing an embodiment of the present invention. The file-structured word dictionary 1 is stored with a notation of words including hiragana, katakana, alphanumeric characters, single kanji, and idioms and a structure of its attribute list. For example, in the structure pointer K, as shown by the structure 1k, the idiom "Akiyu" is stored in the notation, "noun" is stored in the part of speech of the attribute list, and "akime" is stored in the reading.

【００１４】ファイル構成の文字インデックス２は、単
語辞書１をデータベースとして処理装置３によって生成
され、単語辞書１の全ての表記文字について１文字毎の
インデックスリストが格納されている。例えば、ポイン
タＨには構成２ｎで示すようにメンバー文字に「秋」
が、そのインデックスリストには文字「秋」を持つ単語
中の当該文字「秋」のアドレスが格納されている。この
インデックスリストは、文字「秋」を表記に含む単語辞
書１の単語を構成する文字「秋」の保存位置になり、例
えば単語辞書１に「秋雨」、「中秋」、「秋月」という
単語が保存されていると、夫々が文字「秋」を含むこと
から夫々の単語中の文字「秋」のアドレス「Ａ₁」、
「Ａ₈」、「Ａ₂₁」が保存位置データとして生成，保存
される。The character index 2 of the file structure is generated by the processing device 3 using the word dictionary 1 as a database, and an index list for each character of all the written characters of the word dictionary 1 is stored. For example, the pointer H has a member character of "autumn" as shown in the configuration 2n.
However, the index list stores the address of the character "autumn" in a word having the character "autumn". This index list is the storage position of the letters "autumn" that compose the words of the word dictionary 1 that include the letters "autumn". For example, the words "autumn rain,""midautumn," and "autumn moon" are stored in the word dictionary 1. When stored, the address "A ₁ " of the character "autumn" in each word, because each contains the character "autumn",
“A ₈ ” and “A ₂₁ ” are generated and saved as save position data.

【００１５】この文字インデックス２の生成手順は、単
語辞書１中の全ての表記文字について辞書ファイルの各
単語の先頭からの位置（アドレス）を抽出し、その情報
を文字毎に集めてソートし、次いで文字とアドレスを示
したインデックスの対を集めて文字インデックスファイ
ルに保存する。この保存振分けは、例えば保存位置の衝
突を避けるハッシュ関数が使用され、高速の検索環境に
も構築される。The procedure for generating the character index 2 is to extract the position (address) from the beginning of each word in the dictionary file for all the notation characters in the word dictionary 1, collect the information for each character, and sort it. Then, the index pair indicating the character and the address is collected and stored in the character index file. For this storage allocation, for example, a hash function that avoids collision of storage positions is used, and it is also constructed in a high-speed search environment.

【００１６】処理装置３は単語辞書１からの文字インデ
ックス２の生成処理を行った後は、インターフェース４
を通して与えられる入力テキスト（仮名，漢字混じりの
文章）について形態素解析処理を行う。After the processing device 3 has generated the character index 2 from the word dictionary 1, the interface 4
Morphological analysis processing is performed on the input text (texts mixed with kana and kanji) given through.

【００１７】この処理は図２に示す手順で実行される。
まず、入力テキストに対し最長一致法などによる形態素
解析がなされる（ステップＳ１）。この処理には辞書と
のマッチングに文字インデックス２を使用し、入力テキ
ストの第１番目の文字から連続した文字列を１文字づつ
文字インデックス２の文字照合からそのインデックスリ
ストを順番に取出し、該インデックスリスト列の距離が
全て１になっているものかつ最長のものがあれば単語辞
書１中に当該文字列が存在すると判定し、当該文字列を
形態素解析リストとして決定する。This process is executed according to the procedure shown in FIG.
First, the input text is subjected to morphological analysis by the longest match method or the like (step S1). For this processing, character index 2 is used for matching with the dictionary, and a character string consecutive from the first character of the input text is extracted character by character from character index 2 and the index list is taken out in order. If the distances of the list strings are all 1 and there is the longest one, it is determined that the character string exists in the word dictionary 1, and the character string is determined as the morphological analysis list.

【００１８】この処理を図３に示す例で説明する。同図
中、（ａ）には単語辞書１に保存される単語「東南アジ
ア」の表記部分がアドレス「ａ」から「ａ＋４」までに
保存される場合を示す。この単語に対し、文字インデッ
クス２には同図（ｂ）に示すように文字「ア」について
はインデックスリストにアドレス「ａ＋２」と「ａ＋
４」が他のアドレスと共に書込まれており、文字「ジ」
についてはインデックスリストにアドレス「ａ＋３」が
他のアドレスと共に書込まれ、文字「東」にはアドレス
「ａ」が、文字「南」にはアドレス「ａ＋１」が書込ま
れている。This process will be described with reference to the example shown in FIG. In the figure, (a) shows a case where the written portion of the word “Southeast Asia” stored in the word dictionary 1 is stored at addresses “a” to “a + 4”. For this word, in the character index 2, as shown in FIG. 2B, the address "a + 2" and "a +" are added to the index list for the character "a".
"4" is written with other addresses, and the character "Ji"
For, the address “a + 3” is written in the index list together with other addresses, the address “a” is written in the character “east”, and the address “a + 1” is written in the character “south”.

【００１９】ここで、形態素解析に際しては、入力テキ
スト中に文字列「東南アジア」が含まれていると、文字
インデックス２から文字列「東南アジア」のインデック
スリストを読出し、その中に含まれるアドレス「ａ＋
２」、「ａ＋４」、「ａ＋３」、「ａ」、「ａ＋１」か
ら隣接文字間の距離が全て１になることが認識される。
例えば文字「東」と「南」の距離は「ａ＋１」−「ａ」
＝１になる。従って、文字列「東南アジア」は単語辞書
１中に存在すると判定でき、１つの単語として形態素リ
ストに上げられる。Here, in the morphological analysis, when the input text includes the character string "Southeast Asia", the index list of the character string "Southeast Asia" is read from the character index 2 and the address "a +" included therein is read.
It is recognized that the distances between adjacent characters are all 1 from "2", "a + 4", "a + 3", "a", "a + 1".
For example, the distance between the letters "East" and "South" is "a + 1"-"a".
= 1. Therefore, it can be determined that the character string "Southeast Asia" exists in the word dictionary 1, and it is listed in the morpheme list as one word.

【００２０】図２に戻って、ステップＳ１の処理によっ
て入力テキストは単語毎の形態素リストとして抽出され
るが、この形態素解析に失敗する文字が残ることがあ
る。この解析に失敗した文字列，文字は単漢字文字列リ
ストとして抽出される（ステップＳ２）。Returning to FIG. 2, the input text is extracted as a morpheme list for each word by the process of step S1. However, there are some characters that fail in this morpheme analysis. Character strings and characters that have failed in this analysis are extracted as a single Kanji character string list (step S2).

【００２１】抽出された文字列，文字について誤字，脱
字及び多字があるか否かを検出・修正する（ステップＳ
３）。このうち、誤字の検出は、検出対象文字列の１文
字づつに文字インデックス２を参照してそのインデック
スリストを読出し、前の文字と１つ後の文字について夫
々のアドレス間距離を求め、この距離が１でないものが
あったときには当該文字をとばして次の文字に対するア
ドレス間距離を求め、前の文字との距離が２になるとき
にとばした文字を誤字と判定する。It is detected / corrected whether or not there are erroneous characters, omissions, and multiple characters in the extracted character strings and characters (step S).
3). Among them, the typographical error is detected by referring to the character index 2 for each character of the detection target character string, reading the index list, and obtaining the distance between the addresses of the preceding character and the succeeding character. When there is a character other than 1, the character is skipped to obtain the address distance to the next character, and when the distance from the previous character becomes 2, the skipped character is determined to be a typographical error.

【００２２】例えば、形態素解析に失敗した文字列が
「東軟アジア」であった場合、文字「東」と「軟」とは
そのインデックスリストにあるアドレス間距離が１にな
らない。このとき、文字「軟」をとばして次の文字
「ア」と前の文字「東」との距離をチェックすると２に
なるため、文字「軟」を誤字と判定する。For example, if the character string for which the morphological analysis has failed is "Tohkoh Asia", the distance between addresses in the index list of the characters "Higashi" and "Soft" is not 1. At this time, when the character "soft" is skipped and the distance between the next character "a" and the previous character "east" is checked, it becomes 2. Therefore, the character "soft" is determined to be a typographical error.

【００２３】次に、脱字の検出は、検出対象文字列の１
文字づつに文字インデックス２を参照してそのインデッ
クスリストを読出し、前の文字との間のアドレス間距離
を求め、この距離が２になるものがあったとき両文字間
に脱字があったと判定する。Next, the detection of missing characters is performed by using 1 in the character string to be detected.
The character index 2 is referred to for each character, the index list is read, the distance between addresses with the previous character is calculated, and when there is a distance that becomes 2, it is determined that there is a caret between both characters. ..

【００２４】例えば、形態素解析に失敗した文字列が
「東南アア」であった場合、第３番目の文字「ア」と第
４番目の文字「ア」との間のアドレス間距離が２にな
り、両文字「ア」と「ア」間に脱字があったと判定す
る。For example, if the character string for which the morpheme analysis has failed is "southeast aa", the inter-address distance between the third character "a" and the fourth character "a" is 2. , It is determined that there is a missing character between the two characters "A" and "A".

【００２５】次に、多字の検出は、検出対象文字列の１
文字づつのインデックスリストを読出し、前の文字との
間のアドレス間距離が１でないものがあったとき、当該
文字をとばして次の文字との間の距離を求め、この距離
が１になるときはとばした文字を多字と判定する。Next, the multi-character detection is performed by using 1 of the character string to be detected.
When the index list for each character is read and the distance between the previous character and the address is not 1, skip that character and find the distance to the next character, and if this distance becomes 1. The skipped characters are judged to be multi-characters.

【００２６】例えば、形態素解析に失敗した文字列が
「東南軟アジア」であった場合、文字「南」と文字
「軟」との距離が１でないため、文字「軟」をとばして
文字「ア」と文字「南」との距離を求め、この距離が１
になるため文字「軟」を多字と判定する。For example, if the character string for which the morphological analysis has failed is "Southeast Soft Asia", the distance between the characters "South" and "Soft" is not 1, so the character "Soft" is skipped and the character "A" is skipped. ", And the character" south "is calculated, and this distance is 1
Therefore, the character "soft" is determined to be a multi-character.

【００２７】再び図２に戻って、ステップＳ３による誤
字、脱字、多字の検出・修正が施された文字列はステッ
プＳ１での解析で求められた形態素リストに戻され、正
しく形態素解析された文字列の候補リストとして取出さ
れる。この候補リストは他の文字列との接続チェック処
理がなされて形態素解析を終了する（ステップＳ４）。
この接続チェック処理は、例えば前の単語に対する品詞
からチェックする。Returning to FIG. 2 again, the character string in which erroneous characters, omissions and multi-characters have been detected / corrected in step S3 is returned to the morpheme list obtained by the analysis in step S1 and correctly morpheme analyzed. It is taken out as a candidate list of character strings. This candidate list is subjected to a connection check process with another character string, and the morphological analysis ends (step S4).
In this connection check process, for example, the part of speech for the previous word is checked.

【００２８】以上のとおり、本実施例は単語辞書１から
文字インデックスを生成しておき、解析対象文字列のア
ドレス間距離の連続性から形態素解析を行うと共に誤
字、脱字、多字の検出を行う。As described above, in this embodiment, the character index is generated from the word dictionary 1, and the morpheme analysis is performed from the continuity of the distance between the addresses of the analysis target character string, and the erroneous character, the missing character, and the multiple character are detected. ..

【００２９】このため、単語辞書との文字列照合に較べ
て当該文字を含む単語を文字インデックスから直接に検
索し得て高速解析を得ることができ、さらにテキストの
１文字目が曖昧になるときの解析も含めて誤字、脱字、
多字のチェックを容易にする。Therefore, as compared with the character string collation with the word dictionary, the word containing the character can be directly searched from the character index for high-speed analysis, and when the first character of the text becomes ambiguous. Typographical errors, omissions, including analysis of
Make it easy to check multi-characters.

【００３０】また、メモリ容量としては文字インデック
スを確保できるものであれば良く、コンピュータの内部
メモリ等の比較的小容量のもので済むし、アプリケーシ
ョン側のプログラムを複雑にすることは無い。The memory capacity may be any as long as the character index can be secured, and a relatively small capacity such as the internal memory of the computer is sufficient, and the program on the application side is not complicated.

【００３１】さらに、電子化辞書等のデータベースの解
析にも容易に対応できる。Further, it is possible to easily deal with analysis of a database such as an electronic dictionary.

【００３２】なお、実施例では１文字の誤字，脱字，多
字の検出を行う場合を示すが、ｎ文字（２文字や３文
字）の誤字、脱字、多字検出にも応用することができ
る。Although the embodiment shows the case of detecting one erroneous character, omission, and multi-character, it can be applied to detection of n characters (two or three characters) erroneous character, omission, and multi-character. ..

【００３３】例えば、ｎ文字の誤字検出にはアドレス間
距離が１でない文字があったときにｎ文字とばしてアド
レス間距離を求め、これがｎ＋１の距離になったときに
とばしたｎ文字を誤字と判定する。For example, in detecting erroneous characters of n characters, when there is a character whose inter-address distance is not 1, the inter-address distance is obtained by skipping n characters, and when this becomes n + 1, the skipped n characters are erroneous characters. judge.

【００３４】同様に、ｎ文字の多字検出には距離が１で
ない文字があったときにｎ文字とばして距離を求め、こ
れが距離１になったときにその間のｎ文字を多字と判定
する。また、ｎ文字の多字検出は、距離が１でない文字
がありかつ距離がｎ＋１になっているときに該文字間に
脱字があると判定する。Similarly, when detecting a multi-character of n characters, when there is a character whose distance is not 1, the n characters are skipped to obtain the distance, and when the distance becomes 1, the n characters between them are determined to be multi-characters. .. In addition, in the multi-character detection of n characters, it is determined that there is a character whose distance is not 1 and there is a missing character between the characters when the distance is n + 1.

【００３５】なお、上述のｎ文字の誤字，脱字の検出に
ついては文字総数のチェックを加えることによって単語
末尾の誤字，脱字の検出ができる。このためには、図４
に示すように、文字インデックスリストにアドレスデー
タのほかに当該文字を含む単語の文字数と当該単語内で
の文字位置をメンバーとして加えておき、ｎ文字目が辞
書の単語文字数と文字位置で一致したか否かを判定に加
える。Regarding the above-mentioned n-character erroneous characters and omissions, it is possible to detect erroneous characters and omissions at the end of a word by checking the total number of characters. To this end,
As shown in, the number of characters of the word containing the character and the character position in the word are added as members to the character index list in addition to the address data, and the nth character matches the number of word characters in the dictionary and the character position. Whether or not it is added to the judgment.

【００３６】[0036]

【発明の効果】以上のとおり、本発明によれば、入力テ
キストの文字列について文字インデックスから抽出した
単語辞書の保存位置の連続性から形態素解析及び誤字，
脱字，多字の検出を行うようにしたため、単語辞書と文
字列の照合になる解析に較べて高速検索になり、また文
字インデックスには小容量のメモリ確保で済み、さらに
文章の誤字，脱字，多字の検証を行うことができる。As described above, according to the present invention, morphological analysis and typographical errors are detected from the continuity of the storage position of the word dictionary extracted from the character index of the character string of the input text.
Since the detection of punctuation and multi-characters is performed, the search speed is higher than that of the analysis which is a collation of a word dictionary and a character string. Also, a small memory can be secured for the character index. Multi-character verification can be performed.

[Brief description of drawings]

【図１】本発明の一実施例を示す構成図。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図２】実施例の解析処理手順図。FIG. 2 is an analysis processing procedure diagram of the embodiment.

【図３】実施例における形態素解析の態様図。FIG. 3 is a mode diagram of morphological analysis according to an embodiment.

【図４】他の実施例における態様図。FIG. 4 is a diagram illustrating another embodiment.

[Explanation of symbols]

１…単語辞書、２…文字インデックス、３…処理装置。 1 ... Word dictionary, 2 ... Character index, 3 ... Processing device.

Claims

[Claims]

1. A word dictionary that stores words of Japanese characters as a notation character and a structure of an attribute list, a character in which all the characters included in the notation character in the word dictionary are sorted, and the notation character A character index of a structure having an index list that is a storage position in the inside, and a processing device that analyzes a morpheme of an input text that is a morpheme analysis target, the processing device including the character that matches each character of the input text When the character string of the index and the index list are extracted, the character string in which the storage positions are continuous is extracted as one word, and when the storage position is not continuous by one character or a predetermined number of characters in the character string that has failed to be extracted, A morphological analysis method for determining whether any one of the one character or a predetermined number of characters is erroneous, missing, or polymorphic.