JP3091540B2

JP3091540B2 - Morphological analysis method for Japanese sentences

Info

Publication number: JP3091540B2
Application number: JP03279735A
Authority: JP
Inventors: 耕三大井
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1991-10-25
Filing date: 1991-10-25
Publication date: 2000-09-25
Anticipated expiration: 2015-09-25
Also published as: JPH05120261A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は日本語文の形態素解析方
式に関し、特に、最長一致法を用いた日本語文の形態素
解析方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a morphological analysis method for a Japanese sentence, and more particularly to a morphological analysis method for a Japanese sentence using a longest match method.

【０００２】[0002]

【従来の技術】従来、最長一致法（辞書引きされた候補
単語の中から表記の長さが一番長い単語を選択する方
法）を用いた日本語文の形態素解析方式では、解析対象
文字列に対して単語辞書を検索して候補単語を抽出し、
接続可能な候補単語の中で表記の長さが一番長いものを
選択し、例えば、「歩くようにもみえる」という文を最
長一致法によって形態素解析を行なうと、［歩く／よう
に／もみ／える：ここで「もみ」は、動詞「もむ」の連
用形である］というように「もみえる」が正しく分割さ
れず、「もみ」と「える」に分割されてしまうという欠
点が存在する。2. Description of the Related Art Conventionally, in a morphological analysis method of a Japanese sentence using a longest match method (a method of selecting a word having a longest notation length from candidate words extracted from a dictionary), a character string to be analyzed is Search the word dictionary to extract candidate words,
If the longest notation length is selected from the connectable candidate words and, for example, the sentence “Looks like walking” is subjected to morphological analysis by the longest matching method, [Walk / Like / Fir / Eru: Here, "Momi" is a conjugation form of the verb "Mom".] There is a drawback that "Momiru" is not correctly divided, and is divided into "Momi" and "Eru". .

【０００３】また、表記の長さが一番長い単語が複数存
在する場合には、候補単語リスト中の先行要素が優先す
るようになっていたため、選択される単語が辞書検索時
の単語抽出の順番に依存し、必ずしも正しい単語が選択
されないという欠点もある。[0003] Further, when there are a plurality of words having the longest notation length, the preceding element in the candidate word list is prioritized, so that the word to be selected is used for word extraction during dictionary search. There is also a disadvantage that the correct word is not always selected depending on the order.

【０００４】例えば、「接続可能だ」という文に対して
辞書引きを行なった結果、候補単語リストとして、(1)
「接続」(名詞)、(2)「接続」(動詞「接続する」の語
幹）が作成された場合に、候補単語リスト中の先行要素
である、(1)「接続」(名詞)が選択される。[0004] For example, as a result of performing a dictionary lookup for the sentence "can be connected", (1)
(1) "Connect" (noun) which is the preceding element in the candidate word list is selected when "connect" (noun) and (2) "connect" (the stem of the verb "connect") are created Is done.

【０００５】次に残りの文字列「可能だ」に対して辞書
引きを行ない、接続可能な候補単語リストとして、(1)
「可能」(名詞)が作成され、候補が１つだけなので、こ
の単語(1)「可能」(名詞)が選択される。ここでは形容
動詞「可能だ」や助動詞「可能だ」が辞書引きされる
が、直前の単語「接続」(名詞)とは接続不可と判定され
候補単語リストには追加されない。Next, the remaining character string "possible" is looked up in a dictionary, and a list of connectable candidate words is given as (1)
Since "possible" (noun) is created and there is only one candidate, this word (1) "possible" (noun) is selected. Here, the adjective verb "possible" and the auxiliary verb "possible" are looked up in a dictionary, but the immediately preceding word "connection" (noun) is determined to be unconnectable and is not added to the candidate word list.

【０００６】次に残りの文字列「だ」に対して辞書引き
を行ない、接続可能な候補単語リストとして、(1)
「だ」(助動詞)が作成され、候補が１つだけなので、こ
の単語(1)「だ」(助動詞)が選択される。Next, the remaining character string "da" is looked up in a dictionary, and as a list of connectable candidate words, (1)
Since "da" (auxiliary verb) is created and there is only one candidate, this word (1) "da" (auxiliary verb) is selected.

【０００７】結果として、次のように間違って分割され
てしまう。As a result, the division is erroneously performed as follows.

【０００８】接続（名詞）／可能（名詞）／だ（助動
詞）。[0008] Connection (noun) / possible (noun) / da (auxiliary verb).

【０００９】[0009]

【発明が解決しようとする課題】本発明は上記の欠点を
除くためになされたもので、単語選択処理において、表
記の長さが一番長い単語を選択することによる分割位置
の誤りを減少させることができ、また、表記の長さが一
番長い候補単語が複数存在する場合に、従来よりもより
正しい単語を選択することができる形態素解析方式の提
供を目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in order to eliminate the above-mentioned drawbacks. In the word selection processing, errors in the division position caused by selecting the word having the longest notation length are reduced. It is another object of the present invention to provide a morphological analysis method that can select a more correct word than before when there are a plurality of candidate words having the longest notation length.

【００１０】[0010]

【課題を解決するための手段】本発明の日本語文の形態
素解析方式は、辞書引きされた候補単語の中から表記の
長さが一番長い単語を選択する最長一致法を用いた日本
語文の形態素解析において、候補単語の表記文字列と該
候補単語に後接する文字列との誤った組み合わせを記述
した候補単語削除条件テーブルを備え、表記の長さが一
番長い単語が前記候補単語削除条件テーブル中の条件に
適合した場合に該単語を候補単語から削除する候補単語
削除処理手段を有するものである。According to the morphological analysis method for a Japanese sentence of the present invention, a Japanese sentence using a longest match method for selecting a word having the longest notation length from candidate words extracted from a dictionary is used. In the morphological analysis, there is provided a candidate word deletion condition table that describes an erroneous combination of a notation character string of the candidate word and a character string that follows the candidate word, and the word having the longest notation length is the candidate word deletion condition. It has candidate word deletion processing means for deleting the word from the candidate words when the conditions in the table are met.

【００１１】また、本発明の日本語文の形態素解析方式
は、最長一致法を用いた日本語文の形態素解析方式にお
いて、単語の活用タイプと該活用タイプに対応する優先
度とが対となった活用タイプ優先度テーブルと、辞書引
きされた単語の中に表記の長さが一番長い単語が複数存
在する場合に、前記活用タイプ優先度テーブルを参照し
て、表記の長さが一番長い単語の中から優先度が最も高
い活用タイプを持つ単語を選択する活用単語選択処理手
段を有するものである。Further, the morphological analysis method for a Japanese sentence according to the present invention is a morphological analysis method for a Japanese sentence using the longest matching method, wherein a usage type of a word and a priority corresponding to the usage type are paired. When there are a plurality of words having the longest notation lengths in the type priority table and the words retrieved from the dictionary, the word having the longest notation length is referred to by referring to the utilization type priority table. And a utilization word selection processing means for selecting a word having the utilization type having the highest priority from the above.

【００１２】[0012]

【００１３】[0013]

【作用】本発明によれば、最長一致法を用いた日本語文
の形態素解析において、候補単語の表記文字列と該候補
単語に後接する文字列との誤った組み合わせを記述した
候補単語削除条件テーブルと、表記の長さが一番長い単
語が前記候補単語削除条件テーブル中の条件に適合した
場合に、該単語を候補単語から削除する。According to the present invention, in a morphological analysis of a Japanese sentence using the longest match method, a candidate word deletion condition table describing an erroneous combination of a written character string of a candidate word and a character string succeeding the candidate word. If the word having the longest notation matches the condition in the candidate word deletion condition table, the word is deleted from the candidate words.

【００１４】また、本発明によれば、最長一致法を用い
た日本語文の形態素解析において、単語の活用タイプと
該活用タイプに対応する優先度とが対となった活用タイ
プ優先度テーブルと、辞書引きされた単語の中に表記の
長さが一番長い単語が複数存在する場合に、前記活用タ
イプ優先度テーブルを参照して、表記の長さが一番長い
単語の中から優先度が最も高い活用タイプを持つ単語を
選択する。According to the present invention, in a morphological analysis of a Japanese sentence using the longest matching method, a utilization type priority table in which a utilization type of a word and a priority corresponding to the utilization type are paired; If there are a plurality of words having the longest notation length in the words that have been extracted from the dictionary, the priority is selected from the words having the longest notation length by referring to the utilization type priority table. Select the word with the highest usage type.

【００１５】[0015]

【００１６】[0016]

【実施例】以下では、３つの日本語例文による実施例に
ついて具体的に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment using three example sentences in Japanese will be specifically described below.

【００１７】図１は本発明を実施した形態素解析処理の
ブロック図である。（１）は辞書引き部、（２）は単語
選択部、（３）は単語リスト作成部である。FIG. 1 is a block diagram of a morphological analysis process embodying the present invention. (1) is a dictionary lookup unit, (2) is a word selection unit, and (3) is a word list creation unit.

【００１８】（４）は見出し語や接続情報など形態素解
析に必要な情報を記憶した単語辞書である。(4) is a word dictionary storing information necessary for morphological analysis such as headwords and connection information.

【００１９】（５）は隣接する単語が接続可能か否かが
記述されている接続表である。辞書引き部（１）は、入
力された漢字かな混じりの日本語文から、単語選択部
（２）で選択した単語の区切りまでを除いた残りの文字
列に対して、単語辞書（４）を検索すると共に、接続表
（５）を参照して接続可能な「候補単語リスト」を作成
する。(5) is a connection table that describes whether adjacent words can be connected or not. The dictionary lookup unit (1) searches the word dictionary (4) for the remaining character strings from the input Japanese sentence mixed with kanji and kana to the word separator selected by the word selection unit (2). At the same time, a "candidate word list" that can be connected is created with reference to the connection table (5).

【００２０】（６）は候補単語の表記文字列と該候補単
語に後接する文字列との誤った組み合わせを記述した候
補単語削除条件テーブルである。(6) is a candidate word deletion condition table that describes an erroneous combination of a written character string of a candidate word and a character string that follows the candidate word.

【００２１】（７）は単語の活用タイプとその活用タイ
プに対応する優先度とが対となった活用タイプ優先度テ
ーブルである。(7) is a utilization type priority table in which the utilization type of the word and the priority corresponding to the utilization type are paired.

【００２２】（８）は単語の品詞とその品詞に対応する
優先度とが対となった品詞優先度テーブルである。(8) is a part-of-speech priority table in which the part of speech of a word and the priority corresponding to the part of speech are paired.

【００２３】単語選択部（２）は、本発明に係わる処理
で、候補単語削除条件テーブル（６）、活用タイプ優先
度テーブル（７）、品詞優先度テーブル（８）を参照し
ながら、辞書引き部（１）で作成された「候補単語リス
ト」の中から単語を選択する。In the process according to the present invention, the word selection unit (2) refers to the candidate word deletion condition table (6), the utilization type priority table (7), and the part of speech priority table (8) to search the dictionary. A word is selected from the "candidate word list" created in part (1).

【００２４】単語リスト作成部（３）は、単語選択部
（２）で選択された単語を「単語リスト」に追加する。The word list creating section (3) adds the word selected by the word selecting section (2) to the "word list".

【００２５】図２は、本発明に係わる単語選択部（２）
の動作を示すフローチャートである。以下に単語選択の
各ステップについて説明する。FIG. 2 shows a word selector (2) according to the present invention.
6 is a flowchart showing the operation of the first embodiment. Hereinafter, each step of word selection will be described.

【００２６】［ステップＡ］候補単語リストが空であれ
ば（ステップＢ）に、空でなければ（ステップＣ）に移
る。[Step A] If the candidate word list is empty (step B), go to step C if it is not empty.

【００２７】［ステップＢ］バックトラック処理により
直前の単語の単語選択をやり直すか、あるいは、未知語
処理により未知語の範囲を決定してその未知語を選択単
語とした後、単語選択を終了する。[Step B] The word selection of the immediately preceding word is re-executed by the backtracking process, or the range of the unknown word is determined by the unknown word processing and the unknown word is set as the selected word, and the word selection is terminated. .

【００２８】［ステップＣ］候補単語リストの中で表記
の長さが一番長い単語が候補単語削除条件テーブル
（６）中の条件に適合すれば（ステップＤ）に、適合し
なければ（ステップＥ）に移る。[Step C] If the word having the longest description in the candidate word list matches the condition in the candidate word deletion condition table (6) (step D), it does not match (step D). Move to E).

【００２９】［ステップＤ］候補単語リストの中で表記
の長さが一番長い単語を候補単語リストから削除して
（ステップＡ）に戻る。[Step D] The word having the longest description in the candidate word list is deleted from the candidate word list, and the process returns to step A.

【００３０】［ステップＥ］候補単語リストの中で表記
の長さが一番長い単語が複数存在すれば（ステップＧ）
に、存在しなければ（ステップＦ）に移る。[Step E] If there are a plurality of words having the longest notation length in the candidate word list (Step G)
If not, the process proceeds to step F.

【００３１】［ステップＦ］候補単語リストの中で表記
の長さが一番長い単語を選択単語とした後、単語選択を
終了する。[Step F] After selecting the longest word in the candidate word list as the selected word, the word selection ends.

【００３２】［ステップＧ］活用タイプ優先度テーブル
を参照して、表記の長さが一番長い単語の中から優先度
が最も高い活用タイプを持つ単語が複数存在すれば（ス
テップＩ）に、存在しなければ（ステップＨ）に移る。[Step G] Referring to the utilization type priority table, if there are a plurality of words having the utilization type with the highest priority among the words having the longest notation length (step I), If it does not exist, the process moves to step H.

【００３３】［ステップＨ］優先度が最も高い活用タイ
プを持つ単語を選択単語とした後、単語選択を終了す
る。[Step H] After selecting the word having the highest usage type as the selected word, the word selection is terminated.

【００３４】［ステップＩ］品詞優先度テーブルを参照
して、優先度が最も高い活用タイプを持つ単語の中で優
先度が最も高い品詞を持つ単語を選択単語とした後に、
単語選択を終了する。[Step I] Referring to the part-of-speech priority table, a word having the part of speech having the highest priority among words having the highest usage type is selected as a selected word.
Terminate word selection.

【００３５】図３は、本実施例における候補単語削除条
件テーブル（６）の一例を示したものである。ここに
は、候補単語の表記文字列とその候補単語に後接する文
字列とが対になったものが複数記述されている。FIG. 3 shows an example of the candidate word deletion condition table (6) in this embodiment. Here, a plurality of pairs of a character string of a candidate word and a character string that follows the candidate word are described.

【００３６】図４は、本実施例の活用タイプ優先度テー
ブルを示したもので、活用タイプと優先度が対になって
いる。優先度は数値化されており、数値が大きいほど優
先度が高いものとなっている。FIG. 4 shows a utilization type priority table according to the present embodiment, in which utilization types and priorities are paired. The priorities are quantified, and the higher the numerical value, the higher the priority.

【００３７】図５は、本実施例の品詞優先度テーブルを
示したもので、品詞と優先度が対になっている。優先度
は数値化されており、数値が大きいほど優先度が高いも
のとなっている。FIG. 5 shows a part-of-speech priority table according to the present embodiment, in which part-of-speech and priority are paired. The priorities are quantified, and the higher the numerical value, the higher the priority.

【００３８】以下では、次に示す３つの日本語文の形態
素解析処理を単語選択部（２）の動作を中心に順に説明
する。（ａ）「歩くようにもみえる」（ｂ）「接続可能だ」（ｃ）「３本取り扱います」。The following three morphological analysis processes of the Japanese sentence will be described in order, centering on the operation of the word selector (2). (A) "Looks like walking" (b) "Connectable" (c) "We handle 3"

【００３９】［実施例１］まず、日本語文（ａ）の場合
を説明する。[Embodiment 1] First, the case of Japanese sentence (a) will be described.

【００４０】「歩くように」までの単語選択を終わり、
「単語リスト」に「歩く」「ように」が追加された後、
残りの文字列「もみえる」に対して、辞書引き部（１）
の処理を行なった結果、図６に示すように「候補単語リ
スト」として「もみ」（動詞「もむ」の連用形）と
「も」（副助詞）が得られ、単語選択部（２）の処理に
移る。When the word selection up to "like walking" is completed,
After "walking" and "you" are added to the "word list",
Dictionary lookup unit (1) for the remaining character string "Momiru"
As a result, as shown in FIG. 6, "momi" (conjunctive form of the verb "momu") and "mo" (adjunct particle) are obtained as a "candidate word list". Move on to processing.

【００４１】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「もむ」が候補単
語削除条件テーブル（図３）の条件に適合するかをチェ
ックする（ステップＣ）。ステップＣでは、表記の長さ
が一番長い候補単語の表記が、候補単語削除テーブル
（図３）中の表記文字列のいずれかと一致し、かつ、一
致した表記文字列に対応する後接文字列が、表記の長さ
が一番長い候補単語に後接する文字列の前方部分と一致
するかをチェックする。この例の場合、表記の長さが一
番長い候補単語の表記「もみ」が、候補単語削除テーブ
ル（図３）の１番目の表記文字列「もみ」と一致し、そ
れに対応する後接文字列「える」が例文（ａ）の「も
み」に後接する「える」と一致するので（ステップ
Ｃ）、候補単語リストの中から「もみ」を削除する（ス
テップＤ）。Since the "word candidate list" is not empty (step A), it is checked whether or not the word "mum" having the longest description matches the condition of the candidate word deletion condition table (FIG. 3) (FIG. 3). Step C). In step C, the notation of the candidate word having the longest notation matches any one of the notation character strings in the candidate word deletion table (FIG. 3), and the succeeding character corresponding to the matched notation character string. Check if the sequence matches the front part of the string that follows the longest candidate word. In the case of this example, the notation “Momi” of the candidate word having the longest notation matches the first written character string “Momi” in the candidate word deletion table (FIG. 3), and the corresponding postscript character Since the column "eru" matches "eru" which follows "fir" in the example sentence (a) (step C), "fir" is deleted from the candidate word list (step D).

【００４２】ステップＡに戻り、「単語候補リスト」が
空でないので、表記の長さが一番長い単語「も」が候補
単語削除条件テーブル（図３）の条件に適合するかをチ
ェックし、この場合は適合しない（ステップＣ）。次
に、候補単語リストの中で表記の長さが一番長い単語は
「も」のみで複数存在しないので（ステップＥ）、候補
単語リストの中で表記の長さが一番長い単語「も」を選
択単語として単語選択部（２）を終了する（ステップ
Ｆ）。次に単語リスト作成部（３）で選択単語「も」
（副助詞）を単語リストに追加し、再び辞書引き部
（１）に戻る。Returning to step A, since the "word candidate list" is not empty, it is checked whether the word "mo" having the longest description matches the condition of the candidate word deletion condition table (FIG. 3). In this case, it does not fit (step C). Next, in the candidate word list, since the word having the longest notation is only “mo” and a plurality of words do not exist (step E), the word “mo” having the longest notation in the candidate word list is used. "As a selected word, and terminates the word selecting section (2) (step F). Next, in the word list creation unit (3), the selected word "mo"
(Sub particle) is added to the word list, and the process returns to the dictionary lookup unit (1).

【００４３】次に残りの文字列「みえる」に対して、辞
書引き部（１）の処理を行なった結果、図７に示すよう
に「候補単語リスト」として「みえる」（動詞「みえ
る」の連体／終止形）と「みえ」（名詞）と「みえ」
（動詞「みえる」の連用形）が得られ、単語選択部
（２）の処理に移る。Next, as a result of performing the processing of the dictionary look-up unit (1) on the remaining character string "seeable", as shown in FIG. 7, "seeable" as the "candidate word list" (the verb "seeable" Adjunct / Termination) and "Mie" (noun) and "Mie"
(A verb "verb") is obtained, and the process proceeds to the word selecting unit (2).

【００４４】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「みえる」が候補
単語削除条件テーブル（図３）の条件に適合するかをチ
ェックし、この場合は適合しない（ステップＣ）。次
に、候補単語リストの中で表記の長さが一番長い単語は
「みえる」のみで複数存在しないので（ステップＥ）、
候補単語リストの中で表記の長さが一番長い単語「みえ
る」を選択単語として単語選択部（２）を終了する（ス
テップＦ）。Since the "word candidate list" is not empty (step A), it is checked whether the word "seeable" having the longest notation length satisfies the condition of the candidate word deletion condition table (FIG. 3). If not, it does not match (step C). Next, in the candidate word list, the word having the longest notation length is only "visible" and does not exist (step E).
The word selection unit (2) ends with the word “see” having the longest description in the candidate word list as the selected word (step F).

【００４５】次に単語リスト作成部（３）で選択単語
「みえる」を単語リストに追加し、文末に達したので、
形態素解析処理を終了する。Next, the selected word "seeable" is added to the word list by the word list creating section (3), and the end of the sentence is reached.
The morphological analysis processing ends.

【００４６】［実施例２］次に例文（ｂ）「接続可能
だ」の場合の処理を説明する。[Second Embodiment] Next, the processing in the case of the example sentence (b) "Connectable" will be described.

【００４７】最初に文字列「接続可能だ」に対して、辞
書引き部（１）の処理を行なった結果、図８に示すよう
に「候補単語リスト」として「接続」（名詞）と「接
続」（動詞「接続する」の語幹）が得られ、単語選択部
（２）の処理に移る。First, the character string “connectable” is processed by the dictionary lookup unit (1). As a result, as shown in FIG. 8, “connect” (noun) and “connect” (The stem of the verb “connect”) is obtained, and the process proceeds to the word selection unit (2).

【００４８】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「接続」が候補単
語削除条件テーブル（図３）の条件に適合するかをチェ
ックし、この場合は適合しない（ステップＣ）。次に、
候補単語リストの中で表記の長さが一番長い単語が複数
（「接続」（名詞）と「接続する」（動詞））存在する
ので（ステップＥ）、ステップＧに移る。Since the "word candidate list" is not empty (step A), it is checked whether the word "connection" having the longest notation length satisfies the condition of the candidate word deletion condition table (FIG. 3). If not, it does not match (step C). next,
Since there are a plurality of words (“connect” (noun) and “connect” (verb)) having the longest notation in the candidate word list (step E), the process proceeds to step G.

【００４９】候補単語リストの中の表記の長さが一番長
い単語の活用タイプは、「接続」（名詞）が「活用な
し」で、「接続する」（動詞）が「語幹」である。ここ
で活用タイプ優先度テーブル（図４）を参照すると、
「活用なし」の優先度は「１」で、「語幹」の優先度は
「３」なので、「語幹」の優先度の方が高い。本例で
は、表記の長さが一番長い単語の中から優先度が最も高
い活用タイプを持つ単語が複数存在しないので（ステッ
プＧ）、優先度が最も高い活用タイプを持つ単語「接続
する」を選択単語として単語選択部（２）を終了する
（ステップＦ）。次に単語リスト作成部（３）で選択単
語「接続する」を単語リストに追加し、再び辞書引き部
（１）に戻る。As for the inflection type of the word having the longest notation in the candidate word list, "connection" (noun) is "no conjugation" and "connection" (verb) is "stem". Here, referring to the utilization type priority table (FIG. 4),
The priority of “no use” is “1”, and the priority of “stem” is “3”, so the priority of “stem” is higher. In this example, since there is no word having the highest usage type with the highest priority among the words having the longest notation length (step G), the word “connect” having the highest usage type is used. Is used as a selected word, and the word selecting section (2) is terminated (step F). Next, the selected word “connect” is added to the word list by the word list creation unit (3), and the process returns to the dictionary lookup unit (1).

【００５０】次に残りの文字列「可能だ」に対して、辞
書引き部（１）の処理を行なった結果、図９に示すよう
に「候補単語リスト」として「可能だ」（助動詞）が得
られ、単語選択部（２）の処理に移る。ここでの辞書引
き部（１）では、名詞の「可能」も辞書引きされるが、
直前の単語「接続」（動詞「接続する」の語幹）とは接
続不可と判定され、候補単語リストには追加されない。Next, as a result of processing the remaining character string "possible" by the dictionary lookup unit (1), "possible" (auxiliary verb) is obtained as a "candidate word list" as shown in FIG. Then, the process proceeds to the processing of the word selection unit (2). In the dictionary lookup unit (1), the noun “possible” is also looked up in the dictionary,
The immediately preceding word “connection” (the stem of the verb “connect”) is determined to be unconnectable, and is not added to the candidate word list.

【００５１】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「可能だ」が候補
単語削除条件テーブル（図３）の条件に適合するかをチ
ェックし、この場合は適合しない（ステップＣ）。次
に、候補単語リストの中で表記の長さが一番長い単語は
「可能だ」のみで複数存在しないので（ステップＥ）、
候補単語リストの中で表記の長さが一番長い単語「可能
だ」を選択単語として、単語選択部（２）を終了する
（ステップＦ）。次に単語リスト作成部（３）で選択単
語「可能だ」を単語リストに追加し、文末に達したの
で、形態素解析処理を終了する。Since the "word candidate list" is not empty (step A), it is checked whether the word "possible" having the longest description matches the condition of the candidate word deletion condition table (FIG. 3). In this case, it does not fit (step C). Next, in the candidate word list, the word having the longest notation length is only “possible” and does not exist (step E).
The word “possible” having the longest description in the candidate word list is selected as a selected word, and the word selecting unit (2) is terminated (step F). Next, the selected word “possible” is added to the word list by the word list creation unit (3), and the ending of the sentence is reached.

【００５２】［実施例３］次に例文（ｃ）「３本取り扱
います」の場合の処理を説明する。[Embodiment 3] Next, the processing in the case of the example sentence (c) "Three handles" will be described.

【００５３】「３」までの単語選択を終わった後、残り
の文字列「本取り扱います」に対して、辞書引き部
（１）の処理を行なった結果、図１０に示すように「候
補単語リスト」として「本」（名詞）と「本」（接頭
語）と「本」（助数詞）が得られ、単語選択部（２）の
処理に移る。After the selection of the words up to “3” is completed, the remaining character string “I will handle this book” is processed by the dictionary lookup unit (1). As a result, as shown in FIG. As a "list", "book" (noun), "book" (prefix), and "book" (numeric) are obtained, and the process proceeds to the word selection unit (2).

【００５４】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「本」が候補単語
削除条件テーブル（図３）の条件に適合するかをチェッ
クし、この場合は適合しない（ステップＣ）。次に、候
補単語リストの中で表記の長さが一番長い単語が複数
（「本」（名詞）と「本」（接頭語）と「本」（助数
詞））存在するので（ステップＥ）、ステップＧに移
る。Since the "word candidate list" is not empty (step A), it is checked whether the word "book" having the longest description matches the condition of the candidate word deletion condition table (FIG. 3). If not, it does not match (step C). Next, there are a plurality of words ("book" (noun), "book" (prefix), and "book" (numerical number)) in the candidate word list that have the longest expression length (step E). Then, go to step G.

【００５５】候補単語リストの中の表記の長さが一番長
い単語の活用タイプは、いずれの「本」も「活用なし」
で活用タイプに相当する優先度が同じである。従って、
表記の長さが一番長い単語の中で優先度が最も高い活用
タイプを持つ単語が複数存在するので（ステップＧ）、
ステップＩに移る。ステップＩでは、品詞優先度テーブ
ル（図５）を参照して、優先度が最も高い活用タイプを
持つ単語の中で優先度が最も高い品詞を持つ単語を選択
単語とする。品詞優先度テーブル（図５）によると、名
詞の優先度が「１」、接頭語の優先度が「２」、助数詞
の優先度が「３」であり、助数詞の優先度が一番高い。
従って、ステップＩでは、助数詞「本」を選択単語とし
て、単語選択部（２）を終了する。次に単語リスト作成
部（３）で選択単語「本」（助数詞）を単語リストに追
加し、再び辞書引き部（１）に戻る。The use type of the word having the longest notation in the candidate word list is “no use” for any “book”.
And the priority corresponding to the utilization type is the same. Therefore,
Since there are a plurality of words having the highest usage type among the words having the longest notation length (step G),
Move to step I. In Step I, referring to the part-of-speech priority table (FIG. 5), a word having the part-of-speech having the highest priority among words having the highest usage type is set as a selected word. According to the part of speech priority table (FIG. 5), the priority of the noun is “1”, the priority of the prefix is “2”, the priority of the classifier is “3”, and the priority of the classifier is the highest.
Therefore, in step I, the word selector (2) ends with the classifier "book" as the selected word. Next, the word selection unit (3) adds the selected word "book" (the classifier) to the word list, and returns to the dictionary lookup unit (1).

【００５６】次に残りの文字列「取り扱います」に対し
て、辞書引き部（１）の処理を行なった結果、図１１に
示すように「候補単語リスト」として「取り扱い」（名
詞）と「取り扱い」（動詞「取り扱う」の連用形）「取
り」（動詞「取る」の連用形）が得られ、単語選択部
（２）の処理に移る。Next, as a result of performing the processing of the dictionary look-up unit (1) on the remaining character string “handle”, as shown in FIG. 11, “handle” (noun) and “handle” as “candidate word list” "Handling" (conjunctive form of the verb "handle") "take" (conjunctive form of the verb "take") is obtained, and the process proceeds to the word selection unit (2).

【００５７】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「取り扱い」が候
補単語削除条件テーブル（図３）の条件に適合するかを
チェックし、この場合は適合しない（ステップＣ）。次
に、候補単語リストの中で表記の長さが一番長い単語が
複数（「取り扱い」（名詞）と「取り扱う」（動詞））
存在するので（ステップＥ）、ステップＧに移る。候補
単語リストの中の表記の長さが一番長い単語の活用タイ
プは、「取り扱い」（名詞）が「活用なし」で、「取り
扱う」（動詞）が「連用形」である。ここで活用タイプ
優先度テーブル（図４）を参照すると、「活用なし」の
優先度は「１」で、「連用形」の優先度は「２」なの
で、「連用形」の優先度の方が高い。本例では、表記の
長さが一番長い単語の中で優先度が最も高い活用タイプ
を持つ単語が複数存在しないので（ステップＧ）、優先
度が最も高い活用タイプを持つ単語「取り扱う」を選択
単語として、単語選択部（２）を終了する。次に単語リ
スト作成部（３）で選択単語「取り扱う」を単語リスト
に追加し、再び辞書引き部（１）に戻る。Since the "word candidate list" is not empty (step A), it is checked whether the word "handling" having the longest notation length satisfies the conditions of the candidate word deletion condition table (FIG. 3). If not, it does not match (step C). Next, in the candidate word list, there are multiple words with the longest notation length ("handling" (noun) and "handling" (verb))
Since it exists (step E), the process proceeds to step G. The usage type of the word having the longest description in the candidate word list is “handling” (noun) is “no usage”, and “handling” (verb) is “continuous”. Referring to the utilization type priority table (FIG. 4), the priority of “no utilization” is “1” and the priority of “continuous type” is “2”, so that the priority of “continuous type” is higher. . In this example, since there are no words having the highest usage type having the highest priority among the words having the longest notation length (step G), the word “handle” having the highest usage type is used. As the selected word, the word selecting section (2) ends. Next, the selected word "handle" is added to the word list by the word list creation unit (3), and the process returns to the dictionary lookup unit (1).

【００５８】次に残りの文字列「ます」に対して、辞書
引き部（１）の処理を行なった結果、図１２に示すよう
に「候補単語リスト」として「ます」（助動詞）が得ら
れ、単語選択部（２）の処理に移る。Next, as a result of processing the remaining character string "mas" by the dictionary lookup unit (1), "mas" (auxiliary verb) is obtained as a "candidate word list" as shown in FIG. Then, the process proceeds to the word selecting unit (2).

【００５９】「単語候補リスト」が空でないので（ステ
ップＡ）、表記の長さが一番長い単語「ます」が候補単
語削除条件テーブル（図３）の条件に適合するかをチェ
ックするが、この場合は適合しない（ステップＣ）。次
に、候補単語リストの中で表記の長さが一番長い単語は
「ます」のみなので（ステップＥ）、候補単語リストの
中で表記の長さが一番長い単語「ます」を選択単語とし
て、単語選択部（２）を終了する（ステップＦ）。次に
単語リスト作成部（３）で選択単語「ます」を単語リス
トに追加し、文末に達したので、形態素解析処理を終了
する。Since the "word candidate list" is not empty (step A), it is checked whether the word "mas" having the longest description matches the condition of the candidate word deletion condition table (FIG. 3). In this case, it does not fit (step C). Next, since the word with the longest notation in the candidate word list is only "masu" (step E), the word "masu" with the longest notation in the candidate word list is selected. Then, the word selection unit (2) is terminated (step F). Next, the selected word "mas" is added to the word list by the word list creation unit (3), and the ending of the sentence is reached.

【００６０】[0060]

【発明の効果】本発明は、以上に説明したように、最長
一致法による日本語文の形態素解析における単語選択処
理において、表記の長さが一番長い単語を選択すること
による分割位置の誤りを軽減することができ、また、表
記の長さが一番長い候補単語が複数存在する場合に、従
来よりもより正しい単語を選択できるという効果が得ら
れる。As described above, according to the present invention, in the word selection process in the morphological analysis of a Japanese sentence by the longest match method, an error in the division position caused by selecting the word having the longest notation length is eliminated. In the case where there are a plurality of candidate words having the longest notation length, it is possible to obtain an effect that a correct word can be selected as compared with the related art.

[Brief description of the drawings]

【図１】本発明を実施した形態素解析処理のブロック図
である。FIG. 1 is a block diagram of a morphological analysis process embodying the present invention.

【図２】単語選択部の動作のフローチャートを示す図で
ある。FIG. 2 is a diagram illustrating a flowchart of an operation of a word selection unit.

【図３】候補単語削除条件テーブルを示す図である。FIG. 3 is a diagram showing a candidate word deletion condition table.

【図４】活用タイプ優先度テーブルを示す図である。FIG. 4 is a diagram showing a utilization type priority table.

【図５】品詞優先度テーブルを示す図である。FIG. 5 is a diagram showing a part of speech priority table.

【図６】候補単語リストの内容を示す図である。FIG. 6 is a diagram showing the contents of a candidate word list.

【図７】候補単語リストの内容を示す図である。FIG. 7 is a diagram showing the contents of a candidate word list.

【図８】候補単語リストの内容を示す図である。FIG. 8 is a diagram showing contents of a candidate word list.

【図９】候補単語リストの内容を示す図である。FIG. 9 is a diagram showing contents of a candidate word list.

【図１０】候補単語リストの内容を示す図である。FIG. 10 is a diagram showing contents of a candidate word list.

【図１１】候補単語リストの内容を示す図である。FIG. 11 is a diagram showing the contents of a candidate word list.

【図１２】候補単語リストの内容を示す図である。FIG. 12 is a diagram showing contents of a candidate word list.

[Explanation of symbols]

１・・・辞書引き部２・・・単語選択部３・・・単語リスト作成部４・・・単語辞書５・・・接続表６・・・候補単語削除条件テーブル７・・・活用タイプ優先度テーブル８・・・品詞優先度テーブル 1 ・・・ Dictionary lookup unit 2 ・・・ Word selection unit 3 ・・・ Word list creation unit 4 ・・・ Word dictionary 5 ・・・ Connection table 6 ・・・ Candidate word deletion condition table 7 ・・・ Usage type priority Degree table 8 ・・・ Part of speech priority table

Claims

(57) [Claims]

In a morphological analysis of a Japanese sentence using a longest matching method for selecting a word having the longest notation length from dictionary-searched candidate words, a notation character string of the candidate word and the candidate word A candidate word deletion condition table that describes an incorrect combination with the following character string is provided, and when the word having the longest notation matches the condition in the candidate word deletion condition table, the word is removed from the candidate word. A morphological analysis method for Japanese sentences having candidate word deletion processing means to be deleted.

2. A morphological analysis method for a Japanese sentence using the longest matching method, wherein a utilization type priority table in which a utilization type of a word and a priority corresponding to the utilization type are paired; If there are a plurality of words having the longest lengths in the above, refer to the inflection type priority table and select the inflection type having the highest priority from the words having the longest notation length. A morphological analysis method for Japanese sentences that has a utilizing word selection processing means for selecting words to be possessed.