JP3389285B2

JP3389285B2 - Proper noun identification method

Info

Publication number: JP3389285B2
Application number: JP14396393A
Authority: JP
Inventors: 強木谷
Original assignee: NTT Data Corp
Current assignee: NTT Data Corp
Priority date: 1993-06-15
Filing date: 1993-06-15
Publication date: 2003-03-24
Anticipated expiration: 2018-03-24
Also published as: JPH0721196A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データベース等におけ
る名詞を特定する処理方法に関し、さらに詳しくは日本
語文章に出現する企業名、人名、地名などの固有名詞を
特定する固有名詞特定方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a processing method for specifying a noun in a database or the like, and more particularly to a proper noun specifying method for specifying a proper noun such as a company name, a person's name or a place name appearing in Japanese sentences.

【０００２】[0002]

【従来の技術】データベース等においては、目的のデー
タを得るための検索キーを必要とする。この検索キーは
固有名詞であることが多く、データベース中に存在する
固有名詞を操作をともなわずに特定できれば、オペレー
タ等の操作は少なくなりデータベース等への効果は大き
い。2. Description of the Related Art A database or the like requires a search key for obtaining desired data. This search key is often a proper noun, and if a proper noun existing in the database can be specified without any operation, the number of operations by an operator or the like will be small and the effect on the database will be great.

【０００３】従来、前述の固有名詞の特定は、一般的に
固有名詞を登録した辞書と文章との照合によるものであ
った。このため辞書に登録されていない固有名詞を特定
することはできなかった。しかしながら、この方法の他
に辞書と文章との照合の他に、固有名詞の前後で頻繁に
出現する接頭語および接尾語をパターンマッチングのキ
ーとして固有名詞を特定する方法が実用化されている。Conventionally, the identification of the proper noun has generally been performed by collating a dictionary in which the proper noun is registered with a sentence. Therefore, it was not possible to identify proper nouns that were not registered in the dictionary. However, in addition to this method, in addition to collating a dictionary with a sentence, a method of identifying a proper noun using a prefix and suffix frequently appearing before and after the proper noun as a key for pattern matching has been put into practical use.

【０００４】図７は従来の固有名詞を推定する処理のフ
ローチャートである。ステップＳ１の入力処理によって
入力装置等から日本語文章を取り込むと、ステップＳ２
の固有名詞パターンマッチング処理によって、入力文字
列と固有名詞修飾語辞書Ｄ１や固有名詞パターン辞書Ｄ
２とのパターンマッチングを行って、企業名、人名、地
名等の固有名詞を捜し出す。尚、固有名詞修飾語辞書Ｄ
１は固有名詞の前後で頻繁に出現する接頭語、接尾語、
同意語等の情報を記憶し、固有名詞出願パターン辞書Ｄ
２は固有名詞とその前語の接頭語、接尾語、同格語等の
出現形式を定めた情報を記憶している。FIG. 7 is a flowchart of a conventional process for estimating proper nouns. When a Japanese sentence is fetched from the input device or the like by the input process of step S1, step S2
By the proper noun pattern matching process of the input character string, proper noun modifier dictionary D1 and proper noun pattern dictionary D
Pattern matching with No. 2 is performed to find proper nouns such as company names, personal names, and place names. The proper noun modifier dictionary D
1 is a prefix, suffix that often appears before and after proper nouns,
Stores information such as synonyms, and proper noun application pattern dictionary D
Reference numeral 2 stores information that defines the appearance forms of proper nouns and their predecessors such as prefixes, suffixes, and synonyms.

【０００５】ステップＳ２の後はステップＳ３の重なり
パターン選択処理によって、これらの情報と入力文字列
とのパターンマッチング結果から、固有名詞のパターン
が重なる場合に、パターンの一致度およびマッチしたパ
ターンの長さと文字位置に基づき、確からしいパターン
を選択する。そして、ステップＳ４の省略固有名詞探索
処理Ｓ４によって、接頭語および接尾語が省略された場
合であっても、それを求める。そして、前述した処理に
よって得られた結果をステップＳ５の出力処理によって
出力装置に出力する。After the step S2, the overlapping pattern selection processing of the step S3 indicates that the pattern matching result between the information and the input character string indicates the degree of coincidence of the proper nouns and the length of the matched pattern when the proper noun patterns overlap. And select a probable pattern based on the character position. Then, by the abbreviated proper noun search processing S4 of step S4, even if the prefix and suffix are omitted, it is obtained. Then, the result obtained by the above-described processing is output to the output device by the output processing of step S5.

【０００６】以上の様な処理によって、固有名詞を特定
していたが、この方法によっても特定できない固有名詞
が存在した。Although proper nouns have been specified by the above processing, some proper nouns cannot be specified by this method.

【０００７】[0007]

【発明が解決しようとする課題】上述のように、従来の
固有名詞の特定技術では、辞書検索による方法、また
は、固有名詞の前後で頻繁に出現する接頭語および接尾
語を利用したパターンマッチングによる方法によって
も、特定できない固有名詞が存在するという問題があっ
た。As described above, in the conventional proper noun specifying technique, a method using a dictionary search or pattern matching using a prefix and a suffix frequently appearing before and after the proper noun is used. There was also the problem that proper nouns could not be identified depending on the method.

【０００８】本発明は辞書に登録されておらず、かつ固
有名詞の前後で頻繁に出現する接頭語および接尾語によ
っても特定できない固有名詞であっても、高精度に特定
することが可能な固有名詞特定処理方法を提供すること
を目的とする。According to the present invention, even a proper noun that is not registered in a dictionary and cannot be identified by a prefix and a suffix frequently appearing before and after the proper noun can be identified with high accuracy. The purpose is to provide a noun identification processing method.

【０００９】[0009]

【課題を解決するための手段とその作用】本発明は、入
力装置から加わる日本語文章中の固有名詞を抽出する固
有名詞特定処理システムにおける前記日本語文章中の文
字列が固有名詞であるかどうかの推定方法におけるもの
である。本発明による固有名詞特定方法は、前記文字列
に対応する、固有名詞が頻繁に出現する文のパターン単
位で該パターン中に固有名詞が存在する確からしさを定
めた固有名詞パターンデータを記憶手段に記憶されてい
る固有名詞の文パターン辞書から読み出し、固有名詞を
構成する文字数あるいは文字種類の少なくとも一方をパ
ラメータとした固有名詞としての確からしさを定めた統
計データを記憶手段に記憶されている確からしさテーブ
ルから読み出し、前記固有名詞パターンデータ及び前記
統計データとを参照して、確率推論により固有名詞であ
るかどうかを推定し、結果を出力装置から出力する。 Means for Solving the Problems and its effect of the present invention, input
Forced to extract proper nouns from Japanese sentences added from force device
Sentences in the Japanese sentence in the noun identification processing system
In a method of estimating whether a character string is a proper noun
Is. The proper noun specifying method according to the present invention is based on the character string
Corresponding to a sentence pattern in which proper nouns frequently appear
The probability that a proper noun exists in the pattern
The proper noun pattern data is stored in the storage means.
Read from the sentence pattern dictionary of proper nouns
At least one of the number of characters and the type of characters
A rule that defines certainty as a proper noun that is a parameter
Probability table with total data stored in storage means
Read out the proper noun pattern data and the
By referring to statistical data,
It is estimated whether or not the result is output from the output device.

【００１０】本発明では、従来技術では特定できなかっ
た企業名、人名、地名などの固有名詞を、パターンマッ
チングによる確からしさと固有名詞の特徴から得られる
確からしさから、確率推論により特定することができ
る。In the present invention, proper nouns such as a company name, a person's name, and a place name, which could not be identified by the prior art, can be identified by probability inference from the certainty obtained by pattern matching and the certainty obtained from the features of the proper noun. it can.

【００１１】これにより特定した固有名詞は、データベ
ースへの登録情報や、データベース検索のための検索キ
ーなど、様々のアプリケーションプログラムで利用する
ことができる。The proper noun specified in this way can be used in various application programs such as database registration information and a search key for database search.

【００１２】[0012]

【実施例】以下、本発明を図面を用いて詳細に説明す
る。図１は、本発明の実施例の日本語文章に対する固有
名詞特定処理のフローチャートである。本発明の実施例
の日本語文章に対する固有名詞特定処理は、図示されて
いない入力装置から日本語文章が加わると処理を開始す
る。先ず、ステップＳ１１で形態素解析処理が文単位に
入力する入力文字列を単語に分割し品詞を付与する。そ
してステップＳ１２で固有名詞パターンマッチング処理
が単語に分割した入力文字列と固有名詞の文パターン辞
書Ｄ１１とのパターンマッチングによって、企業名、人
名、地名などの固有名詞を含む文字列を捜し出す。The present invention will be described in detail below with reference to the drawings. FIG. 1 is a flowchart of proper noun specifying processing for Japanese sentences according to the embodiment of the present invention. The proper noun specifying process for Japanese sentences of the embodiment of the present invention starts when a Japanese sentence is added from an input device (not shown). First, in step S11, the morphological analysis process divides the input character string input in sentence units into words and adds a part of speech. Then, in step S12, a character string including a proper noun such as a company name, a person's name, or a place name is searched for by pattern matching between the input character string divided into words by the proper noun pattern matching process and the proper noun sentence pattern dictionary D11.

【００１３】固有名詞の文パターン辞書Ｄ１１は固有名
詞が頻繁に出現する文のパターンおよびパターン中に固
有名詞が存在する確からしさを記憶している。すなわ
ち、固有名詞の文パターン辞書Ｄ１１に定義されたパタ
ーンを取り出し、パターン中の文字列フィールドが定義
された順序で入力文中に存在すれば、任意フィールドに
該当する文字列を代入する。The sentence pattern dictionary D11 of proper nouns stores the pattern of sentences in which proper nouns frequently appear and the likelihood that proper nouns exist in the patterns. That is, the pattern defined in the sentence pattern dictionary D11 of proper noun is taken out, and if the character string field in the pattern exists in the input sentence in the defined order, the character string corresponding to the arbitrary field is substituted.

【００１４】図２は固有名詞の文パターン辞書Ｄ１１の
内容の一部を示す図である。パターン辞書Ｄ１１の内容
は記憶情報の一部を表わす固有名詞が頻繁に出現する文
のパターンおよびパターン中に固有名詞が存在する確か
らしさの一例を示すものである。記号“＠”で始まるフ
ィールドすなわち任意フィールドは、任意の文字列と照
合することを示し、それ以外のフィールドすなわち文字
列フィールドは定義された文字列自身がパターンマッチ
ングの対象となる。なお、任意フィールド“＠ＣＮＡＭ
Ｅ−ＳＵＢＪ”は、パターンの属性として固有名詞であ
る企業名を含み、構文上は主語であることを意味する。
また、マッチした文字列中に固有名詞が存在する統計的
な確からしさは、記号“：”の直後に定義されている。
この「確からしさ」とは「負の確からしさ」であり、確
からしくない程度を表している。たとえば負の確からし
さ1.00とは、全く確からしくないことを意味する。FIG. 2 is a view showing a part of the contents of the proper noun sentence pattern dictionary D11. The content of the pattern dictionary D11 shows an example of a sentence pattern in which proper nouns representing a part of the stored information frequently appear and the certainty of the proper nouns in the patterns. A field starting with the symbol "@", that is, an arbitrary field indicates matching with an arbitrary character string, and a field other than that, that is, a character string field, has the defined character string itself as a target of pattern matching. The optional field "@CNAM"
"E-SUBJ" includes a company name that is a proper noun as an attribute of a pattern, and means that it is a syntactic subject.
The statistical certainty that the proper noun exists in the matched character string is defined immediately after the symbol ":".
This "certainty" is "negative certainty" and represents the degree of uncertainty. For example, a negative likelihood of 1.00 means nothing at all.

【００１５】ステップＳ１２の固有名詞パターンマッチ
ング処理の後はステップＳ１３で、捜し出した固有名を
含む文字列から、固有名詞になりうる連続する単語を捜
し、固有名詞候補の範囲を決定する固有名詞範囲決定処
理を行う。すなわち、このステップＳ１３では、固有名
詞の属性を持つ任意フィールドに代入された文字列に対
し、最後尾から単語の品詞を調べ、付属語でなく、かつ
品詞が動詞、形容詞、形容動詞、連体詞、副詞、記号、
接続詞以外なら、前方に固有名詞を拡張していく。上記
の条件を満足しない単語が出現した時点で、１つ後方の
単語から最後尾の単語までを固有名詞候補とする。After the proper noun pattern matching process in step S12, in step S13, a sequence of proper noun candidates is searched for a continuous word that can be a proper noun from the character string including the found proper name, and a proper noun range is determined. Perform decision processing. That is, in this step S13, the part-of-speech of the word is examined from the tail end of the character string substituted in the arbitrary field having the attribute of proper noun, and the part-of-speech is not an adjunct word Adverbs, symbols,
Except for conjunctions, proper nouns are expanded forward. When a word that does not satisfy the above conditions appears, the next word to the last word is set as a proper noun candidate.

【００１６】つづいて、ステップＳ１４で、固有名詞推
定処理が固有名詞の文パターン辞書Ｄ１１に定義されて
いる照合文字列中に固有名詞が含まれる確からしさと、
固有名詞の確からしさテーブルＤ１２に定められている
固有名詞候補の文字数と文字種類をパラメータとして求
めた固有名詞候補の固有名詞としての確からしさとを参
照して、エキスパートシステムＭＹＣＩＮ（Shortliff
e,E.H., Computer-Based Medical Consultations ：Ｍ
ＹＣＩＮ, New York, ＮＹ：El-sevier,1976) で使用さ
れている確率推論により固有名詞か否かを推定する。す
なわち、ステップＳ１４では以下の如くの処理を行う。Then, in step S14, the probability that the proper noun estimation process includes the proper noun in the collation character string defined in the proper noun sentence pattern dictionary D11,
The expert system MYCIN (Shortliff) is referred to by referring to the certainty as the proper noun of the proper noun candidate obtained by using the number of the proper noun candidates and the character type defined in the certainty noun table D12 as parameters.
e, EH, Computer-Based Medical Consultations: M
YCIN, New York, NY: El-sevier, 1976). That is, in step S14, the following processing is performed.

【００１７】照合した文字列中に固有名詞が含まれる確
からしさを、固有名詞の文パターン辞書Ｄ１１から取得
する（この値をＭＢとする）。また、固有名詞候補の文
字数をパラメータとして求めた固有名詞候補の固有名詞
としての負の確からしさを固有名詞の確からしさテーブ
ルＤ１２から取得する（この値をＭＤ１とする）。同様
にして、固有名詞候補の先頭文字の種類をパラメータと
して求めた固有名詞候補の固有名詞としての負の確から
しさを固有名詞の確からしさテーブルＤ１２から取得す
る（この値をＭＤ２とする）。エキスパートシステムの
ＭＹＣＩＮにおける確率推論方式を使用し、ＭＤ１とＭ
Ｄ２を統合した負の確からしさＭＤは、ＭＤ＝ＭＤ１＋ＭＤ２−ＭＤ１×ＭＤ２・・・・・・(1) として求めることができる。最後に、固有名詞としての
総合的な確からしさＣＦは、ＣＦ＝ＭＢ−ＭＤ・・・・・・(2) として求めることができる。この総合的な確からしさ
（ＣＦ）が予め定められた値よりも大きい場合、固有名
詞候補を固有名詞と推定し、図示されていない出力装置
から固有名詞を出力する。The probability that the proper noun is included in the collated character string is acquired from the proper noun sentence pattern dictionary D11 (this value is set to MB). Also, the negative likelihood as a proper noun of the proper noun candidate obtained by using the number of characters of the proper noun candidate as a parameter is acquired from the likelihood table D12 of the proper noun (this value is referred to as MD1). Similarly, the negative certainty as a proper noun of the proper noun candidate obtained by using the type of the first character of the proper noun candidate as a parameter is acquired from the proper noun certainty table D12 (this value is referred to as MD2). Using the probabilistic inference method in MYCIN of the expert system, MD1 and M
The negative certainty MD that integrates D2 can be calculated as MD = MD1 + MD2-MD1 * MD2 (1). Finally, the total likelihood CF as a proper noun can be obtained as CF = MB-MD (2). When the total likelihood (CF) is larger than a predetermined value, the proper noun candidate is estimated as a proper noun, and the proper noun is output from an output device (not shown).

【００１８】前述のステップＳ１１〜Ｓ１４の処理によ
って処理されるのは１文であり、ステップＳ１４の後に
ステップＳ１５で全ての文を処理したかを判別する。全
ての文を処理していない時（ＮＯ）には、次に入力する
文を処理するよう再度ステップＳ１１より実行する。ま
た、全ての文を処理した時には、ステップＳ１６の出力
処理で、決定した処理結果を図示していない出力装置に
出力する。One sentence is processed by the above-described steps S11 to S14, and it is determined whether all the sentences have been processed in step S15 after step S14. When all the sentences have not been processed (NO), the process is repeated from step S11 to process the next sentence to be input. When all the sentences have been processed, the output processing of step S16 outputs the determined processing result to an output device (not shown).

【００１９】さらに、前述したステップＳ１２，Ｓ１
３，Ｓ１４の処理について詳細に説明する。図３は固有
名詞の文パターンマッチング処理の詳細なフローチャー
トである。この固有名詞の文パターンマッチング処理で
は固有名詞の文パターン辞書Ｄ１１に定義されている全
てのパターンに対して、マッチングしているか否かをチ
ェックする。Further, the steps S12 and S1 described above are performed.
The process of S3 and S14 will be described in detail. FIG. 3 is a detailed flowchart of sentence pattern matching processing for proper nouns. In this sentence pattern matching process for proper nouns, it is checked whether or not all the patterns defined in the sentence pattern dictionary D11 for proper nouns are matched.

【００２０】先ず、本処理を開始すると、ステップＳ２
１で固有名詞の文パターン辞書Ｄ１１に定義されている
全てのパターンを調べたかを判別する。全てのパターン
を調べた時（ＹＥＳ）には処理を終了する。全てのパタ
ーンをまだ調べてない時（ＮＯ）には、ステップＳ２２
で１つのパターンを取り出し、ステップＳ２３でパター
ン中の文字列が定義された順序で入力文中に存在するか
を判別する。すなわち、パターンと入力文字とがマッチ
するかを判別する。マッチするパターンが存在しない時
には再度ステップＳ２１より実行する。存在した時に
は、ステップＳ２４で任意フィールドに該当する文字列
を代入する。この代入により、判別すべき文中の１個の
文字列が固有名詞として確定する。First, when this process is started, step S2
In step 1, it is determined whether all the patterns defined in the sentence pattern dictionary D11 of proper nouns have been examined. When all the patterns have been checked (YES), the process ends. If all the patterns have not been checked yet (NO), step S22.
One pattern is extracted in step S23, and it is determined in step S23 whether the character strings in the pattern are present in the input sentence in the defined order. That is, it is determined whether the pattern matches the input character. When there is no matching pattern, the process is repeated from step S21. When it exists, the character string corresponding to the arbitrary field is substituted in step S24. By this substitution, one character string in the sentence to be discriminated is determined as a proper noun.

【００２１】以上の如く処理することにより、ステップ
Ｓ１２の固有名詞の文パターンマッチング処理が終了す
る。この後、ステップＳ１３で固有名詞範囲決定処理に
入る。With the above processing, the sentence pattern matching process for proper nouns in step S12 is completed. Then, in step S13, proper noun range determination processing is started.

【００２２】図４は固有名詞範囲決定処理の動作フロー
チャートである。この固有名詞範囲決定処理が開始する
と、先ずステップＳ３１で、固有名詞が含まれるであろ
うパターンの最後尾の単語に固有名詞候補の先頭位置と
終了位置をセットする。FIG. 4 is an operation flowchart of proper noun range determination processing. When the proper noun range determination process starts, first, in step S31, the head position and end position of the proper noun candidate are set to the last word of the pattern that may include the proper noun.

【００２３】一般的に、助詞の直前で任意フィールドが
終了した場合、任意フィールドの最も後尾にたとえば企
業名が来る可能性が高い。よって、固有名詞範囲決定処
理では、固有名詞の先頭位置の初期値を任意フィールド
の最後尾の単語位置に設定し、それを前方に可能なかぎ
り拡張していく。この様な範囲決定のために、ステップ
Ｓ３１の処理がなされ、固有名詞を判別する範囲が限定
される。In general, when the arbitrary field ends immediately before the postpositional particle, for example, the company name is likely to come at the end of the arbitrary field. Therefore, in the proper noun range determination processing, the initial value of the head position of the proper noun is set to the last word position of the arbitrary field, and it is expanded as far as possible forward. In order to determine such a range, the process of step S31 is performed, and the range for discriminating proper nouns is limited.

【００２４】つづいて、ステップＳ３２で先頭位置の単
語が付属語でなく、かつ品詞が動詞、形容詞、形容動
詞、連体詞、副詞、記号、接続詞の詞以外であるかを判
別する。上記の詞以外である時（ＹＥＳ）には、ステッ
プＳ３３で固有名詞が含まれるパターンの先頭単語まで
調べたか否かを判別する。先頭単語まで調べていない時
には、ステップＳ３４で、固有名詞候補の先頭位置を１
つ前方の単語にセットする。Subsequently, in step S32, it is determined whether the word at the head position is not an adjunct word and the part of speech is other than a verb, an adjective, an adjective verb, an adverb, an adverb, a symbol, or a conjunction. If the word is other than the above words (YES), it is determined in step S33 whether or not the leading word of the pattern including the proper noun has been examined. If the leading word is not checked, the leading position of the proper noun candidate is set to 1 in step S34.
One word ahead.

【００２５】つづいて、ステップＳ３２より再度実行す
る。また、ステップＳ３３で固有名詞が含まれるパター
ンの先頭単語まで調べた時には、ステップＳ３５でその
間を固有名詞候補として出力し、全処理を終了する。Subsequently, the process is repeated from step S32. When the first word of the pattern including the proper noun is checked in step S33, the portion between them is output as a proper noun candidate in step S35, and the whole process is terminated.

【００２６】一方、ステップＳ３２の判別において先頭
位置の単語が付属語であったり、品詞が動詞や形容詞や
形容動詞や連体詞や副詞や記号や接続詞の１個であった
時（ＮＯ）には、ステップＳ３６で固有名詞候補の先頭
位置が終了位置より前方にあるかを判別し、固有名詞候
補の先頭位置より前にない時には全処理を終了する。前
方にある時（ＹＥＳ）にはステップＳ３７で固有名詞候
補の先頭位置を１つ後方に戻し、ステップＳ３５でその
間を固有名詞候補として出力し、全処理を終了する。以
上の動作により、固有名詞の範囲が決定する。On the other hand, when the word at the head position is an adjunct word or the part of speech is one of a verb, an adjective, an adjective, an adverb, an adverb, a symbol, and a conjunction (NO) in the determination of step S32, In step S36, it is determined whether or not the head position of the proper noun candidate is ahead of the end position. If it is not before the head position of the proper noun candidate, the entire process is ended. When it is in front (YES), the head position of the proper noun candidate is moved back by one in step S37, and the part between them is output as a proper noun candidate in step S35, and the whole process is ended. The range of proper nouns is determined by the above operation.

【００２７】前述したステップＳ１３での処理によって
固有名詞の範囲が決定する。図５はステップＳ１４での
固有名詞推定処理の詳細なフローチャートである。ステ
ップＳ１３の処理によって、固有名詞の範囲が決定した
後、この処理を実行する。尚、この処理では固有名詞の
文パターン辞書と固有名詞の確からしさテーブルＤ１２
を用いて固有名詞を推定する。The range of proper nouns is determined by the processing in step S13 described above. FIG. 5 is a detailed flowchart of the proper noun estimation process in step S14. After the range of proper nouns is determined by the process of step S13, this process is executed. In this processing, the sentence pattern dictionary of proper nouns and the probability table D12 of proper nouns are used.
Estimate proper nouns using.

【００２８】先ず、この処理を実行開始すると、全ての
固有名詞の属性を持つ任意フィールドを処理したかをス
テップＳ４１で判別する。本発明の実施例においては、
入力する全文に対して推定を行うので、全ての固有名詞
の属性を持つ任意フィールドを処理した時（ＹＥＳ）に
は、この固有名詞推定処理を終了する。一方、終了して
いない時（ＮＯ）には、ステップＳ４２で照合文字列に
固有名詞が含まれる確からしさ（ＢＢ）を固有名詞の文
パターン辞書Ｄ１１から求める。First, when this processing is started, it is determined in step S41 whether or not the arbitrary fields having the attributes of all proper nouns have been processed. In an embodiment of the invention,
Since the whole sentence to be input is estimated, when the arbitrary fields having the attributes of all proper nouns are processed (YES), the proper noun estimation process is terminated. On the other hand, if not completed (NO), the certainty (BB) that the proper noun is included in the collation character string is obtained from the proper noun sentence pattern dictionary D11 in step S42.

【００２９】図６(a) は固有名詞候補の文字数をパラメ
ータとして求めた固有名詞候補の固有名詞としての確か
らしさを示す統計情報の一例を示すテーブル図表であ
り、図６(b) は固有名詞候補の先頭の文字種類をパラメ
ータとして求めた固有名詞候補の固有名詞としての確か
らしさを示す統計情報の一例を示すテーブル図表であ
る。本発明の実施例は、固有名詞とならない場合の確率
を、負の確からしさとして定義している。FIG. 6 (a) is a table chart showing an example of statistical information indicating the probability of proper noun candidates as proper nouns obtained by using the number of characters of proper noun candidates as a parameter, and FIG. 6 (b) is proper noun. 7 is a table chart showing an example of statistical information indicating the certainty as a proper noun of a proper noun candidate obtained by using the character type at the beginning of the candidate as a parameter. In the embodiment of the present invention, the probability of not becoming a proper noun is defined as negative certainty.

【００３０】さらに、図５に戻って説明を続ける。ステ
ップＳ４２の後は、ステップＳ４３で、固有名詞候補の
文字数をパラメータとして固有名詞候補の負の確からし
さ（ＭＤ１）を固有名詞の確からしさテーブルＤ１２か
ら求める。さらにステップＳ４４で固有名詞候補の先頭
文字種類をパラメータとして固有名詞候補の負の確から
しさ（ＭＤ２）を固有名詞の確からしさテーブルＤ１２
から求める。Further, returning to FIG. 5, the description will be continued. After step S42, in step S43, the negative likelihood (MD1) of the proper noun candidate is obtained from the likelihood table D12 of the proper noun using the number of characters of the proper noun candidate as a parameter. Further, in step S44, the negative likelihood (MD2) of the proper noun candidate is set to the proper noun probability table D12 using the leading character type of the proper noun candidate as a parameter.
Ask from.

【００３１】前述したステップＳ４３，４４で固有名詞
の負の確からしさ（ＭＤ１，ＭＤ２）を求める。この結
果を用いて、ステップＳ４５では式(1) を用いて固有名
詞としての負の確からしさ（ＭＤ）を計算する。そし
て、前述のステップＳ４２で求めた固有名詞が含まれる
確からしさ（ＭＢ）と統合した負の確からしさ（ＭＤ）
とから総合的な確からしさ（ＣＦ）を式(2) を用いてス
テップＳ４６で求める。In steps S43 and 44 described above, the negative certainty (MD1, MD2) of the proper noun is obtained. Using this result, in step S45, the negative likelihood (MD) as a proper noun is calculated using the equation (1). Then, the negative likelihood (MD) integrated with the certainty (MB) including the proper noun obtained in step S42 described above.
From the above, the total likelihood (CF) is obtained in step S46 using the equation (2).

【００３２】本発明の実施例においては、ＣＦが0.10よ
り大きい時、固有名詞と判断しており、ステップＳ４７
では求めた規定値0.10より大きいかを判別し、小さい時
には固有名詞でない（ＮＯ）と判断し、再度ステップＳ
４１より実行する。また、0.10より大きい時（ＹＥＳ）
には固有名詞と判断し、それを固有名詞としてステップ
Ｓ４８で出力する。そして再度ステップＳ４１より実行
する。以上の繰り返しにより、入力文の固有名詞を得
る。In the embodiment of the present invention, when CF is larger than 0.10.
Then, it is determined whether or not it is larger than the specified value 0.10.
Execute from 41. Also, when it is greater than 0.10 (YES)
Is determined as a proper noun and is output as a proper noun in step S48. Then, the process is executed again from step S41. By repeating the above, the proper noun of the input sentence is obtained.

【００３３】以下では、さらに入力文字列が「同社は２
２日、ＸＹＺ信託と日本国内での販売に関して提携した
と発表した」における文中の固有名詞について求める場
合を説明する。In the following, the input character string is "2
On the 2nd, it was announced that the XYZ trust has formed a tie-up with regard to sales in Japan. "

【００３４】固有名詞の文パターン辞書中に図２のパタ
ーン中の文字列フィールド「は」、「と」、「提携」が
存在し、入力文字列中に上記の順序で存在するので、パ
ターンマッチングに成功することになり、任意フィール
ドは、＠ＣＯＭＰＡＮＹ−ＳＵＢＪ＝「同社」＠ＣＯＭＰＡＮＹ−ＰＡＲＴＮＥＲ＝「２２日、ＸＹ
Ｚ信託」、＠ＳＫＩＰ＝「日本国内での販売に関して」となる。これは、ステップＳ１２における固有名詞文パ
ターンマッチング処理でなされる。In the sentence pattern dictionary of proper nouns, the character string fields "ha", "to", and "affiliation" in the pattern of FIG. 2 exist, and they exist in the above order in the input character string. Therefore, pattern matching is performed. In the optional fields, @ COMPANY-SUBJ = “Company” @ COMPANY-PARTNER = “22 days, XY
“Z Trust”, @ SKIP = “Regarding sales in Japan”. This is performed by the proper noun sentence pattern matching process in step S12.

【００３５】固有名詞の属性を持つ任意フィールドは＠
ＣＯＭＰＡＮＹ−ＳＵＢＪと＠ＣＯＭＰＡＮＹ−ＰＡＲ
ＴＮＥＲであるので、個別に固有名詞候補の範囲を決定
する。前者は、「同社」の１単語からなり名詞であるの
で、そのまま固有名詞候補となる。後者の文字列の形態
素解析結果は、“／”により単語の分割位置、“［］”
内に品詞を示すと、「２２［数字］／日［接尾語］／、
［記号］／ＸＹＺ［名詞または未登録語］／信託［名
詞］のようになる。そして、後方から単語を遡り品詞を
調べる。「、」は記号であり固有名詞の一部にならない
ため、それ以降の単語「ＸＹＺ信託」が企業名候補にな
る。この処理がステップＳ１３の固有名詞範囲決定処理
でなされる。Arbitrary fields with proper noun attributes are @
COMPANY-SUBJ and @ COMPANY-PAR
Since it is TNER, the range of proper noun candidates is individually determined. The former is a noun consisting of one word "company", so it is a proper noun candidate as it is. The latter morphological analysis result of the character string is the word division position by "/", "[]"
The part of speech is shown as "22 [number] / day [suffix] /,
[Symbol] / XYZ [noun or unregistered word] / trust [noun]. Then, the word is traced back from the rear and the part of speech is examined. Since "," is a symbol and does not become part of a proper noun, the word "XYZ trust" after that becomes a candidate for a company name. This processing is performed in the proper noun range determination processing in step S13.

【００３６】次に、ステップＳ１４の固有名詞推定処理
で文中の固有名詞を推定する。先ず、＠ＣＯＭＰＡＮＹ
−ＳＵＢＪから推定した企業名「同社」と、＠ＣＯＭＰ
ＡＮＹ−ＰＡＲＴＮＥＲから推定した「ＸＹＺ信託」に
対して、個別に固有名詞であるか否かを判断する。前者
については、固有名詞の文パターン辞書Ｄ１１から、照
合文字列中に固有名詞が含まれる確率ＭＢは0.97であ
る。また、「同社」は２文字からなる単語であるから、
固有名詞の確からしさテーブルＤ１２（図６(a)）か
ら、固有名詞としての負の確からしさＭＤ１は0.85であ
る。同様に、「同社」は漢字からなる単語であるから、
固有名としての負の確からしさＭＤ２は、固有名詞の確
からしさテーブル（図６(b) ）により0.77となる。した
がって、総合的な負の確からしさＭＤは、上記計算式
(1) により、0.97となる。ゆえに、総合的な固有名詞と
しての確からしさＣＦは、上記計算式(2) により、0.00
となり、固有名詞として認められるか否かを示す規定値
を0.10とすると、規定値以下であるため、固有名詞と判
断しない。Next, proper nouns in the sentence are estimated by proper noun estimation processing in step S14. First, @COMPANY
-The company name "Company" estimated from SUBJ and @COMP
It is individually judged whether or not the “XYZ trust” estimated from ANY-PARTNER is a proper noun. For the former, the probability MB that the proper noun is included in the matching character string is 0.97 from the proper noun sentence pattern dictionary D11. Also, "company" is a two-letter word,
From the certainty probability table D12 of the proper noun (FIG. 6A), the negative certainty MD1 as a proper noun is 0.85. Similarly, "company" is a word consisting of kanji,
The negative likelihood MD2 as a proper name is 0.77 according to the proper noun certainty table (FIG. 6 (b)). Therefore, the overall negative likelihood MD is
By (1), it becomes 0.97. Therefore, the probability CF as a comprehensive proper noun is 0.00 by the above formula (2).
Therefore, if the specified value indicating whether or not it is recognized as a proper noun is 0.10, it is below the specified value, so it is not judged as a proper noun.

【００３７】一方、「ＸＹＺ信託」に対しても同様の計
算により、ＭＢ＝0.94、ＭＤ１＝0.59、ＭＤ２＝0.45を
得て、総合的な固有名詞としての確からしさは、上記計
算式(1) 、(2) により、ＣＦ＝0.17となる。これは前述
の規定値より大きいので、固有名詞であると判断し出力
する。On the other hand, for "XYZ trust", MB = 0.94, MD1 = 0.59, MD2 = 0.45 are obtained by the same calculation, and the certainty as a comprehensive proper noun is calculated by the above formula (1). , (2) gives CF = 0.17. Since this is larger than the above-mentioned specified value, it is determined to be a proper noun and is output.

【００３８】なお、上記実施例は本発明の一例を示すも
のであり、本発明はこれに限定されるべきものではない
ことは言うまでもない。It is needless to say that the above embodiment shows one example of the present invention, and the present invention should not be limited to this.

【００３９】[0039]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、辞書検索による方法、および固有名詞の前後で
頻繁に出現する接頭語、接尾語を利用したパターンマッ
チングによる方法によっても特定できない固有名詞を、
高精度に特定することが可能な固有名詞特定処理システ
ムを実現できる。As described above in detail, according to the present invention, it is possible to specify by the method by the dictionary search and the method by the pattern matching using the prefix and suffix frequently appearing before and after the proper noun. A proper noun
A proper noun identification processing system that can identify with high accuracy can be realized.

[Brief description of drawings]

【図１】本発明の一実施例に係わる日本語文章に対する
固有名詞特定処理の概要を示す動作フローチャートであ
る。FIG. 1 is an operation flowchart showing an outline of proper noun identification processing for a Japanese sentence according to an embodiment of the present invention.

【図２】実施例の固有名詞の文パターン辞書Ｄ１１の内
容の一部を示す図である。FIG. 2 is a diagram showing a part of the contents of a proper noun sentence pattern dictionary D11 according to an embodiment.

【図３】実施例の固有名詞パターンマッチング処理の動
作フローチャートである。FIG. 3 is an operation flowchart of proper noun pattern matching processing according to the embodiment.

【図４】実施例の固有名範囲決定処理の動作フローチャ
ートである。FIG. 4 is an operation flowchart of proper name range determination processing according to the embodiment.

【図５】実施例の固有名詞推定処理の動作フローチャー
トである。FIG. 5 is an operation flowchart of proper noun estimation processing according to the embodiment.

【図６】実施例の固有名詞の確からしさテーブルの内容
の一部を示す図である。FIG. 6 is a diagram showing a part of the contents of a probability table of proper nouns according to the embodiment.

【図７】従来の固有名詞を推定する処理のフローチャー
トである。FIG. 7 is a flowchart of a conventional process for estimating proper nouns.

フロントページの続き (56)参考文献特開平２−51765（ＪＰ，Ａ) 特開平３−278176（ＪＰ，Ａ) 特開平４−188364（ＪＰ，Ａ) 木谷強，固有名詞の特定機能を有する形態素解析処理，情報処理学会研究報告 92−ＮＬ−90，Ｖｏｌ．92 Ｎｏ. 55，ｐ．73−80 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/30 Continuation of the front page (56) References JP-A-2-51765 (JP, A) JP-A-3-278176 (JP, A) JP-A-4-188364 (JP, A) Tsuyoshi Kitani, specific function of proper noun Morphological analysis processing, IPSJ Research Report 92-NL-90, Vol. 92 No. 55, p. 73-80 (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 17/30

Claims

(57) [Claims]

1. A proper noun identification processing system for extracting proper nouns from a Japanese sentence added from an input device.
Estimating whether a character string in a Japanese sentence is a proper noun
In the method, proper noun pattern data corresponding to the character string is stored in a storage unit, the proper noun pattern data defining the probability that the proper noun is present in the pattern in a pattern unit of a sentence in which the proper noun frequently appears.
It is read from the sentence pattern dictionary of remembered proper nouns, and statistical data that determines the certainty as proper nouns with at least one of the number of characters forming the proper noun or the character type as a parameter is stored in the storage means.
Read from the size table and refer to the proper noun pattern data and the statistical data to estimate whether or not the proper noun by probabilistic inference
Then, the proper noun specifying method is characterized by outputting the result from an output device .

2. The probabilistic inference uses a first value obtained from the statistical data with the number of characters of proper noun candidates as a parameter, and the statistics from the first certainty with respect to proper noun candidates obtained from the proper noun pattern data. The second obtained from the data using the type of the first character of the proper word candidate as a parameter
Is added, and the third certainty obtained by subtracting the second certainty obtained by subtracting the multiplied value of the first value and the second value from the addition result is greater than or equal to a specific value. 2. The proper noun specifying method according to claim 1, further comprising estimating whether or not the proper noun candidate is a proper noun.