JP2775655B2

JP2775655B2 - Abbreviation word completion device and Japanese analysis system

Info

Publication number: JP2775655B2
Application number: JP3143812A
Authority: JP
Inventors: 芳隆平墳; 裕治岡村; 朗高木
Original assignee: SHII ESU KEI KK
Current assignee: SHII ESU KEI KK
Priority date: 1991-05-20
Filing date: 1991-05-20
Publication date: 1998-07-16
Anticipated expiration: 2013-07-16
Also published as: JPH04343173A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、入力した日本語文の省
略語を機械的に補完する省略語補完装置及びかかる省略
語補完装置を備えた日本語解析システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an abbreviation completion device for mechanically supplementing abbreviations of an input Japanese sentence and a Japanese analysis system provided with such an abbreviation completion device.

【０００２】[0002]

【従来の技術】電子計算機による日本語解析、あるいは
これを用いた機械翻訳などの情報処理システムにおい
て、入力した日本語文中の省略語がいかなる語と照応す
るかを把握することが、しばしば重要になる。これは日
本語文には聞き手あるいは読み手が再現できる要素は省
略する傾向が強いという性質があるからで、省略語を補
完する技術は上記の情報処理システムには極めて重要な
ものとなる。2. Description of the Related Art In an information processing system such as a computer-assisted Japanese analysis or a machine translation using such a computer, it is often important to understand what abbreviation in an input Japanese sentence corresponds to. Become. This is because Japanese sentences tend to omit elements that can be reproduced by the listener or the reader, and the technique of complementing the abbreviations is extremely important for the information processing system described above.

【０００３】従来、この省略語を補完する技術として
は、省略された格成分と同じ格の語を、以前に入力され
た文または節の新しい方から上記の意味的制約条件とは
無関係に探し補完するというものがあった。Conventionally, a technique for complementing this abbreviation is to search for a word having the same case as the omitted case component from a newer one of the previously input sentence or clause, irrespective of the above semantic constraints. There was something to complement.

【０００４】また、さらに進んだ省略語補完技術とし
て、単語が有する意味素性などの概念的情報を用いたも
のがあった。すなわち、動詞などの各述語に対し、その
格成分になり得る語の意味的制約条件を予め定義し、以
前に入力された文または節の中の名詞のうち新しいもの
から、制約条件に当てはまるものを探しだし、そうして
見付かった語を補完するというものである。例えば、
「太郎の目の前に奇妙な食べ物が出された。ところが、
（φが）（φを）食べてみると…。」という文章につい
て考えてみる（文中の“φ”は省略されていることを表
している）。Further, as an advanced abbreviation complementing technique, there has been a technique using conceptual information such as a semantic feature of a word. In other words, for each predicate such as a verb, the semantic constraint of the word that can be the case component is defined in advance, and the noun in the previously input sentence or clause that matches the constraint from the newest noun Search for and complement the words it finds. For example,
"Weird food was served in front of Taro.
Eat (φ) and (φ) ... ("Φ" in the sentence means omitted).

【０００５】「食べる」の助詞「が」の表示する格にな
り得るのは動物、助詞「を」の表示する格になり得るの
は食べ物と定義されていた場合、第一文の名詞の新しい
方から「食べ物」「前」「目」「太郎」の順で意味情報
を調べ、動物・食べ物を探す。その結果、動物に含まれ
る「太郎」を「食べる」の「が」の表示する格とし、食
べ物である「食べ物」を「を」の表示する格として補完
することとなる。[0005] If it is defined that the case where the particle of "eating" can be displayed as an animal and that the case where the particle of "o" can be displayed is food, the noun in the first sentence is new. Look up the semantic information in the order of "food", "before", "eyes" and "Taro", and search for animals and food. As a result, "Taro" included in the animal is displayed as the case where "ga" of "eat" is displayed, and food "food" is complemented as the case where "is displayed".

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、省略語
補完技術においては、意味的制約条件に合うものでなけ
れば補完することができないというのは、通常の文では
最低限必要な条件である。したがって、これと無関係に
以前の同じ格の成分の語を補完する語として選択する場
合、正確な補完を期待することはできなかった。However, in the abbreviation complementing technique, it is a minimum necessary condition in a normal sentence that it cannot be complemented unless it meets semantic constraints. Therefore, regardless of this, it was not possible to expect accurate completion when selecting a word of the same case component as the previous word as a complement.

【０００７】また、意味素性などの概念的情報を用いた
省略補完技術では、省略されているところに近いところ
に補完すべき語があるといったことを前提としており、
その蓋然性は必ずしも高くない。このため、かかる従来
技術でも省略語の補完に関して、あまり良好な精度は得
られなかった。そして、これら従来の技術により省略さ
れている語を全て補おうとすると、かえって誤りの絶対
数が増加し、機械翻訳などの後編集に負担がかかるた
め、応用技術上どうしても省略語の補完が必要な場合に
は、助詞「が」の表示する格の成分あるいは主語などの
補完に限定しておこなっていた。また、省略語補完技術
を機械翻訳に応用する場合、例えば、「花」の対訳とし
て、ｆｌｏｗｅｒとｂｌｏｓｓｏｍのどちらかを選ぶに
は、「桜、バラ」などの連体修飾成分の補完が必要とな
る。しかし、このような補完を行うことは従来の省略語
補完技術ではほとんど不可能であった。[0007] In addition, the abbreviation complementing technique using conceptual information such as semantic features presupposes that there is a word to be complemented near a place where the abbreviation is omitted.
The probability is not always high. For this reason, even in the related art, very good accuracy was not obtained in complementing the abbreviations. If all of the words abbreviated by these conventional techniques are to be supplemented, the absolute number of errors will increase, and post-editing such as machine translation will be burdensome. In some cases, it was limited to complementing the case components or the subject displayed by the particle "ga". In addition, when applying the abbreviation completion technology to machine translation, for example, in order to select either flower or blossom as a translation of “flower”, it is necessary to supplement a continuous modification component such as “cherry blossom, rose”. . However, such completion was almost impossible with conventional abbreviation completion technology.

【０００８】本発明は上記従来の課題を解決すべくなさ
れたものであり、以前の文または節の主題を含む文脈に
関する情報を利用して省略語の補完を行うことにより、
精度の高い省略語補完を実現すると共に、必要な連体修
飾成分の補完をも実現することを目的とする。[0008] The present invention has been made to solve the above-mentioned conventional problems, and abbreviations are complemented by using information on context including the subject of a previous sentence or section.
An object of the present invention is to realize highly accurate abbreviation complementation and also to complement a required union modification component.

【０００９】[0009]

【課題を解決するための手段】上記の目的を達成するた
め、本発明は、日本語文を単語に分割し、該単語に単語
の持つ構文情報及び意味情報を付加する形態素解析処理
と、上記形態素解析処理により分割された単語相互の係
り受け関係や句構造などの文構造を上記構文情報と意味
情報及び上記文構造に関する所定の解析規則に従って解
析する構文解析処理とを施された文の節を入力し、上記
入力した文の節中の省略された語を補完する省略語補完
装置において、上記省略語補完装置は、上記入力した節
中の省略語のうち格成分となるべき語を所定の省略補完
規則に従って補完する格成分省略補完部と、上記格成分
となるべき語を補完された節の主題を所定の主題判定規
則に従って判定する主題判定部と、上記格成分省略補完
部によって格成分となるべき語の補完がなされた節と上
記主題判定部によって判定された主題とを文脈に関する
情報として格納する文脈情報保持部とを備え、上記格成
分省略補完部は、入力した節中の述語に対して上記形態
素解析処理により付加された構文情報及び意味情報に基
づき、入力した節において成分が省略されている格及び
その格の成分の名詞句が満たすべき条件を検出する省略
格検出手段と、上記形態素解析処理により付加された構
文情報及び意味情報に基づいて、上記文脈情報保持部に
格納された以前の文または節のうち上記条件を満足する
語を、上記単語が省略されている格成分となり得る語と
して選び出す補完候補選択手段と、上記補完候補選択手
段が選択した語に対して、上記格成分の省略された語で
ある蓋然性を所定の省略補完規則を適用することにより
判断し、上記蓋然性が高いと判断された語から順に優先
順位を決定する優先順位決定手段と、上記優先順位の最
上位の語を上記格成分となるべき語として上記入力した
節にかける省略語補完手段とを備えてなり、上記主題判
定部は、入力した節の主題となり得る語を該入力した節
あるいは文脈情報保持部に格納された以前の文または節
の中から選び出す主題候補選択手段と、上記主題候補選
択手段が選択した語に対して、上記入力した節の主題で
ある蓋然性を所定の主題判定規則を適用し所定の語彙関
係情報を参照することにより判断し、上記蓋然性が最も
高いと判断された語を上記入力した節の主題として決定
する主題決定手段とを備えてなることを特徴としてい
る。In order to achieve the above object, the present invention provides a morphological analysis process for dividing a Japanese sentence into words and adding syntax information and semantic information of the words to the words. A sentence clause that has been subjected to a syntax analysis process that analyzes sentence structures such as dependency relationships and phrase structures between words divided by the analysis process according to the syntax information and semantic information and a predetermined analysis rule regarding the sentence structure. In the abbreviation completion device which inputs and abbreviates the omitted word in the section of the input sentence, the abbreviation completion device determines a word to be a case component among the abbreviations in the input section in a predetermined manner. A case component elimination complementer that complements according to the abbreviation completion rule, a subject determination unit that determines the subject of the section in which the word to be the case component is complemented according to a predetermined theme determination rule, and a case component elimination complementer. A context information holding unit that stores the clause in which the word to be complemented and the subject determined by the subject determining unit as context-related information, and the case component elimination complement unit includes a predicate in the input clause. On the other hand, based on the syntax information and the semantic information added by the morphological analysis process, an abbreviated case detection unit that detects a case in which a component is omitted in the input clause and a condition that a noun phrase of the case component should satisfy, Based on the syntactic information and semantic information added by the morphological analysis processing, a word that satisfies the condition among the previous sentences or clauses stored in the context information holding unit is replaced by a case component in which the word is omitted. A supplementary candidate selecting means for selecting as a possible word, and for a word selected by the complementary candidate selecting means, a predetermined abbreviated complementing rule for the probability that the case component is omitted. Priority determining means for determining priorities in order from the words determined to be highly probable, and the section in which the highest word of the priority is input as the word to be the case component The subject determining unit selects a word that can be a subject of the input clause from the input sentence or a previous sentence or clause stored in the context information holding unit. Candidate selecting means, for the word selected by the subject candidate selecting means, determine the probability of the subject of the input clause by applying a predetermined subject determination rule and referring to predetermined vocabulary relation information, Subject determining means for determining a word determined to have the highest probability as the subject of the input section.

【００１０】また、請求項４の発明は、日本語文を単語
に分割し、該単語に単語の持つ構文情報及び意味情報を
付加する形態素解析処理と、上記形態素解析処理により
分割された単語相互の係り受け関係や句構造などの文構
造を上記構文情報及び意味情報と上記文構造に関する所
定の解析規則に従って解析する構文解析処理とを行う日
本語解析システムにおいて、上記入力した文の節中の省
略された語を補完する省略語補完装置を備え、かかる省
略語補完装置が、上記入力した節中の省略語のうち格成
分となるべき語を所定の省略補完規則に従って補完する
格成分省略補完部と、上記格成分となるべき語を補完さ
れた節の主題を所定の主題判定規則に従って判定する主
題判定部と、上記格成分省略補完部によって格成分とな
るべき語の補完がなされた節と、上記主題判定部によっ
て判定された主題とを文脈に関する情報として格納する
文脈情報保持部とを備え、上記格成分省略補完部が、入
力した節中の述語について上記形態素解析処理により付
加された構文情報及び意味情報に基づき、入力した節に
おいて成分が省略されている格及びその格の成分の名詞
句が満たすべき条件を検出する省略格検出手段と、上記
形態素解析処理により付加された構文情報及び意味情報
に基づいて、上記文脈情報保持部に格納された以前の文
または節のうち上記条件を満足する語を、上記単語が省
略されている格成分となり得る語として選び出す補完候
補選択手段と、上記補完候補選択手段が選択した語に対
して、上記格成分の省略された語である蓋然性を所定の
省略補完規則を適用することにより判断し、上記蓋然性
が高いと判断された語から順に優先順位を決定する優先
順位決定手段と、上記優先順位の最上位の語を上記格成
分となるべき語として上記入力した節にかける省略語補
完手段とを備え、上記主題判定部が、入力した節の主題
となり得る語を該入力した節あるいは文脈情報保持部に
格納された以前の文または節の中から選び出す主題候補
選択手段と、上記主題候補選択手段が選択した語に対し
て上記入力した節の主題である蓋然性を所定の主題判定
規則を適用し所定の語棄関係情報を参照することにより
判断し、上記蓋然性が最も高いと判断された語を上記入
力した節の主題として決定する主題決定手段とを備えて
なることを特徴としている。According to a fourth aspect of the present invention, there is provided a morphological analysis process for dividing a Japanese sentence into words, and adding syntax information and semantic information of the words to the words. In a Japanese parsing system that performs a parsing process for parsing a sentence structure such as a dependency relationship or a phrase structure according to the above syntax information and semantic information and a predetermined parsing rule regarding the above sentence structure, the omission in the section of the input sentence is omitted. An abbreviation completion unit that complements the selected word, and the abbreviation completion unit complements a word to be a case component among the abbreviations in the input section according to a predetermined abbreviation completion rule. And a subject determination unit that determines the subject of the section in which the word to be the case component is complemented according to a predetermined subject determination rule, and completion of the word to be the case component by the case component elimination complement unit. And a context information holding unit that stores the subject determined by the subject determining unit as context-related information, and the case component omission complementing unit performs the morphological analysis process on the predicate in the input clause. Based on the added syntax information and semantic information, an abbreviation case detecting means for detecting a case in which a component is omitted in the input clause and a condition that a noun phrase of the component of the case should satisfy, and Based on the syntactic information and the semantic information, and selects a word that satisfies the above condition from the previous sentence or clause stored in the context information holding unit as a word that can be a case component in which the word is omitted. By applying a predetermined abbreviated complementing rule to the selecting means and the word selected by the complementing candidate selecting means, the probability that the case component is abbreviated is applied to the word. Priority order determining means for determining the order of priority in the order of the words determined to be highly probable, and abbreviations to be applied to the input section as the word to be the case component with the highest word in the priority order Subject means selecting means for selecting from the previous sentence or section stored in the input section or the context information holding section, the subject determining section, the subject determining section comprising: The probability that is the subject of the input section is determined for the word selected by the subject candidate selecting means by applying a predetermined subject determination rule and referring to predetermined abandonment-related information, and determining that the probability is the highest. Subject determining means for determining the input word as the subject of the input section.

【００１１】また、請求項２及び請求項５の発明は、格
成分省略補完部に備えた省略格検出手段が、少なくとも
助詞「が」、「を」あるいは「に」が表示する格であっ
て時間格及び場所格以外の格について成分が省略されて
いる格及びその格の成分の名詞句が満たすべき条件を検
出することを特徴としている。According to a second aspect of the present invention, the abbreviation case detecting means provided in the case component elimination complementing unit has a case in which at least a particle "ga", "wo" or "ni" is displayed. It is characterized by detecting cases in which components are omitted for cases other than temporal cases and place cases and conditions to be satisfied by noun phrases of the components of the cases.

【００１２】請求項３及び請求項６の発明は、主題判定
部による節の主題判定の根拠となった名詞間の関係に基
づき、上記節の省略語のうち連体修飾成分となるべき語
を補完する連体修飾成分補完部を備えることを特徴とし
ている。According to the third and sixth aspects of the present invention, words to be used as adjunct modification components among the abbreviations in the above section are complemented on the basis of the relationship between the nouns used as the basis for the subject determination of the section by the subject determining section. It is characterized by having a continuous modification component complementing section.

【００１３】[0013]

【００１４】[0014]

【実施例】以下、本発明の実施例について図面を参照し
て詳細に説明する。図１は、本発明の一実施例に係る省
略語補完装置の構成を示すブロック図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of an abbreviation completion device according to an embodiment of the present invention.

【００１５】図示のように、本実施例の省略語補完装置
１は、入力した日本語の文の節中の省略語のうち格成分
となるべき語を補完する格成分省略補完部２と、格成分
となるべき語を補完された節の主題を判定する主題判定
部３と、かかる文または節中の省略語のうち連体修飾成
分となるべき語を補完する連体修飾成分省略語補完部４
と、かかる節の解析結果とその主題とを文脈に関する情
報（以下、文脈情報と称す）として格納する文脈情報保
持部５とを備えてなる。また、本実施例には、省略語を
補完するための省略補完ルール集６と、主題を判定する
ための主題判定ルール集７及び語彙関係データベース８
を格納する外部メモリが接続されている。As shown in the figure, an abbreviation complementing device 1 of the present embodiment includes a case component abbreviation complementing unit 2 for complementing a word to be a case component among abbreviations in a section of an input Japanese sentence. A subject determining unit 3 for determining the subject of a section in which a word to be a case component has been complemented, and a continuous modifier abbreviation complementing unit 4 for complementing a word to be a continuous modification component among abbreviations in such sentences or clauses
And a context information holding unit 5 for storing the analysis result of the section and its subject as information related to context (hereinafter, referred to as context information). Further, in this embodiment, a collection of abbreviation completion rules 6 for complementing abbreviations, a collection of subject determination rules 7 for determining a subject, and a vocabulary relation database 8
Is connected to an external memory.

【００１６】本実施例の省略語補完装置１には、構文解
析装置等によって、日本語文を単語に分割し各単語に構
文情報及び意味情報を付加する形態素解析処理と、形態
素解析処理により分割された単語相互の係り受け関係や
句構造等の文構造を上記構文情報と意味情報及び上記文
構造に関する所定の解析規則に従って解析する構文解析
処理とを施された文の節が入力される。これらの形態素
解析処理や構文解析処理を行う手段としては、従来から
用いられている手段を用いることができる。In the abbreviation complementing apparatus 1 of the present embodiment, the Japanese sentence is divided into words by a syntactic analysis device or the like, and the sentence is divided by morphological analysis processing in which syntactic information and semantic information are added to each word, and morphological analysis processing. A sentence clause that has been subjected to syntax analysis processing for analyzing the sentence structure such as the dependency relationship between words and the phrase structure according to the syntax information and semantic information and a predetermined analysis rule regarding the sentence structure is input. As means for performing these morphological analysis processing and syntax analysis processing, means conventionally used can be used.

【００１７】上記格成分省略補完部２は、図２に示すよ
うに入力した節中において単語が省略されている格成分
（以下、省略格と称す）を検出する省略格検出手段２１
と、かかる省略格に該当する語を文脈情報保持部５に格
納された以前の文または節の文脈情報の中から選び出す
補完候補選択手段２２と、選び出された語に対し省略補
完ルール集６中のルール（省略補完規則）を適用し省略
格に該当する蓋然性が高い語から順に優先順位を決定す
る優先順位決定手段２３と、優先順位が最上位の語を入
力した節の省略格の位置にかける省略語補完手段２４と
を補えてなる。The case component elimination complementing unit 2 detects the case component (hereinafter referred to as the abbreviated case) in which the word is omitted in the input section as shown in FIG.
A complement candidate selection means 22 for selecting a word corresponding to the abbreviation from context information of a previous sentence or section stored in the context information holding unit 5; Priority determining means 23 for determining the order of priority from words having a high likelihood corresponding to the abbreviation by applying the middle rule (abbreviation completion rule), and the position of the abbreviation in the section where the word with the highest priority is input And the abbreviation complementing means 24.

【００１８】形態素解析処理及び構文解析処理が施され
た文の節が入力されると、省略格検出手段２１は入力し
た節中の述語に着目し、これに付加された構文情報及び
意味情報に基づいてその述語に実際に係っている語を検
査し、省略格の有無及びその格の成分の名詞句が満たす
べき条件を判断する。When a clause of a sentence subjected to the morphological analysis processing and the syntax analysis processing is input, the abbreviated case detecting means 21 pays attention to the predicate in the input clause, and applies the syntactic information and the semantic information added thereto. Based on the predicate, a word that is actually related to the predicate is checked to determine whether or not there is an abbreviated case and a condition that the noun phrase of the component of the case should satisfy.

【００１９】省略格が検出されると、補完候補選択手段
２２は省略格が係る述語の構文情報及び意味情報に基づ
いて文脈情報保持部５の文脈情報よりその省略格の条件
を満たす語、すなわち補完候補を選び出す。ここで、文
脈情報保持部５には以前に省略語補完処理がなされた節
の解析結果及びその節の主題が文脈情報として格納され
ている。When an abbreviation is detected, the complement candidate selection means 22 determines a word satisfying the condition of the abbreviation from the context information of the context information holding unit 5 based on the syntax information and the semantic information of the predicate to which the abbreviation is related, that is, Select a candidate for completion. Here, the context information holding unit 5 stores, as context information, the analysis result of the section for which the abbreviation completion processing has been performed previously and the subject of the section.

【００２０】補完候補が選択されると、優先順位決定手
段２３は各補完候補に対して外部メモリに格納された省
略補完ルール集６のルールを適用し、各補完候補が省略
格に該当する蓋然性がどれくらいあるかを判断する。そ
して、蓋然性の高い補完候補から順に優先順位を決定す
る。When a complement candidate is selected, the priority order determination means 23 applies the rules of the omitted complement rule set 6 stored in the external memory to each complement candidate, and the probability that each complement candidate corresponds to the abbreviation is applied. To determine how much there is. Then, the priority order is determined in order from the most probable complement candidate.

【００２１】ここで省略補完ルール集６には、補完候補
が以前の文または節中でどのような特徴（主題、格成分
等）を有していたかを条件として、省略格に該当する蓋
然性がどの程度あるかを得点化したルールを集めてあ
る。そして、かかる省略補完ルールのうち一致したルー
ルの得点の総計により各補完候補が省略格に該当する蓋
然性が評価されることとなる。Here, the abbreviation complementing rule set 6 includes a probability that the abbreviation corresponds to an abbreviation on condition that the complementing candidate has a characteristic (subject, case component, etc.) in a previous sentence or section. A collection of rules that score how much there is. Then, the likelihood that each complementary candidate corresponds to the abbreviation is evaluated based on the total score of the matching rule among the omitted complementary rules.

【００２２】省略補完ル−ルは、以前の主題が助詞
「が」「を」「に」の表示する省略格の成分となりやす
いこと、前の節または前の文の主節の格成分の語及びそ
れらを修飾する語、そこにおいて省略語補完に用いられ
た語、及びそこにおいて比喩的に場所化されている語の
ような次の節の主題になりやすい語が省略格の成分にな
りやすいこと、ただし、原因理由を述べる節のような従
属的な前の節で助詞「が」に係っている語が主格を補完
する語とならないこと、などが、個々にル−ル化されて
いる。The abbreviation completion rule is based on the fact that the previous subject is likely to be an abbreviated case component that displays the particles "ga", "wo", and "ni", and the word of the case component of the previous clause or the main clause of the previous sentence. And words that modify them, words that are used for abbreviation completion there, and words that are likely to be the subject of the next section, such as words that are figuratively localized there, are likely to be components of the abbreviated case However, the fact that the word related to the particle "ga" in the subordinate previous clause, such as the clause describing the cause and reason, does not become a word that complements the nominative case, etc. I have.

【００２３】下記の表に省略補完ルールの例を示す。な
お、表中の各ル−ルの得点の大小は、言語現象の調査に
基づいて設定されており、得点が大きいほど省略格に該
当する蓋然性が高い。The following table shows an example of an omission complement rule. The score of each rule in the table is set based on a survey of linguistic phenomena, and the higher the score, the higher the probability of being abbreviated.

【００２４】[0024]

【表１】 [Table 1]

【００２５】補完候補の優先順位が決定すると、省略語
補完手段２４は優先順位の最も高い補完候補を省略格に
該当する語として入力した節に補完する。ここで、補完
すべき省略格が複数あり、かつ同一の語が複数の省略格
の補完候補となっている場合、最も蓋然性の高い（ルー
ルの得点の総計が高い）補完候補と省略格の組合わせを
優先させる。そして、他の省略格には次に優先順位の高
い補完候補によって補完する。When the priority order of the complementing candidates is determined, the abbreviation complementing means 24 complements the complement candidate having the highest priority to the section input as a word corresponding to the abbreviation. Here, if there are a plurality of abbreviations to be complemented and the same word is a candidate for a plurality of abbreviations, the combination of the most probable (highest total score of rules) and abbreviations Prioritize matching. The other abbreviations are complemented by complement candidates with the next highest priority.

【００２６】省略格の補完が完了すると、省略語補完手
段２４は補完のなされた節を主題判定部３及び連体修飾
語補完部４へ送る。When completion of the abbreviation is completed, the abbreviation completion means 24 sends the complemented clause to the subject judging section 3 and the continuous modifier complement section 4.

【００２７】なお、入力した節に省略格がないとき（省
略格検出手段２１が省略格を検出しなかったとき）は上
記の省略語補完処理を行わずにかかる節を主題判定部３
及び連体修飾成文補完部４へ送る。When there is no abbreviation in the input section (when the abbreviation detection means 21 does not detect an abbreviation), the abbreviation is not subjected to the above-mentioned abbreviation completion processing and the relevant section is replaced with the subject determination section 3.
And sent to the union-modified sentence complementing section 4.

【００２８】上記主題判定部３は、図３に示すように格
成分を補完された節を入力してかかる節の主題となり得
る語をその節あるいは文脈情報保持部５の以前の文また
は節の文脈情報の中から選び出す主題候補選択手段３１
と、選び出された語に対し主題判定ルール集７中のルー
ル（主題判定規則）及び語彙関係データベース８中の語
彙関係情報を適用し主題に該当する蓋然性が最も高い語
を主題として決定する主題決定手段３２とを備えてな
る。As shown in FIG. 3, the subject determining unit 3 inputs a clause in which case components are complemented, and searches for a word which can be the subject of the clause, in the previous sentence or clause of the context information holding unit 5. Subject candidate selection means 31 for selecting from context information
To apply the rule (subject determination rule) in the subject determination rule collection 7 and the vocabulary relation information in the vocabulary relation database 8 to the selected word to determine the word having the highest probability as the subject as the subject. Determining means 32.

【００２９】格成分省略補完部２によって省略格の補完
がなされた節が入力されると、主題候補選択手段３１は
入力した節の主題となるべき語、すなわち主題候補を入
力した節自体あるいは文脈情報保持部５の文脈情報の中
から選び出す。When the section whose abbreviation is complemented by the case component abbreviation complementing section 2 is inputted, the subject candidate selecting means 31 outputs a word to be the subject of the inputted section, that is, the section itself or the context in which the subject candidate was inputted. It is selected from the context information of the information holding unit 5.

【００３０】ここで主題候補選択手段３１は、入力した
節あるいは文脈情報において明示されている語及び格成
分省略補完部２により補完された省略語を主題候補とし
て選ぶ。入力した節の中から主題候補として選ばれるの
は、係助詞「は」「こそ」「も」や「の場合」「なら」
などに係る語、時間格場所格以外の助詞「が」「を」
「に」の表示する省略格に補われた語、及び「と思う」
の節の「と」に係る節の主題などである。文脈情報の中
から主題候補として選ばれるのは、以前の文または節の
主題、以前の文または節の格成分の語、それらを修飾す
る語、及びそこにおいて比喩的に場所化されている語の
ような次の節の主題になりやすい語などである。The subject candidate selecting means 31 selects words specified in the input section or context information and abbreviations complemented by the case component elimination complementing unit 2 as subject candidates. From the input clauses, the subject that will be selected as the subject candidate is the particle “ha”, “kano”, “mo”, “if”, “nara”
Words pertaining to etc., particle "ga""wo"
Words supplemented by abbreviations displayed in "Ni" and "I think"
The subject of the section related to "to" in the section. From the contextual information, the candidates for the subject are the subject of the previous sentence or section, the words of the case components of the previous sentence or section, the words that modify them, and the words that are figuratively localized there. Such as words that are likely to be the subject of the next section.

【００３１】主題候補が選び出されると、主題決定手段
３２は各主題候補に対して外部メモリに格納された主題
判定ルール集７のルールを適用し、各主題候補が主題に
該当する蓋然性がどれくらいあるかを判断する。そして
最も蓋然性の高い語を入力した節の主題として決定す
る。When the subject candidates are selected, the subject determining means 32 applies the rules of the subject determination rule collection 7 stored in the external memory to each subject candidate, and determines the probability that each subject candidate corresponds to the subject. Determine if there is. Then, the most probable word is determined as the subject of the input section.

【００３２】ここで主題判定ル−ル７には、主題候補が
入力した節中より選出されたものである場合には、その
主題候補がその節において語順的にどんな位置にある
か、節内においてどんな役割を担っているか、以前の節
の主題等と同一であるか、あるいは以前の節の主題など
と上位下位などの概念的関係にあるか、また主題候補が
文脈情報中より選出されたものである場合には、その主
題候補が入力した節の名詞句と上位下位などの概念的関
係にあるか、その主題候補がそこの格成分などに明示さ
れているかなどを条件として、主題に該当する蓋然性が
どの程度あるかを得点化したル−ルを集めてある。そし
て、かかる主題判定ルールのうち一致したルールの得点
の総計により各主題候補が主題に該当する該然性が評価
されることとなる。下記の表に主題判定ルールの例を示
す。なお、表中の各ルールの得点の大小は、言語現象の
調査に基づいて設定されており、得点が大きいほど主題
に該当する蓋然性が高い。また、表中「第１名詞句」
「第２名詞句」とあるのは入力した節中の名詞句に前の
ものから順に番号を付したものである。Here, when the subject candidate is selected from the input section, the subject determination rule 7 determines the position of the subject candidate in word order in the section. Role of the subject, whether it is the same as the subject of the previous section, or has a conceptual relationship with the subject of the previous section, such as higher or lower, and the subject candidate was selected from the context information If the subject candidate has a conceptual relationship, such as higher or lower, with the noun phrase of the input clause, or if the subject candidate is specified in the case component, etc. Rules that score the degree of the corresponding probability are collected. Then, the likelihood that each subject candidate corresponds to the subject is evaluated based on the total score of the matching rule among the subject determination rules. The following table shows examples of subject determination rules. Note that the score of each rule in the table is set based on a survey of linguistic phenomena, and the higher the score, the higher the probability that the rule corresponds to the subject. In the table, "Noun phrase"
The “second noun phrase” is a number in which the noun phrases in the input clause are numbered in order from the previous one.

【００３３】[0033]

【表２】 [Table 2]

【００３４】主題決定手段３２が主題候補と文脈情報中
の語との間など語と語の上位下位などの概念的関係を判
断する際には語彙関係データベース８の情報を参照す
る。語彙関係データベース８には、どの名詞がどの名詞
とどのような関係にあるかといった情報が多数格納され
ている。格納した名詞間の関係には、上位と下位、集合
と要素、全体と部分、実体と属性等がある。The subject determining means 32 refers to the information in the vocabulary relation database 8 when determining the conceptual relationship between words and the words in the context information, such as words and the upper and lower words. The vocabulary relation database 8 stores a large number of information such as which noun has which relation to which noun. The relation between the stored nouns includes upper and lower, set and element, whole and part, entity and attribute, and the like.

【００３５】主題が決定されると、主題決定手段３２は
その情報を連体修飾成分補完部４へ送る。なお、入力し
た節が主題を持たない種類のものである場合（例えば、
従属的な節や関係節等）は上記の主題判定処理を行わず
にかかる節を連体修飾成分省略補完部４へ送る。When the subject is determined, the subject determining means 32 sends the information to the continuous modification component complementing unit 4. Note that if the input section is of a type that has no subject (for example,
Subordinate clauses and related clauses) send such clauses to the continuous modification component omission complementing unit 4 without performing the above-described subject determination processing.

【００３６】上記連体修飾成分補完部４は主題判定部３
による主題決定の根拠のひとつとなった名詞間の関係に
基づき、入力した節の省略語のうち連体修飾成分となる
べき語を補完する。そして、連体修飾成分を補完された
節と主題とを文脈情報保持部５に送り、さらに補完され
た節を出力して構文解析装置等へ送る。この補完によ
り、次の文または節の省略語補完や主題判定の際、判断
材料が増えることとなり、精度の向上を図ることができ
る。また、本実施例を機械翻訳システムに応用する場合
にも、より正確な翻訳処理を行うことができる。The above-mentioned continuous modification component complementing unit 4 is provided with a subject determining unit 3
Based on the relationship between the nouns that became one of the grounds for the subject's decision, the abbreviation of the input clause is complemented with the word that should be the noun modification component. Then, the clause and the subject complemented by the continuous modification component are sent to the context information holding unit 5, and the complemented clause is output and sent to a parser or the like. With this complementation, when abbreviations of the next sentence or section are complemented or the subject is determined, the amount of judgment is increased, and the accuracy can be improved. Also, when the present embodiment is applied to a machine translation system, more accurate translation processing can be performed.

【００３７】なお、連体修飾成分補完部４は必須の構成
要件ではなく、連体修飾成分の省略語を補完する必要が
ない場合は設けなくても良い。この場合、格成文省略補
完部２から格成分を補完された節が、また主題判定部３
からその節の主題が直接文脈情報保持部５に送られる。
また、格成分省略補完部２から格成分を補完された節が
直接出力され、構文解析装置等へ送られる。The continuous modification component complementing section 4 is not an essential component, and may not be provided if it is not necessary to supplement the abbreviation of the continuous modification component. In this case, the section whose case component has been complemented by the case sentence elimination complementing unit 2 and the subject determination unit 3
Is sent directly to the context information holding unit 5 from.
The case component complementing section 2 directly outputs the clause in which the case component has been complemented, and sends the clause to the parsing device or the like.

【００３８】上記文脈情報保持部５は、入力した省略語
を補完された節及びその主題を文脈情報として格納す
る。格納された文脈情報は次に入力される節の省略語の
補完及び主題の判定に利用されることとなる。The context information holding unit 5 stores, as the context information, a section and its subject in which the input abbreviation is complemented. The stored context information is used for complementing the abbreviation of the section to be input next and determining the subject.

【００３９】以上のように構成した本実施例の省略語補
完装置１は、単独で利用してもよくまた日本語解析シス
テムや機械翻訳システム等の内部に内在させて利用して
もよい。日本語解析システム中に内在させた場合の構成
例を図４に示す。The abbreviation completion apparatus 1 of the present embodiment configured as described above may be used alone or may be used by being incorporated inside a Japanese analysis system, a machine translation system, or the like. FIG. 4 shows a configuration example in the case of being embedded in a Japanese analysis system.

【００４０】図示のように、日本語解析システム中に省
略語補完装置を内在させた場合、構文解析装置９によっ
て形態素解析及び構文解析がなされた文の節が格成分省
略補完部２へ送られ、主題判定部３と連体修飾成分補完
部４を経て節の解析結果が構文解析装置９に返されるこ
ととなる。また、補完されるべき語を一旦構文解析装置
９に送り、文または節の解析結果（文または節内の単語
の係り受け関係を示す解析木等）に加えて新たな解析結
果とすることもできる。As shown in the figure, when an abbreviation completion device is included in the Japanese parsing system, the syntactic analysis device 9 sends the morphologically analyzed and syntactically analyzed sentence sections to the case component elimination completion unit 2. Then, the analysis result of the clause is returned to the syntax analysis device 9 via the subject determination unit 3 and the continuous modification component complementing unit 4. In addition, a word to be complemented is sent to the syntactic analyzer 9 once, and may be used as a new analysis result in addition to the analysis result of the sentence or clause (such as a parse tree indicating the dependency relationship of the words in the sentence or clause). it can.

【００４１】次に、図４に示す日本語解析システムによ
り、「バイオ関連株が魅力的だが、小型株が多く、機関
投資家に勧めにくい。値動きが予想以上に激しい。」と
いう文の省略語の補完を行った場合を例として、本実施
例の動作について説明する。まず、構文解析装置９で
「バイオ関連株が魅力的だが」まで形態素解析及び構文
解析が進んだところで、解析結果が格成分省略補完部２
へ送られる。Next, according to the Japanese language analysis system shown in FIG. 4, the abbreviation of the sentence "Bio-related strains are attractive, but small-cap stocks are large and it is difficult to recommend to institutional investors. Price movements are more intense than expected." The operation of the present embodiment will be described by taking the case where complementation is performed as an example. First, when the morphological analysis and the syntax analysis have been performed by the syntax analysis device 9 until “the bio-related strain is attractive,” the analysis result is replaced by the case component elimination complementing unit 2.
Sent to

【００４２】格成分省略補完部２は入力した解析結果中
の述語「魅力的だ」に着目し、辞書の「魅力的だ」の項
から取り出された情報に基づき、「魅力的だ」が助詞
「を」「に」の表示する格（以下、「を」格、「に」格
と称す）を持たないこと、及び助詞「が」の表示する格
（以下、「が」格と称す）をもつが解析結果からすでに
「バイオ関連株」が係っていることから、省略格がない
と判断する。そこで省略格の補完を行なわずに解析結果
を連体修飾成分省略補完部４へ送る。The case component elimination complementing unit 2 pays attention to the predicate "attractive" in the input analysis result and, based on the information extracted from the "attractive" section of the dictionary, converts "attractive" to a particle. The case that does not have the case to display "wo" or "ni" (hereinafter referred to as "wo" case or "ni" case) and the case to display the particle "ga" (hereinafter referred to as "ga" case) However, it is judged that there is no abbreviation because "bio-related strains" are already involved from the analysis results. Therefore, the analysis result is sent to the continuous modification component elimination complementing unit 4 without complementing the abbreviation.

【００４３】主題判定部３は、解析結果中にも文脈情報
中にも主題候補となる語がないことからこの節には主題
がないと判断する。そして、かかる判定結果を連体修飾
成分省略補完部４と文脈情報保持部５へ送る。また、主
題がなく上位下位などの概念的関係の存在を主題決定に
利用していないことから、連体修飾成分省略補完部４は
省略語の補完を行わず、解析結果をそのまま文脈情報保
持部５と構文解析装置９へ送る。The subject determining unit 3 determines that there is no subject in this section since there are no words that are subject candidates in both the analysis result and the context information. Then, the determination result is sent to the continuous modification component omission complementing unit 4 and the context information holding unit 5. Further, since there is no subject and the existence of conceptual relationships such as upper and lower levels is not used for determining the subject, the continuous modification component omission complementing unit 4 does not supplement the abbreviations, and the analysis result is directly used as the context information holding unit 5 To the parsing device 9.

【００４４】次に、構文解析装置９で「バイオ関連株が
魅力的だが、小型株が多く」まで解析され、解析結果が
格成分省略補完部２へ送られる。Next, the parsing device 9 analyzes up to “bio-related strains are attractive, but many small strains”, and the analysis result is sent to the case component elimination complementing section 2.

【００４５】格成分省略補完部２は述語「多い」に着目
して省略格を補完する。「多い」の辞書情報から「多
い」は「が」格、「に」格をもつが、「が」格には「小
型株」が係っているので、まず、省略格検出手段２１に
よって助詞「に」格が省略されていること及び「に」格
に係る名詞の条件が検出される。そして、補完候補選択
手段２２によって補完候補「バイオ関連株」が選出さ
れ、優先順位決定手段２３によって「に」格に該当する
蓋然性が判断され、省略語の補完手段によって「バイオ
関連株」が述語「多い」の「に」格にかけられる。下記
の表に補完候補「バイオ関連株」への省略補完ル−ルの
適用例を示す。The case component elimination complementing unit 2 complements the abbreviation by focusing on the predicate “many”. From the dictionary information of “many”, “many” has “ga” case and “ni” case, but “ga” case is related to “small-cap stock”. It is detected that the case is omitted and the noun condition related to the case is detected. Then, the complementing candidate “bio-related strain” is selected by the complementing candidate selecting means 22, the probability corresponding to “Ni” is determined by the priority order determining means 23, and the “bio-related strain” is predicated by the abbreviation complementing means. It is placed on the "ni" of "many". The table below shows an example of applying the abbreviated complementing rule to the complementing candidate "bio-related strain".

【００４６】[0046]

【表３】 [Table 3]

【００４７】主題判定部３は、この補完済みの解析結果
と、文脈情報保持部５の文脈情報をもとに、この節の主
題を判定する。省略格に補完された語は主題候補になる
ので、主題候補選択手段３１によって「バイオ関連株」
が主題候補として選出される。他に前の節の「が」格の
格成分「バイオ関連株」及び前の節全体を示す「これ」
が主題候補となり、主題決定手段３２によって「バイオ
関連株」などがこの節の主題となる蓋然性が判断され、
「バイオ関連株」がこの節の主題として決定され、連体
修飾成分補完部４と文脈情報保持部５に送られる。な
お、同じ語が節内の主題候補であると同時に節外の主題
候補であるとき、節内の主題候補としてのみ扱い主題判
定ルールを適用する。表４に各主題候補に対する主題判
定ル−ルの適用例を示す。The subject determining unit 3 determines the subject of this section based on the complemented analysis result and the context information of the context information holding unit 5. Since the abbreviation-completed word becomes a subject candidate, the subject candidate selecting means 31 selects “bio-related strain”.
Is selected as a subject candidate. In addition, the case component "bio-related strain" of the "ga" case in the previous section and "this" showing the entire previous section
Is a subject candidate, and the subject determining means 32 determines the probability that “bio-related strains” will be the subject of this section,
The “bio-related strain” is determined as the subject of this section, and sent to the noun modification component complementing unit 4 and the context information holding unit 5. In addition, when the same word is a subject candidate in a clause and a subject candidate outside a clause at the same time, it is treated as a subject candidate in a clause only and the subject determination rule is applied. Table 4 shows an example of applying the subject determination rule to each subject candidate.

【００４８】[0048]

【表４】 [Table 4]

【００４９】連体修飾成分補完部４は、主題「バイオ関
連株」と同一の語が文脈情報中にあり、上位下位などの
概念的関係を主題決定の根拠としていないことから連体
修飾成分の補完を行わず補完済みの解析結果を構文解析
装置９と文脈情報保持部５へ送る。The noun modification component complementing unit 4 complements the noun modification component because the same word as the subject “bio-related strain” is contained in the context information, and the conceptual relationship such as upper and lower is not used as the basis for determining the subject. The analysis result that has been complemented is sent to the syntax analysis device 9 and the context information holding unit 5 without performing.

【００５０】構文解析装置９は、以上の補完に基づいて
次の文または節の解析処理を続ける。したがって、文脈
情報保持部５には「バイオ関連株が魅力的だが、｛バイ
オ関連株に｝小型株が多く、」という解析結果とこの節
の主題「バイオ関連株」とが格納される。The syntax analyzer 9 continues the analysis of the next sentence or clause based on the above complementation. Therefore, the context information holding unit 5 stores the analysis result that “bio-related strains are attractive, but (1) bio-related strains are often small-sized strains” and the subject “bio-related strains” in this section.

【００５１】次に、構文解析装置９で「バイオ関連株が
魅力的だが、｛バイオ関連株に｝小型株が多く、機関投
資家に勧めにくい。」まで解析され、解析結果が格成分
省略補完部２へ送られる。Next, the parsing device 9 analyzes "bio-related strains are attractive, but {bio-related stocks are small, many are small and difficult to recommend to institutional investors"". Sent to unit 2.

【００５２】格成分省略補完部２は述語「勧める」に着
目して省略語を補完する。まず、省略格検出手段２１に
よって「が」格及び「を」格が省略されていること及
び、それぞれの格に係る名詞の条件が検出される。そし
て、補完候補選択手段２２によってそれぞれの格の補完
候補が文脈情報から選出される。ここでは「が」格には
該当する語がなく、「を」格についてのみ「バイオ関連
株」「小型株」「これ（前の節全体を表す）」が補完候
補として選出される。The case component elimination complementing unit 2 complements the abbreviation by focusing on the predicate "recommend". First, the abbreviation case detection means 21 detects that the "ga" case and the "wo" case have been omitted, and detects the noun condition associated with each case. Then, the complement candidate of each case is selected from the context information by the complement candidate selecting means 22. Here, there is no word corresponding to the “ga” case, and only “bio” -related “bio-related stock”, “small-cap stock” and “this (representing the entire previous section)” are selected as complementary candidates.

【００５３】補完候補が選出されると、優先順位決定手
段２３によって各補完候補が「を」格に該当する蓋然性
が判断され、「バイオ関連株」「小型株」「これ」の順
で優先順位が決定される。表５に各補完候補に対する省
略補完ル−ルの適用例を示す。When the complementing candidates are selected, the priority determining means 23 determines the probability that each of the complementing candidates corresponds to the “を” case, and the priority order is “bio-related stock”, “small-cap stock”, and “this”. Is determined. Table 5 shows an example of applying the abbreviated complement rule to each complement candidate.

【００５４】[0054]

【表５】 [Table 5]

【００５５】補完候補の優先順位が決定すると、省略語
補完手段２４によって最も優先順位の高い「バイオ関連
株」が述語「勧める」の「を」格にかけられる。When the priority of the complementing candidate is determined, the abbreviation complementing means 24 ranks the "bio-related strain" with the highest priority in the predicate "recommend".

【００５６】主題判定部３は、この補完済みの解析結果
よりこの節の主題を判定する。まず、主題候補選択手段
３１によって省略格に補完された「バイオ関連株」、前
の節の主題「バイオ関連株」及び前の節の「が」格の格
成分「小型株」が節内の主題候補として選出される。The subject determining unit 3 determines the subject of this section from the complemented analysis result. First, the “bio-related strain” supplemented abbreviated by the subject candidate selection means 31, the subject “bio-related strain” of the previous section and the case component “small strain” of the “ga” case of the previous section are included in the section. Selected as a subject candidate.

【００５７】主題候補が選出されると、主題決定手段３
２によって各主題候補がこの節の主題となる蓋然性が判
断され、最も蓋然性の高い「バイオ関連株」が節内の主
題として決定される。When the subject candidates are selected, the subject determining means 3
2, the probability that each subject candidate becomes the subject of this section is determined, and the “bio-related strain” having the highest probability is determined as the subject in the section.

【００５８】連体修飾成分補完部４は、主題「バイオ関
連株」と同一の語が文脈情報中にあり、概念的な上位下
位関係を主題決定の根拠としていないことから補完を行
わず、補完された解析結果等を構文解析装置９と文脈情
報保持部５に送る。したがって、文脈情報保持部５には
「バイオ関連株が魅力的だが、｛バイオ関連株に｝小型
株が多く、｛バイオ関連株を｝機関投資家に勧めにく
い。」という解析結果とこの節の主題「バイオ関連株」
とが格納される。The noun modification component complementing section 4 does not perform complementation because the same word as the subject “bio-related strain” is in the context information and does not use a conceptual upper-lower relationship as a basis for subject determination. The analysis result and the like are sent to the syntax analysis device 9 and the context information holding unit 5. Therefore, the context information storage unit 5 analyzes the results that "bio-related strains are attractive, but (1) bio-related strains are (1) many small-cap stocks, (2) bio-related strains are difficult to recommend to institutional investors," and the subject of this section. "Bio-related strains"
Are stored.

【００５９】次に、構文解析装置９で「値動きが予想以
上に激しい。」という文が解析され、解析結果が格成分
省略補完部２へ送られる。Next, the syntax analyzer 9 analyzes the sentence "The price change is more intense than expected." The analysis result is sent to the case component elimination complementing unit 2.

【００６０】格成分省略補完部２は述語「激しい」に着
目し、「が」格、「を」格、「に」格のいずれにも格成
分の省略がないと判断し、解析結果を主題判定部３へ送
る。The case component elimination complementing unit 2 pays attention to the predicate "intense", judges that there is no omission of the case component in any of the "ga" case, the "wo" case, and the "ni" case. It is sent to the judgment unit 3.

【００６１】主題判定部３はかかる解析結果によりこの
節の主題を判定する。まず、主題候補選択手段３１に
よって「バイオ関連株」、「機関投資家」、「これ（前
の節全体を表す）」が主題候補として選出される。The subject determining unit 3 determines the subject of this section based on the analysis result. First, “bio-related stock”, “institutional investor”, and “this (representing the entire previous section)” are selected as the subject candidates by the subject candidate selecting means 31.

【００６２】主題候補が選出されると、主題決定手段３
２によって各主題候補がこの節の主題となる蓋然性が判
断され、最も蓋然性の高い「バイオ関連株」が主題とし
て決定される。When the subject candidates are selected, the subject determining means 3
2, the probability that each subject candidate becomes the subject of this section is determined, and the “bio-related strain” having the highest probability is determined as the subject.

【００６３】連体修飾成分補完部４は、主題判定の際に
検出した名詞間の関係（ここでは主体現象関係）に基づ
き「バイオ関連株」を「値動き」にかけ、その結果を構
文解析装置９と文脈情報保持部５へ送る。したがって文
脈情報保持部５には「｛バイオ関連株の｝値動きが予想
以上に激しい。」という解析結果とこの文の主題「バイ
オ関連株」とが格納される。[0063] adnominal component compensating unit 4, subjected based on the relationship between a noun detected during subject determination (principal phenomena relevant here) the "biotechnology Ltd." to "price movements", the syntax analysis unit 9 and the results It is sent to the context information holding unit 5. Therefore, the context information holding unit 5 stores the analysis result that “｛value movement of bio-related strains is more intense than expected” and the subject “bio-related strains” of this sentence.

【００６４】以上で「バイオ関連株が魅力的だが、小型
株が多く、機関投資家に勧めにくい。値動きが予想以上
に激しい。」という文の省略語補完処理を終了する。こ
の処理の結果日本語解析システムから出力される解析結
果は、「バイオ関連株が魅力的だが、｛バイオ関連株
に｝小型株が多く、｛バイオ関連株を｝機関投資家に勧
めにくい。｛バイオ関連株の｝値動きが予想以上に激し
い。」となる。The abbreviation complement processing for the sentence "Bio-related stocks are attractive, but small-cap stocks are hard to recommend to institutional investors. Price movements are more intense than expected." The analysis result output from the Japanese analysis system as a result of this processing is as follows: "Bio-related strains are attractive, but (1) Bio-related stocks are (1) many small-cap stocks, and (2) Bio-related stocks are difficult to recommend to institutional investors. The price movement of bio-related stocks is more intense than expected. "

【００６５】[0065]

【発明の効果】以上説明したように、請求項１、２及び
請求項４、５の発明は、日本語文に対して省略語の補完
を行う省略語補完装置において、格成分省略補完部と主
題判定部と文脈情報保持部とを備え、以前に省略語を補
完された文または節に関する文脈情報を該文脈情報保持
部に格納し、該文脈情報を次の文または節の省略語補完
や主題判定の際に利用することとしたため、精度の高い
省略語補完をすることができる。As described above, according to the first, second and fourth and fifth aspects of the present invention, an abbreviation complementing apparatus for supplementing abbreviations for Japanese sentences is provided with a case component elimination complementer and a subject. A context information storage unit that stores a context or a sentence whose abbreviation has been previously complemented; and stores the context information with the abbreviation complement or subject of the next sentence or clause. Since it is used at the time of determination, abbreviations with high accuracy can be complemented.

【００６６】また、文脈情報として以前の文または節の
主題をも利用することにより、特に日本語文において主
題と関係の深い助詞「が」、「を」あるいは「に」が表
示する格の格成分の省略に対し、精度の高い補完をする
ことができるという効果がある。Also, by using the subject of a previous sentence or section as context information, the case component of the case in which the Japanese particle "ga", "wo" or "ni" is closely related to the subject, particularly in Japanese sentences. Has an effect that highly accurate complementation can be performed.

【００６７】さらに、主題判定部が、主題候補を処理中
の節あるいは文脈情報の中から選び出す主題候補選択手
段と、選び出された各主題候補の主題となる蓋然性を判
断し最も蓋然性の高い主題候補をその節の主題として決
定する主題決定手段とを備えることにより、節の主題の
判定を正確に行うことができ、これによって判定された
主題を利用する省略語補完装置の精度の向上を図ること
ができる。Further, the subject determining section selects subject candidates from the section or the context information in which the subject candidates are being processed, and the subject having the highest probability by judging the probability of being the subject of each selected subject candidate. By providing the subject determining means for determining a candidate as the subject of the section, the subject of the section can be determined accurately, thereby improving the accuracy of the abbreviation completion device using the determined subject. be able to.

【００６８】また、請求項３及び請求項６の発明は、主
題判定部による節の主題決定の根拠のひとつとなった名
詞間の関係に基づき、上記節の省略語のうち連体修飾成
分となるべき語を補完する連体修飾成分補完部を備える
ことにより、次の文または節の解析や省略語補完を行う
際、また機械翻訳システムへの応用の際により正確な処
理を行うことができるという効果がある。According to the third and sixth aspects of the present invention, based on the relationship between nouns which is one of the grounds for determining the subject of a section by the subject determining section, it becomes a continuous modifier component of the abbreviations of the above section. Providing the adjunct modification component complementer that complements a power word can provide more accurate processing when analyzing the next sentence or clause, completing abbreviations, and applying it to a machine translation system. There is.

【００６９】[0069]

[Brief description of the drawings]

【図１】本発明の一実施例による省略語補完装置を示す
ブロック図である。FIG. 1 is a block diagram illustrating an abbreviation completion device according to an embodiment of the present invention.

【図２】図１の格成分省略補完部の機能を示す機能ブロ
ック図である。FIG. 2 is a functional block diagram illustrating functions of a case component elimination complementer of FIG. 1;

【図３】図１の主題判定部の機能を示す機能ブロック図
である。FIG. 3 is a functional block diagram illustrating functions of a subject determination unit in FIG. 1;

【図４】本発明の一実施例による日本語解析システムを
示すブロック図である。FIG. 4 is a block diagram showing a Japanese language analysis system according to one embodiment of the present invention.

[Explanation of symbols]

１省略語補完装置２格成分省略補完部３主題判定部４連体修飾成分省略補完部５文脈情報保持部６省略補完ル−ル集７主題判定ル−ル集８語彙関係デ−タベ−ス９構文解析装置２１省略格検出手段２２補完候補選択手段２３優先順位決定手段２４省略語補完手段３１主題候補選択手段３２主題決定手段 DESCRIPTION OF SYMBOLS 1 Abbreviation word completion device 2 Case component omission completion part 3 Subject judgment part 4 Continuous modification component omission completion part 5 Context information storage part 6 Abbreviation completion rule collection 7 Subject judgment rule collection 8 Lexical relation database 9 Syntax analyzer 21 Abbreviation case detecting means 22 Complementary candidate selecting means 23 Priority order determining means 24 Abbreviation word complementing means 31 Subject candidate selecting means 32 Subject determining means

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/27 - 17/28──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁶ , DB name) G06F 17/27-17/28

Claims

(57) [Claims]

1. A morphological analysis process that divides a Japanese sentence into words and adds syntax information and semantic information of the words to the words, and the dependency relationships and phrase structures of the words divided by the morphological analysis process. A sentence section that has been subjected to a syntax analysis process for analyzing the sentence structure according to the syntax information and semantic information and a predetermined analysis rule regarding the sentence structure is input, and the omitted words in the input sentence section are input. In the abbreviation completion device for complementing, the abbreviation completion device includes a case component abbreviation completion unit that complements a word to be a case component among the abbreviations in the input section according to a predetermined abbreviation completion rule; A subject determining unit that determines the subject of the section in which the word to be completed is complemented in accordance with a predetermined subject determining rule; and a section in which the word to be a case component is complemented by the case component elimination complementing unit and the subject determining unit. A context information storage unit for storing the determined subject matter as the information about the context I, the price components omitted complementing unit, the syntax information to the predicate in clauses entered added by the morphological analysis and An abbreviation case detecting means for detecting a case in which a component is omitted in an input clause and a condition to be satisfied by a noun phrase of the case component based on the semantic information, and syntactic information and meaning added by the morphological analysis processing
Based on the information, previously stored in the above context information holding unit
Words that satisfy the above conditions in the sentences or clauses of
Candidate selection means for selecting as a word that can be a case component in which is omitted, and the case formation for the word selected by the complement candidate selection means
Minutes are abbreviated words.
Priority order determining means for judging by applying and determining the order of priority in the order of the words judged to be highly probable, and the section in which the highest word of the priority order is input as the word to be the case component The subject determining unit selects a word that can be the subject of the input clause from the input sentence or the previous sentence or clause stored in the context information holding unit. Candidate inputting means, and inputting the word for the word selected by the subject candidate selecting means.
Apply the prescribed subject judgment rule to the probability that is the subject of the selected section
Judgment by referring to the specified vocabulary-related information.
An abbreviation complementing device comprising: a subject determining means for determining a word determined to have the highest likelihood as the subject of the input section.

2. An abbreviation case detecting means provided in a case component omission complementing unit, wherein at least a case is indicated by a particle "ga", "wo" or "ni", and a component other than a temporal case and a place case is a component. 2. The abbreviation completion device according to claim 1, wherein a condition to be satisfied by the omitted case and the noun phrase of the component of the case is detected.

3. An adjunct modification component complementing unit that supplements a word that is to be a adjunct modification component among the abbreviations in the section based on the relationship between the nouns used as the basis for the subject judgment of the clause by the subject judgment unit. The abbreviation completion device according to claim 1, characterized in that:

4. A morphological analysis process in which a Japanese sentence is divided into words and syntax information and semantic information of the words are added to the words, and a dependency relationship between the words divided by the morphological analysis process, a phrase structure, and the like. A parsing process for parsing the sentence structure of the sentence according to the syntax information and the semantic information according to a predetermined analysis rule regarding the sentence structure. A word complementer, wherein the abbreviation complementer complements a word to be a case component among the abbreviations in the input section according to a predetermined abbreviation completion rule; A subject judging unit for judging the subject of the section in which the power word is complemented in accordance with a predetermined subject judging rule; a clause in which the word to be a case component is complemented by the case component elimination complementing unit; A context information storage unit for storing the subject matter has been determined by the determining unit as the information on the context, the price components omitted complementary portion, syntactic information and meaning the predicate in clauses entered added by the morphological analysis An abbreviation case detecting means for detecting a case in which a component is omitted in an input clause and a condition to be satisfied by a noun phrase of the case component based on the information, and syntactic information and meaning added by the morphological analysis process
Based on the information, previously stored in the above context information holding unit
Words that satisfy the above conditions in the sentences or clauses of
Candidate selection means for selecting as a word that can be a case component in which is omitted, and the case formation for the word selected by the complement candidate selection means
Minutes are abbreviated words.
Priority order determining means for judging by applying and determining the order of priority in the order of the words judged to be highly probable, and the section in which the highest word of the priority order is input as the word to be the case component Subject selection means for selecting a word that can be the subject of the input section from the previous sentence or section stored in the input section or the context information holding section. Means, and inputting the word for the word selected by the subject candidate selecting means.
Applying the prescribed subject judgment rule to the probability
Judge by referring to the predetermined vocabulary information,
A subject determining means for determining a word determined to have the highest probability as the subject of the input section.

5. An abbreviation case detecting means provided in a case component omission complementing unit, wherein at least a case is displayed for a particle "ga", "wo" or "ni", and a component other than a time case and a place case is a component. 5. The Japanese analysis system according to claim 4, wherein a condition to be satisfied by the omitted case and the noun phrase of the component of the case is detected.

6. An adjunct modification component complementing unit that supplements a word that should be a adjunct modification component among the abbreviations in the above clause based on the relationship between the nouns used as the basis for the subject judgment of the clause by the subject judgment unit. The Japanese language analysis system according to claim 4, wherein: