JPH0877196A

JPH0877196A - Extracting device for document information

Info

Publication number: JPH0877196A
Application number: JP6215070A
Authority: JP
Inventors: Yukari Saitou; 由香梨斎藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-09-08
Filing date: 1994-09-08
Publication date: 1996-03-22

Abstract

PURPOSE: To highly precisely extract a desired sentence containing important information by collating the sentence with a sentence expression pattern on the basis of the mark and the part-of-speech information of the morpheme of every sentence having been morpheme-analyzed, and extracting the coincident sentence. CONSTITUTION: A document information extracting device 2 morpheme-analyzes an input document 1, and extracts the sentence coincident with the sentence expression pattern 5 on the basis of the mark, the part-of-speech information and the meaning information of the morpheme, and it is constituted of a morpheme analyzing part 3, a sentence pattern collating part 4 and the sentence expression pattern 5. Then, the morpheme analyzing part 3 morpheme-analyzes the input document 1, and the sentence pattern collating part 4 executes the collation with the sentence expression pattern 5 on the basis of the mark and the part-of-speech information of the morpheme of every morpheme-analyzed sentence, and extracts the coincident sentence. Accordingly, since the result of the morpheme analysis of the inputted document 1 is collated with the sentence expression pattern 5, and at the time of coincidence, the sentence is extracted as the desired sentence, the sentence in which the improtant information is contained can be highly precisely extracted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文書を形態素解析し形
態素の表記や品詞情報や意味情報をもとに文表現パター
ンと照合して所望の文を抽出する文書情報抽出装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document information extracting apparatus for morphologically analyzing a document and matching a sentence expression pattern based on morpheme notation, part-of-speech information and semantic information to extract a desired sentence. .

【０００２】[0002]

【従来の技術】従来の文書抽出装置は、ある表現を含む
文を抽出する際に文字列を用いていた。例えば「○とは○○のことである」（○は任意の文字）という表現を含む文を抽出したい場合、「とは」という
文字列を含む文を検索し、抽出していた。2. Description of the Related Art A conventional document extracting apparatus uses a character string when extracting a sentence containing a certain expression. For example, when it is desired to extract a sentence containing the expression "○ means XX" (○ is an arbitrary character), the sentence containing the character string "TOHA" was searched and extracted.

【０００３】[0003]

【発明が解決しようとする課題】この場合、本来抽出し
たい表現を含む文以外に「とは」という文字列を含む
文、例えば「編集することはできない」などが抽出され
てしまうという問題があった。[Problems that the Invention is to try to solve this case, is a problem that the statement containing the string "and is" in addition to the sentence that contains the expression you want to extract the original, such as "can not and editing child 'is being extracted there were.

【０００４】また、「ｅコマンドを入力すると、処理を
終了できる」のような「○○すると○○できる」という表現の文を抽出したい場合、「と」の直前の語は
「○○する」のような動詞の終止形であるという条件を
指定したいが、従来の文字列による検索ではそういう指
定ができないため、「○○と○○できる」という文字列
を含む文を全て抽出してしまうので、「文書の更新をす
るときに、コマンドの指定ができる」のような文が抽出
されてしまうという問題があった。Further, when it is desired to extract a sentence such as "when the e command is entered, the process can be terminated", "when you can do XX, you can do XX", the word immediately before "to" is "do XX". I want to specify the condition that it is the final form of the verb like, but I can not specify it in the conventional search by character string, so I will extract all the sentences containing the character string "○○ and ○○ can" , "to come to an update of the document can be specified as command," there is a problem that statements such as from being extracted.

【０００５】本発明は、これらの問題を解決するため、
重要な情報の含まれる文の特徴のある形態素の表記に対
応づけて品詞情報や意味情報を記述した文表現パターン
を用意し、入力された文書を形態素解析した結果と文表
現パターンとを照合し一致したときに抽出し、重要な情
報が含まれる所望の文を精度高く抽出することを目的と
している。The present invention solves these problems.
Prepare a sentence expression pattern that describes part-of-speech information and semantic information in association with the notation of characteristic morphemes of sentences that include important information, and match the result of morphological analysis of the input document with the sentence expression pattern. The purpose is to extract a desired sentence that includes important information with high accuracy, when it matches.

【０００６】[0006]

【課題を解決するための手段】図１は、本発明の原理構
成図を示す。図１において、入力文書１は、文を抽出す
る対象の文書である。FIG. 1 is a block diagram showing the principle of the present invention. In FIG. 1, an input document 1 is a document from which sentences are extracted.

【０００７】文書情報抽出装置２は、入力文書１を形態
素解析し、その表記、品詞情報、および意味情報をもと
に文表現パターン５と一致する文を抽出するものであっ
て、形態素解析部３、文パターン照合部４、および文表
現パターン５から構成されるものである。The document information extraction device 2 is a morpheme analysis unit that performs morphological analysis on the input document 1 and extracts sentences that match the sentence expression pattern 5 based on the notation, part-of-speech information, and semantic information. 3, a sentence pattern matching unit 4, and a sentence expression pattern 5.

【０００８】形態素解析部３は、入力文書１を形態素解
析するものである。文パターン照合部４は、形態素解析
部３によって形態素解析された文の表記、品詞情報およ
び意味情報をもとに文表現パターン５と照合を行い、一
致する文を抽出するものである。The morphological analysis unit 3 is for performing morphological analysis on the input document 1. The sentence pattern matching unit 4 matches the sentence expression pattern 5 on the basis of the notation, part-of-speech information, and semantic information of the sentence subjected to the morpheme analysis by the morpheme analysis unit 3, and extracts the matching sentence.

【０００９】文表現パターン５は、抽出しようとする文
のパターン（特徴のある形態素の表記、品詞情報、およ
び意味情報）を登録したものである。抽出された文６
は、入力文書から抽出された文である。The sentence expression pattern 5 is a pattern in which a sentence pattern to be extracted (notation of a characteristic morpheme, part-of-speech information, and semantic information) is registered. Extracted sentence 6
Is a sentence extracted from the input document.

【００１０】[0010]

【作用】本発明は、図１に示すように、形態素解析部３
が入力文書１を形態素解析し、文パターン照合部４が形
態素解析した文毎の形態素の表記および品詞情報をもと
に文表現パターン５と照合を行い一致する文を抽出する
ようにしている。In the present invention, as shown in FIG.
Performs morphological analysis of the input document 1, and the sentence pattern matching unit 4 matches with the sentence expression pattern 5 based on the morpheme notation and part-of-speech information of each sentence analyzed by the morpheme to extract matching sentences.

【００１１】また、形態素解析部３が入力文書１を形態
素解析し、文パターン照合部４が形態素解析した文毎の
形態素の表記および意味情報をもとに文表現パターン５
と照合を行い一致する文を抽出するようにしている。The morpheme analysis unit 3 performs morpheme analysis on the input document 1, and the sentence pattern matching unit 4 performs morpheme analysis on the basis of the morpheme notation and the semantic information of each sentence.
And the matching sentence is extracted.

【００１２】また、形態素解析部３が入力文書１を形態
素解析し、文パターン照合部４が形態素解析した文毎の
形態素の表記、品詞情報および意味情報をもとに文表現
パターン５と照合を行い一致する文を抽出するようにし
ている。The morphological analysis unit 3 morphologically analyzes the input document 1, and the sentence pattern matching unit 4 matches the sentence expression pattern 5 with the sentence expression pattern 5 on the basis of the morpheme notation, part-of-speech information, and semantic information of each sentence. It tries to extract the matching sentences.

【００１３】従って、重要な情報の含まれる文の特徴の
ある形態素の表記や品詞情報や意味情報を記述した文表
現パターンを用意し、入力された文書を形態素解析した
結果と文表現パターンとを照合し一致したときに所望の
文として抽出することにより、重要な情報が含まれる文
を精度高く抽出することが可能となった。Therefore, a sentence expression pattern in which the notation of characteristic morphemes of a sentence including important information, part-of-speech information, and semantic information is described is prepared, and the result of morphological analysis of the input document and the sentence expression pattern are prepared. By matching and extracting as a desired sentence when they match, it is possible to accurately extract a sentence containing important information.

【００１４】[0014]

【実施例】次に、図２から図９を用いて本発明の実施例
の構成および動作を順次詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the construction and operation of an embodiment of the present invention will be described in detail with reference to FIGS.

【００１５】図２は、本発明の動作説明フローチャート
を示す。これは、図１の構成の全体の動作を説明するフ
ローチャートである。図２において、Ｓ１は、入力文書
１を読み込み、形態素解析を行なう。これは、図１の文
書情報抽出装置２を構成する形態素解析部３が、入力文
書１を読み込み、形態素解析を行い、文の形態素の表
記、品詞情報、および意味情報を生成する。例えば図３
の入力文書１であるマニュアル文を読み込み、図４およ
び図５に示すように、形態素の表記、品詞情報、および
意味情報を生成する。例えば、図３の文番号“１”の文クラスとは、データの属性と動作を規定する抽象的なオ
ブジェクトである。を形態素解析し、図４のに示すよ
うに、形態素の表記、品詞情報、および意味情報を生成
する。FIG. 2 shows a flowchart for explaining the operation of the present invention. This is a flow chart for explaining the overall operation of the configuration of FIG. In FIG. 2, S1 reads the input document 1 and performs morphological analysis. In this, the morpheme analysis unit 3 included in the document information extraction device 2 of FIG. 1 reads the input document 1 and performs morpheme analysis to generate a morpheme notation of a sentence, part-of-speech information, and semantic information. For example, FIG.
The manual sentence which is the input document 1 is read, and as shown in FIGS. 4 and 5, morpheme notation, part-of-speech information, and semantic information are generated. For example, the sentence class of sentence number “1” in FIG. 3 is an abstract object that defines data attributes and operations. Is morphologically analyzed to generate morpheme notation, part-of-speech information, and semantic information, as shown in FIG.

【００１６】Ｓ２は、文パターン照合部４で文表現パタ
ーン５との照合を行う。これは、後述するように、Ｓ１
で形態素解析した文の形態素の表記、品詞情報、および
意味情報について、図６の文表現パターンとの照合を行
い、一致する文を見つける。In S2, the sentence pattern matching unit 4 performs matching with the sentence expression pattern 5. This is the S1
The morpheme notation, part-of-speech information, and semantic information of the sentence subjected to morpheme analysis are compared with the sentence expression pattern of FIG. 6 to find a matching sentence.

【００１７】Ｓ３は、文表現パターン５とマッチした文
が抽出される。これは、Ｓ２の照合によって、マッチ
（一致）した文を抽出する。以上によって、入力文書１
を形態素解析して文の形態素の表記、品詞情報、および
意味情報を生成し、これらと、予め登録した文表現パタ
ーン５（例えば図６の文表現パターン例）と照合を行
い、一致する文を抽出する。この際、文の形態素の表記
が一致するのみならず、品詞情報、更に意味情報がとも
に一致したときにのみ文を抽出することとしているの
で、重要な抽出したい文の品詞情報および意味情報を予
め文表現パターン５に登録して指定することが可能とな
る。この抽出したい文の表記の他に品詞情報および意味
情報を指定することにより、文の形態素の表記が一致し
ても、品詞情報と意味情報が一致（マッチ）しない文を
抽出しないので、不要な文の抽出を防止して重要な文の
みを選択的に抽出することが可能となった。以下順次詳
細に説明する。In S3, a sentence that matches the sentence expression pattern 5 is extracted. This extracts the matched sentence by the collation of S2. Input document 1
Morphological analysis is performed to generate morpheme notation, part-of-speech information, and semantic information of the sentence, and these are collated with a sentence expression pattern 5 (for example, the sentence expression pattern example of FIG. 6) registered in advance to find a matching sentence. Extract. At this time, the sentence is extracted not only when the notation of the morpheme of the sentence is matched, but also when the part-of-speech information and the semantic information are also matched. It becomes possible to register and specify in the sentence expression pattern 5. By specifying part-of-speech information and semantic information in addition to the notation of the sentence to be extracted, even if the notation of the morpheme of the sentence matches, a sentence in which the part-of-speech information and the semantic information do not match (match) is not extracted. It became possible to prevent the extraction of sentences and selectively extract only important sentences. The details will be sequentially described below.

【００１８】図３は、本発明のマニュアル文の例を示
す。これは、図１の入力文書１の例である。右側に文番
号１から７を記載し、後の説明との関連づけを行ってい
る。このマニュアル文の文番号１、５、６、７を形態素
解析して文の形態素の表記、品詞情報、および意味情報
を生成したものが、図４および図５である。FIG. 3 shows an example of the manual sentence of the present invention. This is an example of the input document 1 in FIG. Sentence numbers 1 to 7 are described on the right side, and are related to the later explanation. Morphological analysis of sentence numbers 1, 5, 6, and 7 of the manual sentence to generate the morpheme notation, part-of-speech information, and semantic information in FIGS. 4 and 5.

【００１９】図４および図５は、本発明の形態素解析例
を示す。これらは、図３のマニュアル文の文番号１、
５、６、７の文を形態素解析し、文の形態素の表記、品
詞情報、および意味情報を生成したものである。4 and 5 show examples of morphological analysis according to the present invention. These are the sentence numbers 1 of the manual sentence in FIG.
Morphological analysis is performed on the sentences 5, 6, and 7 to generate the morpheme notation, the part-of-speech information, and the semantic information of the sentence.

【００２０】図６は、本発明の文表現パターン例を示
す。ここでは、重要な文を表現するパターンとして、文
表現パターン１、２、３の３つを下記のように予め記載
したものである。FIG. 6 shows an example of a sentence expression pattern of the present invention. Here, as patterns for expressing important sentences, three sentence expression patterns 1, 2, and 3 are described in advance as follows.

【００２１】・文表現パターン１：［名詞］／とは／〜／であ／る・文表現パターン２：〜［動詞終止形］／と／でき／る・文表現パターン３：［名詞］＜手段＞／に／よ／っ／
て／〜ここで、［］内は品詞情報を表し、＜＞内は意味情
報を表し、それ以外は形態素の表記を表す。〜は、任意
の文字列を表す。／は文の形態素の区切りを表す。-Sentence expression pattern 1: [noun] / and /////-Sentence expression pattern 2:-[verb end form] / and / can / suffer-Sentence expression pattern 3: [noun] <Means> / to / yo / tsu /
Here, in [], part-of-speech information is represented, in <>, semantic information is represented, and in other cases, morpheme notation is represented. ~ Represents an arbitrary character string. / Represents a morpheme delimiter of a sentence.

【００２２】以上のように、文の形態素の表記、品詞情
報、および意味情報を指定した文表現パターン１、２、
３を記述することにより、この文表現パターン１、２、
３に一致する文のみが、入力文書１から抽出されること
となる。As described above, the sentence expression patterns 1, 2 specifying the morpheme notation of the sentence, the part-of-speech information, and the semantic information.
By describing 3, the sentence expression patterns 1, 2,
Only sentences matching 3 will be extracted from the input document 1.

【００２３】図７は、本発明の文表現パターンとの照合
フローチャートを示す。これは、入力文書１を形態素解
析して文の形態素の表記、品詞情報、および意味情報を
生成し、これらと、図６の文表現パターン５との照合を
説明するものである。FIG. 7 shows a flow chart of collation with the sentence expression pattern of the present invention. This describes morphological analysis of the input document 1 to generate morpheme notation, part-of-speech information, and semantic information of a sentence, and to compare these with the sentence expression pattern 5 of FIG.

【００２４】図７において、Ｓ１１は、形態素のリスト
が終了か判別する。ＹＥＳの場合には、終了する（ＥＮ
Ｄ）。一方、ＮＯの場合には、形態素のリストが終了し
ていないので、Ｓ１２に進む。ここで、形態素のリスト
は文を形態素解析した形態素の表記、品詞情報、および
意味情報であって、例えば図３のマニュアル文の文番号
１の形態素のリストは図４の文番号１として記載した
表記、品詞情報、および意味情報のリストとなる。In FIG. 7, S11 determines whether the morpheme list is complete. If YES, end (EN
D). On the other hand, in the case of NO, the list of morphemes has not ended, and therefore the process proceeds to S12. Here, the list of morphemes is the notation of morphemes obtained by morphological analysis of sentences, part-of-speech information, and semantic information. For example, the list of morphemes with sentence number 1 in the manual sentence in FIG. 3 is described as sentence number 1 in FIG. It is a list of notation, part-of-speech information, and semantic information.

【００２５】Ｓ１２は、文表現パターンが終了か判別す
る。これは、文表現パターン、例えば図６の文表現パタ
ーン１、２、３が終了か判別する。ＹＥＳの場合には、
Ｓ１１に戻り、次の文の形態素のリストについて繰り返
す。一方、ＮＯの場合には、Ｓ１３に進む。In step S12, it is determined whether the sentence expression pattern ends. This determines whether the sentence expression pattern, for example, the sentence expression patterns 1, 2, and 3 in FIG. If yes,
Returning to S11, the morpheme list of the next sentence is repeated. On the other hand, if NO, the process proceeds to S13.

【００２６】Ｓ１３は、形態素リストと文表現パターン
が最後までマッチするか判別する。ＹＥＳの場合には、
形態素リストと、文表現パターンとが最後まで一致した
ので、Ｓ１４でマッチする文を抽出し、Ｓ１１に戻る。
一方、ＮＯの場合には、マッチしなかったので、Ｓ１２
に戻り繰り返す。In step S13, it is determined whether the morpheme list matches the sentence expression pattern until the end. If yes,
Since the morpheme list and the sentence expression pattern match until the end, the matching sentence is extracted in S14, and the process returns to S11.
On the other hand, in the case of NO, there is no match, so S12
Return to and repeat.

【００２７】以上によって、入力文書１の文の形態素の
リストと、文表現パターンとを順次照合を行いマッチし
たときにそのマッチした文を抽出する。以下図３のマニ
ュアル文について、図６の文表現パターンとの照合を説
明する。As described above, when the list of sentence morphemes of the input document 1 and the sentence expression pattern are sequentially collated and matched, the matched sentence is extracted. The matching of the manual sentence of FIG. 3 with the sentence expression pattern of FIG. 6 will be described below.

【００２８】（１）図３の文番号１の文・クラスとは、データの属性と動作を規定する抽象的な
オブジェクトである．を形態素解析すると、文の形態素
の表記は、・クラス／とは／、／データ／の／属性／と／動作／を
／規定／する／抽象的／な／オブジェクト／で／ある．となる（品詞情報および意味情報は図４の文書番号１
の品詞情報および意味情報を参照）。この文番号１の形
態素の表記、および品詞情報のパターンと、図６の文表
現パターン１である・［名詞］／とは／〜／であ／るとの照合を行うと、上記文の形態素の表記に下線を引い
た部分が一致、即ち、・“クラス”と“名詞（普通名詞）” ・“とは”と“とは” ・“であ”と“であ” ・“る”と“る” とが一致するので、文番号１の文を文表現パターン１に
一致するものとして図８に示すように抽出する。(1) Sentence number 1 in FIG. 3 A class is an abstract object that defines data attributes and operations. The Upon morphological analysis, the morphological notation statement class / A /, / data / Roh / attribute / a / operation / a / provisions / to / abstract / Do / object / in / is. (Part-of-speech information and semantic information are document number 1 in FIG. 4)
See part-of-speech and semantic information). This notation of the morpheme of sentence number 1 and the pattern of part-of-speech information is the sentence expression pattern 1 of FIG. 6. [Noun] / and ///// The underlined part of the notation matches, that is, "class" and "noun (common noun)"-"what" and "to"-"de" and "de"-"ru" Since "ru" matches, the sentence of sentence number 1 is extracted as shown in FIG.

【００２９】（２）文番号５の文を（１）と同様にし
て、文表現パターン２に一致するものとして図８に示す
ように抽出する。（３）図３の文番号６の文・クラスの種類によって、生成されるインスタンスが異
なる．を形態素解析すると、文の形態素の表記は、・クラス／の／種類／に／よ／っ／て／、／生成／さ／
れ／る／インスタンス／が／異な／る．となる（品詞情報および意味情報は図５の文書番号６の
品詞情報および意味情報を参照）。この文番号６の形態
素の表記、および品詞情報のパターンと、図６の文表現
パターン３である・［名詞］＜手段＞／に／よ／っ／て／〜との照合を行うと、先頭の・“クラス”、“普通名詞（品詞情報）”、“具体物
（意味情報）”と［名詞］＜手段＞とが不一致となる。即ち、クラスの品詞情報が“名詞”である点は
一致するが、意味情報が“具体物”と“手段”とで不一
致となり、当該文番号６は文表現パターン３とマッチし
ないこととなる。(2) The sentence of sentence number 5 is extracted as shown in FIG. 8 as the sentence expression pattern 2 in the same manner as (1). (3) Sentence No. 6 in FIG. 3 • The generated instance differs depending on the class type. When the morphological analysis, morphological notation of the statement, class / Bruno / kind / to / I / Tsu / Te /, / generation / /
Re / ru / instance / is / different / ru. (See the part-of-speech information and the meaning information of the document number 6 in FIG. 5 for the part-of-speech information and the meaning information). The morpheme notation of sentence number 6 and the pattern of part-of-speech information and sentence expression pattern 3 of FIG. 6 are obtained. When [noun] <means> / ni / yo / t / te / ...・ The "class", "common noun (part of speech information)", "concrete (semantic information)" and [noun] <means> do not match. That is, although the part of speech information of the class is “noun”, the semantic information of “concrete” and “means” do not match, and the sentence number 6 does not match the sentence expression pattern 3.

【００３０】従って、表記は一致しても、意味情報が一
致しなく、全体として一致しないので不一致となり、文
番号６の文は抽出しない。（４）一方、文番号７の文・エディタによって、インスタンスのスロットを指定で
きる．を形態素解析すると、文の形態素の表記は、・エディタ／に／よ／っ／て／、／インスタンス／の／
スロット／を／指定／でき／る．となる（品詞情報および意味情報は図５の文書番号７の
品詞情報および意味情報を参照）。この文番号７の形態
素の表記、品詞情報、および意味情報のパターンと、図
６の文表現パターン３である・［名詞］＜手段＞／に／よ／っ／て／〜との照合を行うと、上記文の形態素の表記、品詞情報、
および意味情報が一致、即ち、・“クラス”、“普通名詞”、“手段”と“名詞”、
“手段” ・“に”と“に” ・“よ”と“よ” ・“っ”と“っ” ・“て”と“て” とが一致するので、文番号７の文を文表現パターン３に
一致するものとして図８に示すように抽出する。Therefore, even if the notations match, the meaning information does not match, and the meanings do not match as a whole, so they do not match and the sentence of sentence number 6 is not extracted. (4) On the other hand, the sentence of sentence number 7 ・ The slot of the instance can be specified by the editor. When the morphological analysis, morphological notation of the statement, editor / on / by / Tsu / Te /, / instance / Bruno /
Slot / specify / specify / specify. (See the part-of-speech information and the meaning information of the document number 7 in FIG. 5 for the part-of-speech information and the meaning information). The pattern of the morpheme of the sentence number 7, the part-of-speech information, and the semantic information is the sentence expression pattern 3 of FIG. 6. [Noun] <means> / ni / yo / t / te / ~ is collated. And the morpheme notation of the above sentence, part-of-speech information,
And the semantic information match, that is, "class", "common noun", "means" and "noun",
"Means"-"ni" and "ni"-"yo" and "yo"-"tsu" and "tsu"-"te" and "te" match, so the sentence of sentence number 7 is expressed as a sentence expression pattern. It is extracted as shown in FIG.

【００３１】以上によって、図３のマニュアル文から、
図６の文表現パターン１、２、３に一致する文として図
８に示すように、文番号１、５、７の文を抽出すること
ができたこととなる。From the above, from the manual sentence of FIG.
As shown in FIG. 8, the sentences of sentence numbers 1, 5, and 7 can be extracted as the sentences that match the sentence expression patterns 1, 2, and 3 of FIG.

【００３２】図８は、本発明の抽出された文例を示す。
これは、上述したように、図３のマニュアル文から、図
６の文表現パターン１、２、３に一致するものとして抽
出された文である。この抽出する際に、文の種類を図示
のように一致した文表現パターン１、２、３に登録され
ている“定義文”、“方法文”、“方法文”に対応づけ
て登録しておく。FIG. 8 shows an extracted example sentence of the present invention.
As described above, this is a sentence extracted from the manual sentence of FIG. 3 as matching with the sentence expression patterns 1, 2, and 3 of FIG. At the time of this extraction, the type of sentence is registered in association with the “definition sentence”, “method sentence”, and “method sentence” registered in the sentence expression patterns 1, 2, and 3 that match as shown in the figure. deep.

【００３３】図９は、本発明の応用例を示す。これは、
図８の抽出された文について、文の種類毎にまとめて索
引を自動生成した応用例を示す。ここでは、図８の定義
文の文番号１の文を取り出し、図示の・索引（定義）の欄にクラスとは？・・・・・・・・・・・・・・・・ｐ××
（ページ数）と自動編集する。また、同様に、図８の方法文の文番号
５、７の文を取り出し、・索引（方法）の欄に任意のインスタンスを生成するには？・・・・・ｐ××
（ページ数）インスタンスのスロットを指定するには？・・・ｐ××
（ページ数）と自動編集する。FIG. 9 shows an application example of the present invention. this is,
With respect to the extracted sentences in FIG. 8, an application example in which an index is automatically generated for each sentence type will be shown. Here, the sentence with the sentence number 1 of the definition sentence in FIG. 8 is taken out, and what is the class in the illustrated column of (index) (definition)?・・・ P ××
(Page number) and edit automatically. Similarly, how to take out the sentences of sentence numbers 5 and 7 of the method sentence of FIG. 8 and generate an arbitrary instance in the index (method) column? ... pxx
(Number of pages) How to specify the instance slot? ... pxx
(Page number) and edit automatically.

【００３４】以上によって、図３の入力文書１であるマ
ニュアル文から、図６の文表現パターン１、２、３によ
って形態素の表記、品詞情報および意味情報で特徴付け
られた文（文番号１、５、７）のみを抽出し、この抽出
した精度の高い文から、更に文の種類に分けて索引を自
動的に編集（生成）することが可能となる。As described above, from the manual sentence which is the input document 1 in FIG. 3, the sentence characterized by the morpheme notation, the part-of-speech information and the semantic information by the sentence expression patterns 1, 2 and 3 in FIG. 6 (sentence number 1, It is possible to extract (5, 7) only and automatically edit (generate) the index from the extracted highly accurate sentences by further dividing them into sentence types.

【００３５】[0035]

【発明の効果】以上説明したように、本発明によれば、
重要な情報の含まれる文の特徴のある形態素の表記や品
詞情報や意味情報を記述した文表現パターンを用意し、
この文表現パターンと入力された文書を形態素解析した
結果とを照合し一致したときに所望の文として抽出する
構成を採用しているため、重要な情報が含まれる所望の
文を精度高く抽出することができるようになった。特
に、文表現パターンとして、文の形態素の表記、品詞情
報、および意味情報を任意に指定して重要な情報を含む
所望の文を高精度に抽出でき、しかも、表記（文字列）
のみが一致する不要な文の抽出を防止できるので、不要
な文を削除する手間も削減できる。As described above, according to the present invention,
Prepare sentence expression patterns that describe notation of morphemes that have characteristics of sentences containing important information, part of speech information, and semantic information,
Since this sentence expression pattern and the result of morpheme analysis of the input document are collated and the matched sentence is extracted as a desired sentence, the desired sentence including important information is extracted with high accuracy. I was able to do it. Particularly, as a sentence expression pattern, a desired sentence including important information can be extracted with high accuracy by arbitrarily designating a sentence morpheme notation, part-of-speech information, and semantic information, and the notation (character string)
Since it is possible to prevent the extraction of unnecessary sentences that only match, it is possible to reduce the trouble of deleting unnecessary sentences.

[Brief description of drawings]

【図１】本発明の原理構成図である。FIG. 1 is a principle configuration diagram of the present invention.

【図２】本発明の動作説明フローチャートである。FIG. 2 is a flowchart explaining the operation of the present invention.

【図３】本発明のマニュアル文の例である。FIG. 3 is an example of a manual sentence of the present invention.

【図４】本発明の形態素解析例（続く）である。FIG. 4 is a morphological analysis example (continued) of the present invention.

【図５】本発明の形態素解析例（続き）である。FIG. 5 is a morphological analysis example (continuation) of the present invention.

【図６】本発明の文表現パターン例である。FIG. 6 is an example of a sentence expression pattern of the present invention.

【図７】本発明の文表現パターンとの照合フローチャー
トである。FIG. 7 is a flowchart for matching with a sentence expression pattern of the present invention.

【図８】本発明の抽出された文例である。FIG. 8 is an extracted sentence example of the present invention.

【図９】本発明の応用例である。FIG. 9 is an application example of the present invention.

[Explanation of symbols]

１：入力文書２：文書情報抽出装置３：形態素解析部４：文パターン照合部５：文表現パターン６：抽出された文 1: Input document 2: Document information extraction device 3: Morphological analysis unit 4: Sentence pattern matching unit 5: Sentence expression pattern 6: Extracted sentence

Claims

[Claims]

1. A sentence expression pattern (5) in which part-of-speech information is registered as necessary in association with the morpheme notation of a sentence to be extracted, and a morphological analysis unit (3) for morphologically analyzing an input document.
And a sentence pattern matching unit (4) that matches the above sentence expression pattern (5) based on the morpheme notation and part-of-speech information of each morphologically analyzed sentence and extracts a matching sentence. A document information extracting device, which extracts a sentence of.

2. A sentence expression pattern (5) for registering semantic information as necessary in association with the notation of morphemes of a sentence to be extracted, and a morphological analyzer (3) for morphologically analyzing an input document.
And a sentence pattern matching unit (4) that matches the above sentence expression pattern (5) and extracts a matching sentence based on the morpheme notation and semantic information of each morphologically analyzed sentence. A document information extracting device, which extracts a sentence of.

3. A sentence expression pattern (5) for registering part-of-speech information and semantic information as necessary in association with the notation of morphemes of a sentence to be extracted, and a morphological analyzer (3) for morphologically analyzing an input document. )
And a sentence pattern matching unit (4) for matching the sentence expression pattern (5) and extracting a matching sentence based on the morpheme notation, part-of-speech information, and semantic information of each morphologically analyzed sentence, A document information extraction device characterized by extracting a desired sentence from inside.