JP6056563B2

JP6056563B2 - Example part specifying device, example part specifying method, and example part specifying program

Info

Publication number: JP6056563B2
Application number: JP2013046876A
Authority: JP
Inventors: 和久大野; 直之田村
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2013-03-08
Filing date: 2013-03-08
Publication date: 2017-01-11
Anticipated expiration: 2033-03-08
Also published as: JP2014174744A

Description

本発明は、文書の要約生成や要点整理を行う装置、及び方法等の技術分野に関する。 The present invention relates to a technical field such as an apparatus and method for generating a summary of a document and organizing a main point.

従来から、文書の要約生成や要点整理を行うソフトウェアが開発されている。例えば、特許文献１には、テキスト中での要約構成単位（述部を含む節を最小とする単位）の位置、手がかり語、タイトル、単語の重要度、キーワードなどの情報を手がかりとして、テキスト中における各要約構成単位の重要度を計算し、各要約構成単位に対する重要度に基づいて、最も重要度の高い要約構成単位から順に、利用者が要求する要約長に達するまで順次抽出し、要約結果出力手段が抽出された要約構成単位を元のテキストにおける出現順に並べ、要約結果として出力する技術が開示されている。このような技術の他にも、例えば特許文献２には、分類された談話構成要素にテキストを分割し、談話構成要素を談話の構造表現に結合する技術が開示されている。 Conventionally, software for generating document summaries and organizing key points has been developed. For example, in Patent Document 1, information such as the position of a summary constituent unit (a unit that minimizes a section including a predicate), a clue word, a title, the importance of a word, and a keyword are used as clues in the text. Calculates the importance of each summary component in, extracts from the summary component units with the highest importance in order based on the importance for each summary component until the summary length requested by the user is reached. A technique is disclosed in which the summary constituent units extracted by the output means are arranged in the order of appearance in the original text and output as a summary result. In addition to such a technique, for example, Patent Document 2 discloses a technique for dividing text into classified discourse constituent elements and combining the discourse constituent elements with the structural representation of the discourse.

特開２００９−１４６４４７号公報JP 2009-146447 A 特許第４７０６２２７号公報Japanese Patent No. 4706227

ところで、文書を構成する構成部分（構成要素）の中でも、特に、例示部分は、知識整理、記憶定着に有効である。これは、例えば著者が例示を用いて文書の内容を、読者へ分り易く伝えようとしている（著者の主張である）と考えられるためである。文書中から例示部分を特定する方法として、手がかり語（「たとえば」など）を用いることが考えられる。しかし、文書に手がかり語が含まれていない場合、この方法では、文書中から例示部分を特定することができない。また、文書中から例示部分を特定する方法として、複数文章のまとまりにおいて文章Ａで述べられた事象や状態等の具体的項目が文章Ｂで提示される場合、文章Ｂを例示部分としてすることも考えられる。しかしながら、この方法では、文章間の関係性だけを考慮しているため、例示部分の文章が独立して出現する場合は、例示部分を特定できない。 By the way, among the constituent parts (constituent elements) constituting the document, the exemplified part is particularly effective for knowledge organization and memory fixing. This is because, for example, it is considered that the author is trying to convey the contents of the document to the reader in an easy-to-understand manner using examples. As a method for identifying an example portion from a document, it is conceivable to use a clue word (such as “for example”). However, when a clue word is not included in the document, this method cannot identify an example portion from the document. In addition, as a method for identifying an example part from a document, when a specific item such as an event or state described in the sentence A is presented in the sentence B in a group of a plurality of sentences, the sentence B may be used as the example part. Conceivable. However, in this method, only the relationship between sentences is taken into consideration, and therefore the example part cannot be specified when the example part appears independently.

そこで、本発明は、上記点等に鑑みてなされたものであり、接続詞などの手がかり語が文章中に存在しない場合や、文章間の接続関係を取得できない場合であっても、例示部分を特定することが可能な例示部分特定装置、例示部分特定方法、及び例示部分特定プログラムを提供することを課題とする。 Therefore, the present invention has been made in view of the above points and the like, and even when a clue word such as a conjunction is not present in a sentence or when a connection relation between sentences cannot be obtained, an example portion is specified. It is an object of the present invention to provide an example part specifying device, an example part specifying method, and an example part specifying program that can be performed.

上記課題を解決するために、請求項１に記載の発明は、複数の事象又は事物の上位語と下位語とを対応付けて登録する上位下位語データベースを参照可能な例示部分特定装置であって、文書データを所定の記憶手段から取得する文書取得手段と、前記文書取得手段により取得された文書データに係る文書の中から文書の主題を抽出する主題抽出手段と、前記上位下位語データベースを参照して、前記文書取得手段により取得された文書データに係る文書の中から、前記上位下位語データベースに登録されている語が含まれている文書の構成部分を文書の例示部分候補として抽出する例示部分候補抽出手段と、前記例示部分候補抽出手段により抽出された例示部分候補に含まれる語の数を第１の出現数とし、前記主題抽出手段により抽出された主題と当該主題の上位語との少なくとも何れか一方の数を第２の出現数とした場合に、前記第１の出現数に対する前記第２の出現数の割合を算出する割合算出手段と、前記割合算出手段により算出された割合に基づいて、前記例示部分候補抽出手段により抽出された例示部分候補の中から、文書の例示部分を特定する例示部分特定手段と、を備えることを特徴とする。 In order to solve the above-mentioned problem, the invention according to claim 1 is an exemplary partial specifying device capable of referring to a broader term database that registers a broader term and a narrower term of a plurality of events or things in association with each other. The document acquisition means for acquiring the document data from a predetermined storage means, the subject extraction means for extracting the subject of the document from the document related to the document data acquired by the document acquisition means, and the broader term database Then, an example of extracting a constituent part of a document including words registered in the higher-level and lower-word database from the document related to the document data acquired by the document acquisition unit as an example part candidate of the document The number of words contained in the partial candidate extraction unit and the exemplary partial candidate extracted by the exemplary partial candidate extraction unit is set as the first appearance number, and the extracted by the subject extraction unit A ratio calculating means for calculating a ratio of the second number of occurrences to the first number of occurrences when the number of at least one of the subject and the broader term of the subject is the second number of appearances; And an example portion specifying unit for specifying an example portion of the document from the example portion candidates extracted by the example portion candidate extracting unit based on the ratio calculated by the ratio calculating unit.

請求項２に記載の発明は、請求項１に記載の例示部分特定装置において、前記主題抽出手段により抽出される文書の主題は、前記上位下位語データベースに登録されている語に限られることを特徴とする。 According to a second aspect of the present invention, in the exemplary part specifying apparatus according to the first aspect, the subject of the document extracted by the subject extraction unit is limited to words registered in the broader term database. Features.

請求項３に記載の発明は、請求項１又は２に記載の例示部分特定装置において、前記例示部分特定手段により特定された例示部分又は当該例示部分の要約を所定の表示手段に表示させるための表示データを出力する表示データ出力手段を更に備えることを特徴とする。 According to a third aspect of the present invention, there is provided the example part specifying device according to the first or second aspect, wherein the example part specified by the example part specifying unit or a summary of the example part is displayed on a predetermined display unit. It further comprises display data output means for outputting display data.

請求項４に記載の発明は、請求項３に記載の例示部分特定装置において、前記表示データは、前記文書取得手段により取得された文書データに係る文書の構成部分又は当該構成部分の要約であって前記例示部分特定手段により特定された例示部分に関係する構成部分又は当該構成部分の要約を、前記例示部分又は当該例示部分の要約に対応付けて表示させるためのデータであることを特徴とする。 According to a fourth aspect of the present invention, in the exemplary part specifying apparatus according to the third aspect, the display data is a component part of the document related to the document data acquired by the document acquisition unit or a summary of the component part. The component part related to the example part specified by the example part specifying means or the summary of the component part is data for displaying in association with the example part or the summary of the example part. .

請求項５に記載の発明は、請求項４に記載の例示部分特定装置において、前記表示データは、前記例示部分特定手段により特定された例示部分又は当該例示部分の要約の表示可否をユーザの指示に応じて切り換え可能なデータであることを特徴とする。 According to a fifth aspect of the present invention, in the exemplary portion specifying apparatus according to the fourth aspect, the display data is an instruction from a user to display whether or not the exemplary portion specified by the exemplary portion specifying means or a summary of the exemplary portion is to be displayed. The data is switchable according to the above.

請求項６に記載の発明は、複数の事象又は事物の上位語と下位語とを対応付けて登録する上位下位語データベースを参照可能なコンピュータにより実行される例示部分特定方法であって、文書データを所定の記憶手段から取得する文書取得ステップと、前記文書取得ステップにより取得された文書データに係る文書の中から文書の主題を抽出する主題抽出ステップと、前記上位下位語データベースを参照して、前記文書取得ステップにより取得された文書データに係る文書の中から、前記上位下位語データベースに登録されている語が含まれている文書の構成部分を文書の例示部分候補として抽出する例示部分候補抽出ステップと、前記例示部分候補抽出ステップにより抽出された例示部分候補に含まれる語の数を第１の出現数とし、前記主題抽出ステップにより抽出された主題と当該主題の上位語との少なくとも何れか一方の数を第２の出現数とした場合に、前記第１の出現数に対する前記第２の出現数の割合を算出する割合算出ステップと、前記割合算出ステップにより算出された割合に基づいて、前記例示部分候補抽出ステップにより抽出された例示部分候補の中から、文書の例示部分を特定する例示部分特定ステップと、を含むことを特徴とする。 The invention according to claim 6 is an exemplary portion specifying method executed by a computer capable of referring to a broader term database that registers and registers the broader terms and narrower terms of a plurality of events or things. A document acquisition step for acquiring a document from a predetermined storage means, a subject extraction step for extracting a subject of a document from documents related to the document data acquired by the document acquisition step, and referring to the broader term database, Example part candidate extraction that extracts a constituent part of a document that includes a word registered in the broader term database from documents related to the document data acquired by the document acquisition step as an example part candidate of the document The number of words included in the example portion candidates extracted in the step and the example portion candidate extraction step is the first occurrence number, When the number of at least one of the subject extracted in the extracting step and the broader term of the subject is set as the second number of appearances, the ratio of the second number of appearances to the first number of appearances is calculated. A ratio calculating step; and an example portion specifying step for specifying an example portion of the document from the example portion candidates extracted by the example portion candidate extraction step based on the ratio calculated by the ratio calculation step. It is characterized by that.

請求項７に記載の発明は、複数の事象又は事物の上位語と下位語とを対応付けて登録する上位下位語データベースを参照可能なコンピュータを、文書データを所定の記憶手段から取得する文書取得手段、前記文書取得手段により取得された文書データに係る文書の中から文書の主題を抽出する主題抽出手段、前記上位下位語データベースを参照して、前記文書取得手段により取得された文書データに係る文書の中から、前記上位下位語データベースに登録されている語が含まれている文書の構成部分を文書の例示部分候補として抽出する例示部分候補抽出手段、前記例示部分候補抽出手段により抽出された例示部分候補に含まれる語の数を第１の出現数とし、前記主題抽出手段により抽出された主題と当該主題の上位語との少なくとも何れか一方の数を第２の出現数とした場合に、前記第１の出現数に対する前記第２の出現数の割合を算出する割合算出手段、及び、前記割合算出手段により算出された割合に基づいて、前記例示部分候補抽出手段により抽出された例示部分候補の中から、文書の例示部分を特定する例示部分特定手段として機能させることを特徴とする。 The invention according to claim 7 is a document acquisition for acquiring document data from a predetermined storage means for a computer capable of referring to a broader term database that registers and registers the broader terms and narrower terms of a plurality of events or things. Means for extracting the subject of the document from the document related to the document data acquired by the document acquisition means, and relates to the document data acquired by the document acquisition means with reference to the broader term database Extracted from the document by the example part candidate extracting means for extracting the constituent part of the document containing the word registered in the broader term database as the example part candidate of the document, extracted by the example part candidate extracting means The number of words included in the example portion candidate is the first occurrence number, and at least one of the subject extracted by the subject extraction means and the broader term of the subject And the ratio calculation means for calculating the ratio of the second number of appearances to the first number of appearances, and the ratio calculated by the ratio calculation means, , And functioning as an example part specifying means for specifying an example part of a document from the example part candidates extracted by the example part candidate extracting unit.

本発明によれば、接続詞などの手がかり語が文章中に存在しない場合や、文章間の接続関係を取得できない場合であっても、例示部分を特定することができる。 According to the present invention, even when a clue word such as a conjunction is not present in a sentence, or even when a connection relationship between sentences cannot be acquired, an exemplary portion can be specified.

本実施形態に係る文書情報提供システムの概要構成例を示す図である。It is a figure which shows the example of a schematic structure of the document information provision system which concerns on this embodiment. 制御部２３における例示部分特定処理の一例を示すフローチャートである。5 is a flowchart illustrating an example of an exemplary part specifying process in a control unit 23. 例示部分候補１〜３それぞれの例示度ｅの例を示す図である。It is a figure which shows the example of the example e of each of the example partial candidates 1-3. （Ａ）は、ユーザ端末１のディスプレイに表示された文書内容閲覧画面の一例を示す図である。（Ｂ）は、ユーザ端末１のディスプレイに表示された要点整理画面の一例を示す図である。(A) is a figure which shows an example of the document content browsing screen displayed on the display of the user terminal 1. FIG. (B) is a figure which shows an example of the important point organization screen displayed on the display of the user terminal 1. FIG. （Ａ）は、ユーザ端末１のディスプレイに表示された文書内容閲覧画面の一例を示す図である。（Ｂ）は、ユーザ端末１のディスプレイに表示された要点整理画面の一例を示す図である。(A) is a figure which shows an example of the document content browsing screen displayed on the display of the user terminal 1. FIG. (B) is a figure which shows an example of the important point organization screen displayed on the display of the user terminal 1. FIG. タイトル寄与度をユーザが設定する例を示す図である。It is a figure which shows the example which a user sets a title contribution. 出現頻度の閾値をユーザが設定する例を示す図である。It is a figure which shows the example which a user sets the threshold value of appearance frequency. 例示度ｅの閾値θをユーザが設定する例を示す図である。It is a figure which shows the example which the user sets threshold value (theta) of the example degree e.

以下、本発明を実施するための実施形態について、図面に基づいて説明する。なお、以下に説明の実施形態は、文書情報提供システムに対して本発明を適用した場合の実施形態である。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The embodiment described below is an embodiment when the present invention is applied to a document information providing system.

先ず、図１を参照して、本実施形態に係る文書情報提供システムの構成及び機能について説明する。図１は、本実施形態に係る文書情報提供システムの概要構成例を示す図である。図１に示すように、文書情報提供システムＳは、ユーザ端末１及び文書情報提供サーバ２等を備えて構成される。ユーザ端末１の例として、タブレット、携帯電話機、スマートフォン、パーソナルコンピュータ等が挙げられる。ユーザ端末１は、ネットワークＮＷを介して文書情報提供サーバ２にアクセスして通信を行うことが可能になっている。ネットワークＮＷは、例えば、インターネット、専用通信回線（例えば、ＣＡＴＶ（Community Antenna Television）回線）、移動体通信網（基地局等を含む）、及びゲートウェイ等により構成される。 First, the configuration and function of the document information providing system according to the present embodiment will be described with reference to FIG. FIG. 1 is a diagram illustrating a schematic configuration example of a document information providing system according to the present embodiment. As shown in FIG. 1, the document information providing system S includes a user terminal 1, a document information providing server 2, and the like. Examples of the user terminal 1 include a tablet, a mobile phone, a smartphone, and a personal computer. The user terminal 1 can communicate by accessing the document information providing server 2 via the network NW. The network NW includes, for example, the Internet, a dedicated communication line (for example, CATV (Community Antenna Television) line), a mobile communication network (including a base station), a gateway, and the like.

文書情報提供サーバ２は、通信部２１、記憶部２２（記憶手段の一例）、及び制御部２３等を備える。通信部２１は、ネットワークＮＷに接続してユーザ端末１との通信状態を制御する。記憶部２２は、例えば、ハードディスクドライブ等により構成されており、オペレーティングシステム及び本発明の例示部分特定プログラム等を記憶する。なお、例示部分特定プログラムは、例えば、所定のサーバ等からネットワークＮＷを介して配信されるようにしても良いし、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）等の記録媒体に記録されて提供されるようにしてもよい。 The document information providing server 2 includes a communication unit 21, a storage unit 22 (an example of a storage unit), a control unit 23, and the like. The communication unit 21 is connected to the network NW and controls the communication state with the user terminal 1. The storage unit 22 is configured by, for example, a hard disk drive or the like, and stores an operating system and an exemplary part specifying program of the present invention. The example portion specifying program may be distributed from a predetermined server or the like via the network NW, or may be recorded on a recording medium such as a CD (Compact Disc) or a DVD (Digital Versatile Disc). It may be provided.

記憶部２２には、ユーザ端末１へ提供可能な複数の文書データが記憶されている。文書データは、書籍や雑誌等の文書の電子データである。各文書データには、文書データごとに一意の文書ＩＤ（識別子）が付与されている。また、文書データに係る文書（文書データにより表される文書）の構成部分（構成要素）には、それぞれ、要素ＩＤ、要約（要点）、及び意味タグが付与され、これらは文書ＩＤに対応付けられて記憶されている。ここで、文書の構成部分は、文書を構成する一部分であり、例えば文書中の文章が該当する。要素ＩＤは、構成部分ごとに一意の識別子である。要約は、例えば、構成部分を公知の手法で短縮した文、或いは構成部分に含まれる特徴的な語（例えば、名詞）である。意味タグは、構成部分の種別を示すタグであり、例えば、「従来」、「現状」、「例示」等がある。また、記憶部２２には、文書の構成部分間の接続関係を示すデータ（例えば、互いに関係する各構成部分の要素ＩＤが対応付けられたデータ）が、当該文書の文書ＩＤに対応付けられて記憶される。例えば、「従来は紙によって流通・・・」という構成部分と、「一方、現在はデジタルで・・・」という構成部分との接続関係は「対比」である。なお、「従来は紙によって流通・・・」という構成部分の要約は「紙による流通」であり、その意味タグは「従来」である。「一方、現在はデジタルで・・・」という構成部分の要約は「デジタルによる流通」、その意味タグは「現状」である。 The storage unit 22 stores a plurality of document data that can be provided to the user terminal 1. The document data is electronic data of documents such as books and magazines. Each document data is given a unique document ID (identifier) for each document data. In addition, an element ID, a summary (main point), and a semantic tag are assigned to each component (component) of a document (document represented by document data) related to document data, and these are associated with the document ID. Is remembered. Here, the component part of a document is a part which comprises a document, for example, the sentence in a document corresponds. The element ID is a unique identifier for each component. The summary is, for example, a sentence obtained by shortening the constituent part by a known method or a characteristic word (for example, a noun) included in the constituent part. The meaning tag is a tag indicating the type of the component part, and includes, for example, “conventional”, “current state”, “example”, and the like. Further, in the storage unit 22, data indicating the connection relationship between the constituent parts of the document (for example, data in which element IDs of the constituent parts related to each other are associated) is associated with the document ID of the document. Remembered. For example, the connection relationship between the component part “conventionally distributed by paper ...” and the component part “currently digital ...” is “contrast”. Note that the summary of the component “conventionally distributed by paper ...” is “distributed by paper”, and its meaning tag is “conventional”. On the other hand, the summary of the component “currently digital ...” is “digital distribution”, and the meaning tag is “current”.

また、記憶部２２には、上位下位語データベース２２１が構築されている。上位下位語データベース２２１は、複数の具体的な事象又は事物の上位語と下位語とを対応（関係）付けて登録する。具体的な事象又は事物の上位語と下位語の例としては、以下の（関係例１）〜（関係例３）が挙げられる。
（関係例１）<上位>人→学者→物理学者→地球物理学者→…<下位>
（関係例２）<上位>食品→発酵食品→チーズ→チェダーチーズ→…<下位>
（関係例３）<上位>スポーツ→スポーツ競技→球技→野球→…<下位> In the storage unit 22, a broader term database 221 is constructed. The broader term database 221 registers the broader terms and the narrower terms of a plurality of specific events or things in association (relation). Specific examples of broader terms and narrower terms of events or things include the following (Relational Example 1) to (Relational Example 3).
(Relational Example 1) <Higher> People → Scholar → Physicist → Geophysicist →… <Lower>
(Relationship example 2) <Upper> Food → Fermented food → Cheese → Cheddar cheese →… <Lower>
(Relationship example 3) <Higher> Sports → Sport competition → Ball games → Baseball →… <Lower>

上位下位語データベース２２１は、既存のものを利用してもよいし新たに生成してもよい。なお、関係例１〜３はあくまで一例でり、上位下位語データベース２２１に多数の事象又は事物の上位語と下位語が含まれている。また、事象又は事物の上位語と下位語は名詞であることが望ましい。また、事象又は事物の上位語と下位語は単語であってもよいし複合語であってもよい。なお、上位下位語データベース２２１は、文書情報提供サーバ２内に無くともよく、文書情報提供サーバ２がネットワークＮＷを介してアクセス可能な他のサーバ内に構築されてもよい。 As the broader term database 221, an existing one may be used or a new one may be generated. The related examples 1 to 3 are merely examples, and the broader term database 221 includes the broader terms and the narrower terms of a large number of events or things. Moreover, it is desirable that the broader terms and narrower terms of events or things are nouns. Further, the broader terms and the narrower terms of events or things may be words or compound words. The broader term database 221 does not have to be in the document information providing server 2 and may be constructed in another server that can be accessed by the document information providing server 2 via the network NW.

本実施形態では、上述した意味タグの中でも、特に、「例示」に着目し、文書を構成する複数の構成部分の中から、文書の例示部分を特定する方法について詳しく説明する。 In the present embodiment, among the meaning tags described above, in particular, focusing on “exemplary”, a method for specifying an exemplary portion of a document from a plurality of constituent portions constituting the document will be described in detail.

制御部２３は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等により構成される。制御部２３は、本発明の例示部分特定装置及びコンピュータの一例である。制御部２３は、例示部分特定プログラムを実行することにより、本発明における文書取得手段、主題抽出手段、例示部分候補抽出手段、割合算出手段、例示部分特定手段、及び表示データ出力手段等として機能し、例示部分特定処理を実行する。制御部２３は、例示部分特定処理において、上位下位語データベース２２１を参照する。 The control unit 23 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control unit 23 is an example of an exemplary part specifying device and a computer according to the present invention. The control unit 23 functions as a document acquisition unit, a subject extraction unit, an example part candidate extraction unit, a ratio calculation unit, an example part specifying unit, a display data output unit, and the like in the present invention by executing the example part specifying program. The example portion specifying process is executed. The control unit 23 refers to the broader term database 221 in the example part specifying process.

ところで、具体的な事象又は事物を表す単語や複合語が含まれている文章は例示になりやすいが、そのような単語や複合語であっても、必ずしも例示とはならない場合がある。そのため、本実施形態に係る例示部分特定処理では、語の上位下位語（上位下位概念）と文書の主題を比較し、例示であるかを判断するというものである。 By the way, a sentence including a word or compound word representing a specific event or thing is easy to illustrate, but even such a word or compound word may not necessarily be illustrated. Therefore, in the example part specifying process according to the present embodiment, the broader and lowerer words (higher and lower concepts) of the word are compared with the subject matter of the document to determine whether the word is an example.

図２は、制御部２３における例示部分特定処理の一例を示すフローチャートである。図２に示す処理は、例えばオペレータの端末からの指示があった場合に開始される。図１に示す処理が開始されると、制御部２３は、未処理の文書データを記憶部２２または他の装置の記憶手段から取得する（例えばＲＡＭに読み込む）（ステップＳ１）。 FIG. 2 is a flowchart illustrating an example of the exemplary portion specifying process in the control unit 23. The process shown in FIG. 2 is started when there is an instruction from an operator's terminal, for example. When the processing shown in FIG. 1 is started, the control unit 23 acquires unprocessed document data from the storage unit 22 or storage means of another device (for example, reads it into a RAM) (step S1).

次いで、制御部２３は、ステップＳ１で取得した文書データに係る文書の中から文書の主題を抽出する（ステップＳ２）。例えば、制御部２３は、文書のタイトルに含まれる語を文書の主題として１つ又は複数決定する。また、主題は、文書中の章，節といったセクションごとに付されたタイトルに含まれる語から抽出されるように構成してもよい。また、主題は、文書中の目次や索引に含まれる語から抽出されるように構成してもよい。また、主題は、文書中の出現頻度が高い語（頻出語）から抽出されるように構成してもよい。この場合、出現頻度が高いか否かを判定するために、出現頻度の閾値が予め設定される。また、主題は、文書のタイトルに含まれる語（文書中の章，節といったセクションごとに付されたタイトルに含まれる語や、文書中の目次や索引に含まれる語を含めてもよい）の中で、文章中の出現頻度が高い語から抽出されるように構成してもよい。或いは、文書のタイトルに含まれる語（文書中の章，節といったセクションごとに付されたタイトルに含まれる語や、文書中の目次や索引に含まれる語を含めてもよい）と、文章中の出現頻度が高い語とを区別し、タイトルに含まれる語の寄与度（タイトルに含まれる語へ付与する重み）を高くしたり低くしたりすることで、決定される主題の数（主題の種類の数）が変わるように構成してもよい。 Next, the control unit 23 extracts the subject of the document from the document related to the document data acquired in step S1 (step S2). For example, the control unit 23 determines one or more words included in the document title as the subject of the document. The subject may be extracted from a word included in a title attached to each section such as a chapter or a section in a document. Moreover, you may comprise so that a theme may be extracted from the word contained in the table of contents or index in a document. Moreover, you may comprise so that a theme may be extracted from a word (frequency word) with high appearance frequency in a document. In this case, in order to determine whether the appearance frequency is high, a threshold value for the appearance frequency is set in advance. The subject is a word included in the title of the document (may include words included in the title of each section, such as chapters and sections in the document, and words included in the table of contents or index in the document). Among them, it may be configured to be extracted from words having a high appearance frequency in the sentence. Or words included in the title of the document (may include words included in titles assigned to sections such as chapters and sections in the document, and words included in the table of contents or index in the document) and sentences The number of themes (themes of the subject) determined by distinguishing from words with high appearance frequency and increasing or decreasing the contribution of the words included in the title (weight given to the words included in the title) The number of types) may be changed.

なお、一般に主題は具体的な事象又は事物を表すものであるので、ステップＳ２で抽出される主題は、上位下位語データベース２２１に登録されている語（上位語又は下位語）に限られることが望ましい。 Since the subject generally represents a specific event or thing, the subject extracted in step S2 may be limited to words (higher term or lower term) registered in the broader term database 221. desirable.

次いで、制御部２３は、上位下位語データベース２２１を参照して、ステップＳ１で取得した文書データに係る文書の中から、上位下位語データベース２２１に登録されている語が含まれている文書の構成部分を、文書の例示部分候補として抽出する（ステップＳ３）。つまり、制御部２３は、文書の構成部分に含まれる語が、上位下位語データベース２２１に登録されているかどうかを、文章の構成部分ごとにチェックし、文書の構成部分に含まれる語のうち一つでも上位下位語データベース２２１に登録されていれば、当該構成部分を例示部分の候補として抽出する。 Next, the control unit 23 refers to the high-order low-order word database 221 and configures a document that includes words registered in the high-order low-order word database 221 from among the documents related to the document data acquired in step S1. The part is extracted as an example part candidate of the document (step S3). That is, the control unit 23 checks, for each constituent part of the sentence, whether or not a word contained in the constituent part of the document is registered in the higher-order word database 221, and selects one of the words contained in the constituent part of the document. If any one of them is registered in the broader term database 221, the constituent part is extracted as an example part candidate.

次いで、制御部２３は、ステップＳ３で抽出した例示部分候補に含まれる語の数を第１の出現数（頻度）ｎ１とし、ステップＳ２で抽出した主題と当該主題の上位語との少なくとも何れか一方の数を第２の出現数ｎ２とした場合に、第１の出現数ｎ１に対する第２の出現数ｎ２の割合（ｎ２／ｎ１）を算出し、算出した割合に基づいて例示部分候補の例示度ｅ（＝１−（ｎ２／ｎ１）：０≦ｅ≦１）を、例示部分候補ごとに算出する（ステップＳ４）。ここで、割合は、主題に沿っているかどうかの程度を示し、この割合の値が高いと、主題に沿っていることを示す。また、例示部分候補の例示度ｅは、例示部分候補に含まれる語の上位下位概念と文書の主題との関連度を示し、この例示度の値が高いと、例示部分候補は文書中の例示である可能性が高いことを示す。 Next, the control unit 23 sets the number of words included in the exemplary candidate portion extracted in step S3 as the first occurrence number (frequency) n1, and at least one of the subject extracted in step S2 and the broader term of the subject When one of the numbers is the second number of appearances n2, the ratio (n2 / n1) of the second number of appearances n2 to the first number of appearances n1 is calculated. The degree e (= 1− (n2 / n1): 0 ≦ e ≦ 1) is calculated for each exemplary portion candidate (step S4). Here, the ratio indicates the degree of whether or not the subject is along, and a high value of the ratio indicates that the subject is along the subject. The example portion e of the example portion candidate indicates the degree of association between the upper and lower concepts of the words included in the example portion candidate and the subject matter of the document. When the example degree value is high, the example portion candidate is exemplified in the document. It is highly possible that

次いで、制御部２３は、ステップＳ４で算出した例示度ｅに基づいて（つまり、割合（ｎ２／ｎ１）に基づいて）、ステップＳ３で抽出した例示部分候補の中から、文書の例示部分を特定（決定）する（ステップＳ５）。例えば、例示度ｅが高いか否かを判定するために、例示度の閾値θ（０≦θ≦１）が予め設定される。そして、制御部２３は、例示度ｅが閾値θ以上である（ｅ≧θ）か否かを判定し、例示度ｅが閾値θ以上である例示部分候補を文書の例示部分として特定する。一方、例示度ｅが閾値θ未満である（ｅ＜θ）例示部分候補は例示部分として特定されない。 Next, the control unit 23 specifies an example part of the document from the example part candidates extracted in step S3 based on the example degree e calculated in step S4 (that is, based on the ratio (n2 / n1)). (Determine) (step S5). For example, in order to determine whether or not the example e is high, an example threshold value θ (0 ≦ θ ≦ 1) is set in advance. Then, the control unit 23 determines whether or not the example e is equal to or greater than the threshold θ (e ≧ θ), and identifies an example part candidate whose example e is equal to or greater than the threshold θ as an example part of the document. On the other hand, an example part candidate having an example e of less than the threshold θ (e <θ) is not specified as an example part.

図３は、例示部分候補１〜３それぞれの例示度ｅの例を示す図である。図３に示す例示部分候補１〜３それぞれの例示度ｅは、図３に示す式（１）により算出されたものである。この場合において、例えば閾値θが“０．３”に設定された場合、図３に示す例示部分候補１〜３の中で、例示部分候補３だけが例示部分として特定されることになる。このような閾値θは、システム運営者又はシステム利用者により任意に設定されるように構成してもよい。 FIG. 3 is a diagram illustrating an example of the degree of example e for each of the example portion candidates 1 to 3. The example e of each of the example partial candidates 1 to 3 shown in FIG. 3 is calculated by the equation (1) shown in FIG. In this case, for example, when the threshold θ is set to “0.3”, only the example portion candidate 3 is specified as the example portion among the example portion candidates 1 to 3 shown in FIG. Such a threshold value θ may be configured to be arbitrarily set by a system operator or a system user.

このように例示部分が特定されると、特定された例示部分に対して意味タグ「例示」が付与され（当該例示部分の要素ＩＤに対応付けられ）、当該付与された意味タグ「例示」が記憶部２２に記憶される。 When the example part is specified in this way, the meaning tag “example” is assigned to the specified example part (corresponding to the element ID of the example part), and the assigned meaning tag “example” is assigned. It is stored in the storage unit 22.

以上説明したように、上記例示部分特定処理によれば、接続詞などの手がかり語が文章中に存在しない場合や、文章間の接続関係を取得できない場合であっても、例示部分を効率良く特定することができる。 As described above, according to the example part specifying process, even when a clue word such as a conjunction is not present in the sentence or a connection relationship between sentences cannot be acquired, the example part is efficiently specified. be able to.

次に、上記例示部分特定処理により特定された例示部分に付与された意味タグの利用例について説明する。前提として、ユーザ端末１は文書情報提供サーバ２から所望の文書データ（文書ＩＤ及び要素ＩＤが付加される）をダウンロードし、文書データに係る文書の一部を含む文書内容閲覧画面がユーザ端末１のディスプレイ（表示手段の一例）に表示されているものとする。 Next, a usage example of the semantic tag given to the example part specified by the example part specifying process will be described. As a premise, the user terminal 1 downloads desired document data (a document ID and an element ID are added) from the document information providing server 2, and a document content browsing screen including a part of the document related to the document data is displayed on the user terminal 1. Are displayed on the display (an example of display means).

図４（Ａ）は、ユーザ端末１のディスプレイに表示された文書内容閲覧画面の一例を示す図である。図４（Ａ）に示す文書内容閲覧画面の表示状態において、ユーザ端末１のユーザが、図４（Ａ）に示すように、所望の文章（文書の構成部分の一例）を指等で指定すると、ユーザ端末１は文書情報提供サーバ２にアクセスして、当該指定された文章の要素ＩＤを含むリクエストを送信する。これに応じて、文書情報提供サーバ２は、ユーザにより指定された文章（要素ＩＤ）と関係する他の文章（上記特定された例示部分を含む）又はそれら文章の要約をディスプレイに表示させる表示データ（指定された文章と関係する他の文章との間の接続関係を示すデータを含む）をユーザ端末１へ返信（出力）する。この表示データは、文書データに係る文書の構成部分（又は当該構成部分の要約）であって、上記特定された例示部分に関係する構成部分（又は当該構成部分の要約）を、例示部分（又は当該例示部分の要約）に対応付けて表示させるためのデータである。そして、ユーザ端末１は、文書情報提供サーバ２から送信された表示データを受信すると要点整理画面をディスプレイに表示する。 FIG. 4A is a diagram illustrating an example of a document content browsing screen displayed on the display of the user terminal 1. In the display state of the document content browsing screen shown in FIG. 4A, when the user of the user terminal 1 designates a desired sentence (an example of a component part of the document) with a finger or the like as shown in FIG. The user terminal 1 accesses the document information providing server 2 and transmits a request including the element ID of the designated sentence. In response to this, the document information providing server 2 displays other text (including the specified example part specified above) related to the text (element ID) designated by the user or a summary of the text on the display. (Including data indicating a connection relationship between the designated sentence and other sentences) is returned (output) to the user terminal 1. This display data is a component part of the document related to the document data (or a summary of the component part), and a component part (or a summary of the component part) related to the identified example part is designated as an example part (or This is data for display in association with the summary of the example portion. When the user terminal 1 receives the display data transmitted from the document information providing server 2, the user terminal 1 displays a main point arrangement screen on the display.

図４（Ｂ）は、ユーザ端末１のディスプレイに表示された要点整理画面の一例を示す図である。図４（Ｂ）において、「携帯電話」、「スマフォ」、及び「タブレット」が上述したように特定された例示部分の要約（要点）である。また、文書情報提供サーバ２からユーザ端末１へ返信される表示データは、ユーザにより指定された文章と関係する例示部分又は当該例示部分の要約の表示可否をユーザの指示に応じて切り替え可能なデータであってもよい。この場合、図５（Ｂ）に示すような要点整理画面がディスプレイに表示される。図５（Ｂ）において破線及び破線内に示される部分の表示は、ユーザの指示に応じて切り換え可能になっている。つまり、例示を非表示にし、本論だけを表示するという構成をとることもできる。なお、文書情報提供サーバ２からユーザ端末１へ返信される表示データは、ユーザにより指定された文章に関係する例示部分又は当該例示部分の要約だけをディスプレイに表示させるための表示データであってもよい。 FIG. 4B is a diagram illustrating an example of a main point organizing screen displayed on the display of the user terminal 1. In FIG. 4B, “mobile phone”, “smartphone”, and “tablet” are summaries (essential points) of exemplary portions identified as described above. The display data returned from the document information providing server 2 to the user terminal 1 is data that can switch whether or not to display the example portion related to the text designated by the user or the summary of the example portion according to the user's instruction. It may be. In this case, a key point arrangement screen as shown in FIG. 5B is displayed on the display. In FIG. 5B, the display of the broken line and the portion indicated within the broken line can be switched in accordance with a user instruction. That is, it is possible to adopt a configuration in which the illustration is not displayed and only the present discussion is displayed. Note that the display data returned from the document information providing server 2 to the user terminal 1 may be display data for causing the display to display only the example portion related to the text designated by the user or the summary of the example portion. Good.

なお、上記実施形態において、文書のタイトルに含まれる語と、文章中の出現頻度が高い語（頻出語）とを区別し、タイトルに含まれる語の寄与度（以下、「タイトル寄与度」という）を高くしたり低くしたりする構成について説明したが、このタイトル寄与度をユーザが任意に設定できるように構成してもよい。また、出現頻度が高いか否かを判定するための出現頻度の閾値をユーザが任意に設定できるように構成してもよい。更に、例示度ｅが高いか否かを判定するための閾値θをユーザが任意に設定できるように構成してもよい。 In the above embodiment, the word included in the title of the document is distinguished from the word having a high appearance frequency (frequent word) in the sentence, and the contribution degree of the word included in the title (hereinafter referred to as “title contribution degree”). ) Has been described above. However, the title contribution may be configured so that the user can arbitrarily set the title contribution. Moreover, you may comprise so that a user can set arbitrarily the threshold value of appearance frequency for determining whether appearance frequency is high. Furthermore, the threshold value θ for determining whether or not the example e is high may be configured so that the user can arbitrarily set the threshold value θ.

図６は、タイトル寄与度をユーザが設定する例を示す図である。図６（Ａ）に示す要点整理画面の表示状態において、ユーザ端末１のユーザが、図６（Ａ）に示すように、タイトル寄与度を変えるためのスライダ５１を右又は左に移動させることでタイトル寄与度が変更される。このように変更されたタイトル寄与度は、当該文書の文書ＩＤ及び画面に要約が表示されている構成部分の要素ＩＤ等と共に、ユーザ端末１から文書情報提供サーバ２へ送信される。これにより、文書情報提供サーバ２は、図２に示す例示部分特定処理を行い、再度、例示部分を特定し、特定された例示部分又はその要約等をディスプレイに表示させる表示データをユーザ端末１へ返信することで、図６（Ｂ）又は（Ｃ）に示すように、例示の表示数が増加又は減少することになる。 FIG. 6 is a diagram illustrating an example in which the user sets the title contribution. 6A, the user of the user terminal 1 moves the slider 51 for changing the title contribution degree to the right or left as shown in FIG. 6A. The title contribution is changed. The title contribution degree thus changed is transmitted from the user terminal 1 to the document information providing server 2 together with the document ID of the document and the element ID of the component whose summary is displayed on the screen. Thereby, the document information providing server 2 performs the example part specifying process shown in FIG. 2, specifies the example part again, and displays display data for displaying the specified example part or its summary on the display to the user terminal 1. By replying, as shown in FIG. 6B or FIG. 6C, the number of display examples increases or decreases.

図７は、出現頻度の閾値をユーザが設定する例を示す図である。図７（Ａ）に示す要点整理画面の表示状態において、ユーザ端末１のユーザが、図７（Ａ）に示すように、出現頻度の閾値を変えるためのスライダ５２を右又は左に移動させることで出現頻度の閾値が変更される。このように変更された出現頻度の閾値は、当該文書の文書ＩＤ及び画面に要約が表示されている構成部分の要素ＩＤ等と共に、ユーザ端末１から文書情報提供サーバ２へ送信される。これにより、文書情報提供サーバ２は、図２に示す例示部分特定処理を行い、再度、例示部分を特定し、特定された例示部分又はその要約等をディスプレイに表示させる表示データをユーザ端末１へ返信することで、図７（Ｂ）又は（Ｃ）に示すように、例示の表示数が増加又は減少することになる。 FIG. 7 is a diagram illustrating an example in which the user sets the threshold value of the appearance frequency. 7A, the user of the user terminal 1 moves the slider 52 for changing the threshold of appearance frequency to the right or left as shown in FIG. 7A. The threshold of the appearance frequency is changed. The threshold value of the appearance frequency thus changed is transmitted from the user terminal 1 to the document information providing server 2 together with the document ID of the document and the element ID of the component whose summary is displayed on the screen. Thereby, the document information providing server 2 performs the example part specifying process shown in FIG. 2, specifies the example part again, and displays display data for displaying the specified example part or its summary on the display to the user terminal 1. By replying, as shown in FIG. 7 (B) or (C), the number of display examples increases or decreases.

図８は、例示度ｅの閾値θをユーザが設定する例を示す図である。図８（Ａ）に示す要点整理画面の表示状態において、ユーザ端末１のユーザが、図８（Ａ）に示すように、例示度ｅの閾値θを変えるためのスライダ５３を右又は左に移動させることで例示度ｅの閾値θが変更される。このように変更された例示度ｅの閾値θは、当該文書の文書ＩＤ及び画面に要約が表示されている構成部分の要素ＩＤ等と共に、ユーザ端末１から文書情報提供サーバ２へ送信される。これにより、文書情報提供サーバ２は、図２に示す例示部分特定処理を行い、再度、例示部分を特定し、特定された例示部分又はその要約等をディスプレイに表示させる表示データをユーザ端末１へ返信することで、図８（Ｂ）又は（Ｃ）に示すように、例示の表示数が増加又は減少することになる。 FIG. 8 is a diagram illustrating an example in which the user sets the threshold value θ of the example e. 8A, the user of the user terminal 1 moves the slider 53 for changing the threshold value θ of the example e to the right or left as shown in FIG. 8A. By doing so, the threshold value θ of the example e is changed. The threshold value θ of the degree of illustration e thus changed is transmitted from the user terminal 1 to the document information providing server 2 together with the document ID of the document and the element ID of the component whose summary is displayed on the screen. Thereby, the document information providing server 2 performs the example part specifying process shown in FIG. 2, specifies the example part again, and displays display data for displaying the specified example part or its summary on the display to the user terminal 1. By replying, as shown in FIG. 8B or FIG. 8C, the number of display examples increases or decreases.

以上説明したように、タイトル寄与度、出現頻度の閾値、または例示度ｅの閾値θをユーザが任意に設定できるように構成すれば、ユーザは要点整理画面に表示された例示部分の要約を見ながら、適宜、例示の表示数を変えることができるので、よりユーザの意向に沿った例示を表示させることができる。 As described above, if the user can arbitrarily set the title contribution level, the appearance frequency threshold value, or the example degree e threshold value θ, the user can view the summary of the example portion displayed on the summary screen. However, since the display number of examples can be changed as appropriate, the examples more in line with the user's intention can be displayed.

なお、上記実施形態においては、文書情報提供サーバ２が例示部分特定処理を行うように構成したが、別の例として、上位下位語データベース２２１にアクセス可能なユーザ端末１が例示部分特定処理を行うように構成してもよい。この場合、ユーザ端末１の制御部（コンピュータ及び情報処理装置の一例）が本発明の例示部分特定プログラムを実行することにより、本発明における文書取得手段、主題抽出手段、例示部分候補抽出手段、割合算出手段、例示部分特定手段、及び表示データ出力手段等として機能することになる。 In the above embodiment, the document information providing server 2 is configured to perform the exemplary portion specifying process. However, as another example, the user terminal 1 that can access the broader term database 221 performs the exemplary portion specifying process. You may comprise as follows. In this case, the control unit (an example of a computer and an information processing device) of the user terminal 1 executes the exemplary part specifying program of the present invention, whereby the document acquisition unit, the subject extraction unit, the exemplary part candidate extraction unit, and the ratio in the present invention It functions as a calculation unit, an example portion specifying unit, a display data output unit, and the like.

１ユーザ端末１
２文書情報提供サーバ
２１通信部
２２記憶部
２３制御部 1 User terminal 1
2 Document information providing server 21 Communication unit 22 Storage unit 23 Control unit

Claims

An example portion specifying device that can refer to a broader term database that registers and associates broader terms and narrower terms of a plurality of events or things,
Document acquisition means for acquiring document data from a predetermined storage means;
Subject extraction means for extracting the subject of the document from the document related to the document data acquired by the document acquisition means;
With reference to the broader term database, the constituent parts of the document containing the words registered in the broader term database are selected from the documents related to the document data acquired by the document acquisition unit. An example portion candidate extracting means for extracting as an example portion candidate;
The number of words included in the example part candidate extracted by the example part candidate extraction unit is a first appearance number, and the number of at least one of the subject extracted by the subject extraction unit and the broader word of the subject A ratio calculating means for calculating a ratio of the second number of appearances to the first number of appearances,
Based on the ratio calculated by the ratio calculation means, an example part specifying means for specifying an example part of the document from the example part candidates extracted by the example part candidate extraction means;
An example portion specifying apparatus comprising:

The exemplary portion specifying apparatus according to claim 1, wherein the subject of the document extracted by the subject extracting unit is limited to a word registered in the broader term database.

3. The display data output means for outputting display data for causing the predetermined display means to display the example part specified by the example part specifying means or the summary of the example part, further comprising: The example part identification apparatus of description.

The display data is a component part of the document related to the document data acquired by the document acquisition unit or a summary of the component part, and a component part or the component part related to the example part specified by the example part specifying unit The example part specifying device according to claim 3, wherein the summary is data for displaying the example part in association with the example part or the summary of the example part.

5. The example according to claim 4, wherein the display data is data that can switch whether or not to display the example part specified by the example part specifying unit or the summary of the example part according to a user instruction. Partial identification device.

An example portion specifying method executed by a computer capable of referring to a broader term database that associates and registers broader terms and narrower terms of a plurality of events or things,
A document acquisition step of acquiring document data from a predetermined storage means;
A subject extraction step of extracting a subject of the document from the document related to the document data acquired by the document acquisition step;
Referring to the broader term database, the constituent parts of the document containing the words registered in the broader term database are selected from the documents related to the document data acquired by the document acquisition step. An example portion candidate extraction step for extracting as an example portion candidate;
The number of words included in the example part candidate extracted by the example part candidate extraction step is a first appearance number, and the number of at least one of the subject extracted by the subject extraction step and the broader term of the subject A ratio calculating step of calculating a ratio of the second number of appearances to the first number of appearances,
Based on the ratio calculated by the ratio calculation step, an example part specifying step for specifying an example part of the document from the example part candidates extracted by the example part candidate extraction step;
An example portion specifying method comprising:

A computer capable of referring to a broader term database that registers and associates broader terms and narrower terms of multiple events or things,
Document acquisition means for acquiring document data from a predetermined storage means;
Subject extraction means for extracting the subject of the document from the document related to the document data acquired by the document acquisition means;
With reference to the broader term database, the constituent parts of the document containing the words registered in the broader term database are selected from the documents related to the document data acquired by the document acquisition unit. Example part candidate extraction means for extracting as example part candidates;
The number of words included in the example part candidate extracted by the example part candidate extraction unit is a first appearance number, and the number of at least one of the subject extracted by the subject extraction unit and the broader word of the subject Is a second appearance number, a ratio calculating means for calculating a ratio of the second appearance number to the first appearance number, and
An example characterized in that, based on the ratio calculated by the ratio calculation means, it functions as an example part specifying means for specifying an example part of a document from the example part candidates extracted by the example part candidate extraction means. Partial identification program.