JP4496900B2

JP4496900B2 - Event information extraction apparatus and program

Info

Publication number: JP4496900B2
Application number: JP2004263725A
Authority: JP
Inventors: 晴美川島; 裕一郎関口; 雅且大久保
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-09-10
Filing date: 2004-09-10
Publication date: 2010-07-07
Anticipated expiration: 2024-09-10
Also published as: JP2006079412A

Description

本発明は、イベント情報抽出装置及びプログラムに係り、特に、季節のイベントやスポーツイベントなど、特定の日時に開始される事象に関する話題を提供する技術において、各種イベントについて記載された文書情報をインターネット等のネットワークに接続された1つ以上の情報提供サーバから取得し、イベントの注目度に応じて話題となる語句を抽出し、提供するためのイベント情報抽出装置及びプログラムに関する。 The present invention relates to event information extracting device及 beauty program, such as the Internet seasonal events, sporting events, in the art of providing a topic of events initiated at a specific date and time, the document information described for various events acquired from one or more information providing servers connected to the network etc., it extracts the phrases become the subject according to the attention level of the event, on the event information extracting device及 beauty program for providing.

近年、インターネットなどのコンピュータネットワークの発達に伴い、大量の電子化された文書情報が次々と蓄積され続けている。特に、掲示板やblogサービスを利用して個人が自分の興味のある事柄に対して感想や意見を発しにすることが容易に行なえるようになってきた。従って、ニュース情報や掲示板、blogなど、次々と発信される情報を数多く収集して解析すれば、最新の話題になっているニュースや出来事を把握することが可能となる。 In recent years, with the development of computer networks such as the Internet, a large amount of electronic document information has been accumulated one after another. In particular, it has become easier for individuals to use their bulletin boards and blog services to express their opinions and opinions on matters that interest them. Therefore, if a large amount of information transmitted one after another, such as news information, bulletin boards, and blogs, is collected and analyzed, it becomes possible to grasp the latest news and events.

従来、複数の情報提供サーバから発信された情報を、話題毎のカテゴリで分類し、その話題の時間遷移を提示・検索する情報潮流検索方法が提案されている（例えば、特許文献１参照）。 Conventionally, an information flow search method has been proposed in which information transmitted from a plurality of information providing servers is classified into categories for each topic, and the time transition of the topic is presented and searched (for example, see Patent Document 1).

この情報潮流検索方法では、ある期間において発信された文書集合から類似した文書同士を集め、カテゴリに割り当てるという処理を行っている。そのため、類似する文書が複数発信されてからしかカテゴリが割り当てられないため、話題となる情報が初めて発信されてから時間が経過した後でしか、話題を提示することができない。 In this information flow search method, processing is performed in which similar documents are collected from a document set transmitted in a certain period and assigned to a category. For this reason, since a category is assigned only after a plurality of similar documents are transmitted, the topic can be presented only after a lapse of time since the information on the topic is transmitted for the first time.

また、個人の興味の中には、イベント等の特定の日時に開始される事象も含まれており、大抵は事象を体験した後に、その感想や意見が記述され、発信される。そのため、イベントが終了した後でしか話題を提示することができない。
特開２０００−２４２６５２号公報 Also, personal interests include events such as events that start on a specific date and time, and most of them experience and experience their experiences and opinions. Therefore, the topic can be presented only after the event ends.
JP 2000-242652 A

しかしながら、前述のように従来技術では、既にイベントが終了した後でしか話題を提供できず、これから話題になる可能性がある最新のイベント情報をいち早く提供することができない。 However, as described above, according to the conventional technology, a topic can be provided only after an event has already ended, and the latest event information that may become a topic cannot be provided quickly.

本発明は、上記の点に鑑みなされたもので、特定の時刻に開始される事象に対して事象が開始される以前に、利用者が興味をひく語句を提供することが可能なイベント情報抽出装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and is an event information extractor that can provide a user with an interesting phrase before an event is started for an event that starts at a specific time. and to provide a device及 beauty program.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明は、季節やイベントやスポーツイベントなどの、特定の日時に開始されるイベントに関する話題を抽出するイベント情報抽出方法において、
イベントの開催日時や概要が記載されたイベント情報集合を蓄積するイベント情報蓄積手段からイベント名と開催期間を取得し（ステップ１）、時刻情報を持つ文書集合を蓄積する文書情報蓄積手段からイベント名を含む文書を検索して抽出し、イベント別にイベント別文書情報蓄積手段に格納する（ステップ２）イベント別文書抽出ステップと、
抽出した文書から指定した集計期間内の時刻情報を持つ文書をイベント毎に定期的に集計し、現在までの文書の集計値の総和を集計日数で除算した値を注目度として算出するイベント注目度算出ステップと（ステップ３）、
所定の条件を満たす注目度の高いイベントを選択し、該イベントの開始前である場合には、時間間隔Ｔ１で、該イベントの概要文を形態素解析し語句の出現位置や文字数に基づいて語句を抽出し、該イベントの開始後出る場合には、上記Ｔ１よりも小さい時間間隔Ｔ２で、イベント別文書情報蓄積手段に格納されている文書を形態素解析し、語句毎に出現する文書数を求め、出現する文書数が多い語句を抽出し、語句蓄積手段に出力する語句抽出ステップ（ステップ４）と、を行なう。 The present invention relates to an event information extraction method for extracting topics related to an event that starts at a specific date and time, such as a season, an event, or a sports event.
The event name and the period of the event are acquired from the event information storage means for storing the event information set in which the event date and time and the outline are described (step 1), and the event name is acquired from the document information storage means for storing the document set having time information. And a document extracting step by event, which is stored in the event-specific document information storage means by event (step 2),
An event attention level that calculates a document that has time information within the specified counting period from the extracted document for each event , and calculates the total value of the documents up to now divided by the total number of days as the attention level Calculating step (step 3);
When an event with a high degree of attention is selected that satisfies a predetermined condition and is before the start of the event, the summary sentence of the event is morphologically analyzed at time interval T1, and the phrase is determined based on the appearance position and the number of characters of the phrase. If the document is extracted and exits after the start of the event, the document stored in the event-specific document information storage means is morphologically analyzed at a time interval T2 smaller than T1, and the number of documents appearing for each word is obtained. A phrase extraction step (step 4) is performed in which phrases with a large number of appearing documents are extracted and output to the phrase storage means.

本発明は、イベント注目度算出ステップ（ステップ３）において、
指定した集計期間内に検索要求され、イベント名に一致する検索語から、検索語を入力した利用者を特定する情報を用いて、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合には1回とカウントすることにより、イベントに関連する検索語の利用人数を集計し、文書からの集計値と検索語からの集計値を加えた合計値を定期的に求め、定期的に求めた合計値の現在までの総和を集計日数で除算した値を注目度として算出する。 The present invention provides an event attention level calculation step (step 3).
When the same user enters the same keyword more than once in a short time interval using information that identifies the user who entered the search term from the search terms that match the event name and requested during the specified aggregation period by counting a one-time, aggregated search terms using the number of which related to the event, regularly seeking the total value obtained by adding the aggregated value of the search words and aggregated value from the document to a regular The value obtained by dividing the total sum obtained up to the present by the total number of days is calculated as the attention level.

本発明は、イベント注目度算出ステップ（ステップ３）において、
開催前のイベントについては、イベント名が同じ過去のイベントにおける過去の文書からの集計値の総和である過去注目度を前記終了イベント蓄積手段から検索し、検索した過去のイベントの過去注目度を、現在までの文書の集計値の総和に加算し、過去のイベント開催日数と集計日数とを加算した日数で除算した値を注目度として算出し、終了後のイベントについては、あらかじめ設定した期間後、今回開催のイベントにおける文書からの集計値を全て加算した値を過去注目度として該終了イベント蓄積手段に蓄積する。 The present invention provides an event attention level calculation step (step 3).
For the event before the event, the past attention level that is the sum of the total values from the past documents in the past event with the same event name is searched from the end event accumulation means, and the past attention level of the searched past event is determined , Add to the sum of the document totals up to now, and calculate the value of attention divided by the number of days of past event days and the total number of days, and for events after the end, after a preset period, A value obtained by adding all the total values from documents in the event held this time is accumulated in the end event accumulation means as a past attention degree.

本発明は、イベント注目度算出ステップ（ステップ３）において、
開催前のイベントについては、イベント名が同じ過去のイベントにおける過去の文書からの集計値と検索語からの集計値との合計値の総和である過去注目度を終了イベント蓄積手段から検索し、検索した過去のイベントの過去注目度を定期的に求めた合計値の現在までの総和に加算し、過去のイベント開催日数と集計日数を加算した日数で除算した値を注目度として算出し、終了後のイベントについては、あらかじめ設定した期間後、今回開催のイベントにおける合計値を全て加算した値を過去注目度として該終了イベント蓄積手段に蓄積する。 The present invention provides an event attention level calculation step (step 3).
For events prior to the event, search the past event accumulation means for past attention, which is the sum of the total values from past documents and the total values from the search terms for past events with the same event name. After adding the past attention degree of the past event that has been obtained to the sum of the total values obtained up to the present time, the value obtained by dividing the past event holding days and the total number of days is calculated as the attention degree. For the event, a value obtained by adding all the total values in the event held at this time is stored in the end event storage means as a past attention level after a preset period.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項１）は、季節やイベントやスポーツイベントなどの、特定の日時に開始されるイベントに関する話題を抽出するイベント情報抽出装置１００であって、
イベントの開催日時や概要が記載されたイベント情報集合を蓄積するイベント情報蓄積手段２０１からイベント名と開催期間を取得し、時刻情報を持つ文書集合を蓄積する文書情報蓄積手段２０２からイベント名を含む文書を検索して抽出し、イベント別にイベント別文書情報蓄積手段１０３に格納するイベント別文書抽出手段１０１と、
抽出した文書から指定した集計期間内の時刻情報を持つ文書をイベント毎に定期的に集計し、現在までの文書の集計値の総和を集計日数で除算した値を注目度として算出するイベント注目度算出手段１０２と、
所定の条件を満たす注目度の高いイベントを選択し、該イベントの開始前である場合には、時間間隔Ｔ１で、該イベントの概要文を形態素解析し語句の出現位置や文字数に基づいて語句を抽出し、該イベントの開始後である場合には、上記Ｔ１よりも小さい時間間隔Ｔ２で、イベント別文書情報蓄積手段１０３に格納されている文書を形態素解析し、語句毎に出現する文書数を求め、出現する文書数が多い語句を抽出し語句蓄積手段２０４に出力する語句抽出手段１０４と、を有する。 The present invention (Claim 1) is an event information extraction device 100 that extracts topics related to events that start at a specific date and time, such as seasons, events, and sports events.
The event name and holding period are acquired from the event information storage unit 201 that stores the event information set that describes the event date and time and the summary, and the event name is included from the document information storage unit 202 that stores the document set having time information. An event-specific document extraction unit 101 that searches and extracts documents and stores them in the event-specific document information storage unit 103 by event;
An event attention level that calculates a document that has time information within the specified counting period from the extracted document for each event , and calculates the total value of the documents up to now divided by the total number of days as the attention level Calculating means 102;
When an event with a high degree of attention is selected that satisfies a predetermined condition and is before the start of the event, the summary sentence of the event is morphologically analyzed at time interval T1, and the phrase is determined based on the appearance position and the number of characters of the phrase. If it is extracted and after the start of the event, the document stored in the event-specific document information storage means 103 is morphologically analyzed at a time interval T2 smaller than T1, and the number of documents appearing for each phrase is determined. And a phrase extracting unit 104 that extracts a phrase having a large number of found documents and outputs the extracted phrase to the phrase storing unit 204.

本発明（請求項２）は、イベント注目度算出手段１０２において、
指定した集計期間内に検索要求され、イベント名に一致する検索語から、検索語を入力した利用者を特定する情報を用いて、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合には1回とカウントすることにより、イベントに関連する検索語の利用人数を集計し、前記文書からの集計値と検索語からの集計値を加えた合計値を定期的に求め、定期的に求めた合計値の現在までの総和を前記集計日数で除算した値を注目度として算出する手段を含む。 According to the present invention (claim 2), the event attention level calculation means 102
When the same user enters the same keyword more than once in a short time interval using information that identifies the user who entered the search term from the search terms that match the event name and requested during the specified aggregation period by counting a one-time, aggregated search terms using the number of which related to the event, regularly seeking the total value obtained by adding the aggregated value of the search words and aggregated value from the document to, Means for calculating as a degree of attention a value obtained by dividing the total sum obtained up to now by the total number of days .

本発明（請求項３）は、過去のイベント名に対応する過去注目度が格納された終了イベント蓄積手段を更に有し、
イベント注目度算出手段１０２において、
開催前のイベントについては、イベント名が同じ過去のイベントにおける過去の文書からの集計値の総和である過去注目度を終了イベント蓄積手段から検索し、検索した過去のイベントの過去注目度を、現在までの文書の集計値の総和に加算し、過去のイベント開催日数と集計日数とを加算した日数で除算した値を注目度として算出し、終了後のイベントについては、あらかじめ設定した期間後、今回開催のイベントにおける文書からの集計値を全て加算した値を過去注目度として該終了イベント蓄積手段に蓄積する手段を含む。 The present invention (Claim 3) further includes an end event accumulation unit that stores past attention levels corresponding to past event names,
In event attention level calculation means 102,
For events before the event, search the past attention level from the end event storage means, which is the sum of the aggregated values from past documents for past events with the same event name , added to the sum of the aggregate value of the document up to, to calculate the value obtained by dividing the number of days for which the sum of the aggregate number of days in the past of the event held on the number of days as the degree of attention, for after the end of the event, after a period which is set in advance, this time Means for accumulating in the end event accumulating means as a past attention level a value obtained by adding all the total values from documents in the held event .

本発明（請求項４）は、過去のイベント名に対応する過去注目度が格納された終了イベント蓄積手段を更に有し、
イベント注目度算出手段１０２において、
開催前のイベントについては、イベント名が同じ過去のイベントにおける過去の文書からの集計値と検索語からの集計値との合計値の総和である過去注目度を前記終了イベント蓄積手段から検索し、検索した過去のイベントの過去注目度を定期的に求めた合計値の現在までの総和に加算し、過去のイベント開催日数と前記集計日数を加算した日数で除算した値を注目度として算出し、終了後のイベントについては、あらかじめ設定した期間後、今回開催のイベントにおける合計値を全て加算した値を過去注目度として該終了イベント蓄積手段に蓄積する手段を含む。

The present invention (Claim 4) further includes an end event accumulation means in which past attentions corresponding to past event names are stored,
In event attention level calculation means 102,
For the event before the event, the past event degree is searched from the end event accumulation means, which is the sum of the total value of the total value from the past document and the total value from the search word in the past event having the same event name, The past attention degree of the past events searched is added to the total sum obtained up to the present , and the value obtained by dividing the past event holding days and the total number of days is calculated as the attention degree. The event after the end includes means for accumulating a value obtained by adding all the total values of the currently held event as a past attention degree in the end event accumulating means after a preset period.

本発明（請求項５）は、請求項1乃至４のいずれか1項に記載のイベント情報抽出装置の各手段としてコンピュータを機能させるためのイベント情報抽出プログラムである。 The present invention (Claim 5 ) is an event information extraction program for causing a computer to function as each means of the event information extraction apparatus according to any one of Claims 1 to 4 .

上記のように、本発明によれば、特定の期間開催されるイベントの注目度を算出し、これから開始されるイベントについては、内容を紹介したイベント情報(例えば、主催者の案内文)の中から語句を抽出することで、イベントが開始される前に話題になる可能性のある情報を提供することができる。 As described above, according to the present invention, the degree of attention of an event held for a specific period is calculated, and for an event to be started from now on, event information that introduces the content (for example, a guide text of the organizer) is included. By extracting a phrase from, information that may become a topic before the event is started can be provided.

また、イベントが開始されてからは、イベントを見ている人々が発信する文書情報(例えば、クチコミ情報)から話題になっている語句を抽出し、提供することができる。 In addition, after the event is started, it is possible to extract and provide a word / phrase that is a topic from document information (for example, word-of-mouth information) transmitted by people watching the event.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態におけるイベント情報抽出装置の構成を示す。 FIG. 3 shows the configuration of the event information extraction device in one embodiment of the present invention.

同図に示すイベント情報抽出装置１００は、イベント情報蓄積部２０１、文書情報蓄積部２０２、検索語蓄積部２０３からの情報を入力とし、語句蓄積部２０４に抽出した語句を出力する。 The event information extraction apparatus 100 shown in the figure receives information from the event information storage unit 201, the document information storage unit 202, and the search word storage unit 203 and outputs the extracted words to the word storage unit 204.

イベント情報抽出装置１００は、イベント情報抽出部１０１、イベント注目度算出部１０２、イベント別文書情報蓄積部１０３、語句抽出部１０４、終了イベント蓄積部１０５から構成される。 The event information extraction device 100 includes an event information extraction unit 101, an event attention level calculation unit 102, an event-specific document information storage unit 103, a phrase extraction unit 104, and an end event storage unit 105.

イベント情報蓄積部２０１には、イベント情報が蓄積されている。図４に、イベント情報蓄積部２０１の例を示す。同図に示すように、イベント情報は、イベント名４０１、開始日４０２、終了日４０３、時刻４０４、場所４０５、概要文４０６などのイベントに関する情報から構成されている。 The event information storage unit 201 stores event information. FIG. 4 shows an example of the event information storage unit 201. As shown in the figure, the event information includes information related to events such as an event name 401, a start date 402, an end date 403, a time 404, a place 405, and a summary sentence 406.

文書情報蓄積部２０２には、ホームページやblog、掲示板などのインターネット上で公開されている文書情報から、新しく生成された文書情報や更新された文書情報が収集され、作成日時や更新日時などの時刻情報と共に蓄積されている。これらの文書情報からは、イベントを体験した人々のイベントに対する感想や、イベントの開催状況を紹介したニュース記事などを収集することができる。 The document information storage unit 202 collects newly generated document information and updated document information from document information published on the Internet such as a home page, blog, and bulletin board, and creates a time such as a creation date and an update date. Accumulated with information. From these document information, it is possible to collect impressions of people who have experienced the event and news articles that introduce the event status.

検索語蓄積部２０３には、インターネット上で検索サービスを提供する検索サイトにおいて、利用者により入力された検索語が収集され、検索要求された時刻と共に蓄積されている。検索語は、利用者が関心を持っている情報を探す手掛かりとして入力する語であるから、定期的に検索語を収集し、解析することで、利用者の関心が高かった情報を知ることができる。 The search word storage unit 203 collects search words input by the user at a search site that provides a search service on the Internet and stores them together with the time when the search is requested. Search terms are words that are input as a clue to search for information that the user is interested in, so by collecting and analyzing the search terms regularly, you can know the information that the user is highly interested in. it can.

イベント注目度算出部１０２は、定期的にイベント情報蓄積部２０１からイベント情報を取得し、イベント名別に管理する。イベント情報は短くても開催数日前にイベント情報蓄積部２０１に登録されると考えられるので、イベント注目度算出部１０２の処理を１日１回実行したとしても、イベント開催前にイベントの注目度算出することが可能である。 The event attention level calculation unit 102 periodically acquires event information from the event information storage unit 201 and manages the event information by event name. Even if the event information is short, it is considered that it is registered in the event information storage unit 201 a few days before the event, so even if the event attention level calculation unit 102 executes the process once a day, the event attention level before the event is held. It is possible to calculate.

また、イベントが終了するとイベントに関する新しい情報は収集できなくなる（イベントに参加した人々の感想や体験談もイベント終了後数日でなくなる傾向にある）ので、管理するイベントから削除する。 Also, when the event ends, new information about the event cannot be collected (the impressions and experiences of people who participated in the event tend to disappear within a few days after the event ends), and are deleted from the managed event.

図５に、イベント注目度算出部１０２で蓄積される情報の例を示す。同図（ａ）は、イベント別集計結果テーブル５００の構成例であり、イベント情報蓄積部２０１から新しいイベント情報を取得し、イベント毎にイベントを識別するための一意な番号であるイベントＩＤ５０１を付与し、イベント情報を管理している。 FIG. 5 shows an example of information accumulated in the event attention level calculation unit 102. FIG. 6A shows an example of the structure of the event totaling result table 500. New event information is acquired from the event information storage unit 201, and an event ID 501 that is a unique number for identifying the event is assigned to each event. Event information is managed.

イベント名５０２は、イベントの名称、開始日５０３はイベントの開始日、終了日５０４はイベントの終了日であり、時刻５０５は１日の中でイベントが開催されている開始時刻と終了時刻が記録される。 The event name 502 is the name of the event, the start date 503 is the start date of the event, the end date 504 is the end date of the event, and the time 505 records the start time and end time at which the event is held during the day. Is done.

過去注目度５０６は、過去に同様のイベントが開催されていた際の注目度であり、新しいイベントをイベント別集計結果テーブル５００に追加する際に終了イベント蓄積部１０５を検索して記録される。 The past attention level 506 is the attention level when a similar event has been held in the past, and is recorded by searching the end event accumulation unit 105 when adding a new event to the event-by-event totaling result table 500.

次回語句抽出日時５０７は、イベントに関連する語句を抽出する間隔を制御するための時刻情報であり、語句抽出部１０４において語句抽出処理が行なわれる際に次回語句抽出日時が更新される。 The next word / phrase extraction date / time 507 is time information for controlling the interval for extracting the word / phrase related to the event, and the word / phrase extraction date / time is updated when the word / phrase extraction unit 104 performs the word / phrase extraction process.

集計終了日時５０８は、イベントが終了した後の日時に設定され、集計終了日時５０８を過ぎたイベントはイベント別集計結果テーブル５００から削除される。 The count end date and time 508 is set to the date and time after the event ends, and events that have passed the count end date and time 508 are deleted from the event-based count result table 500.

集計値５０９は、イベントの注目度を算出する際に使用するデータであり、一定期間毎のデータを特定の期間分蓄積することができる。また、集計データ５０９の値と、過去注目度５０６の値から同図（ｂ）に示すようなイベント注目度のランキングを生成する。イベント注目度は、あまり短い間隔で算出しても変化がなく、一方、あまり長い間隔で算出すると、注目され始めてから時間が経過してしまうという問題がある。また、新しいイベントが追加されたら、追加されたイベントの注目度を算出する必要がある。そこで、イベント別集計結果テーブルに５００に新しいイベントを追加した後で、イベント注目度のランキング生成処理を実行するように構成する。例えば、イベントの追加処理が１日に１回であれば、ランキング生成処理も１日に１回実行する。 The total value 509 is data used when calculating the degree of attention of an event, and data for a certain period can be accumulated for a specific period. Further, a ranking of event attention level as shown in FIG. 5B is generated from the value of the total data 509 and the value of the past attention level 506. Even if the event attention level is calculated at a very short interval, there is no change. On the other hand, if the event attention level is calculated at a very long interval, there is a problem that the time elapses after the attention starts. Further, when a new event is added, it is necessary to calculate the degree of attention of the added event. Therefore, after a new event is added to the event totaling result table 500, the event attention ranking ranking generation process is executed. For example, if the event addition process is once a day, the ranking generation process is also executed once a day.

イベント情報抽出部１０１は、イベント注目度算出部１０２における処理とは独立に実行される。検索語蓄積部２０１には次々と新しい検索語が蓄積され、文書情報蓄積部２０２にも同様に次々と収集された新しい文書情報が蓄積されている。イベント情報抽出部１０１は、検索語蓄積部２０１と文書情報蓄積部２０２に蓄積されている情報を定期的に取得し、イベントに関連する情報だけを抽出する。ここでの抽出間隔が、利用者に情報を提供する最小の間隔になる。まず、イベント注目度算出部１０２からイベント名を取得し、検索語蓄積部２０３から取得した検索語からイベント名に一致する検索語を集計する。また、文書情報蓄積部２０２から取得した文書情報から、イベント名を含む文書情報をイベント名毎に集計する。そして、検索語から集計した集計値と文書情報から集計した集計値とを加算し、イベント注目度算出部１０２のイベント別集計結果テーブル５００の集計値５０９欄に記録する。また、イベントに関連した文書情報をイベント毎に整理してイベント別文書情報蓄積部１０３に記録する。 The event information extraction unit 101 is executed independently of the processing in the event attention level calculation unit 102. New search words are successively stored in the search word storage unit 201, and new document information collected one after another is stored in the document information storage unit 202 in the same manner. The event information extraction unit 101 periodically acquires information stored in the search word storage unit 201 and the document information storage unit 202, and extracts only information related to the event. The extraction interval here is the minimum interval for providing information to the user. First, an event name is acquired from the event attention level calculation unit 102, and search words that match the event name are tabulated from the search words acquired from the search word storage unit 203. Further, document information including event names is totaled for each event name from the document information acquired from the document information storage unit 202. Then, the total value aggregated from the search term and the total value aggregated from the document information are added and recorded in the total value column 509 of the event-specific total result table 500 of the event attention level calculation unit 102. Also, document information related to the event is organized for each event and recorded in the event-specific document information storage unit 103.

語句抽出部１０４は、イベント情報抽出部１０１の処理終了後、イベント注目度算出部１０２から注目度の高いイベント名を取得し、注目度の高いイベントが開催前であればイベント情報蓄積部２０１から概要文を取得し、話題となる語句を抽出して語句蓄積部２０４に出力する。注目度の高いイベントの開催後は、イベント別文書情報蓄積部１０３から文書情報を取得し、話題を表す語句を抽出し、語句蓄積部２０４に出力する。終了したイベントは、数日すると注目度の高いイベントして取得されなくなり、語句の抽出も終了する。 After the processing of the event information extraction unit 101 ends, the word / phrase extraction unit 104 acquires an event name with a high degree of attention from the event attention level calculation unit 102. A summary sentence is acquired, a topic phrase is extracted and output to the phrase storage unit 204. After an event with a high degree of attention is held, document information is acquired from the event-specific document information storage unit 103, a word representing a topic is extracted, and is output to the word storage unit 204. The completed event is not acquired as an event with a high degree of attention after several days, and the extraction of words is also completed.

イベント開催前において、イベントに関心のある人々が記載した文書情報があったとしても、イベントの概要文より多くの情報が含まれていることはないため、イベント情報蓄積部２０１の概要文４０６から語句の抽出を行なう。また、イベント情報蓄積部２０１におけるイベント情報は頻繁に更新されることはないため（一旦登録された後は、変更があった場合のみ更新される）、イベント開催前の語句抽出処理は頻繁に行なう必要がない。例えば、１日１回程度で充分だと考えられる。より効率的には、イベント情報蓄積部２０１において登録及び更新されたイベント情報を識別するための識別情報をイベント毎に付与し、識別情報をチェックすることにより登録及び更新時にのみ語句抽出処理を実行するよう構成してもよい。この場合、語句抽出処理が終了した時点で、前述の識別情報をリセットする必要がある。 Even if there is document information written by people who are interested in the event before the event is held, the information does not contain more information than the summary text of the event. Extract words. In addition, since the event information in the event information storage unit 201 is not frequently updated (once it is registered, it is updated only when there is a change), so the word / phrase extraction process before the event is frequently performed. There is no need. For example, once a day is considered sufficient. More efficiently, identification information for identifying event information registered and updated in the event information storage unit 201 is assigned to each event, and the phrase extraction process is executed only during registration and update by checking the identification information. You may comprise. In this case, it is necessary to reset the above-described identification information when the phrase extraction process is completed.

イベント開催期間中は、イベントに関心のある人々が、次々と新しい文書情報を公開するので、語句抽出処理の間隔を短くし、最新の情報を次々と利用者に提供可能とする。イベント終了後は、新しい情報が少なくなり、イベント終了後にイベントに関する話題を提供しても利用者が次回の参考にするしかない。そのため、イベント終了後の語句抽出処理はイベント開催期間中に比べ長く設定する。このように、イベント開催中は語句抽出処理を頻繁に実行し、常に最新の話題を提供することが可能であり、イベント開催前と終了後の語句抽出処理は間隔を開けて実行することにより、語句抽出処理の負荷を軽減させることが可能となる。 During the event period, people interested in the event release new document information one after another, so the interval of the phrase extraction process is shortened and the latest information can be provided to users one after another. After the event, new information is reduced, and users can only refer to the next time to provide topics about the event after the event. Therefore, the word / phrase extraction process after the end of the event is set longer than during the event holding period. In this way, it is possible to frequently execute the phrase extraction process during the event and always provide the latest topics, and by performing the phrase extraction process before and after the event at intervals, It is possible to reduce the load of the phrase extraction process.

図６に語句蓄積部２０４に出力される語句の蓄積例を示す。抽出された語句１００４は、イベント名１００１、イベントの開始日１００２、終了日１００３に対応付けられて蓄積される。定期的にイベント情報抽出部１０１と語句抽出部１０４の処理を実行することにより、注目度の高いイベントと、そのイベントの話題を表す語句が、次々と出力される。語句蓄積部２０４には、イベントの開始日１００２、終了日１００３が記録されているので、今開催中の注目イベントでは何が話題になっているか、あるいは、これからどんな注目イベントが開催されるのか、といった情報提供が可能となる。 FIG. 6 shows an example of storage of words output to the word storage unit 204. The extracted word / phrase 1004 is stored in association with the event name 1001, the event start date 1002, and the event end date 1003. By periodically executing the processing of the event information extraction unit 101 and the phrase extraction unit 104, an event with a high degree of attention and a phrase representing the topic of the event are output one after another. The phrase storage unit 204 records the start date 1002 and the end date 1003 of the event, so what is the subject of the attention event currently being held or what attention event will be held in the future? Such information can be provided.

図７に、イベント情報抽出装置１００の各処理のタイミングを、イベント名「イベントＡ」を例として図示している。まず、「イベントＡ」がイベント情報蓄積部２０１（図４，４０１）に登録される。登録後イベント注目度算出部１０２が起動されると、図５のイベント別集計結果テーブル５００に「イベントＡ」を追加し、イベントＩＤ「0000101を付与して管理する。図４における「イベントＡ」の開始日４０２、終了日４０３、時刻４０４を、図５の開始日５０３、終了日５０４、時刻５０５にコピーする。 In FIG. 7, the timing of each process of the event information extraction apparatus 100 is illustrated with the event name “event A” as an example. First, “event A” is registered in the event information storage unit 201 (FIG. 4, 401). When the post-registration event attention calculation unit 102 is activated, “event A” is added to the event totaling result table 500 of FIG. 5 and the event ID “0000101” is assigned and managed. The start date 402, the end date 403, and the time 404 are copied to the start date 503, the end date 504, and the time 505 in FIG.

続いて「イベントＡ」に対して終了イベント蓄積部１０５を検索し、過去に同様のイベントが開催され、どの程度の注目度であったかを調べる。終了イベント蓄積部１０５は、図８に示すように、イベント名１１０１、開始日１１０２、終了日１１０３、過去注目度１１０４から構成される。イベント名「イベントＡ」で終了イベント蓄積部１０５のイベント名を検索する。この際、「イベントＡ」に年号などが含まれる場合には、年号を除いた文字列で検索を行ない、終了イベント蓄積部１０５のイベント名１１０１の欄から文字列を含むイベント名を取得する。検索の結果、複数のイベント名が取得できた場合は、終了日の新しいイベントを選択し、対応する過去注目度１１０４を取得する。図５のイベント別集計結果テーブル５００の例は、「イベントＡ」の検索結果として、イベントが取得できなかった例であり、過去注目度は空欄とする。また、「イベントＣ」については過去イベント蓄積部１０５を検索した結果「イベントＸ」が取得された例であり、過去注目度「３０」を取得し、イベント別集計結果テーブル５００の過去注目度５０６に「３０」を記録している。 Subsequently, the end event accumulation unit 105 is searched for “event A”, and it is checked how much attention has been given to the similar event in the past. As shown in FIG. 8, the end event accumulation unit 105 includes an event name 1101, a start date 1102, an end date 1103, and a past attention level 1104. The event name of the end event accumulation unit 105 is searched with the event name “event A”. At this time, if “event A” includes a year or the like, a search is performed using a character string excluding the year, and an event name including the character string is obtained from the event name 1101 column of the end event accumulation unit 105. To do. If a plurality of event names can be acquired as a result of the search, a new event with an end date is selected, and the corresponding past attention level 1104 is acquired. The example of the event totaling result table 500 in FIG. 5 is an example in which an event could not be acquired as a search result of “event A”, and the past attention level is blank. Further, “event C” is an example in which “event X” is acquired as a result of searching the past event accumulation unit 105, the past attention level “30” is acquired, and the past attention level 506 in the event-by-event total result table 500 is acquired. “30” is recorded.

新しくイベントを追加した際の次回語句抽出日時５０７の欄は、空のままにしておく。 The column for the next word extraction date and time 507 when a new event is added is left blank.

集計終了日時５０８は、イベント終了日時の５日後「２００４年９月５日１７：００：００」に設定する。集計終了日時を過ぎたイベントは、イベント別集計結果テーブル５００から削除される。集計終了日時５０８は、イベント終了日時に何日かを加算して設定するが、加算する日数は予めイベントに関連する文書数や検索語数の推移を調査することにより決定しておく。加算する日数は、イベントによらず、全て同じ日数に設定してもよいし、イベントのタイプによって人々の関心が消えるスピードが異なる場合を考慮し、イベントのタイプ別に日数を設定してもよい。例えば、花火大会など１日しか行なわれないイベントの場合は、２日程度で人々の関心がなくなる傾向がある。タイプ別に日数を設定する場合は、タイプを決定するためのルール（例えば、イベント名に「花火」を含む）を定め、ルールにマッチするイベントに対して、タイプ毎に決定した日数を設定する。 The aggregation end date and time 508 is set to “September 5, 2004, 17:00: 00” five days after the event end date and time. Events that have passed the aggregation end date and time are deleted from the event-specific aggregation result table 500. The totaling end date and time 508 is set by adding several days to the event end date and time, and the number of days to be added is determined in advance by examining the transition of the number of documents and search terms related to the event. The number of days to be added may be set to the same number of days regardless of the event, or the number of days may be set for each event type in consideration of the case where the speed at which people's interest disappears differs depending on the event type. For example, in the case of an event that takes place only for one day such as a fireworks display, people tend to lose interest in about two days. When setting the number of days for each type, a rule for determining the type (for example, including “fireworks” in the event name) is defined, and the number of days determined for each type is set for an event that matches the rule.

続いて、イベント注目度算出部１０２において、イベント注目度のランキングを生成し、イベント注目度テーブル５１０に記録する。 Subsequently, the event attention level calculation unit 102 generates an event attention level ranking and records it in the event attention level table 510.

図９は、本発明の一実施の形態におけるイベント注目度算出部のランキング生成処理の流れを示す図である。 FIG. 9 is a diagram showing a flow of ranking generation processing of the event attention level calculation unit according to the embodiment of the present invention.

まず、イベント別集計結果テーブル５００から全てのイベントＩＤ集合を取得し（ステップ１０１）、イベントＩＤ集合からイベントを１つ取り出し（ステップ１０３）、イベントが開催前かどうか調べる（ステップ１０４）。開催前であれば（ステップ１０４、Ｙｅｓ），過去注目度５０６と現在までの集計値を元にイベント注目度を以下の式により算出する（ステップ１０５）。 First, all event ID sets are acquired from the event totaling result table 500 (step 101), one event is extracted from the event ID set (step 103), and it is checked whether the event is before the event (step 104). If it is before the event (step 104, Yes), the event attention level is calculated by the following formula based on the past attention level 506 and the total value up to the present time (step 105).

イベントが開催されている場合（ステップ１０４、Ｎｏ）、集計終了日時５０８を過ぎているか調べ（ステップ１０６）、集計終了日時５０８を過ぎていない場合は（ステップ１０６、Ｎｏ）、集計値５０９の現在までの値を全て加算し、イベント開催日から現在日までの日数で除算した値をイベント注目度として算出する（ステップ１０７）。また、集計終了日時を過ぎている場合は（ステップ１０６、Ｙｅｓ）、集計終了日時までの集計値を全て加算した値を算出し、終了イベント蓄積部１０５の過去注目度１１０４の欄に記録する（ステップ１０８）。続いて、イベント別集計結果テーブル５００からイベントを削除する（ステップ１０９）。

When the event is held (step 104, No), it is checked whether or not the counting end date and time 508 has passed (step 106). When the counting end date and time 508 has not been passed (step 106, No), the current value of the counting value 509 is determined. All the values up to are added, and a value obtained by dividing by the number of days from the event date to the current date is calculated as the event attention degree (step 107). If the total end date / time has passed (step 106, Yes), a value obtained by adding all the total values up to the total end date / time is calculated and recorded in the past attention degree 1104 column of the end event storage unit 105 ( Step 108). Subsequently, the event is deleted from the event totaling result table 500 (step 109).

ステップ１０１で取得したイベントＩＤ集合の全てに対してステップ１０３〜ステップ１０９までの条件に一致する処理を実行し、全てのイベントＩＤに対して処理が終了したら（ステップ１０２、Ｙｅｓ）、イベントＩＤ毎に算出したイベント注目度を注目度の高い順にランクを付け、イベント注目度テーブル５１０に出力する。この例では、現在のイベント注目度は、１日あたり平均注目数であり、過去注目度は、イベント開催前から終了後までの人々の関心があった期間全体にわたっての注目数である。常に関心が高いイベントについては、１日あたりの平均注目数（＝イベント注目度）が略一定になり、イベントが開始された直後にのみ関心が高かったイベントについては、イベント開催期間が長くなるにつれ、１日あたりの平均注目数（＝イベント注目度）が低くなる。 When processing that matches the conditions from step 103 to step 109 is executed for all event ID sets acquired in step 101 and the processing is completed for all event IDs (step 102, Yes), for each event ID The event attention degrees calculated in the above are ranked in descending order of the attention degree and output to the event attention degree table 510. In this example, the current event attention level is the average number of attentions per day, and the past attention level is the number of attentions over the entire period in which people were interested before and after the event was held. For events that are always of high interest, the average number of attention per day (= event attention) will be approximately constant, and for events that are of high interest only immediately after the event has started, the event duration will increase. The average number of attentions per day (= event attention level) decreases.

イベント情報抽出部１０１は、常に一定間隔でイベント注目度算出部１０２からイベント名を取得し、イベントに関連する情報を抽出する。そのため、「イベントＡ」がイベント注目度算出部１０２のイベント別集計結果テーブル５００に追加後、イベント情報抽出部１０１の処理が起動されると、「イベントＡ」に関連した文書情報が抽出され始める。イベント情報抽出部１０１は、イベント別集計結果テーブル５００から「イベントＡ」が削除されるまでの期間、定期的に（例えば１時間間隔）で処理を実行する。イベント情報抽出部１０１の処理により、イベント別集計結果テーブル５００の集計値５０９に文書情報と検索語の集計値が１時間おきに記録されていく。また、イベント別文書情報蓄積部１０３にイベント別に整理された文書情報が蓄積されていく。

The event information extraction unit 101 always acquires an event name from the event attention level calculation unit 102 at regular intervals, and extracts information related to the event. Therefore, after “event A” is added to the event totalization result table 500 of the event attention level calculation unit 102, when the processing of the event information extraction unit 101 is started, document information related to “event A” starts to be extracted. . The event information extraction unit 101 executes processing periodically (for example, at an interval of one hour) for a period until “event A” is deleted from the event totaling result table 500. By the processing of the event information extraction unit 101, the total value of the document information and the search word is recorded every hour in the total value 509 of the event-specific total result table 500. Also, the document information organized by event is accumulated in the document information accumulation unit 103 by event.

イベント情報抽出部１０１の処理の流れについて図１０を用いて説明する。 The process flow of the event information extraction unit 101 will be described with reference to FIG.

図１０は、本発明の一実施の形態におけるイベント情報抽出部の処理の流れを示す図である。 FIG. 10 is a diagram showing a flow of processing of the event information extraction unit in one embodiment of the present invention.

イベント情報抽出部１０１は、まず、イベント注目度算出部１０２から、イベントＩＤとイベント名を取得する（ステップ２０１）。次に、検索語蓄積部２０１から時刻範囲を指定して検索語を取得し（ステップ２０２）、イベント名と一致する検索語を、イベント名毎に集計する（ステップ２０３）。ここで指定する時刻範囲は、１時間おきに処理が実行される場合であれば、現在時刻よりも前の１時間とし、次回指定する時刻範囲は、今回指定した時刻範囲に連続した１時間とする。 The event information extraction unit 101 first acquires an event ID and an event name from the event attention level calculation unit 102 (step 201). Next, a search term is acquired from the search term storage unit 201 by specifying a time range (step 202), and search terms that match the event name are tabulated for each event name (step 203). The time range specified here is one hour before the current time if processing is executed every other hour, and the time range specified next time is one hour consecutive to the time range specified this time. To do.

検索語は、同一利用者が複数回入力することが考えられる。そのため、検索語を入力した利用者を特定する情報（例えば、ＷｅｂブラウザのＣｏｏｋｉｅ情報）により、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合は１回とカウントすることにより、検索語毎の入力人数を集計する。 It is conceivable that the same user inputs the search term multiple times. Therefore, if the same user inputs the same keyword a plurality of times at short time intervals based on information (for example, cookie information of the Web browser) that identifies the user who has input the search word, the search is performed by counting once. Aggregate the number of people entering each word.

次に、文書情報蓄積部２０２からも時刻範囲を指摘して文書情報を取得し（ステップ２０４）、イベント名を含む文書情報をイベントに関連する文書情報として選び出し、イベント名毎に文書数を集計する。ここで指定する時刻範囲は、ステップ２０２で指定した時刻範囲と同じ時刻範囲である。イベントに関連した文書情報はイベント毎に整理してイベント別文書情報蓄積部１０３に記録する（ステップ２０５）。 Next, the document information storage unit 202 also points out the time range and acquires the document information (step 204), selects the document information including the event name as the document information related to the event, and counts the number of documents for each event name. To do. The time range specified here is the same time range as the time range specified in step 202. Document information related to the event is organized for each event and recorded in the event-specific document information storage unit 103 (step 205).

イベントに関連する検索語や文書情報を選択する際に、イベント名を使用したが、イベント名に年号や地名（例えば、○○○in Tokyo）が含まれる場合などは、年号や地名を除いた文字列を用いて、関連する検索語や文書情報を選択する。 When selecting a search term or document information related to an event, the event name was used. However, if the event name includes a year or place name (for example, XX in Tokyo), enter the year or place name. Using the excluded character string, a related search term and document information are selected.

また、イベント別文書情報蓄積部１０３の蓄積方法としては、イベントＩＤ毎に文書情報を格納する領域を設定し、その領域内に時刻情報との対応が取れるように文書情報に時刻を含むファイル名を付けるなどして蓄積する。 In addition, as a storage method of the event-specific document information storage unit 103, an area for storing document information is set for each event ID, and the file name including the time is included in the document information so that the time information can be associated with the area. Accumulate by adding, for example.

そして、イベントＩＤ毎に検索語の集計値と文書情報から集計した文書数集計値とを加算し、イベント注目度算出部１０２のイベント別集計結果テーブル５００の集計値５０９に記録する（ステップ２０６）。検索語の集計値は、イベント名で検索要求をした利用者の数であり、イベントに関心をもった人の数と言える。また、文書数の集計値についてもイベントに対する感想や意見を文書にしている点からイベントに関心を持った人の数を表している。つまり、集計データ５０９には、イベントに関心をもった人の数が期間毎に蓄積されることになる。 Then, for each event ID, the total value of the search terms and the total number of documents calculated from the document information are added and recorded in the total value 509 of the event-specific total result table 500 of the event attention level calculation unit 102 (step 206). . The total value of the search term is the number of users who have made a search request with the event name, and can be said to be the number of people interested in the event. In addition, the total number of documents represents the number of people who are interested in the event because the comments and opinions about the event are documented. That is, the total data 509 stores the number of people interested in the event for each period.

語句抽出部１０４も、イベント情報抽出部１０１と同様、常に一定間隔（例えば１時間間隔）で処理を実行している。イベント注目度算出部１０２のイベント注目度テーブル５１０から注目度の高いイベントＩＤをＮ件取得し、イベントの話題を表す語句（話題語）を抽出する。そのため、「イベントＡ」がイベント注目度テーブル５１０において注目度の高いイベントＮ件にリストアップされてから、語句抽出部１０４の処理が起動されると、「イベントＡ」に関する話題語が抽出され始める。「イベントＡ」が注目度の高いイベントＮ件に選ばれない場合や、集計終了日時が過ぎて、イベント別集計結果テーブル５００からイベントが削除された場合は、「イベントＡ」についての話題語は抽出されない。 Similarly to the event information extraction unit 101, the phrase extraction unit 104 always executes processing at regular intervals (for example, one hour interval). N event IDs with a high degree of attention are obtained from the event attention level table 510 of the event attention level calculation unit 102, and a phrase (topic word) representing the topic of the event is extracted. For this reason, when “event A” is listed as N events with high attention level in the event attention level table 510 and the processing of the phrase extraction unit 104 is started, topic words related to “event A” begin to be extracted. . If “Event A” is not selected as an N event with a high degree of attention, or if the event is deleted from the event-by-event aggregation result table 500 after the aggregation end date and time, the topic word for “Event A” is Not extracted.

語句抽出部１０４は、実行のたびに注目度の高いＮ件のイベントに関して話題語を抽出し、語句蓄積部２０４に出力していく。 The phrase extraction unit 104 extracts topic words for N events having a high degree of attention each time it is executed, and outputs the topic words to the phrase storage unit 204.

語句抽出部１０４の流れについて図１１を用いて説明する。 The flow of the phrase extraction unit 104 will be described with reference to FIG.

図１１は、本発明の一実施の形態における語句抽出部の流れを示す図である。 FIG. 11 is a diagram showing the flow of the phrase extraction unit in one embodiment of the present invention.

語句抽出部１０４は、イベント情報抽出部１０１の処理終了後起動される。即ち、イベント情報抽出部１０１の処理が１時間おきであれば、語句抽出部１０４の処理も１時間おきに実行される。 The phrase extraction unit 104 is activated after the event information extraction unit 101 finishes processing. That is, if the process of the event information extraction unit 101 is every other hour, the process of the phrase extraction unit 104 is also executed every hour.

まず、イベント注目度算出部１０２のイベント注目度テーブル５１０から注目度の高いイベントＩＤ５１１を上位Ｎ件取得する（ステップ３０１）。次に、イベントＩＤを１つ取り出し、イベント別集計結果テーブル５００から注目度の高いイベントＩＤに該当するイベント名５０２、開始日５０３、終了日５０４、時刻５０５、次回語句抽出日時５０７、集計終了日時５０８を取得する（ステップ３０３）。次回語句抽出日時が空（登録されたイベントで今回初めて処理される）または、現在日時が既に次回語句抽出日時を過ぎているか調べ（ステップ３０４）、条件に一致する場合は、次にイベント開催前かどうかを調べる（ステップ３０５）。イベント開催前であれば、現在の時刻に時間Ｔ１を加算して次回語句抽出日時を設定し（ステップ３０６）、イベント情報蓄積部２０１から概要文を取得し（ステップ３０７）、語句を抽出して、イベント名、開始日、終了日、と共に語句蓄積部２０４に出力する（ステップ３０８）。ここで加算する時間Ｔ１は、イベント開催前であるから、ある程度長い時間で十分であり、例えば１日（２４時間）とする。また、イベント開催前であるから、イベントに関心のある人々が記載した文書情報があったとしても、イベントの概要以上の詳細な内容が含まれていることはなく、そのためイベント情報蓄積部２０１の概要文４０６からの語句の抽出を行なう。 First, the top N event IDs 511 having a high level of attention are acquired from the event level of attention table 510 of the event level of interest calculation unit 102 (step 301). Next, one event ID is extracted, and the event name 502, start date 503, end date 504, time 505, next word extraction date / time 507, count end date / time corresponding to the event ID having the high degree of attention are collected from the event totaling result table 500 508 is acquired (step 303). Check whether the next word extraction date / time is empty (processed for the first time with a registered event) or whether the current date / time has already passed the next word extraction date / time (step 304). Whether or not (step 305). If it is before the event, add the time T1 to the current time to set the next word extraction date and time (step 306), obtain a summary sentence from the event information storage unit 201 (step 307), extract the word The event name, start date, and end date are output to the phrase storage unit 204 (step 308). Since the time T1 to be added here is before the event is held, a somewhat long time is sufficient, for example, one day (24 hours). Also, since it is before the event, even if there is document information written by people who are interested in the event, it does not contain detailed content beyond the outline of the event, so the event information storage unit 201 A phrase is extracted from the summary sentence 406.

次に、イベント開催前ではない場合（ステップ３０５、Ｎｏ）、イベントが開催期間中かどうか調べ（ステップ３０９）、開催期間中の場合は（ステップ３０９、Ｙｅｓ）、語句抽出処理の開始時刻に時間Ｔ２を加算して次回語句抽出日時を設定する（ステップ３１０）。イベント開催期間中は、イベントに関心のある人々が記載した文書情報を次々と収集できるので、イベント別文書情報蓄積部１０３からイベントに関する文書情報集合を取得し（ステップ３１１）、文書情報集合から語句を抽出してイベント名、開始日、終了日と共に語句蓄積部２９４に出力する（ステップ３１２）。ステップ３１０で加算する時間Ｔ２は、イベント開催期間中であるため、短い時間を設定する。ここでは、イベント情報抽出部１０１の処理間隔に併せて、例えば１時間とする。イベント情報抽出部１０１の処理間隔より短い時間で実行しても、新しい文書情報がイベント別文書情報蓄積部１０３に蓄積されていないので、新しい話題語を抽出することができない。即ち、語句の抽出をもっと短い時間で実行したければ、イベント情報抽出部１０１の処理間隔をもっと短くする必要ある。 Next, when it is not before the event (step 305, No), it is checked whether the event is being held (step 309), and when it is being held (step 309, Yes), the time at the start time of the phrase extraction process T2 is added to set the next word extraction date (step 310). During the event period, document information described by people interested in the event can be collected one after another. Therefore, a document information set related to the event is acquired from the event-specific document information storage unit 103 (step 311), and the phrase is extracted from the document information set. Are extracted and output together with the event name, start date, and end date to the phrase storage unit 294 (step 312). Since the time T2 to be added in step 310 is during the event holding period, a short time is set. Here, for example, one hour is set in accordance with the processing interval of the event information extraction unit 101. Even if it is executed in a shorter time than the processing interval of the event information extraction unit 101, new topic information cannot be extracted because new document information is not stored in the event-specific document information storage unit 103. That is, if it is desired to execute the phrase extraction in a shorter time, the processing interval of the event information extraction unit 101 needs to be further shortened.

次に、イベント開催期間中でない場合（ステップ３０９、Ｎｏ）は、すでにイベントが終了している場合であり、語句抽出処理の開始時刻に時間Ｔ３を加算して次回語句抽出日時を設定する（ステップ３１３）。この時間Ｔ３は、イベントが終了した後であるから、短い時間を設定する必要はなく、例えば、１２時間とする。イベントが終了してからも、新しい文書情報が取得できる可能性があるため、イベント終了後もイベント別文書情報蓄積部１０３から文書情報集合を取得し（ステップ３１１）、語句の抽出を行なう（ステップ３１２）。 Next, when the event is not being held (step 309, No), the event has already ended, and the time / phrase extraction date / time is set by adding the time T3 to the start time of the word / phrase extraction process (step 309). 313). Since this time T3 is after the event has ended, it is not necessary to set a short time, and is, for example, 12 hours. Since there is a possibility that new document information can be acquired even after the event ends, a document information set is acquired from the event-specific document information storage unit 103 even after the event ends (step 311), and words are extracted (step). 312).

Ｎ件全てのイベントＩＤについて処理を実行し（ステップ３０２、Ｙｅｓ）、語句抽出部１０４の処理を終了する。 The process is executed for all N event IDs (step 302, Yes), and the process of the phrase extraction unit 104 is terminated.

以上の実施の形態においては、イベントの注目度の違いによらず、語句の抽出を行なっているが、注目度の高いイベントほど語句の抽出間隔を短くし、より短い間隔で最新情報を提供できるように構成してもよい。 In the above embodiment, the phrase is extracted regardless of the difference in the attention level of the event. However, the higher the degree of attention, the shorter the word extraction interval, and the latest information can be provided at a shorter interval. You may comprise as follows.

また、スポーツや格闘など、毎回開催されるたびに新しい情報が生まれるようなイベントについては、開催期間中の語句抽出間隔を短く設定することで、最新情報が提供可能となる。反対に、展示会等のイベントそのものの内容にあまり変化のないイベントについては、開催期間中であっても語句抽出間隔を１日１回程度と長く設定しても問題ない。このようにイベントのタイプに応じて語句抽出間隔を変化させるように構成してもよい。 In addition, for events such as sports and fights where new information is generated each time it is held, the latest information can be provided by setting a short word extraction interval during the holding period. On the other hand, for events such as exhibitions that do not change much in the content of the event itself, it is not a problem to set the word extraction interval as long as about once a day even during the holding period. In this way, the phrase extraction interval may be changed according to the event type.

さらに、開催期間中であっても、実際にイベントが行なわれている時刻情報に基づき、イベントが行なわれている時刻を含む特定期間だけ、語句抽出間隔を短く設定するように構成してもよい。 Furthermore, even during the holding period, the phrase extraction interval may be set to be short only for a specific period including the time when the event is performed, based on the time information when the event is actually performed. .

次に、ステップ３０８における語句抽出方法についてより詳細に説明する。 Next, the word / phrase extraction method in step 308 will be described in more detail.

ステップ３０８において、語句抽出部１０４には、ステップ３０７で取得した、イベント情報蓄積部２０１から概要文が入力される。まず、概要文に対して形態素解析処理を行い、個々の品詞や句読点などの形態素に分解する。分解した形態素から、名詞、複数の名詞が連続する複合名詞、「小泉政権の支持率」のように連続した語全体で名詞と同じ働きをする名詞句を抽出する。名詞、複合名詞、名詞句などは、話題を表す言葉として動詞や形容詞などに比べ適しているためでる。以下の説明では、名詞、複合名詞、名詞句を総称して語句と呼ぶ。概要文は文の先頭に伝えたい重要な情報が記載される性質を持っているため、抽出した語句に対して先頭に出現するほど高い評価値Ａを与える。また、文字数が多いほど正確な情報を伝えることができるため、文字数が多いほど高い評価値Ｂを与える。評価値ＡとＢから最も評価値の高い語句を抽出する。 In step 308, the word / phrase extraction unit 104 receives the summary sentence from the event information storage unit 201 acquired in step 307. First, a morphological analysis process is performed on the summary sentence, and it is decomposed into morphemes such as individual parts of speech and punctuation marks. From the decomposed morphemes, nouns, compound nouns in which multiple nouns are continuous, and noun phrases that function in the same way as nouns are extracted for all consecutive words, such as “Koizumi administration support rate”. This is because nouns, compound nouns, noun phrases and the like are more suitable as verbal words and adjectives as words representing topics. In the following description, nouns, compound nouns, and noun phrases are collectively referred to as words. Since the summary sentence has the property that important information to be conveyed at the beginning of the sentence is described, a higher evaluation value A is given to the extracted word / phrase as it appears at the beginning. Moreover, since more accurate information can be conveyed as the number of characters increases, a higher evaluation value B is given as the number of characters increases. The word with the highest evaluation value is extracted from the evaluation values A and B.

図６は、「２００４／０７／２０２１：００」に処理を開始した語句抽出部１０４が出力したデータ例である。同図のイベント名「イベントＣ」が、イベント開催前であり、「イベントＣ」に対応付けられて蓄積されている「大空中ナイアガラ」、「花火業者絶賛」が、ステップ３０８で抽出された語句の例である。 FIG. 6 is an example of data output by the phrase extraction unit 104 that started processing at “2004/07/20 21:00”. The event name “event C” in the figure before the event is held, and “the airborne Niagara” and “fireworks praise” stored in association with “event C” are the words and phrases extracted in step 308 It is an example.

次に、上記のステップ３１２における語句の抽出方法についても、より詳細に説明する。ステップ３１２において、語句抽出部１０４にはステップ３１１でイベント別情報蓄積部１０３から取得した文書情報集合が入力される。文書情報集合のそれぞれの文書内容に対して、形態素解析処理を行い、名詞、複合名詞、名詞句（＝語句）を抽出する。全ての文書情報から語句を抽出した後、語句毎に出現する文書数を集計する。ここで、タイトルに含まれる語句は、集計の対象から外す。残った語句の中から、より多くの文書情報に出現していた語句Ｍ件を、話題を表す語句として抽出する。図６においては、イベント名「イベントＡ」が、イベント開催期間中であり、「イベントＡ」に対応付けられて蓄積されている語句「羽毛恐竜」、「先着１０００名」などステップ３１２で抽出された語句の例である。 Next, the phrase extraction method in step 312 will be described in more detail. In step 312, the word / phrase extraction unit 104 receives the document information set acquired from the event-specific information storage unit 103 in step 311. Morphological analysis processing is performed on each document content of the document information set to extract nouns, compound nouns, and noun phrases (= phrases). After extracting words from all document information, the number of documents appearing for each word is totaled. Here, the words included in the title are excluded from aggregation. Of the remaining phrases, M phrases that have appeared in more document information are extracted as phrases representing the topic. In FIG. 6, the event name “event A” is being extracted in step 312 such as the words “feather dinosaur” and “first 1000 people” that are stored in association with “event A” during the event holding period. This is an example of a phrase.

なお、上記の実施の形態におけるイベント情報抽出装置１００のイベント情報抽出部１０１、イベント注目度算出部１０２、語句抽出部１０４の動作をプログラムとして構築し、イベント情報抽出装置として利用されるコンピュータにインストールして実行させる、または、ネットワークを介して流通させることが可能である。 Note that the operations of the event information extraction unit 101, the event attention degree calculation unit 102, and the phrase extraction unit 104 of the event information extraction device 100 in the above embodiment are constructed as programs and installed in a computer used as the event information extraction device. And can be distributed through a network.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、ネットワーク上で、季節のイベントやスポーツイベントなど、特定の日時に開始されるイベントに関する話題を抽出する技術に適用可能である。 The present invention is applicable to a technique for extracting a topic related to an event that starts on a specific date and time, such as a seasonal event or a sports event, on a network.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における情報提供装置の構成図である。It is a block diagram of the information provision apparatus in one embodiment of this invention. 本発明の一実施の形態におけるイベント情報蓄積部の例を示す図である。It is a figure which shows the example of the event information storage part in one embodiment of this invention. 本発明の一実施の形態におけるイベント注目度算出部のデータ例を示す図である。It is a figure which shows the example of data of the event attention degree calculation part in one embodiment of this invention. 本発明の一実施の形態における語句蓄積部の蓄積例を示す図である。It is a figure which shows the example of accumulation | storage of the phrase storage part in one embodiment of this invention. 本発明の一実施の形態における処理タイミングを示す図である。It is a figure which shows the process timing in one embodiment of this invention. 本発明の一実施の形態における終了イベント蓄積部の例を示す図である。It is a figure which shows the example of the completion | finish event storage part in one embodiment of this invention. 本発明の一実施の形態におけるイベント注目度算出部のランキング生成処理の流れを示す図である。It is a figure which shows the flow of the ranking production | generation process of the event attention degree calculation part in one embodiment of this invention. 本発明の一実施の形態におけるイベント情報抽出部の処理の流れを示す図である。It is a figure which shows the flow of a process of the event information extraction part in one embodiment of this invention. 本発明の一実施の形態における語句抽出部の処理の流れを示す図である。It is a figure which shows the flow of a process of the phrase extraction part in one embodiment of this invention.

Explanation of symbols

１００イベント情報抽出供装置
１０１イベント別文書抽出手段、イベント情報抽出部
１０２イベント注目度算出手段、イベント注目度算出部
１０３イベント別文書情報蓄積手段、イベント別文書情報蓄積部
１０４語句抽出手段、語句抽出部
１０５終了イベント蓄積部
２０１イベント情報蓄積手段、イベント情報蓄積部
２０２文書情報蓄積手段、文書情報蓄積部
２０３検索語蓄積部
２０４語句蓄積手段、語句蓄積部
４０１イベント名
４０２開始日
４０３終了日
４０４時刻
４０５場所
４０６概要
５００イベント別集計結果テーブル
５０１イベントＩＤ
５０２イベント名
５０３開始日
５０４終了日
５０５時刻
５０６過去注目度
５０７次回語句抽出日時
５０８集計終了日時
５０９集計値
５１０イベント注目度テーブル
５１１ランク
５１２イベントＩＤ
５１３注目度
１００１イベント名
１００２開始日
１００３終了日
１００４語句
１１０１イベント名
１１０２開始日
１１０３終了日
１１０４過去注目度 100 Event Information Extraction Device 101 Event-Specific Document Extraction Unit, Event Information Extraction Unit 102 Event Attention Level Calculation Unit, Event Attention Level Calculation Unit 103 Event-Specific Document Information Storage Unit, Event-Specific Document Information Storage Unit 104 Phrase Extraction Unit, Phrase Extraction Unit 105 End event storage unit 201 Event information storage unit, event information storage unit 202 Document information storage unit, document information storage unit 203 Search word storage unit 204 Phrase storage unit, phrase storage unit 401 Event name 402 Start date 403 End date 404 Time 405 Location 406 Summary 500 Total result table 501 by event Event ID
502 Event name 503 Start date 504 End date 505 Time 506 Past attention level 507 Next word extraction date and time 508 Total end date and time 509 Total value 510 Event attention level table 511 Rank 512 Event ID
513 Attention level 1001 Event name 1002 Start date 1003 End date 1004 Phrase 1101 Event name 1102 Start date 1103 End date 1104 Past attention level

Claims

An event information extraction device that extracts topics related to events that start at a specific date and time, such as seasons, events, and sports events,
The event name and duration are acquired from the event information storage means that stores the event information set that describes the date and time of the event, and the document including the event name is stored from the document information storage means that stores the document set having time information. An event-specific document extraction unit that searches and extracts and stores the event-specific document information storage unit by event,
Event attention that calculates a document that has time information within the specified aggregation period from the extracted document periodically for each event, and calculates a value obtained by dividing the total of the total values of the document up to the present by the total number of days Degree calculation means;
When an event with a high degree of attention is selected that satisfies a predetermined condition and is before the start of the event, the summary sentence of the event is morphologically analyzed at time interval T1, and the phrase is determined based on the appearance position and the number of characters of the phrase. If it is extracted and after the start of the event, the document stored in the event-specific document information storage means is morphologically analyzed at a time interval T2 smaller than T1, and the number of documents appearing for each word is calculated. A phrase extraction unit that extracts a phrase having a large number of documents that appear and outputs the phrase to the phrase storage unit;
An event information extracting device characterized by comprising:

The event attention degree calculating means includes:
A search request is made within the specified counting period, and the same user inputs the same keyword a plurality of times at short time intervals using information that identifies the user who entered the search term from the search terms that match the event name. by counting and once in the case, aggregate search terms using the number of which related to the event, regularly seeking the total value obtained by adding the aggregated value of the search words and aggregated value from the document The event information extraction device according to claim 1, further comprising means for calculating, as a degree of attention, a value obtained by dividing a total sum obtained up to the present time by the total number of days .

It further has an end event storage means in which past attention levels corresponding to past event names are stored,
The event attention degree calculating means includes:
For the event before the event, the past attention level that is the sum of the total values from the past documents in the past event with the same event name is searched from the end event accumulation means, and the past attention level of the searched past event is determined , Add to the total of the total values of the documents up to the present, and calculate the value obtained by dividing the number of days of past event held by the number of days added as the degree of attention. 2. The event information extracting device according to claim 1, further comprising means for accumulating in the end event accumulating means as a past attention level a value obtained by adding all the total values from the documents in the event held this time .

It further has an end event storage means in which past attention levels corresponding to past event names are stored,
The event attention degree calculating means includes:
For the event before the event, the past event degree is searched from the end event accumulation means, which is the sum of the total value of the total value from the past document and the total value from the search word in the past event having the same event name, The past attention degree of the past events searched is added to the total sum obtained up to the present , and the value obtained by dividing the past event holding days and the total number of days is calculated as the attention degree. The event information extraction device according to claim 2, further comprising means for accumulating a value obtained by adding all of the total values of the currently held event as a past attention degree in the end event accumulating means for the event after the completion after a preset period. .

An event information extraction program for causing a computer to function as each means of the event information extraction device according to claim 1.