JP4396444B2

JP4396444B2 - Phrase extraction device and program

Info

Publication number: JP4396444B2
Application number: JP2004238605A
Authority: JP
Inventors: 晴美川島; 裕一郎関口; 雅且大久保
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2004-08-18
Filing date: 2004-08-18
Publication date: 2010-01-13
Anticipated expiration: 2024-08-18
Also published as: JP2006059024A

Description

本発明は、語句抽出装置及びプログラムに係り、特に、テレビ番組やラジオ番組等の特定の時刻に開始される事象に関する話題を提供する際に、番組について記載された文書情報をインターネット等のネットワークに接続された１つ以上の情報提供サーバから取得し、番組の注目度に応じて話題となる語句を抽出し、提供するための語句抽出装置及びプログラムに関する。 The present invention relates to a phrase extractor及 beauty programs, in particular, in providing topic of events to start at a time, such as television programs and radio programs, a network such as the Internet document information described for the program obtained from one or more information providing servers connected to, extracts phrases become the subject in response to the attention of the program, regarding the phrase extractor及 beauty program for providing.

近年、インターネットなどのコンピュータネットワークの発達に伴い、大量の電子化された文書情報が次々と蓄積され続けている。特に、掲示板やblog（ブログ）サービスを利用して個人が自分の興味のある事柄に対して感想や意見を発信することが容易に行なえるようになってきた。従って、ニュースや掲示板、blogなど、次々と発信される情報を数多く収集して解析すれば、最新の話題になっているニュースや出来事を把握することが可能となる。 In recent years, with the development of computer networks such as the Internet, a large amount of electronic document information has been accumulated one after another. In particular, it has become possible for individuals to easily send their opinions and opinions on matters of interest using bulletin boards and blog services. Therefore, by collecting and analyzing a lot of information that is sent one after another, such as news, bulletin boards, and blogs, it is possible to grasp the latest news and events.

従来、複数の情報提供サーバから発信された情報を、話題毎のカテゴリで分類し、その話題の時間遷移を提示・検索する情報潮流検索方法が提案されている。この情報潮流検索方法では、ある期間において発信された文書集合から類似した文書同士を集め、カテゴリに割り当てるという処理を行なっている（例えば、特許文献１参照）。そのため、類似する文書が複数発信されてからしかカテゴリが割り当てられないため、話題となる情報が初めて発信されてから時間が経過した後でしか、話題を提示することができない。 Conventionally, there has been proposed an information flow search method in which information transmitted from a plurality of information providing servers is classified into categories for each topic, and the time transition of the topic is presented and searched. In this information flow search method, processing is performed in which similar documents are collected from a document set transmitted in a certain period and assigned to a category (for example, see Patent Document 1). For this reason, since a category is assigned only after a plurality of similar documents are transmitted, the topic can be presented only after a lapse of time since the information on the topic is transmitted for the first time.

また、個人の興味の中には、番組等の特定の日時に開始される事象も含まれており、大抵は、事象を体験した後にその感想や意見が記述され、発信される。そのため、番組が終了した後でしか話題を提示することができない。 In addition, the personal interest includes an event that starts at a specific date and time such as a program. Usually, after experiencing the event, the impression and opinion are described and transmitted. Therefore, the topic can be presented only after the program ends.

現在では、番組を視聴しながら掲示板等に書き込みを行なう場合もあるが、わずかな文書情報しかない時点で話題を提示することができない。
特開２０００−２４２６５２号公報 Currently, there is a case where a user writes on a bulletin board or the like while watching a program, but a topic cannot be presented when there is only a small amount of document information.
JP 2000-242652 A

前述のように、従来の技術では、すでに番組が終了した後でしか話題を提供できず、これから話題になる可能性がある最新の番組情報をいち早く提供することができない。 As described above, in the conventional technique, a topic can be provided only after the program has already ended, and the latest program information that may become a topic cannot be provided quickly.

本発明は、上記の点に鑑みなされたもので、特定の時刻に開始される事象に対して、事象が開始される以前に、利用者が興味をひく語句を提供することが可能な語句抽出装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and it is possible to extract a phrase that allows a user to provide an interesting phrase for an event that starts at a specific time before the event is started. and to provide a device及 beauty program.

図１は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明は、テレビ番組やラジオ番組等の特定の時刻に開始される事象に関する話題を、ネットワークで公開されている文書情報や、番組のスケジュールが記録された番組表から抽出して提供するために用いられる語句抽出装置における、語句抽出方法において、
時刻情報を持つ文書を格納する文書情報蓄積手段から読み出された時刻情報を持つ文書の集合から、番組タイトルを含む文書を検索して抽出し、番組別に番組別文書情報蓄積手段に格納する番組別文書抽出ステップ（ステップ１）と、
抽出した文書から、番組の開始日時から番組の放送周期に応じた所定の注目度算出期間内の時刻情報を持つ文書を番組毎に集計し、集計した文書数を番組の注目度として番組注目度テーブルに格納する番組別注目度算出ステップ（ステップ２）と、
前記番組注目度テーブルを参照し、前回の放送で注目度の高い番組を選択し、該番組の次回の放送に際し、該番組の放送前で放送開始時刻までの予め設定した時間内である場合には、番組表が格納された番組表蓄積手段から該番組のサブタイトルあるいは概要を取得し、形態素解析を行い、語句の出現位置や文字数に基づいて語句を抽出し、該番組の放送後で放送開始時刻から予め設定した時間内である場合には、前記番組別文書情報蓄積手段に格納されている文書を形態素解析し、語句毎に出現する文書数を求め、出現する文書数が多い語句を抽出する語句抽出ステップ（ステップ３）と、を行なう。 This onset Ming, a topic related to events that are start at a specific time, such as television programs and radio programs, and document information that has been published in the network, the schedule of the program is to provide extracts from the program table that has been recorded In the phrase extraction method in the phrase extraction device used for
A program that searches and extracts a document including a program title from a set of documents having time information read from a document information storage unit that stores documents having time information, and stores the program in a document information storage unit for each program. A separate document extraction step (step 1);
From the extracted documents , documents having time information within a predetermined attention level calculation period corresponding to the broadcast period of the program from the program start date and time are totaled for each program, and the program attention level is set as the program attention level. Attention level calculation step (step 2) for each program stored in the table,
When the program attention level table is referenced, a program with a high degree of attention is selected in the previous broadcast, and the next broadcast of the program is within a preset time before the broadcast of the program until the broadcast start time Acquires the subtitle or outline of the program from the program guide storage means storing the program guide, performs morphological analysis, extracts words and phrases based on the appearance position and number of characters of the phrases, and starts broadcasting after the program broadcasts If it is within the preset time from the time, the document stored in the program-specific document information storage means is subjected to morphological analysis, the number of documents appearing for each phrase is obtained, and the phrases having a large number of appearing documents are extracted. And a word extraction step (step 3).

本発明は、番組別注目度算出ステップ（ステップ２）において、注目度算出期間に検索要求された検索語から、検索語を入力した利用者を特定する情報を用いて、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合には１回とカウントすることにより、番組に関連する検索語の利用人数を集計し、文書数の集計値との加算値を番組の注目度として番組注目度テーブルに格納する。 This onset Ming, in the program-specific attention degree calculation step (step 2), the search request search term to the attention degree calculation period, using the information specifying the user who has entered the search term, the same user is short by counting and once when you enter multiple times the same keyword in the time interval, aggregate search terms using the number of which related to the turn of sets, the sum of the aggregate value of the document number as the target of the program Store in the program attention level table .

本発明は、番組別注目度算出ステップ（ステップ２）において、新しい番組について高い注目度を設定する。 This onset Ming, in the program-specific attention degree calculating step (step 2), to set a high degree of attention for the new program.

本発明は、語句抽出ステップ（ステップ３）において、
番組放送周期に応じて放送開始までの最大期間を予め設定しておき、
これから放送される番組の開始日時と現在日時から放送開始までの期間を求め、該放送開始までの期間が最大期間を超えない番組に対して、語句の抽出を行なう。 This onset Ming, in the phrase extraction step (Step 3),
The maximum period until the start of broadcasting is set in advance according to the program broadcasting cycle,
The start date and time of the program to be broadcast and the period from the current date and time to the start of broadcast are obtained, and words and phrases are extracted for the program whose period until the start of the broadcast does not exceed the maximum period.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明（請求項１）は、テレビ番組やラジオ番組等の特定の時刻に開始される事象に関する話題を、ネットワークで公開されている文書情報や、番組のスケジュールが記録された番組表から抽出して提供するために用いられる語句抽出装置であって、
時刻情報を持つ文書を格納する文書情報蓄積手段２０２から読み出された時刻情報を持つ文書の集合から、番組タイトルを含む文書を検索して抽出し、番組別に番組別文書情報蓄積手段１０３に格納する番組別文書抽出手段１０１と、
抽出した文書から、番組の開始日時から番組の放送周期に応じた所定の注目度算出期間内の時刻情報を持つ文書を番組毎に集計し、集計した文書数を番組の注目度として番組注目度テーブルに格納する番組別注目度算出手段１０２と、
番組注目度テーブルを参照し、前回の放送で注目度の高い番組を選択し、該番組の次回の放送に際し、該番組の放送前で放送開始時刻までの予め設定した時間内である場合には、番組表が格納された番組表蓄積手段２０３から該番組のサブタイトルあるいは概要を取得し、形態素解析を行い、語句の出現位置や文字数に基づいて語句を抽出し、該番組の放送後で放送開始時刻から予め設定した時間内である場合には、番組別文書情報蓄積手段１０３に格納されている文書を形態素解析し、語句毎に出現する文書数を求め、出現する文書数が多い語句を抽出する語句抽出手段１０４と、を有する。 The present invention (Claim 1 ) extracts a topic related to an event that starts at a specific time such as a television program or a radio program from document information published on the network or a program table on which a program schedule is recorded. A phrase extraction device used to provide
A document including a program title is searched and extracted from a set of documents having time information read from the document information storage unit 202 that stores documents having time information, and stored in the program-specific document information storage unit 103 for each program. A program-specific document extracting means 101,
From the extracted documents , documents having time information within a predetermined attention level calculation period corresponding to the broadcast period of the program from the program start date and time are totaled for each program, and the program attention level is set as the program attention level. Attention level calculation means 102 for each program stored in the table;
When the program attention level table is referenced, a program with a high degree of attention is selected in the previous broadcast, and the next broadcast of the program is within a preset time before the broadcast of the program until the broadcast start time The subtitle or outline of the program is acquired from the program guide storage means 203 in which the program guide is stored, morphological analysis is performed, the phrase is extracted based on the appearance position and the number of characters of the phrase, and the broadcast starts after the program is broadcast If it is within a preset time from the time, the morphological analysis is performed on the document stored in the program-specific document information storage unit 103 to obtain the number of documents that appear for each word, and the words with a large number of appearing documents are extracted. And phrase extracting means 104.

本発明（請求項２）は、番組別注目度算出手段１０２において、
注目度算出期間に検索要求された検索語から、検索語を入力した利用者を特定する情報を用いて、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合には１回とカウントすることにより、番組に関連する検索語の利用人数を集計し、文書数の集計値との加算値を番組の注目度として番組注目度テーブルに格納する手段を含む。 According to the present invention (claim 2 ), the program-specific attention level calculating means 102
When the same user inputs the same keyword a plurality of times at short time intervals using the information for specifying the user who has input the search word from the search words requested during the attention degree calculation period , it is counted once. by aggregates search terms occupancy of associated turn sets includes means for storing the program of interest degree table the sum of the aggregate number of documents as a target of the program.

本発明（請求項３）は、番組別注目度算出手段１０２において、新しい番組について高い注目度を設定する。 In the present invention (claim 3 ), the attention level calculation means 102 for each program sets a high attention level for a new program.

本発明（請求項４）は、語句抽出手段１０４において、
番組放送周期に応じて放送開始までの最大期間を予め設定しておき、
これから放送される番組の開始日時と現在日時から放送開始までの期間を求め、該放送開始までの期間が最大期間を超えない番組に対して、語句の抽出を行なう手段を含む。 According to the present invention (claim 4 ), in the phrase extracting means 104,
The maximum period until the start of broadcasting is set in advance according to the program broadcasting cycle,
Means for obtaining a start date and time of a program to be broadcast and a period from the current date and time to the start of the broadcast, and extracting a phrase for a program whose period until the start of the broadcast does not exceed the maximum period.

本発明（請求項５）は、請求項１乃至４のいずれか１項に記載の語句抽出装置を構成する各手段としてコンピュータを機能させるための語句抽出プログラムである。 The present invention (Claim 5 ) is a phrase extraction program for causing a computer to function as each means constituting the phrase extraction apparatus according to any one of Claims 1 to 4 .

上記のように、本発明によれば、テレビ番組やラジオ番組等の特定の時刻に開始される番組の注目度を算出し、これから開始される番組については、内容を紹介した番組表の中から語句を抽出することで、番組が開始される前に話題になる可能性のある情報を提供することができる。 As described above, according to the present invention, the degree of attention of a program that starts at a specific time such as a TV program or a radio program is calculated, and the program that is to be started is calculated from the program table that introduces the contents. By extracting the phrase, it is possible to provide information that may become a topic before the program is started.

また、番組が開始されてからは、番組を見ている人々が発信する文書情報から、話題になっている語句を抽出し提供することができる。 In addition, after the program is started, it is possible to extract and provide a topical phrase from document information transmitted by people watching the program.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図３は、本発明の一実施の形態における語句抽出装置の構成図である。 FIG. 3 is a block diagram of the phrase extracting device in one embodiment of the present invention.

同図に示す語句抽出装置１００には、外部記憶装置として、検索語蓄積部２０１、文書情報蓄積部２０２、番組情報蓄積部２０３、語句蓄積部２０４が接続され、検索語蓄積部２０１、文書情報蓄積部２０２、番組情報蓄積部２０３からの情報を入力とし、語句蓄積部２０４に抽出した語句を出力するものである。 The word extraction device 100 shown in the figure is connected to a search word storage unit 201, a document information storage unit 202, a program information storage unit 203, and a word storage unit 204 as external storage devices. The information from the storage unit 202 and the program information storage unit 203 is input, and the extracted phrase is output to the phrase storage unit 204.

語句抽出装置１００は、番組情報抽出部１０１、番組注目度算出・蓄積部１０２、番組別文書情報蓄積部１０３、語句抽出部１０４から構成され、番組情報抽出部１０１には、検索語蓄積部２０１、文書情報蓄積部２０２が接続され、番組注目度算出・蓄積部１０２には、番組情報蓄積部２０３が接続される。 The phrase extraction device 100 includes a program information extraction unit 101, a program attention calculation / storage unit 102, a program-specific document information storage unit 103, and a phrase extraction unit 104. The program information extraction unit 101 includes a search word storage unit 201. The document information storage unit 202 is connected, and the program attention level calculation / storage unit 102 is connected to the program information storage unit 203.

検索語蓄積部２０１には、インターネット上で検索サービスを提供する検索サイトにおいて、利用者により入力された検索語が収集され、検索要求された時刻と共に蓄積されている。検索語は、利用者が関心を持っている情報を検索する手掛かりとして入力する語であるから、定期的に検索語を収集し、解析することで、利用者の関心が高かった情報を知ることができる。 The search word storage unit 201 collects search words input by a user at a search site that provides a search service on the Internet and stores them together with the time when the search is requested. Search terms are words that are input as a clue to search for information that the user is interested in, so by collecting and analyzing the search terms regularly, you can know the information that the user is highly interested in Can do.

文書情報蓄積部２０２には、ホームページや、blog、掲示板などインターネット上で公開されている文書情報から、新しく生成された文書情報や更新された文書情報を収集し、作成日時や更新日時などの時刻情報と共に蓄積されている。これらの文書情報からは、番組を視聴した人々の番組に対する感想等を収集することができる。また、テレビ局が番組のホームページを提供している場合には、番組の最新内容を収集することができる。 The document information storage unit 202 collects newly generated document information or updated document information from document information published on the Internet such as a home page, blog, bulletin board, etc. Accumulated with information. From these document information, it is possible to collect impressions and the like of programs viewed by people who have watched the programs. In addition, when a TV station provides a program homepage, the latest contents of the program can be collected.

番組情報蓄積部２０３には、テレビ番組表の情報が蓄積されている。テレビ番組表は、放送日、曜日、ジャンル、番組開始時間、番組終了時間、番組タイトル、サブタイトル、概要文、などの番組に関する情報から構成されており、同じ番組タイトルでも放送日毎に別々に蓄積されている。番組情報蓄積部２０３には、１週間程度先までのテレビ番組表の情報が蓄積されているものとする。 The program information storage unit 203 stores information on the TV program guide. The TV program guide is composed of information related to programs such as broadcast day, day of the week, genre, program start time, program end time, program title, subtitle, and summary text, and the same program title is stored separately for each broadcast date. ing. It is assumed that the program information storage unit 203 stores information on the TV program guide for about one week ahead.

語句抽出装置１００の番組注目度算出・蓄積部１０２には、定期的に番組情報蓄積部２０３から番組情報を取得し、番組タイトル別に管理する。図４に番組注目度算出・蓄積部１０２で蓄積される番組別集計結果テーブル５００と番組注目度テーブル５１０の例を示す。図４（ａ）は、番組別集計結果テーブル５００の構成例であり、番組情報蓄積部２０３から最新の番組情報を取得し、番組タイトル別に番組情報を管理している。番組別集計結果テーブル５００は、番組ＩＤ５０１、番組タイトル５０２、周期５０３、開始日時５０４、終了日時５０５、注目度算出日時５０６、集計データ５０７から構成される。 The program attention degree calculation / accumulation unit 102 of the phrase extracting device 100 periodically acquires program information from the program information accumulation unit 203 and manages it by program title. FIG. 4 shows an example of the program totalization result table 500 and the program attention level table 510 stored in the program attention level calculation / accumulation unit 102. FIG. 4A is a configuration example of the program-by-program total result table 500. The latest program information is acquired from the program information storage unit 203, and the program information is managed by program title. The program-by-program total result table 500 includes a program ID 501, a program title 502, a cycle 503, a start date and time 504, an end date and time 505, an attention degree calculation date and time 506, and total data 507.

番組ＩＤ５０１は、番組を識別するための一意な番号であり、番組タイトル５０２は、テレビ番組の番組名である。番組タイトル５０２が同じ番組については、同じ番組ＩＤで管理する。周期５０３は、テレビ番組が定期的な番組（毎週、毎日）の場合に、何日周期かを記録している（例えば、毎週放送される番組は７日周期であり、「７」が記録されている）。開始日時５０４は、番組の放送開始日時、終了日時５０５は、番組の放送終了日時、注目度算出日時５０６は、開始日時５０４へ周期５０３に応じた期間を加算した値で、番組の注目度を算出する日時を示している。集計データ５０７は、番組の注目度を算出する際に使用するデータであり、一定期間毎のデータを特定の期間分蓄積することができる。また、集計データ５０７の値から図４（ｂ）に示すような番組注目度のランキングを生成し番組注目度テーブル５１０に蓄積する。番組注目度テーブル５１０は、ランク５１１、番組ＩＤ５１２、注目度５１３を格納する。 The program ID 501 is a unique number for identifying the program, and the program title 502 is the program name of the television program. Programs with the same program title 502 are managed with the same program ID. In the period 503, when the TV program is a regular program (every week, every day), the number of days is recorded (for example, a program broadcast every week is a seven day period, and “7” is recorded). ing). The start date and time 504 is the broadcast start date and time of the program, the end date and time 505 is the broadcast end date and time of the program, and the attention level calculation date and time 506 is a value obtained by adding a period corresponding to the cycle 503 to the start date and time 504. Indicates the date and time to be calculated. The total data 507 is data used when calculating the degree of attention of a program, and data for a certain period can be accumulated for a specific period. Also, a ranking of program attention as shown in FIG. 4B is generated from the value of the total data 507 and accumulated in the program attention degree table 510. The program attention level table 510 stores rank 511, program ID 512, and attention level 513.

番組が定期的に放送される場合は、注目度が算出された後に次の放送時間に更新される。例えば、図４（ａ）において、番組ＩＤ“００００１０１”の番組タイトル『情報Ａ』という番組は、毎日１８：００に放送開始し、１８：２７に放送終了する番組であるとすると、注目度算出日時「２００４年１月２１日１２：００：００」を過ぎると、次の放送時間（開始時刻２００４年１月２１日１８：００、終了時刻２００４年１月２１日１８：２７）に更新される。 When a program is broadcast regularly, it is updated to the next broadcast time after the attention level is calculated. For example, in FIG. 4A, if a program with the program title “Information A” with a program ID “0000101” is a program that starts broadcasting at 18:00 and ends at 18:27 every day, the degree of attention is calculated. When the date and time “January 21, 2004, 12:00: 00” has passed, it is updated to the next broadcast time (start time, January 21, 2004, 18:00, end time, January 21, 2004, 18:27). The

番組情報抽出部１０１は、番組注目度算出・蓄積部１０２における処理とは独立に、テレビ番組に関する情報をどれくらいの間隔で利用者に提供したいかにより、特定の期間を定め、定めた特定の期間おきに処理を行う。 The program information extraction unit 101 determines a specific period depending on how often the information related to the television program is to be provided to the user, independently of the processing in the program attention level calculation / accumulation unit 102, and the predetermined specific period Process every other.

まず、番組情報抽出部１０１は、番組注目度算出・蓄積部１０２から番組タイトルを取得し、検索語蓄積部２０１から取得した検索語から番組タイトル毎に番組タイトルに一致する検索語を集計する。また、文書情報蓄積部２０２から取得した文書情報から、番組タイトルを含む文書情報を番組のタイトル毎に集計する。そして、検索語から集計した番組タイトル集計値と文書情報から集計した番組タイトルの集計値とを加算し、番組注目度算出・蓄積部１０２の番組別集計結果テーブル５００に記録する。また、番組に関連した文書情報を番組毎に整理して番組別文書情報蓄積部１０３に記録する。 First, the program information extraction unit 101 acquires a program title from the program attention level calculation / accumulation unit 102, and totals the search terms that match the program title for each program title from the search terms acquired from the search term storage unit 201. Further, the document information including the program title is totaled for each program title from the document information acquired from the document information storage unit 202. Then, the program title total value calculated from the search term and the program title total value calculated from the document information are added and recorded in the program total calculation result table 500 of the program attention degree calculation / accumulation unit 102. Also, document information related to the program is organized for each program and recorded in the program-specific document information storage unit 103.

語句抽出部１０４は、番組情報抽出部１０１の処理終了後、番組注目度算出・蓄積部１０２から注目度の高い番組タイトルを取得し、注目度の高い番組の開始時刻が予め設定した時間内に開始される場合、番組情報蓄積部２０３から番組のサブタイトルや概要文を取得し、話題となる語句を抽出して語句蓄積部２０４に出力する。また、注目度の高い番組の放送開始後は、番組別文書情報蓄積部１０３から文書情報を取得し、話題を表す語句を抽出し、語句蓄積部２０４に出力する。 After the processing of the program information extraction unit 101 ends, the word / phrase extraction unit 104 acquires a program title with a high degree of attention from the program attention level calculation / accumulation unit 102, and the start time of the program with a high degree of attention falls within a preset time. When the program is started, the program subtitle and the summary sentence are acquired from the program information storage unit 203, and a topical phrase is extracted and output to the phrase storage unit 204. In addition, after the broadcast of a program with a high degree of attention is started, document information is acquired from the program-specific document information storage unit 103, a phrase representing a topic is extracted, and is output to the phrase storage unit 204.

図５に語句蓄積部２０４に出力される語句の蓄積例を示す。語句抽出部１０４で抽出された語句９０３は、番組タイトル９０１、番組の開始日時９０２に対応付けられて語句蓄積部２０４に蓄積される。 FIG. 5 shows an example of storage of words output to the word storage unit 204. The phrase 903 extracted by the phrase extraction unit 104 is stored in the phrase storage unit 204 in association with the program title 901 and the program start date 902.

このように、定期的に番組情報抽出部１０１と語句抽出部１０４の処理を実行することにより、注目度の高い番組と、その番組の話題を表す語句が、次々と語句蓄積部２０４に出力される。語句蓄積部２０４には、番組開始日時９０２が記録されているので、今放送されている注目番組でどのような語句が話題になっているか、あるいは、これからどのような注目番組が放送されるのか、といった情報提供が可能になる。 As described above, by periodically executing the processing of the program information extraction unit 101 and the phrase extraction unit 104, a program with a high degree of attention and a phrase representing the topic of the program are sequentially output to the phrase storage unit 204. The Since the program start date and time 902 is recorded in the word / phrase accumulating unit 204, what kind of word / phrase is being talked about in the current program being broadcast, or what program will be broadcast in the future It is possible to provide information such as.

次に、番組情報抽出部１０１の処理の流れを図６を用いて説明する。 Next, the processing flow of the program information extraction unit 101 will be described with reference to FIG.

図６は、本発明の一実施の形態における番組情報抽出部の処理のフローチャートである。 FIG. 6 is a flowchart of the process of the program information extraction unit in one embodiment of the present invention.

番組情報抽出部１０１の処理は、定期的（例えば１５分おき）に実行される。まず、番組注目度算出・蓄積部１０２から、番組タイトルを取得する（ステップ１０１）。ここで指定する時刻範囲は１５分おきに処理が実行される場合であれば、現在時刻より前の１５分間とし、次回指定する時刻範囲は、今回指定した時刻範囲に連続した１５分間とする。 The processing of the program information extraction unit 101 is executed periodically (for example, every 15 minutes). First, a program title is acquired from the program attention level calculation / accumulation unit 102 (step 101). If the processing is executed every 15 minutes, the time range specified here is 15 minutes before the current time, and the time range specified next time is 15 minutes continuous to the time range specified this time.

次に、検索語蓄積部２０１から時刻範囲を指定して検索語を取得し（ステップ１０２）、番組タイトルと一致する検索語を、番組タイトル毎に集計する（ステップ１０３）。検索語は、同一利用者が複数回入力することが考えられる。そのため、検索語を入力した利用者を特定する情報（例えば、ＷｅｂブラウザのCookie情報）により、同一利用者が短い時間間隔で複数回同じキーワードを入力した場合は１回とカウントすることにより、検索語毎の入力人数を集計する（ステップ１０４）。 Next, a search word is acquired by designating a time range from the search word storage unit 201 (step 102), and search words that match the program title are tabulated for each program title (step 103). It is conceivable that the same user inputs the search term multiple times. Therefore, if the same user inputs the same keyword multiple times at short time intervals based on information that identifies the user who entered the search word (for example, cookie information of the Web browser), the search is counted as one The number of input persons for each word is totaled (step 104).

次に、文書情報蓄積部２０２からも時刻範囲を指定して文書情報を取得し、番組に関連する文書情報を選び出し、番組タイトル毎に文書数を集計する。ここで指定する時刻範囲は、ステップ１０２で指定した時刻範囲と同じ時刻範囲である。 Next, the document information is acquired from the document information storage unit 202 by specifying the time range, the document information related to the program is selected, and the number of documents is totaled for each program title. The time range specified here is the same time range as the time range specified in step 102.

番組に関連した文書情報は番組毎に整理して番組別文書情報蓄積部１０３に記録する（ステップ１０５）。番組に関連する文書を選択する方法として、番組タイトルが含まれる文書情報を検索する方法がある。この方法では、番組タイトルが一般的に別の文脈で利用されない言葉の場合には（例えば、『ポケットモンスター』）、高速に処理できる点で有効である。しかし、番組タイトルは省略される場合が多いので（例えば、『ポケットモンスター』は「ポケモン」と略されることが多い）、番組タイトルの省略形や別名なども番組注目度算出・蓄積部１０２に蓄積しておくことにより、より多くの文書情報を選別することができる。 Document information related to the program is organized for each program and recorded in the program-specific document information storage unit 103 (step 105). As a method of selecting a document related to a program, there is a method of searching for document information including a program title. This method is effective in that the program title can be processed at high speed when the program title is not generally used in another context (for example, “Pocket Monster”). However, since the program title is often omitted (for example, “Pocket Monster” is often abbreviated as “Pokemon”), the program title abbreviation and alias are also stored in the program attention calculation / accumulation unit 102. As a result, more document information can be selected.

また、番組タイトルが番組以外の文脈で利用される場合には、番組タイトルを含むという条件だけで文書情報を選別してしまうと、番組に関係しない文書情報が混入してしまうという問題がある。番組に関連する文書情報は、番組タイトルに周辺に「見る」、「放送する」、「言う」、「やる」などといった単語が存在する場合が多く、これらの単語が存在する場合に、番組に関連している文書情報として選別することができる。 Further, when the program title is used in a context other than the program, there is a problem that document information not related to the program is mixed if the document information is selected only under the condition that the program title is included. Document information related to a program often includes words such as “view”, “broadcast”, “say”, “do”, etc. in the vicinity of the program title. It can be selected as related document information.

また、蓄積の方法としては、番組ＩＤ毎に文書情報を格納する領域を設定し、その領域内に時刻情報との対応がとれるように文書情報に時刻を含むファイル名を付けるなどして蓄積する。 As an accumulation method, an area for storing document information is set for each program ID, and the document information is accumulated by adding a file name including the time so that the correspondence with the time information can be taken in the area. .

そして、番組タイトル毎に検索語の集計値と文書情報から集計した文書数集計値とを加算し、番組注目度算出・蓄積部１０２の番組別集計結果テーブル５００の集計データ５０７に記録する（ステップ１０６）。検索語の集計値は、番組タイトルで検索要求をした利用者の数であり、番組に関心をもった人の数といえる。また、文書数の集計値についても番組に対する感想や意見を文書にしている点から番組に関心を持った人の数を表している。つまり、集計データ５０７には、番組に関心をもった人の数が期間毎に蓄積されることになる。 Then, for each program title, the total value of the search terms and the total number of documents counted from the document information are added and recorded in the total data 507 of the program total calculation result table 500 of the program attention degree calculation / accumulation unit 102 (step) 106). The total value of the search word is the number of users who requested the search by the program title, and can be said to be the number of people who are interested in the program. Also, the total number of documents represents the number of people who are interested in the program from the point that the comments and opinions about the program are documented. That is, the total data 507 stores the number of people interested in the program for each period.

次に、番組注目度算出・蓄積部１０２における処理について、番組別集計結果テーブル５００の管理処理と、番組注目度算出・処理に分けて説明する。まず、番組別集計結果テーブル５００の管理処理について図７を用いて説明する。 Next, the processing in the program attention level calculation / accumulation unit 102 will be described separately for management processing of the program total result table 500 and program attention level calculation / processing. First, the management process of the program-by-program total result table 500 will be described with reference to FIG.

図７は、本発明の一実施の形態における番組別集計結果テーブルの管理処理のフローチャートである。 FIG. 7 is a flowchart of the management processing of the program-by-program total result table according to the embodiment of the present invention.

番組注目度算出・蓄積部１０２は、定期的に番組情報蓄積部２０３から指定する期間に開始される番組情報集合を取得する（ステップ２０１）。番組情報蓄積部２０３の情報登録・更新が１日に１度であれば、番組情報蓄積部２０３の情報登録・更新処理の後、１日に１度実行すればよい。また、指定する期間は現在時刻から１２０時間（５日）以内といったように、これから開始される番組情報を指定する。取得した番組情報集合から開始時刻が古い番組順に１つずつ取り出し、以下の処理を行う。 The program attention degree calculation / accumulation unit 102 periodically acquires a program information set starting from the program information accumulation unit 203 during a designated period (step 201). If the information registration / update of the program information storage unit 203 is performed once a day, it may be executed once a day after the information registration / update process of the program information storage unit 203. Further, the program information to be started is designated such that the designated period is within 120 hours (5 days) from the current time. One program is extracted from the acquired program information set in order of oldest start time, and the following processing is performed.

始めに、番組注目度算出・蓄積部１０２は、番組情報集合から番組タイトルを取り出し、番組別集計結果テーブル５００の番組タイトル５０２に存在するかを調べる(ステップ２０２)。存在しない場合は、新しい番組の情報であり、番組別蓄積結果テーブル５００に追加する(ステップ２０３)。この際、新しい番組の周期情報は不明であるため「０」を記録する。番組タイトル５０２に既に番組タイトルが存在する場合は、番組別集計テーブル５００の周期５０３が「０」であるか調べ「０」である場合は(ステップ２０４、Ｙｅｓ)、番組別集計テーブル５００に蓄積されている開始日時５０４と、番組情報集合から番組の開始日時を取り出し、周期を算出して周期５０３を更新する（ステップ２０５）。 First, the program attention level calculation / accumulation unit 102 takes out a program title from the program information set and checks whether it exists in the program title 502 of the program total result table 500 (step 202). If it does not exist, it is information on a new program and is added to the program-specific storage result table 500 (step 203). At this time, since the period information of the new program is unknown, “0” is recorded. When a program title already exists in the program title 502, it is checked whether the cycle 503 of the program total table 500 is “0” (“Yes”) (step 204, Yes), and stored in the program total table 500. The program start date and time 504 and the program start date and time are extracted from the program information set, and the cycle is calculated to update the cycle 503 (step 205).

続いて、注目度算出済みかどうかを調べ(ステップ２０６)、注目度算出済みの場合(ステップ２０６、Ｙｅｓ)のみ番組別集計テーブル５００の開始日時５０４、終了日時５０５、注目度算出日時５０６を更新する(ステップ２０７)。注目度算出日時５０６は、開始日時５０４へ周期５０３に応じた注目度算出期間を加算して記録する。注目度算出期間は、番組が放送されてから人々の間で話題に上り、その後話題が一段落するまでの期間とし、過去の集計データに対して推移を調べることで予め求めておく。例えば、周期が１日の番組では１８時間、周期が７日の番組では７２時間といった値を予め設定しておく。また、新しい番組や１回しか放送されない番組の注目度算出期間も予め設定しておく。このように、番組の開始日時と周期に応じて注目度算出期間を設定することにより、１回の放送毎に注目度を算出することが可能となる。 Subsequently, it is checked whether or not the attention level has been calculated (step 206). When the attention level has already been calculated (step 206, Yes), the start date and time 504, the end date and time 505, and the attention level calculation date and time 506 are updated. (Step 207). The attention level calculation date and time 506 is recorded by adding the attention level calculation period corresponding to the cycle 503 to the start date and time 504. The attention level calculation period is a period from when a program is broadcast until it is talked about among the people and then until the topic is settled, and is obtained in advance by examining the transition of past data. For example, a value such as 18 hours for a program with a cycle of 1 day and 72 hours for a program with a cycle of 7 days is set in advance. Also, the attention level calculation period of a new program or a program that is broadcast only once is set in advance. Thus, by setting the attention level calculation period according to the start date and time and the period of the program, it is possible to calculate the attention level for each broadcast.

番組ＩＤ“００００１０１”の番組タイトル「情報Ａ」という番組は、開始日時「２００４年１月２０日１８：００」、周期が「１」日であり、周期が１日の番組の注目度算出期間＝１８時間とすると、注目度算出日時は、「２００４年１月２１日１２：００：００」となる。 The program with the program ID “0000101” with the program title “information A” has a start date “January 20, 2004, 18:00”, a period of “1” day, and a period of interest calculation period of the program with a period of one day = 18 hours, the attention degree calculation date and time is “January 21, 2004, 12:00:00”.

また、例えば、ステップ２０１において毎日放送される番組が複数取得された場合には、取得された番組の中で一番古い開始時刻を持つ番組が、前回の放送に対する注目度算出が終了した後でステップ２０７において記録され、次に古い開始時刻をもつ番組は、ステップ２０６において注目度算出が終了するまで記録されない。 For example, when a plurality of programs broadcast every day are acquired in step 201, a program having the oldest start time among the acquired programs is calculated after the attention level calculation for the previous broadcast is completed. The program recorded at step 207 and having the next oldest start time is not recorded until the interest level calculation is completed at step 206.

また、注目度算出済みかどうかを調べる方法としては、注目度算出処理が実行された際に、注目度算出日時５０６を空にすることで実現している。別の方法としては番組別集計結果テーブル５００に注目度算出済みかどうかを表す項目を追加する方法がある。 Further, as a method for checking whether or not the attention level has been calculated, the attention level calculation date and time 506 is emptied when the attention level calculation process is executed. As another method, there is a method of adding an item indicating whether or not the attention level has been calculated to the program-by-program total result table 500.

番組情報蓄積部２０３から取得した番組情報集合全てに対して、処理が終了するまでステップ２０２〜ステップ２０７の処理を繰り返し、全ての番組が処理済みになると(ステップ２０８、Ｙｅｓ)一連の処理が終了する。 For all the program information sets acquired from the program information storage unit 203, the processing from step 202 to step 207 is repeated until the processing is completed, and when all the programs have been processed (step 208, Yes), the series of processing is completed. To do.

次に、図８を用いて番組注目度算出・処理の流れを説明する。 Next, the flow of program attention calculation / processing will be described with reference to FIG.

図８は、本発明の一実施の形態における番組注目度算出・処理のフローチャートである。 FIG. 8 is a flowchart of program attention level calculation / processing in an embodiment of the present invention.

番組注目度算出・蓄積部１０２による、番組注目度算出・処理は、番組タイトル毎の１回の放送に対する注目度を算出する目的で実行され、算出された注目度は、次回同じ番組タイトルの番組が放送される際に利用される。テレビ番組は３０分や１時間といった単位で構成されることが多いため、３０分間隔で処理を実行する。３０分という時間間隔以外で実行しても問題ないが、次の放送開始日時までに注目度を算出する必要がある。 The program attention level calculation / storage unit 102 performs the program attention level calculation / processing for the purpose of calculating the attention level for one broadcast for each program title, and the calculated attention level is the program of the same program title next time. Used when is broadcast. Since television programs are often configured in units of 30 minutes or 1 hour, processing is executed at 30-minute intervals. There is no problem if it is executed at a time interval other than 30 minutes, but it is necessary to calculate the degree of attention by the next broadcast start date and time.

番組注目度算出・蓄積部１０２は、３０分間隔で番組別集計テーブル５００から注目度算出日時５０６が現在日時より古い番組を検索し、番組ＩＤ５０１の集合を取得する(ステップ３０１)。番組ＩＤ集合から番組ＩＤを１つ取り出し、番組ＩＤに対応する開始日時５０４から注目度算出日時５０６までの集計データを加算し、注目度とする。注目度の算出が終了した時点で、注目度算出日時を空に変更する（ステップ３０２）。 The program attention level calculation / accumulation unit 102 searches the program total table 500 for programs whose attention level calculation date and time 506 is older than the current date and time, and acquires a set of program IDs 501 every 30 minutes (step 301). One program ID is taken out from the set of program IDs, and the total data from the start date / time 504 to the attention level calculation date / time 506 corresponding to the program ID is added to obtain the attention level. When the attention level calculation is completed, the attention level calculation date is changed to empty (step 302).

次に、番組情報蓄積部２０３を番組タイトルで検索し（ステップ３０３）、次回放送の番組情報がある場合は（ステップ３０４、Ｙｅｓ）、開始日時５０４、終了日時５０５を取得し、注目度算出日時５０６を設定する（ステップ３０５）。 Next, the program information storage unit 203 is searched by the program title (step 303), and when there is program information of the next broadcast (step 304, Yes), the start date and time 504 and the end date and time 505 are acquired, and the attention degree calculation date and time 506 is set (step 305).

ステップ３０１で取得した番組ＩＤ集合の全てを処理し終えたら（ステップ３０６、Ｙｅｓ）、番組ＩＤの注目度を大きな値順に並べ替え、ランク付けして番組注目度テーブル５１０に蓄積する（ステップ３０７）。ここで、蓄積された注目度は、番組タイトル毎に１回の放送でどれくらいの人々が関心を持ったかを示す値となる。以上の処理により算出される注目度は周期性をもった番組だけであり、新しい番組や１回しか放送されない番組の注目度は算出できない。そこで、新しい番組や１回しか放送されない番組をステップ３０７で算出したランクの上位に挿入する（ステップ３０８）。 When all of the program ID sets acquired in step 301 have been processed (step 306, Yes), the program ID attention levels are rearranged in descending order, ranked, and stored in the program attention level table 510 (step 307). . Here, the accumulated attention level is a value indicating how many people are interested in one broadcast for each program title. The degree of attention calculated by the above processing is only a program having periodicity, and the degree of attention of a new program or a program that is broadcast only once cannot be calculated. Therefore, a new program or a program that is broadcast only once is inserted at the top of the rank calculated in step 307 (step 308).

このように、新しい番組や１回しか放送されない番組の注目度を高くすることで、番組を視聴した人々が記載した文書情報から話題を抽出して提供することが可能となる。番組の注目度は、１回の放送の度に算出するので、今回のランクが次回に影響することはない。 Thus, by increasing the degree of attention of a new program or a program that is broadcast only once, it becomes possible to extract and provide topics from document information described by people who have watched the program. Since the attention level of the program is calculated for each broadcast, the current rank does not affect the next time.

次に、語句抽出部１０４の処理の流れについて図９を用いて説明する。 Next, the processing flow of the phrase extraction unit 104 will be described with reference to FIG.

図９は、本発明の一実施の形態における語句抽出部の処理のフローチャートである。 FIG. 9 is a flowchart of the processing of the phrase extraction unit in one embodiment of the present invention.

語句抽出部１０４による処理は、番組情報抽出部１０１の処理終了後実行される。即ち、番組情報抽出部１０１の処理が１５分おきであれば、語句抽出部１０３の処理も１５分おきに実行される。 The processing by the phrase extraction unit 104 is executed after the processing of the program information extraction unit 101 is completed. That is, if the process of the program information extraction unit 101 is every 15 minutes, the process of the phrase extraction unit 103 is also executed every 15 minutes.

定期的に番組注目度算出・蓄積部１０２の番組注目度テーブル５１０から注目度の高い番組ＩＤ５１２を上位Ｎ件取得し、番組別集計結果テーブル５００から注目度の高い番組ＩＤに該当する番組タイトル５０２、開始日時５０４、注目度算出日時５０６、周期５０３を取得する（ステップ４０１）。番組ＩＤと開始日時、注目度算出日時、周期を１つ取り出し（ステップ４０２）、現在日時が、開始日時より新しく、注目度算出日時以前であるという条件を満たすかどうか調べる（ステップ４０３）。注目度算出日時は、開始日時に、周期に応じて設定した期間を加算した日時であり、番組が放送された後、人々の間で話題が一段落する日時である。そこで、ステップ４０３の条件を満たす場合（ステップ４０３、Ｙｅｓ）は、番組別文書情報蓄積部１０３から人々が記述した文書情報を取得し、話題を表す語句を抽出する。番組ＩＤにより番組別文書情報蓄積部１０３の格納領域を特定し、開始日時から現在日時までの文書情報を時刻情報と対応付けて取得する(ステップ４０４)。 The top N program IDs 512 with the highest attention level are periodically acquired from the program attention level table 510 of the program attention level calculation / accumulation unit 102, and the program title 502 corresponding to the program ID with the higher level of attention is obtained from the program total result table 500. , Start date 504, attention calculation date 506, and period 503 are acquired (step 401). One program ID, start date and time, attention level calculation date and time, and one cycle are extracted (step 402), and it is checked whether the current date and time is newer than the start date and time and before the attention level calculation date and time is satisfied (step 403). The attention degree calculation date / time is a date / time obtained by adding a period set in accordance with the cycle to the start date / time, and is a date / time when a topic breaks down among people after the program is broadcast. Therefore, when the condition of step 403 is satisfied (step 403, Yes), document information described by people is acquired from the program-specific document information storage unit 103, and a phrase representing a topic is extracted. The storage area of the program-specific document information storage unit 103 is specified by the program ID, and the document information from the start date and time to the current date is acquired in association with the time information (step 404).

取得した文書情報から、話題を表す語句を抽出し、番組タイトルと開始日時とを対応付けて語句蓄積部２０４へ出力する。 A phrase representing a topic is extracted from the acquired document information, and the program title and the start date / time are associated with each other and output to the phrase storage unit 204.

ステップ４０３の条件を満たさない場合(ステップ４０３、Ｎｏ)は、これから放送される番組であり、近い未来に放送される番組を選択するために期間Ｔを設定する（ステップ４０６）。期間Ｔは、周期的に放送される番組の場合に、(周期−注目度算出期間−注目度算出処理の間隔)より小さい値に設定する。注目度算出処理の間隔を、前述の番組注目度算出・処理の例で示した３０分とすると、例えば、毎日放送される番組の場合は注目度算出期間＝１８時間であるため、
２４−１８−０．５＝５．５
時間未満、毎週放送される番組の場合は、
１６８−７２−０．５＝９５．５
時間未満の値を設定する。そして、番組の開始日時が現在日時より新しく、現在日時＋期間Ｔより古い番組を選択し（ステップ４０７、Ｙｅｓ）、番組情報蓄積部２０３から番組タイトルと開始日時が一致する番組情報のサブタイトルまたは、概要文を取得する。サブタイトルは、番組内容を的確に表した語句であるため、サブタイトルの方が概要文よりも番組内容を伝えやすい。そこで、サブタイトルがある場合には、サブタイトルを取得し、サブタイトルがない場合には概要文を取得する（ステップ４０８）。 When the condition of step 403 is not satisfied (step 403, No), a period T is set to select a program to be broadcast from now on, and a program to be broadcast in the near future (step 406). The period T is set to a value smaller than (period-attention level calculation period-interval of attention level calculation process) in the case of a program broadcast periodically. If the interval of interest level calculation processing is 30 minutes as shown in the above-described example of program attention level calculation and processing, for example, in the case of a program broadcast every day, the attention level calculation period is 18 hours.
24-18-0.5 = 5.5
For programs that are broadcast less than an hour and weekly,
168-72-0.5 = 95.5
Set a value less than the time. Then, a program whose start date and time is newer than the current date and time and older than the current date and time + period T is selected (step 407, Yes), and the program information subtitle of the program information whose start date and time match from the program information storage unit 203, or Get summary statement. Since the subtitle is a word that accurately represents the program content, the subtitle is easier to convey the program content than the summary text. Therefore, if there is a subtitle, the subtitle is acquired, and if there is no subtitle, a summary sentence is acquired (step 408).

取得した情報から語句を抽出し、番組タイトルと開始日時とを対応付けて語句蓄積部２０４へ出力する。期間Ｔは、注目度の高い番組の情報を、放送日時のどれくらい前から提供できるかに影響する。例えば、期間Ｔを３時間に設定した場合、今から３時間以内に開始される注目度の高い番組の情報を提供することができる。また、毎週放送される番組に対して期間Ｔを２４時間と設定した場合、「明日のこの時間に、何か注目されている番組はないかな？」という利用者に対して注目度の高い番組を提供することが可能となる。 A phrase is extracted from the acquired information, and the program title is associated with the start date and time and output to the phrase storage unit 204. The period T affects how long before the broadcast date and time information on programs with a high degree of attention can be provided. For example, when the period T is set to 3 hours, it is possible to provide information on a program with a high degree of attention that starts within 3 hours from now. In addition, when the period T is set to 24 hours for a program broadcast every week, a program with a high degree of attention to the user, “Are there any programs that are attracting attention at this time of tomorrow?” Can be provided.

ステップ４０１で取得したＮ件の番組すべてに対し処理を実施し、全ての番組が処理済みになると（ステップ４１０、Ｙｅｓ）、一連の処理を終了する。 Processing is performed for all the N programs acquired in step 401, and when all the programs have been processed (step 410, Yes), a series of processing ends.

次に、図１０に、図６、図８、図９の各処理の起動タイミングの例を、番組ＩＤ“００００１０１”の番組「情報Ａ」を例にとって説明する。 Next, FIG. 10 illustrates an example of the start timing of each process in FIGS. 6, 8, and 9, taking the program “information A” with the program ID “0000101” as an example.

番組ＩＤ“００００１０１”は、図４より開始日時５０４（ａ）が「２００４年１月２０日１８：００：００」であり、周期５０３が“１”であるため毎日放送される番組である。開始日時５０４と周期５０３により、図７のステップ２０７において、既に注目度算出日時が算出されている(毎日放送されている番組の場合、注目度算出期間＝１８時間とすると、注目度算出日時５０６（ａ）は「２００４年１月２１日１２：００：００」となる)。番組「情報Ａ」が注目度の高い番組であるとすると、番組情報抽出部１０１の処理（図６）と語句抽出部１０４の処理（図９が、例えば１５分おきに実行されており、話題を表す語句の出力が順次行なわれている。また、１５分おきに図６のステップ１０６により、図４の集計データ５０７に集計値が蓄積されている。 The program ID “0000101” is a program that is broadcast every day because the start date and time 504 (a) is “January 20, 2004 18: 00: 00: 00” and the period 503 is “1” from FIG. At step 207 in FIG. 7, the attention level calculation date and time has already been calculated from the start date and time 504 and the period 503 (in the case of a program broadcast every day, if the attention level calculation period is 18 hours, the attention level calculation date and time 506 (A) is “January 21, 2004, 12:00”). If the program “information A” is a program with a high degree of attention, the processing of the program information extraction unit 101 (FIG. 6) and the processing of the word extraction unit 104 (FIG. 9 are executed every 15 minutes, for example. 4 are sequentially output every 15 minutes, and the total value is accumulated in the total data 507 in FIG.

注目度算出日時「２００４年１月２１日１２：００：００」を過ぎて初めて、番組注目度算出・処理（図８）が実行されると、開始日時５０４（ａ）「２００４年１月２０日１８：００：００」から注目度算出日時５０６(ａ)「２００４年１月２１日１２：００：００」までの集計データ５０７が加算されて番組注目度テーブル５１０に記録される（ステップ３０２）。 When the program attention level calculation / processing (FIG. 8) is executed for the first time after the attention level calculation date “January 21, 2004 12:00: 00”, the start date 504 (a) “January 20, 2004 Date of interest calculation date and time 506 (a) “Total date 507 from January 1st, 2004 at 12:00” is added and recorded in the program attention level table 510 (step 302). ).

また、番組タイトルに基づいて、番組情報蓄積部２０３を検索し、次回の番組情報(開始日時（ｂ）「２００４年１月２１日１８：００：００」、終了日時（ｂ）「２００４年１月２１日１８：２７：００」)が取得できると、開始日時、終了日時を更新し、次回の注目度算出日時（ｂ）を計算し、番組別集計結果テーブル５００に記録する（ステップ３０５）。 Further, the program information storage unit 203 is searched based on the program title, and the next program information (start date and time (b) “January 21, 2004 18:00:00”, end date and time (b) “2004 1 If it can be acquired, the start date and time and the end date and time are updated, the next attention level calculation date and time (b) is calculated, and recorded in the program-by-program total result table 500 (step 305). .

前回の開始日時（ａ）「２００４年１月２０日１８：００：００」から注目度算出日時（ａ）「２００４年１月２１日１２：００：００」までは、語句抽出部１０４の処理ステップ４０３の条件に一致し、文書情報蓄積部１０３から語句を抽出する（ステップ４０４，４０５）。 From the previous start date / time (a) “January 20, 2004 18: 00: 00: 00” to the attention degree calculation date / time (a) “January 21, 2004 12: 00: 00: 00”, the processing of the phrase extraction unit 104 A phrase is extracted from the document information storage unit 103 in accordance with the condition of step 403 (steps 404 and 405).

次回開始日時「２００４年１月２１日１８：００：００」の期間Ｔ＝３時間前になると、ステップ４０７の条件に一致し、番組情報蓄積部２０３から語句を抽出する（ステップ４０８，４０９）。 When the next start date and time “January 21, 2004, 18:00:00” period T = 3 hours ago, the condition of step 407 is met and the phrase is extracted from the program information storage unit 203 (steps 408 and 409). .

このように周期的に放送される番組に対して、前回放送の注目度が高い番組については、放送前からこれから話題になるであろう語句を提供することが可能となる。 With respect to a program that is broadcast periodically in this way, a program that has a high level of attention in the previous broadcast can be provided with a word that will become a topic before the broadcast.

より具体的に、図４の番組注目度算出・蓄積部１０２の蓄積例を用いて、語句抽出部１０４の処理の流れを説明する。 More specifically, the processing flow of the phrase extraction unit 104 will be described using the accumulation example of the program attention degree calculation / accumulation unit 102 of FIG.

現在日時を「２００４年０１月２０日１９：１５」とし、注目度の高い番組ＩＤを「００００１０３」、「００００１０５」として、以下の説明を行なう。 The following explanation will be given on the assumption that the current date and time is “January 20, 2004 19:15”, and the program IDs with high attention are “0000103” and “0000105”.

まず、番組ＩＤ「００００１０３」と開始日時「２００４年０１月２０日１９：００：００」注目度算出日時「２００４年０１月２３日１９：００：００」を番組別集計結果テーブル５００から取り出す（ステップ４０２）。番組ＩＤ５０１「００００１０３」がステップ４０３の条件（開始日時≦現在日時≦注目度算出日時）を満たすか調べ、条件を満たすため、番組別文書情報蓄積部１０３から文書情報を取得する（ステップ４０４）。文書情報から語句を抽出し、番組タイトル「アニメＣ」と開始日時「２００４年０１月２０日１９：００：００」とを対応付けて語句蓄積部２０４に出力する（ステップ４０５）。 First, the program ID “0000103” and the start date and time “January 20, 2004, 19:00:00” and the attention degree calculation date and time “January 23, 2004, 19:00:00” are taken out from the program total result table 500 ( Step 402). Whether the program ID 501 “0000103” satisfies the condition of Step 403 (start date and time ≦ current date and time of interest calculation date and time) is checked, and in order to satisfy the condition, document information is acquired from the program-specific document information storage unit 103 (Step 404). The phrase is extracted from the document information, and the program title “Animation C” and the start date “January 20, 2004, 19:00:00” are associated with each other and output to the phrase storage unit 204 (step 405).

次に、番組ＩＤ「００００１０５」と、開始日時「２００４年０１月２０日２１：００：００」を取り出す。ステップ４０３の条件を満たすか調べ、条件を満たさないため、ステップ４０６において期間Ｔを設定する。ここで期間Ｔは、毎日放送される番組の場合３時間、毎週放送される番組の場合２４時間に予め設定されているものとする。番組ＩＤ５０１「００００１０５」は、周期５０３が「７」であり、毎週放送される番組であるから期間Ｔを２４時間とし（ステップ４０６）、ステップ４０７の条件（現在日時＜開始日時＜現在日時＋Ｔ）を満たすか調べる。ステップ４０７の条件を満たすため、番組情報蓄積部２０３から番組の概要文を取得する（ステップ４０８）。概要文から語句を抽出し、番組タイトル５０２「ドラマＥ」と開始時刻５０４「２００４年０１月２０日２１：００：００」とを対応付けて語句蓄積部２０４に出力する（ステップ４０９）。

Next, the program ID “0000105” and the start date “January 20, 2004 21:00:00” are extracted. It is checked whether or not the condition of step 403 is satisfied. Since the condition is not satisfied, a period T is set in step 406. Here, it is assumed that the period T is preset to 3 hours for a program broadcast every day and 24 hours for a program broadcast every week. Since the program ID 501 “0000105” has a period 503 of “7” and is broadcast every week , the period T is set to 24 hours (step 406), and the condition of step 407 (current date <start date <current date + T) Find out if it meets. In order to satisfy the condition of step 407, a summary text of the program is acquired from the program information storage unit 203 (step 408). Phrases are extracted from the summary sentence, and the program title 502 “Drama E” and the start time 504 “January 20, 2004, 21:00:00” are associated and output to the phrase storage unit 204 (step 409).

次に、ステップ４０５における語句の抽出方法について、より詳細に説明する。 Next, the phrase extraction method in step 405 will be described in more detail.

ステップ４０５では、ステップ４０４で取得した文書情報集合のそれぞれの文書内容に対して、形態素解析処理を行い、個々の品詞や句読点などの形態素に分解する。分解した形態素から、名詞、複数の名詞が連続する複合名詞、「小泉政権の支持率」のような連続した語全体で名詞と同じ働きをする名詞句を抽出する。名詞、複合名詞、名詞句は、話題を表す言葉として動詞や形容詞などに比べ適しているためである。以下の説明では、名詞、複合名詞、名詞句を総称して語句と呼ぶ。全ての文書情報から語句を抽出した後、語句毎に出現する文書数を集計する。ここで、タイトルに含まれる語句は、集計の対象から外す。ステップ１０５において、番組タイトルの省略形や別名を利用した場合は、省略形や別名も集計の対象から外す。ステップ１０５において、番組タイトルの省略形や別名を利用した場合は、省略形や別名も集計の対象から外す。残った語句の中から、より多くの文書情報に出現していた語句Ｍ件を、話題を表す語句として抽出する。図５の語句蓄積部２０５では、番組タイトル「アニメＣ」に対応付けられて蓄積されている語句「○○シティ」、「さとしピンチ」、「ピカチューの必殺技」などがステップ４０５で抽出された語句の例である。 In step 405, morpheme analysis processing is performed on each document content of the document information set acquired in step 404, and the document contents are decomposed into morphemes such as individual parts of speech and punctuation marks. From the decomposed morphemes, nouns, compound nouns in which a plurality of nouns are continuous, and noun phrases that function in the same way as nouns are extracted for the entire consecutive words such as “Koizumi administration support rate”. This is because nouns, compound nouns, and noun phrases are more suitable than verbs and adjectives as words representing topics. In the following description, nouns, compound nouns, and noun phrases are collectively referred to as words. After extracting words from all document information, the number of documents appearing for each word is totaled. Here, the words included in the title are excluded from aggregation. If an abbreviation or alias name of the program title is used in step 105, the abbreviation or alias name is also excluded from the total. If an abbreviation or alias name of the program title is used in step 105, the abbreviation or alias name is also excluded from the total. Of the remaining phrases, M phrases that have appeared in more document information are extracted as phrases representing the topic. In the phrase storage unit 205 of FIG. 5, the phrases “XX City”, “Satoshi Pinch”, “Pikachu's Special Move”, etc., stored in association with the program title “Anime C” are extracted in Step 405. This is an example of a phrase.

また、ステップ４０９における語句抽出方法では、ステップ４０５と異なり、１つの文書情報（サブタイトルや概要文）から語句を抽出する処理になるため、より多くの文書情報で使われている語句が話題を表す語句とする方法は適していない。そこで、次のような処理を行う。 Also, the phrase extraction method in step 409 differs from step 405 in that the phrase is extracted from one document information (subtitle or summary sentence), so the phrase used in more document information represents the topic. The phrase method is not suitable. Therefore, the following processing is performed.

まず、ステップ４０５と同様に形態素解析処理を行い、名詞、複合名詞、名詞句（＝語句）を抽出する。概要文は文の先頭に伝えたい重要な情報が記載される性質を持っているため、抽出した語句に対して先頭に出現するほど高い評価値Ａを与える。また、文字数が多いほど正確な情報を伝えることができるため、文字数が多いほど高い評価値Ｂを与える。評価値ＡとＢから最も評価値の高い語句を抽出する。図５では、番組タイトル９０１「ドラマＥ」に対応付けられて蓄積されている「△△と再会」が、ステップ４０９で抽出された語句の例である。 First, similarly to step 405, morphological analysis processing is performed to extract nouns, compound nouns, and noun phrases (= words). Since the summary sentence has the property that important information to be conveyed at the beginning of the sentence is described, a higher evaluation value A is given to the extracted word / phrase as it appears at the beginning. Moreover, since more accurate information can be conveyed as the number of characters increases, a higher evaluation value B is given as the number of characters increases. The word with the highest evaluation value is extracted from the evaluation values A and B. In FIG. 5, “ΔΔ and reunion” stored in association with the program title 901 “drama E” is an example of the phrase extracted in step 409.

また、本発明は、上記の図６、図７、図８、図９の動作をプログラムとして構築し、語句抽出装置として利用されるコンピュータにインストールして実行する、または、ネットワークを介して流通させることも可能である。 In the present invention, the operations shown in FIGS. 6, 7, 8, and 9 are constructed as a program and installed in a computer used as a phrase extracting device to be executed or distributed through a network. It is also possible.

また、構築されたプログラムを語句抽出装置として利用されるコンピュータに接続されるハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭに格納しておき、コンピュータにインストールして実行させることも可能である。 It is also possible to store the constructed program in a hard disk, a flexible disk, or a CD-ROM connected to a computer used as a phrase extraction device, and install and execute the program on the computer.

なお、本発明は、上記の実施の形態及び実施例に限定されることなく、特許請求の範囲内において種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments and examples, and various modifications and applications can be made within the scope of the claims.

本発明は、テレビ番組やラジオ番組などの特定の時刻に開始される番組の注目度に応じて話題となる語句を利用する技術に適用可能である。 The present invention can be applied to a technique that uses a topical phrase according to the degree of attention of a program started at a specific time such as a television program or a radio program.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における語句抽出装置の構成図である。It is a block diagram of the phrase extraction apparatus in one embodiment of this invention. 本発明の一実施の形態における番組注目度算出・蓄積部の蓄積例を示す図である。It is a figure which shows the example of accumulation | storage of the program attention degree calculation and accumulation | storage part in one embodiment of this invention. 本発明の一実施の形態における語句蓄積部の蓄積例を示す図である。It is a figure which shows the example of accumulation | storage of the phrase storage part in one embodiment of this invention. 本発明の一実施の形態における番組情報抽出部の処理のフローチャートである。It is a flowchart of the process of the program information extraction part in one embodiment of this invention. 本発明の一実施の形態における番組別集計結果テーブルの管理処理のフローチャートである。It is a flowchart of the management process of the total result table according to program in one embodiment of this invention. 本発明の一実施の形態における番組注目度算出・処理のフローチャートである。It is a flowchart of a program attention degree calculation and process in one embodiment of the present invention. 本発明の一実施の形態における語句抽出部の処理のフローチャートである。It is a flowchart of the process of the phrase extraction part in one embodiment of this invention. 本発明の一実施の形態における処理タイミングを示す図である。It is a figure which shows the process timing in one embodiment of this invention.

Explanation of symbols

１００語句抽出装置
１０１番組別文書抽出手段、番組情報抽出部
１０２番組注目度算出・手段、番組注目度算出・蓄積部
１０３番組別文書情報蓄積手段、番組別文書情報蓄積部
１０４語句抽出手段、語句抽出部
２０１検索語蓄積部
２０２文書情報蓄積手段、文書情報蓄積部
２０３番組表蓄積手段、番組情報蓄積部
２０４語句蓄積手段、語句蓄積部
５００番組別集計結果テーブル
５０１番組ＩＤ
５０２番組タイトル
５０３周期
５０４開始日時
５０５終了日時
５０６注目度算出日時
５０７集計データ
５１０番組注目度テーブル
５１１ランク
５１２番組ＩＤ
５１３注目度
９０１番組タイトル
９０２開始日時
９０３語句 100 Phrase Extraction Device 101 Program-Specific Document Extraction Unit, Program Information Extraction Unit 102 Program Attention Level Calculation / Means, Program Attention Level Calculation / Storage Unit 103 Program-Specific Document Information Storage Unit, Program-Specific Document Information Storage Unit 104 Phrase Extraction Unit, Phrase Extraction unit 201 Search term storage unit 202 Document information storage unit, document information storage unit 203 Program table storage unit, program information storage unit 204 Phrase storage unit, phrase storage unit 500 Total result table 501 by program Program ID
502 Program title 503 Cycle 504 Start date 505 End date 506 Attention calculation date 507 Total data 510 Program attention table 511 Rank 512 Program ID
513 Attention 901 Program title 902 Start date 903 Words

Claims

Extract words used to extract and provide topics related to events that start at a specific time, such as TV programs and radio programs, from document information published on the network and program schedules that record program schedules A device ,
A program that searches and extracts a document including a program title from a set of documents having time information read from a document information storage unit that stores documents having time information, and stores the program in a document information storage unit for each program. A separate document extraction means ;
From the extracted documents , documents having time information within a predetermined attention level calculation period corresponding to the broadcast period of the program from the program start date and time are totaled for each program, and the program attention level is set as the program attention level. Attention level calculation means for each program stored in the table;
When the program attention level table is referenced, a program with a high degree of attention is selected in the previous broadcast, and the next broadcast of the program is within a preset time before the broadcast of the program until the broadcast start time Acquires the subtitle or outline of the program from the program guide storage means storing the program guide, performs morphological analysis, extracts words and phrases based on the appearance position and number of characters of the phrases, and starts broadcasting after the program broadcasts If it is within the preset time from the time, the document stored in the program-specific document information storage means is subjected to morphological analysis, the number of documents appearing for each phrase is obtained, and the phrases having a large number of appearing documents are extracted. A word extraction means to
The phrase extraction apparatus characterized by having a.

The program attention level calculating means includes :
When the same user inputs the same keyword a plurality of times at short time intervals using the information specifying the user who has input the search word from the search words requested for the attention degree calculation period , once by counting, it counts the search terms occupancy of associated turn sets, according to claim 1 further comprising a means for storing the program of interest degree table the sum of the total value of the number of documents as a target of the program Phrase extraction device .

The program attention level calculating means includes:
Claim 1 or phrase extraction apparatus 2 according to set a high degree of attention for the new program.

The phrase extracting means is
The maximum period until the start of broadcasting is set in advance according to the program broadcasting cycle,
Seek time to broadcast begins at a start date and the current date and time of the program to be broadcast therefrom, with respect to program time to the broadcast start does not exceed the maximum period, according to claim 1 further comprising a means for phrase extraction phrase extraction device.

A phrase extraction program for causing a computer to function as each means constituting the phrase extraction apparatus according to claim 1 .