JP2000242652A

JP2000242652A - Information stream retrieval method and device and storage medium recorded with information stream retrieval program

Info

Publication number: JP2000242652A
Application number: JP11040271A
Authority: JP
Inventors: Masayuki Sugizaki; 正之杉崎; Masakatsu Okubo; 雅且大久保; Daijiro Mori; 大二郎森; Kazuo Tanaka; 一男田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1999-02-18
Filing date: 1999-02-18
Publication date: 2000-09-08

Abstract

PROBLEM TO BE SOLVED: To retrieve the transitions of objective topics from the transitions of the topics of a document extracted from an information transmission medium and to sequence and provide them in an order suited to a retrieval request. SOLUTION: In a document storage part 102, document information inputted in a document input part 101 is stored. In an information stream extraction part 103, classification by time information and classification by topics are executed to a document set inputted in the document input part 101, the documents classified into individual topics are arranged by using the time information and an information stream is generated. In a retrieval request input part 104 the document in which a word desired to be retrieved or information desired to be retrieved is written is specified from the extracted information stream and the retrieval request is originated. In an information stream retrieval part 105, the documents or classification categories satisfying a retrieval condition are extracted from the information stream extracted in the information stream extraction part 103, the similarity with the retrieval condition of them is calculated and they are displayed at a display part 106 sequentially as a retrieved result starting from the information stream of high similarity.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、情報発信媒体から
発信された大量の情報からその話題と時間的な遷移を抽
出し、検索する情報潮流検索方法および装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information flow search method and apparatus for extracting topics and temporal transitions from a large amount of information transmitted from an information transmission medium and searching the information.

【０００２】[0002]

【従来の技術】近年、インターネットなどのコンピュー
タネットワークを通じて、大量の電子化された文書をや
り取りできるようになっている。そのため、個人が必要
とする情報を検索できるようなサービスがネットワーク
上で実現されている。しかし、そのために自分が獲得し
た情報が大量になってしまい、個々の情報の持つ特徴を
抽出することが困難となる。そこで、獲得した情報を分
類し整理する技術が必要となる。2. Description of the Related Art In recent years, it has become possible to exchange a large amount of electronic documents through a computer network such as the Internet. For this reason, services that allow individuals to search for information they need are implemented on networks. However, the amount of information acquired by the user becomes large, and it becomes difficult to extract characteristics of individual information. Therefore, a technique for classifying and organizing the acquired information is required.

【０００３】従来から、文書情報を自動的に分類する手
法の研究が行われている。代表的な手法としては、図書
館のように分類するための区切り（カテゴリと呼ぶ）が
既知で、新規の情報に対しそれぞれ適切と思われるカテ
ゴリに分類する手法（“分類体系相互の関係を利用した
テキストの自動分類”山本，増山（豊橋技術大学）内藤
（ＮＴＴ），１９９５）や、分類するカテゴリが未知
で、文書集合の中から類似する文書を集めて分類カテゴ
リを作成し割り当てるという方法（“競合学習ニューラ
ルネットワークによる自動分割”菊池，松岡ら（宇都宮
大他），１９９５）などがある。これらの技術により、
大量の文書の分類整理を行っている。Conventionally, research has been conducted on a method of automatically classifying document information. A typical method is to classify information into categories that are considered appropriate for new information, with known breaks (called categories) for classifying like a library (" Automatic text classification "Yamamoto, Masuyama (Toyohashi University of Technology) Naito (NTT), 1995), or a method in which the category to be classified is unknown and similar documents are collected from a set of documents to create and assign a classification category (" Automatic division by a competitive learning neural network "Kikuchi, Matsuoka et al. (Utsunomiya Univ.), 1995). With these technologies,
Classifying and organizing a large number of documents.

【０００４】また、本出願人が先に提案した「情報潮流
提示装置」（特開平１０−１５４１５０号）に記載され
ているように、新聞記事のような時間情報を持つ文書を
対象に、話題の遷移を抽出する技術もある。この技術
は、ある文書集合から話題が類似した文書を自動収集し
てカテゴリを作成し、それをある時間間隔で分割し提示
することにより、ある話題に関する記事数や話題の分岐
などの時間変化を抽出するものである。Further, as described in the “information flow presentation device” (Japanese Patent Laid-Open No. 10-154150) proposed by the present applicant, a topic such as a newspaper article having time information is targeted. There is also a technique for extracting the transitions of. This technology automatically collects documents with similar topics from a set of documents, creates categories, divides them at certain time intervals, and presents them. It is to extract.

【０００５】図５、図６は情報潮流提示装置（特開平１
０−１５４１５０号）によって新聞記事を対象として話
題を抽出した例である。図５は「台風３号」に関する情
報潮流であり、図６は「小学生殺人事件」に関する情報
潮流である。FIGS. 5 and 6 show an information flow presentation device (Japanese Patent Laid-Open No.
No. 0-154150) to extract topics from newspaper articles. FIG. 5 shows an information flow relating to “Typhoon No. 3”, and FIG. 6 shows an information flow relating to “Elementary school murder”.

【０００６】図５では、日付「１９９８／０６／０８」
には話題として「台風３号」が発信され、日付「１９９
８／０６／０９」には話題として台風３号の「勢力拡
大」が抽出されていることを表している。また、「台風
３号」では、記事が２件、台風３号の「近畿通過」では
６件の記事が発信されていることを表している。ちなみ
に、日付「１９９８／０６／０７」の空のカテゴリは、
「情報潮流が１９９８／０６／０８から始まっている」
ことを表している。また、図６では、「小学生殺人事
件」という情報潮流の中に、さまざまな話題が存在して
いることを表している。例えば、日付「１９９８／０６
／２８」に「殺人事件」が発生し、日付「１９９８／０
６／２９」では殺された「小学生」の話題や、「医者」
の診断の話題や、犯人の「中学」の話題に分岐して発信
されていることを表している。In FIG. 5, the date “1998/06/08”
"Typhoon No. 3" was sent as a topic, and the date "199
"8/06/09" indicates that "power expansion" of Typhoon No. 3 is extracted as a topic. In addition, "Typhoon No. 3" indicates that two articles are transmitted, and "Typhoon No. 3""KinkiPass" indicates that six articles are transmitted. By the way, the empty category on the date "1998/06/07"
"The information tide is starting on 1998/06/08"
It represents that. Further, FIG. 6 shows that various topics are present in the information flow of “elementary school murder”. For example, the date “1998/06
/ 28 ”, a“ murder case ”occurred, and the date“ 1998/0
In June 29, the topic of the killed elementary school students and the doctor
It shows that the topic is divided into the topic of diagnosis and the topic of the junior high school of the culprit.

【０００７】即ち、情報潮流提示装置とは、各日付ある
いは時刻毎に発信された文書集合から類似した記事同士
を集めカテゴリに割り当て、カテゴリ名を決定し、各時
刻毎に存在する類似した話題のカテゴリを繋げ、その結
果を提示する装置といえる。That is, the information flow presentation device collects similar articles from a set of documents transmitted at each date or time, assigns them to a category, determines a category name, and determines a similar topic existing at each time. It can be said that it is a device that connects categories and presents the results.

【０００８】[0008]

【発明が解決しようとする課題】本出願人が先に提案し
た情報潮流提示装置（特開平１０−１５４１５０号）に
記載されている発明では、あらかじめ欲しい情報が既知
の場合に、その情報を情報潮流提示装置に入力しておく
ことで、その情報に応じた話題の変遷を抽出することが
できた。According to the invention described in the information flow presenting apparatus (JP-A-10-154150) previously proposed by the present applicant, if the desired information is known in advance, the information is used as the information. By inputting the information into the tidal current presentation device, the transition of topics according to the information could be extracted.

【０００９】本発明の目的は、さらに、情報発信媒体か
ら抽出された大量の話題の遷移の中から必要な話題の遷
移を検索し、かつ順序付けして提示する情報潮流提示方
法および装置を提供することである。Another object of the present invention is to provide an information flow presentation method and apparatus for searching for a necessary topic transition from a large number of topic transitions extracted from an information transmission medium, and presenting the transition in a sequence. That is.

【００１０】[0010]

【課題を解決するための手段】まず、入力された記事集
合から話題の遷移（これを情報潮流と呼ぶことにする）
の抽出を行う。そのために情報潮流の抽出を行う。分類
対象となる文書は、時間の情報を有する文書であり、そ
の文書が作成された時間や文書内で表現している時間を
利用する。記事の集合は、新聞社やラジオ局やテレビ局
など、テキスト情報を発信している媒体から発信されて
いる情報を別々に、または組み合わせて利用する。話題
の時間的な変化は、記事集合を時間毎に分類した結果と
話題毎に分類した結果を用いて、共通した話題のカテゴ
リを時間順に従って繋げていく。[Means for Solving the Problems] First, a topic transition from an input article set (this is called an information flow)
Is extracted. For this purpose, information flow is extracted. The document to be classified is a document having time information, and uses the time at which the document was created and the time expressed in the document. The collection of articles uses information transmitted from a medium that transmits text information, such as a newspaper company, a radio station, or a television station, separately or in combination. The temporal change of the topic is obtained by connecting the common topic categories in chronological order using the result of classifying the article set by time and the result of classifying by topic.

【００１１】情報潮流の検索手法の説明の前に、検索対
象の各文書に対し特徴ベクトルを定義する。特徴ベクト
ルとは実数値のベクトルで、各要素は文書内に存在する
単語それぞれと１対１に対応する。文書ｉの特徴ベクト
ルBefore describing the information flow search technique, a feature vector is defined for each document to be searched. The feature vector is a real-valued vector, and each element has one-to-one correspondence with each word existing in the document. Feature vector of document i

【００１２】[0012]

【外１】は[Outside 1] Is

【００１３】[0013]

【数１】で表現できる。ｎは文書の特徴を表すために用いる単語
の数である。各要素の値は、その単語の出現頻度や文書
集合全体における分布の割合や文字数の長さなどを利用
して決定する手法（"Automatic Text Processing" Gera
rd Salton, ADDISON-WESLEY pub. 1989）が一般的であ
る。この特徴ベクトルを用いて、文書間の類似度を定義
することができる。例えば、特徴ベクトル間の内積から
算出されるｃｏｓθ（ただし、θは特徴ベクトル間の角
度）などの関数が利用される。(Equation 1) Can be expressed by n is the number of words used to represent the characteristics of the document. The value of each element is determined using the appearance frequency of the word, the distribution ratio in the entire document set, the length of the number of characters, etc. ("Automatic Text Processing" Gera
rd Salton, ADDISON-WESLEY pub. 1989) is common. The similarity between documents can be defined using this feature vector. For example, a function such as cos θ (where θ is the angle between the feature vectors) calculated from the inner product between the feature vectors is used.

【００１４】次に、情報潮流の検索手法について説明す
る。検索要求は検索したい単語、あるいは検索したい情
報が書かれた文書を用いる。検索要求はそれぞれ一つあ
るいは複数、あるいはいくつかの組合せで指定する。検
索要求が単語の場合、全文書を対象にその単語を含む文
書をすべて選出する。その際に、検索要求の単語と文書
との類似度はその文書の持つ特徴ベクトルの要素の値を
利用する。検索要求が文書であった場合、その文書の特
徴ベクトルを生成し、検索対象の文書の持つ特徴ベクト
ルとの類似度を計算し、類似している文書をすべて選出
する。Next, a method of searching for an information flow will be described. The search request uses a word or a document in which information to be searched is written. Each search request is specified by one, a plurality, or some combination. If the search request is a word, all documents containing that word are selected for all documents. At this time, the value of the element of the feature vector of the document is used for the similarity between the word of the search request and the document. If the search request is a document, a feature vector of the document is generated, the similarity to the feature vector of the search target document is calculated, and all similar documents are selected.

【００１５】検索要求を満たした文書集合をＳＤｏｃ＝
｛ｄｏｃ₁，…，ｄｏｃ_m｝で、それぞれの類似度をＳＤ
ｏｃｖ＝（ｄｏｃｖ₁，…，ｄｏｃｖ_m）とする。
ｍは検索要求を満たした文書数である。次に、検索要求
に対する各情報潮流の類似度を計算する。例えば、情報
潮流ｋの類似度Ｒｅｌ_k は、情報潮流に割り当てられて
いる文書の検索要求に対する類似度の総和とすると、A document set satisfying a search request is defined as SDoc =
{Doc ₁ ,..., Doc _m }, each similarity is expressed as
oc v = (doc v ₁ , ..., doc v _m ).
m is the number of documents that satisfy the search request. Next, the similarity of each information flow with respect to the search request is calculated. For example, when the similarity Rel _k of the information flow _k is the sum of the similarities to the search request of the document assigned to the information flow,

【００１６】[0016]

【数２】と表現できる。(Equation 2) Can be expressed as

【００１７】情報潮流の類似度を用いて、類似度が大き
いものから順序付けて検索結果を表示する。また、各情
報潮流が存在する時間情報を利用して、新鮮な文書を数
多く持つ情報潮流をより上位に、あるいは、その逆に順
序付けて表示する方法もある。Using the similarity of the information flow, the search results are displayed in order from the one with the largest similarity. There is also a method of displaying information flows having a large number of fresh documents in a higher order, or vice versa, using time information in which each information flow exists.

【００１８】もう一つの代表的方法として、情報潮流と
検索語（検索式）との類似度を算出して、類似している
情報潮流を検索結果として提示する方法がある。情報潮
流と検索語との類似度を算出するために情報潮流の特徴
ベクトルAs another typical method, there is a method of calculating the similarity between an information flow and a search word (search expression), and presenting similar information flows as a search result. Feature vector of information flow to calculate similarity between information flow and search term

【００１９】[0019]

【外２】を導入した。情報潮流の構成品は文書であるので、情報
潮流の特徴ベクトル[Outside 2] Was introduced. Since the components of the information flow are documents, the feature vector of the information flow

【００２０】[0020]

【外３】として情報潮流を構成している文書の特徴ベクトル[Outside 3] Feature vectors of documents that make up the information flow

【００２１】[0021]

【外４】の平均（（３）式）を利用する。[Outside 4] (Expression (3)) is used.

【００２２】[0022]

【数３】ここで各検索語ｑの特徴ベクトルを(Equation 3) Here, the feature vector of each search word q is

【００２３】[0023]

【外５】とする。特徴ベクトル[Outside 5] And Feature vector

【００２４】[0024]

【外６】は、各要素ｑが本文中の単語ｑと１対１に対応してお
り、検索語ｑに対応する要素の値を１、それ以外を０と
する。そして「情報潮流」と検索語ｑとの類似度として
情報潮流の特徴ベクトル[Outside 6] Is such that each element q has a one-to-one correspondence with the word q in the text, the value of the element corresponding to the search word q is 1, and the other values are 0. Then, as a similarity between the “information flow” and the search term q, the feature vector of the information flow is

【００２５】[0025]

【外７】との内積（ユークリッド空間における三角関数ｃｏｓ
θ）を利用する。[Outside 7] Product (trigonometric function cos in Euclidean space
θ) is used.

【００２６】そして、情報潮流の特徴ベクトルと検索語
の特徴ベクトルとの類似度が０より大きい情報潮流を抽
出し、類似度が大きい順に検索結果として提示する。か
かる手法によって共通の単語が用いられている文書から
なる情報潮流が検索される。Then, an information flow in which the similarity between the feature vector of the information flow and the feature vector of the search word is greater than 0 is extracted, and presented as a search result in descending order of the similarity. By this method, an information flow composed of documents using common words is searched.

【００２７】検索された複数の情報潮流は共通の単語あ
るいは文書に類似した話題であり、それらを比較するこ
とが可能となる。The retrieved information flows are topics similar to common words or documents, and these can be compared.

【００２８】[0028]

【発明の実施の形態】次に、本発明の実施の形態につい
て図面を参照して説明する。Next, embodiments of the present invention will be described with reference to the drawings.

【００２９】図１を参照すると、本発明の一実施形態の
情報潮流検索装置は、文書入力部１０と文書記憶部１０
２と情報潮流抽出部１０３と検索要求入力部１０４と情
報潮流検索部１０５と表示部１０６とで構成される。Referring to FIG. 1, an information flow searching device according to an embodiment of the present invention includes a document input unit 10 and a document storage unit 10.
2, an information power flow extraction unit 103, a search request input unit 104, an information power flow search unit 105, and a display unit 106.

【００３０】本実施形態の情報潮流提示装置では、利用
者が、処理を施したい文書を文書入力部１０１で入力す
る。処理を施したい文書としては、コンピュータに入力
した文書すべてが含まれる。例えば、新聞記事やインタ
ーネット上のＨＴＭＬファイルやネットニュースや文字
放送やＦＭ多重放送やテレビでの放送原稿等がある。In the information flow presenting apparatus according to the present embodiment, a user inputs a document to be processed by the document input unit 101. The documents to be processed include all the documents input to the computer. For example, there are newspaper articles, HTML files on the Internet, net news, text broadcasting, FM multiplex broadcasting, and broadcast manuscripts on television.

【００３１】文書記憶部１０２では、文書入力部１０１
において入力した情報を、情報発信媒体それぞれ、ある
いは情報発信媒体のいくつか、あるいはすべての情報発
信媒体全体の文書を取り出せるように記憶する。In the document storage unit 102, the document input unit 101
Is stored so that the document of each information transmission medium, or some or all of the information transmission media can be retrieved.

【００３２】情報潮流抽出部１０３では、文書集合から
の情報潮流の抽出を行う。文書入力部１０１で入力され
た文書集合に対し、「１９９９年」「１９９９年１月」
「１９９９年１月３日」などのような時間情報による分
類と、「“交通事故”の分類カテゴリ」「“８つ子誕
生”の分類カテゴリ」などのような話題による分類を施
す。各話題に分類された文書を時間情報を用いて並べ
て、情報潮流を生成する。The information flow extraction unit 103 extracts an information flow from a set of documents. For the document set input by the document input unit 101, “1999”, “January 1999”
Classification based on time information such as "January 3, 1999" and topic classification such as "classification category of" traffic accident "" and "classification category of" eight child birth "" are performed. Documents classified into each topic are arranged using time information to generate an information flow.

【００３３】検索要求入力部１０４では、利用者が、抽
出された情報潮流の中から、検索したい単語あるいは検
索したい情報が書かれた文書を指定して検索要求を出
す。In the search request input section 104, the user issues a search request by specifying a word to be searched or a document in which information to be searched is written from the extracted information flow.

【００３４】情報潮流検索部１０５では、情報潮流抽出
部１０３で抽出された情報潮流の検索を行う。すなわ
ち、検索要求入力部１０４から入力された検索条件を満
たす文書あるいは分類カテゴリを各情報潮流から抽出
し、それらの検索条件との類似度を計算し、類似度の大
きい情報潮流から順番に検索結果として表示部１０６に
表示する。The information power flow search unit 105 searches for the information power flow extracted by the information power flow extraction unit 103. That is, a document or a classification category that satisfies the search condition input from the search request input unit 104 is extracted from each information flow, the similarity with those search conditions is calculated, and the search results are sequentially sorted from the information flow having the highest similarity. Is displayed on the display unit 106.

【００３５】本実施形態の処理の流れを具体例を用いて
説明する。The processing flow of the present embodiment will be described using a specific example.

【００３６】本装置に入力する文書集合は、新聞社Ａと
新聞社Ｂと新聞社Ｃのそれぞれの１９８８年から１９９
８年の新聞記事とする。文書集合を文書入力部１０１よ
り入力する。入力された文書は文書記憶部１０２に記憶
される。The set of documents to be input to the present apparatus is the same as that of newspaper company A, newspaper company B and newspaper company C since 1988.
Eight-year newspaper article. A document set is input from the document input unit 101. The input document is stored in the document storage unit 102.

【００３７】次に、情報潮流抽出部１０３で、入力され
た文書集合から情報潮流を抽出する。この結果から、例
えば、１９９６年の「“台風１８号”の情報潮流」や１
９９７年の「“台風１８号”の情報潮流」や１９９３年
の「“日本プロサッカーリーグ（Ｊリーグ）”の情報潮
流」や１９９４年の「“Ｊリーグ”の情報潮流」や１９
９４年の「“ワールドカップアメリカ大会”の情報潮
流」や１９９８年の「“ワールドカップフランス大会”
の情報潮流」などが抽出されたとする（図３）。Next, the information flow extraction unit 103 extracts an information flow from the input document set. From this result, for example, the information flow of “Typhoon No. 18” in 1996 or 1
The information flow of “Typhoon No. 18” in 997, the information flow of “Japan Professional Soccer League (J-League)” in 1993, the information flow of “J-League” in 1994, and 19
1994 "The World Cup World Cup Information Trends" and 1998 "World Cup France Tournament"
It is assumed that the information tide of information has been extracted (FIG. 3).

【００３８】次に、情報潮流検索部１０５で、必要な情
報潮流を検索する。例えば、「サッカー」の情報が欲し
いとする。検索要求入力部１０４からの検索要求として
「“サッカー”という単語を含む文書が存在する情報潮
流」を入力する（ステップ２０１）と、１９９３年の
「“日本プロサッカーリーグ（Ｊリーグ）”の情報潮
流」や１９９４年の「“Ｊリーグ”の情報潮流」や１９
９４年の「“ワールドカップアメリカ大会” の情報潮
流」や１９９８年の「“ワールドカップフランス大会”
の情報潮流」が単語「サッカー」を含む文書を含んで
おり、この条件に合う情報潮流として抽出される（図３
ステップ２０２、２０３）。表示する時の順序付けは、
例えば、順序付けする条件を「最近の情報潮流から」と
すれば、年数が新しいものから順に順序付けして表示部
１０５に表示する（ステップ２０４、２０５）。Next, the information flow search unit 105 searches for a necessary information flow. For example, suppose we want information on "soccer". When the user inputs “information trend in which a document including the word“ soccer ”exists” as a search request from the search request input unit 104 (step 201), the information of “Japan Professional Soccer League (J-League)” in 1993 Tide "and" Information Tide of "J League""in 1994 and 19
In 1994, the information trend of the “World Cup USA Tournament” and in 1998, the “World Cup France Tournament”
Information flow "includes a document including the word" soccer ", and is extracted as an information flow that meets this condition (FIG. 3).
Steps 202 and 203). The ordering when displaying is
For example, if the ordering condition is "from the latest information flow", the order is displayed in the display unit 105 in order from the newest one (steps 204 and 205).

【００３９】また、「台風」の情報が欲しいとする。検
索要求として「“台風”という単語を含む文書が存在す
る情報潮流」を用いる（ステップ２０１）と、１９９６
年の「“台風１８号”の情報潮流」や１９９７年の
「“台風１８号” の情報潮流」や、“台風により試合
が延期”という記事を含む１９９４年の「“Ｊリーグ”
の情報潮流」が単語「台風」を含む文書を含んでおり、
この条件に合う情報潮流として抽出される（ステップ２
０２、２０３）。表示する時の順序付けは、例えば、順
序付けする条件を「単語と文書の類似度の総和が大きい
ものから」とすれば、台風の情報を多く含んでいない１
９９４年の「“Ｊリーグ”の情報潮流」よりも、１９９
６年の「“台風１８号” の情報潮流」や１９９７年の
「“台風１８号”の情報潮流」が、より上位に順序付け
され、表示部１０６に表示される（ステップ２０４、２
０５）。It is also assumed that information on "typhoon" is desired. When "information flow in which a document including the word" typhoon "exists" is used as the search request (step 201), 1996
The "J League" in 1994, including the article "The Typhoon No. 18 Information Trend", the 1997 "The Typhoon No. 18 Information Trend", and the article "The game was postponed due to the typhoon"
Information flow contains documents containing the word "typhoon"
It is extracted as an information flow that meets this condition (step 2
02, 203). For example, if the ordering condition is “from the largest sum of similarities between words and documents”, the ordering at the time of display does not include much typhoon information.
199 more than the “J-League information trend” in 994
The “information flow of“ Typhoon No. 18 ”” in 6 years and the “information flow of“ Typhoon No. 18 ”” in 1997 are ordered higher and displayed on the display unit 106 (steps 204 and 2).
05).

【００４０】図４は本発明の他の実施形態の情報潮流提
示装置である。本実施形態の情報潮流提示装置は入力装
置３０１と記憶装置３０２，３０３と出力装置３０４と
記録媒体３０５とデータ処理装置３０６で構成されてい
る。入力装置３０１は図１中の文書入力部１０１に相当
する。記憶装置３０２は図１中の文書記憶部１０２に相
当する。記憶装置３０３はハードディスクである。出力
装置３０４は図１中の表示部１０６に相当する。記録媒
体３０５は、図２に示す処理からなる情報潮流検索プロ
グラムが記録されている、ＦＤ（フロッピィ・ディス
ク）、ＣＤ−ＲＯＭ、ＭＯ（光磁気ディスク）等の記録
媒体である。データ処理装置３０６は記録媒体３０５か
ら情報潮流検索プログラムを記憶装置３０３に読込ん
で、これを実行するＣＰＵである。FIG. 4 shows an information flow presentation device according to another embodiment of the present invention. The information flow presentation device of the present embodiment includes an input device 301, storage devices 302 and 303, an output device 304, a recording medium 305, and a data processing device 306. The input device 301 corresponds to the document input unit 101 in FIG. The storage device 302 corresponds to the document storage unit 102 in FIG. The storage device 303 is a hard disk. The output device 304 corresponds to the display unit 106 in FIG. The recording medium 305 is a recording medium such as an FD (floppy disk), a CD-ROM, and an MO (magneto-optical disk) in which an information flow search program including the processing shown in FIG. 2 is recorded. The data processing device 306 is a CPU that reads the information flow search program from the recording medium 305 into the storage device 303 and executes the program.

【００４１】[0041]

【発明の効果】以上説明したように、本発明は、大量の
文書集合において、個々の文書が持つ話題の情報とその
文書が生成された時間の情報を用いて生成された大量の
情報潮流から、欲しい情報潮流だけを検索でき、それに
よって自分の欲しい情報とその話題の遷移を把握するこ
とが可能となる。As described above, according to the present invention, in a large document set, a large amount of information flow generated by using topic information of each document and information of the time when the document was generated is used. Therefore, it is possible to search for only the desired information flow, thereby making it possible to grasp the information desired by the user and the transition of the topic.

[Brief description of the drawings]

【図１】本発明の一実施形態の情報潮流検索装置の概略
構成を示すブロック図である。FIG. 1 is a block diagram illustrating a schematic configuration of an information flow search device according to an embodiment of the present invention.

【図２】図１の情報潮流検索装置での、検索要求の入力
から検索結果の表示までの処理を示すフローチャートの
一例である。FIG. 2 is an example of a flowchart showing processing from input of a search request to display of a search result in the information flow search device of FIG. 1;

【図３】図１の情報潮流検索装置の検索結果の一出力例
を示す図である。FIG. 3 is a diagram illustrating an example of an output of a search result of the information power flow search device in FIG. 1;

【図４】本発明の他の実施形態の情報潮流検索装置の構
成図である。FIG. 4 is a configuration diagram of an information flow search device according to another embodiment of the present invention.

【図５】従来の情報潮流提示装置によって得られた「台
風３号」に関する情報潮流を示す図である。FIG. 5 is a diagram showing an information flow relating to “Typhoon No. 3” obtained by a conventional information flow presentation device.

【図６】従来の情報潮流提示装置によって得られた「小
学生殺人事件」に関する情報潮流を示す図である。FIG. 6 is a diagram showing an information flow relating to a “elementary school murder case” obtained by a conventional information flow presentation device.

[Explanation of symbols]

１０１文書入力部１０２文書記憶部１０３情報潮流抽出部１０４検索要求入力部１０５情報潮流検索部１０６表示部２０１〜２０５ステップ３０１入力装置３０２，３０３記憶装置３０４出力装置３０５記録媒体３０６データ処理装置 Reference Signs List 101 Document input unit 102 Document storage unit 103 Information flow extraction unit 104 Search request input unit 105 Information flow search unit 106 Display unit 201 to 205 Step 301 Input device 302, 303 Storage device 304 Output device 305 Recording medium 306 Data processing device

───────────────────────────────────────────────────── フロントページの続き (72)発明者森大二郎東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者田中一男東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5B075 ND03 NK06 NK10 NK31 NR12 PQ76 QM08 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Daijiro Mori 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Kazuo Tanaka 3-192-1, Nishi-Shinjuku, Shinjuku-ku, Tokyo No. Nippon Telegraph and Telephone Corporation F-term (reference) 5B075 ND03 NK06 NK10 NK31 NR12 PQ76 QM08

Claims

[Claims]

An information flow search method for extracting a topic and its temporal transition from a large amount of information transmitted from an information transmission medium and searching the document, wherein a document described in a natural language is input. An input step, a document storage step of storing input data obtained in the document input step, and an input data stored in the document storage step.
An information flow extraction step of extracting an information flow, which is a transition of the topic, by categorizing according to the topic and arranging the documents in the order of the time when the document was transmitted; and, from the extracted information flow, a word or a search to be performed. A search request inputting step of designating a document in which information is written and issuing a search request; comparing the information flow with a document including the word specified in the search request inputting step or the specified document, and assigning the information flow An information flow search step of obtaining a similarity to a search request of a document being ordered and ordering information flows from those having the highest similarity or the latest, and a display step of displaying the ordered information flows. Method.

2. An information flow searching apparatus for extracting a topic and its temporal transition from a large amount of information transmitted from an information transmission medium and searching the document, wherein a document described in a natural language is input. An input unit; a document storage unit that stores input data obtained by the document input unit; and input data stored in the document storage unit. An information flow extraction unit for extracting an information flow as a topic transition; and a search request for designating a word or a document in which information to be searched is written from among the extracted information flows. An input unit, comparing the information flow with a document including the word specified in the search request input unit or the specified document, and calculating a similarity to the search request of the document assigned to the information flow, And information tide retrieval unit ordering from those or latest large similarity information tide retrieval device having a display unit that displays the ordered information tide.

3. An information flow search program for extracting and retrieving topics and temporal transitions thereof from a large amount of information transmitted from an information transmission medium, wherein a document described in a natural language is stored in a storage device. By storing the input data stored in the document storage procedure, classifying the input data according to the topic, and arranging the input data in the order in which the documents were transmitted,
An information flow extraction procedure for extracting an information flow that is a transition of a topic, including, from the extracted information flow, a word specified in a search request specifying a word to be searched or a document in which information to be searched is written A document or a specified document is compared with the information flow, a similarity to a search request for a document assigned to the information flow is obtained, and an information flow search for ordering the information flow from the one with the highest similarity or the latest one is performed. A recording medium storing an information flow search program for causing a computer to execute a procedure and a display procedure of displaying the ordered information flow on a display device.