JP4087769B2

JP4087769B2 - Server and related word proposal method

Info

Publication number: JP4087769B2
Application number: JP2003324503A
Authority: JP
Inventors: 武士辻; 勝俊飯伏
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-09-17
Filing date: 2003-09-17
Publication date: 2008-05-21
Anticipated expiration: 2023-09-17
Also published as: JP2005092491A

Description

本発明は、ユーザが予め登録したキーワード（単語）を含む文書が収集された際に、その文書を該当するユーザに対して提供するサービスに関する。更に詳しくは、ユーザが予め登録したキーワードに対して、新出語、関連語、及び検索式を提案する技術に関する。 The present invention relates to a service for providing a document to a corresponding user when a document including a keyword (word) registered in advance by the user is collected. More specifically, the present invention relates to a technique for proposing new words, related words, and search expressions for keywords registered in advance by a user.

近年、ウエブ上からの文書収集や文書検索を応用したサービスとして、例えば、リアルタイムな文書（例えば、ニュース記事）を提供するクリッピングサービスが実施されている。このクリッピングサービスとは、ユーザが予め登録したキーワードを含む文書が収集された際に、その文書を該当するユーザに自動的に配信するサービスである。 In recent years, for example, a clipping service that provides a real-time document (for example, a news article) has been implemented as a service that applies document collection and document search from the web. This clipping service is a service that automatically distributes a document to a corresponding user when a document including a keyword registered in advance by the user is collected.

従来のクリッピングサービスでは、ユーザが登録したキーワードは、ユーザ自身が変更作業を行わない限り継続して使用される。このため、従来のクリッピングサービスには、登録したキーワードが陳腐化してしまうという問題点があった。さらに、ユーザが登録するキーワードは、ユーザが持つ知識の範囲内の語句となる場合が殆どである。このため、ユーザが登録したキーワードでは、ユーザが求める情報を抽出する材料としては不足している場合も考えられる。 In the conventional clipping service, the keyword registered by the user is continuously used unless the user himself performs a change operation. For this reason, the conventional clipping service has a problem that the registered keyword becomes obsolete. Furthermore, the keywords registered by the user are mostly words and phrases within the range of knowledge possessed by the user. For this reason, the keyword registered by the user may be insufficient as a material for extracting information requested by the user.

また、登録してあるキーワードなどを用いて情報を提示する技術として、語句の出現頻度から関連性を計算することで、キーワード（検索語）に対する関連語を提示する方法がある（特許文献１、特許文献２）。 In addition, as a technique for presenting information using registered keywords or the like, there is a method of presenting related words for keywords (search terms) by calculating relevance from the appearance frequency of phrases (Patent Document 1, Patent Document 2).

その他、キーワードを用いて情報を提示する技術として、特許文献３及び特許文献４に開示された技術がある。
特開平９−０４４５２３号公報特開２０００−２２２４２７号公報特開平５−１５１２７３号公報特開平１０−１３４０７５号公報 In addition, as a technique for presenting information using a keyword, there are techniques disclosed in Patent Document 3 and Patent Document 4.
JP-A-9-044523 JP 2000-222427 A JP-A-5-151273 Japanese Patent Laid-Open No. 10-134075

しかしながら、従来のような関連語の提案方法は、文書の新旧に関係する時間的な変化が考慮されていない。このため、社会的背景、又は世の注目を集めた出来事により、急に関連性が伴うようになった語句を関連語として提案することができず、クリッピングサービスのようにリアルタイムな情報を提供するサービスに適用するには不向きであった。 However, conventional related word proposal methods do not take into account temporal changes related to the old and new documents. For this reason, it is not possible to suggest words or phrases that suddenly become related due to social backgrounds or events that have attracted the attention of the world, and provide real-time information like a clipping service. It was unsuitable to apply to the service.

本発明は、以上のような問題を解決し、時間的な変化によって発生する旬な語句を提示することができ、ユーザが登録したキーワードに対するメンテナンスの補助をすることが可能な技術を提供することを目的とする。 The present invention provides a technique capable of solving the above-described problems, presenting seasonal words that occur due to changes over time, and assisting maintenance of keywords registered by the user. With the goal.

上記問題を解決するため、本発明は以下のような構成をとる。即ち、本発明のサーバは、文書と、該文書に含まれている単語と、該文書の属する分野との関係を、前記単語と当該単語の分野との関係で示す分類辞書を用いて、文書に含まれている単語に基づき、前記文書をカテゴリに分類する手段と、前記文書に含まれる単語に対して同一カテゴリ中でのその単語が含まれている文書の数である文書頻度を単位区間毎に算出する手段と、複数の前記単位区間における前記文書頻度の推移を単語情報データベースに記録する手段と、前記単語情報データベースを参照し、同一カテゴリ中で、第１の単語と類似する文書頻度の推移を有する第２の単語を、前記第１の単語の関連語として抽出する手段と、を備える。 In order to solve the above problems, the present invention has the following configuration. That is, the server of the present invention uses a classification dictionary that shows a relationship between a document, a word included in the document, and a field to which the document belongs as a relationship between the word and the field of the word. based on the words that are contained in, and means for classifying the documents in the category, document frequency is the number of documents that contain that word in the same categories for words included in the document It means for calculating for each unit interval and means for recording the estimated transfer of the document frequency in a plurality of the unit sections in the word information database, referring to the word information database, in the same category, the first word Means for extracting a second word having a similar document frequency transition as a related word of the first word.

本発明によれば、文書と、該文書に含まれている単語と、該文書の属する分野との関係を、前記単語と当該単語の分野との関係で示す分類辞書を用いて、文書に含まれている単語に基づき、前記文書をカテゴリに分類し、その文書に含まれる単語に対して同一カテゴリ中で、その単語が含まれている文書の数である文書頻度を単位区間毎に算出し、複数の前記単位区間における前記文書頻度の推移を単語情報データベースに記録し、前記単語情報データベースを参照し、同一カテゴリ中で、第１の単語と類似する文書頻度の推移を有する第２の単語を、その第１の単語の関連語として抽出することができる。ここで、第１の単語は、サーバに予め登録された登録単語として定義でき、推移は、単位期間当りの増減率として定義できる。従って、例えば、予め登録された登録単語をキーに、その登録単語の単位期間当りにおける文書頻度の推移に類似する文書頻度の推移を有する単語を関連語として抽出することができる。 According to the present invention, a document, a word included in the document, and a field to which the document belongs are included in the document using a classification dictionary that indicates the relationship between the word and the field of the word. based on the words that are to classify the documents in the category, in the same category for the word contained in the document, the unit sections each the document frequency is the number of documents that contain the word calculated, to record changes in the document frequency in a plurality of the unit sections in the word information database, referring to the word information database, in the same category, the having a transition of document frequency similar to the first word Two words can be extracted as related words of the first word. Here, the first word can be defined as a registered word registered in advance in the server, and the transition can be defined as an increase / decrease rate per unit period. Therefore, for example, using a registered word registered in advance as a key, a word having a document frequency transition similar to a document frequency transition per unit period of the registered word can be extracted as a related word.

好ましくは、上記サーバは、前記第１の単語またはその関連語が含まれる文書の数に対して、第１の単語及びその関連語の双方が含まれる文書の数の比率である一致度を求める手段と、前記一致度に応じて前記第１の単語とその関連語とを含む検索式を出力する手段と、を備えるように構成してもよい。 Preferably, said server, said the first word or the number of documents that the related word is included, the ratio of the number of first words and documents that contain both the relevant word matching degree And a means for outputting a search expression including the first word and its related word in accordance with the degree of coincidence.

本発明によれば、上記第１の単語またはその関連語が含まれる文書の数に対して、第１の単語及びその関連語の双方が含まれる文書の数の比率である一致度を求めて、その一致度に応じて第１の単語とその関連語を含む検索式を出力することができる。従って、例えば、予め登録された登録単語に対する関連語を、登録単語と組み合わせて提案することができる。 According to the present invention, for the number of documents that contain the first word or related terms thereof, the degree of coincidence is the ratio of the number of first words and documents that include both the related words The search expression including the first word and the related word can be output according to the degree of coincidence. Therefore, for example, a related word for a registered word registered in advance can be proposed in combination with the registered word.

好ましくは、上記サーバが備える出力する手段は、上記一致度が所定値に達する場合に、第１の単語とその関連語の論理積からなる検索式を出力し、上記一致度が所定値に達しない場合に、第１の単語とその関連語の論理和からなる検索式を出力するように構成してもよい。 Preferably, the output means included in the server outputs a search expression composed of a logical product of the first word and its related word when the degree of coincidence reaches a predetermined value, and the degree of coincidence reaches the predetermined value. If not, a search expression composed of a logical sum of the first word and its related word may be output.

本発明によれば、上記一致度が所定値に達する場合に、第１の単語とその関連語の論理積からなる検索式を出力し、上記一致度が所定値に達しない場合に、第１の単語とその関連語の論理和からなる検索式を出力することができる。従って、例えば、予め登録された登録単語に対する関連語を、登録単語と組み合わせて提案する際に、一致度に応じてその組み合わせ方を変えることができる。 According to the present invention, when the degree of coincidence reaches a predetermined value, a search expression consisting of the logical product of the first word and its related word is output, and when the degree of coincidence does not reach the predetermined value, the first It is possible to output a search expression consisting of the logical sum of the words and related words. Therefore, for example, when a related word for a registered word registered in advance is proposed in combination with the registered word, the combination can be changed according to the degree of coincidence.

好ましくは、上記サーバが備える出力する手段は、上記一致度が所定値に達する場合であっても、第１の単語とその関連語が予め登録されている同義語に該当する場合には、第１の単語とその関連語の論理積からなる検索式を出力しないように構成してもよい。 Preferably, the output means included in the server is configured such that, even when the degree of coincidence reaches a predetermined value, the first word and its related word are equivalent to pre-registered synonyms. You may comprise so that the search formula consisting of the logical product of 1 word and its related word may not be output.

本発明によれば、上記一致度が所定値に達する場合であっても、第１の単語とその関連語が予め登録されている同義語に該当する場合には、第１の単語とその関連語の論理積からなる検索式を出力しないようにすることができる。従って、例えば、一致度の値により、同義語でないとみなされた単語であっても、同義語辞書などに予め登録されている同義語を除外することができる。このため、単なる登録単語の同義語（言い換え）ではなく、その登録単語に対して時間の経過とともに関連を持つようになった単語などを関連語として抽出することができる。 According to the present invention, even when the degree of coincidence reaches a predetermined value, if the first word and its related word are equivalent to pre-registered synonyms, the first word and its related word It is possible not to output a search expression consisting of a logical product of words. Therefore, for example, synonyms registered in advance in the synonym dictionary or the like can be excluded even if the word is considered not to be a synonym due to the value of the degree of coincidence. For this reason, it is possible to extract not only a synonym (paraphrase) of a registered word but also a word that has become related to the registered word as time passes.

好ましくは、上記サーバは、上記第１の単語の登録をユーザから受け付ける手段をさらに備え、上記出力する手段は、上記第１の単語を登録したユーザに対して検索式を含む通
知を送信する手段と、上記送信した通知に対して、ユーザから検索式の採用または不採用を含む応答を受け付ける手段と、上記応答に採用が含まれている場合に、一定期間経過後に上記検索式を継続して利用するか否かを確認する通知を送信する手段とを有するように構成してもよい。 Preferably, the server further includes means for accepting registration of the first word from a user, and the means for outputting transmits means for sending a notification including a search expression to the user who has registered the first word. And means for receiving a response including adoption or non-adoption of the search formula from the user in response to the transmitted notification, and if the response includes adoption, the search formula is continued after a certain period of time. It may be configured to have a means for transmitting a notification for confirming whether or not to use.

本発明によれば、上記第１の単語の登録をユーザから受け付けて、その第１の単語を登録したユーザに対して検索式を含む通知を送信し、その送信した通知に対して、ユーザから検索式の採用または不採用を含む応答を受け付けて、その応答に採用が含まれている場合に、一定期間経過後に上記検索式を継続して利用するか否かを確認する通知を送信することができる。従って、例えば、一度検索式を採用したユーザに対して、一定時間経過後に、再びその検索式の利用の意志を確認することができるため、ユーザが登録した登録単語に対するメンテナンスを補助することができる。 According to the present invention, the registration of the first word is accepted from the user, the notification including the search formula is transmitted to the user who has registered the first word, and the user is notified of the transmitted notification. When a response including adoption or non-adoption of a search expression is accepted and the response includes adoption, a notification confirming whether or not to continue using the search expression after a certain period has elapsed Can do. Therefore, for example, since a user who has once employed a search expression can confirm the will of using the search expression again after a certain period of time, maintenance of a registered word registered by the user can be assisted. .

また、本発明は、コンピュータその他の装置、機械等が上記いずれかの処理を実行する方法であってもよい。また、本発明は、コンピュータその他の装置、機械等に、以上のいずれかの機能を実現させるプログラムであってもよい。また、本発明は、そのようなプログラムをコンピュータ等が読み取り可能な記録媒体に記録したものでもよい。 Further, the present invention may be a method in which a computer, other devices, machines, etc. execute any one of the processes described above. Furthermore, the present invention may be a program that causes a computer, other devices, machines, or the like to realize any of the above functions. Further, the present invention may be a program in which such a program is recorded on a computer-readable recording medium.

本発明によれば、時間的な変化によって発生する旬な語句を元のキーワードとの組み合わせによりユーザに提示することができる。これにより、ユーザが登録した元のキーワードに対してメンテナンスをしながら旬な語句をユーザに提示することが可能となる。 According to the present invention, it is possible to present seasonal phrases that occur due to temporal changes to the user in combination with the original keywords. This makes it possible to present seasonal phrases to the user while maintaining the original keywords registered by the user.

以下、図面を用いて本発明を実施するための最良の形態について説明する。なお、本実施形態の説明は例示であり、本発明の構成は以下の説明に限定されない。
《実施形態》
本実施形態では、ユーザに対してクリッピングサービスを実施する情報システムに本発明を適用した場合を想定して説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings. The description of the present embodiment is an exemplification, and the configuration of the present invention is not limited to the following description.
<Embodiment>
In the present embodiment, the case where the present invention is applied to an information system that performs a clipping service for a user will be described.

〈概要〉
本情報システムは、新出語・関連語・検索式をクリッピングサービスを利用しているユーザに対して提案することで、ユーザが予め登録したキーワードのメンテナンスの補助を可能にする機能を実現させる。以下に、ユーザに対して提案する新出語・関連語・検索式のポイントについて説明する。 <Overview>
The information system proposes new words / related words / search formulas to a user who uses the clipping service, thereby realizing a function that enables maintenance of keywords registered in advance by the user. Below, the points of new words, related words, and search formulas proposed to the user will be described.

新出語は、ユーザが登録してあるキーワードが頻出する分野（ジャンル）において、出現頻度が予め設定した閾値よりも高く、且つその時点以前、例えば、前日に比べて出現増加率が予め設定しておいた閾値よりも高い語句をいい、そのような語句が新出語としてユーザに対して提示される。 For new words, in the field (genre) in which the keywords registered by the user frequently appear, the appearance frequency is higher than a preset threshold value, and the appearance increase rate is set in advance compared to the previous day, for example, the previous day. A phrase higher than a predetermined threshold is referred to, and such a phrase is presented to the user as a new word.

関連語は、ユーザが登録してあるキーワードが頻出する分野（ジャンル）において、ユーザのキーワードと出現回数の頻度の推移が類似している語句をいい、そのような語句が関連語としてユーザに対して提示される。この関連語は、ユーザによって採用されるとクリッピングサービス時の新たなキーワードとなる。また、出現頻度の推移は、例えば、図１２に示すグラフの値を用いて判定する。図１２は、キーワードの一定期間ごとの出現頻度をグラフ化した一例を示す図である。このように、キーワードごとの時系列の頻度の推移から、その傾きの類似度を求めることによりキーワード間の類似性を判定する。即ち、全体として同じような傾きの変化をしているか否かを判定する。 Related words are words and phrases whose appearance frequency is similar to that of the user's keyword in a field (genre) in which the keywords registered by the user frequently appear, and such words are related words to the user. Presented. When the related word is adopted by the user, it becomes a new keyword in the clipping service. Further, the transition of the appearance frequency is determined using, for example, the values of the graph shown in FIG. FIG. 12 is a diagram illustrating an example in which the appearance frequency of a keyword for each certain period is graphed. In this way, the similarity between keywords is determined by obtaining the degree of similarity of the inclination from the transition of the time-series frequency for each keyword. That is, it is determined whether or not the same inclination change is made as a whole.

検索式は、関連語の提示において、ユーザが登録してあるキーワードに対する組み合わせ方であり、そのような組み合わせが検索式としてユーザに対して提案される。これは、ユーザが登録してあるキーワードに対して、関連語をどのような条件として用いてクリッピングサービスを提供するのかを考慮するためのものである。以下、本実施形態について、関連語と検索式の提案を例にして具体的に説明する。 The search formula is a method of combining the keywords registered by the user in the presentation of related words, and such a combination is proposed to the user as a search formula. This is for considering the conditions under which the related word is used for the keyword registered by the user to provide the clipping service. Hereinafter, the present embodiment will be described in detail by using related words and search formulas as examples.

〈システム構成〉
図１は、本情報システムのシステム構成を示す図である。本情報システムは、装置１がユーザ２０に対してサービスを実施することにより実現される。装置１は、例えば、パーソナルコンピュータ、ワークステーション、サーバ等の情報機器である。装置１は、コンテンツ収集システム２、コンテンツ分類処理部３、検索インデックス作成部４、キーワード・検索式提案処理部５、キーワード提案管理処理部６、クリッピング処理システム７、各種データベースを有する。また、ユーザ２０は、装置１と通信可能なユーザ端末を保持する。このユーザ端末は、一般的な情報端末を想定しているため、詳細な説明については省略する。 <System configuration>
FIG. 1 is a diagram showing a system configuration of the information system. This information system is realized by the apparatus 1 providing a service to the user 20. The device 1 is an information device such as a personal computer, a workstation, or a server. The apparatus 1 includes a content collection system 2, a content classification processing unit 3, a search index creation unit 4, a keyword / search formula suggestion processing unit 5, a keyword proposal management processing unit 6, a clipping processing system 7, and various databases. The user 20 holds a user terminal that can communicate with the device 1. Since this user terminal is assumed to be a general information terminal, detailed description thereof is omitted.

各種データベースは、コンテンツデータベース（ＤＢ）８、検索データベース９、文書・単語対応リスト１０、カテゴリマスタデータベース１１、分類情報データベース１２、単語情報データベース１３、提案検索式管理データベース１４である。なお、各種データベースの詳細については、〈データ構造〉の節において後述する。 The various databases are a content database (DB) 8, a search database 9, a document / word correspondence list 10, a category master database 11, a classification information database 12, a word information database 13, and a suggested search formula management database 14. Details of the various databases will be described later in the section <Data structure>.

コンテンツ収集システム２は、ウエブ収集ロボットによりインターネット上から文書を収集する。コンテンツ分類処理部３は、収集した文書を各分野（ジャンル）に分類する。この分類は、一般に知られている文書分類技術を用いて行う。例えば、文書に含まれている単語と分野との関係を示す分類辞書を用いればよい。そして、分類結果は、分類情報データベース１２に格納される。 The content collection system 2 collects documents from the Internet using a web collection robot. The content classification processing unit 3 classifies the collected documents into each field (genre). This classification is performed using a generally known document classification technique. For example, a classification dictionary indicating the relationship between words contained in a document and a field may be used. The classification result is stored in the classification information database 12.

検索インデックス作成部４は、収集した文書からインデクサにより検索データベース９を作成する。同時に、検索インデックス作成部４は、どの文書にどのキーワードが含まれているかが識別できる文書・単語対応リスト１０を作成する。 The search index creation unit 4 creates a search database 9 from the collected documents using an indexer. At the same time, the search index creation unit 4 creates a document / word correspondence list 10 that can identify which keyword contains which keyword.

キーワード・検索式提案処理部５は、文書を分類した結果に基づいて文書・単語対応リスト１０、及びユーザ２０が登録してあるキーワードを元に、新たなキーワードやそのキーワードを提示するための検索式をユーザ２０に対して提案する処理を行う。 The keyword / search formula suggestion processing unit 5 performs a search for presenting a new keyword and the keyword based on the document / word correspondence list 10 and the keyword registered by the user 20 based on the result of classifying the document. Processing for proposing an expression to the user 20 is performed.

キーワード提案管理処理部６は、上記提案に対するユーザ２０からの応答結果に基づいて、提案検索式管理データベース１４の更新、その他サービスに係る処理を行う。例えば、キーワード提案管理処理部６は、キーワードや検索式の仮登録または本登録をする。また、キーワード提案管理処理部６は、一定期間経過後にユーザ２０に対して本登録の確認を通知する。 The keyword proposal management processing unit 6 performs processing related to the update of the proposal retrieval formula management database 14 and other services based on the response result from the user 20 to the proposal. For example, the keyword suggestion management processing unit 6 performs temporary registration or main registration of keywords and search expressions. Further, the keyword proposal management processing unit 6 notifies the user 20 of confirmation of the main registration after a certain period of time has elapsed.

クリッピング処理システム７は、クリッピングサービスに係る一連の情報を保持しており、その情報に基づいてユーザ２０に対してクリッピングサービスを実行する。ここでいう情報とは、例えば、ユーザ２０が予め登録するキーワードなどである。また、クリッピング処理システム７は、ユーザ２０からのキーワードの登録を受け付ける手段としても機能する。 The clipping processing system 7 holds a series of information related to the clipping service, and executes the clipping service for the user 20 based on the information. The information here is, for example, a keyword registered by the user 20 in advance. The clipping processing system 7 also functions as a means for accepting keyword registration from the user 20.

〈データ構造〉
次に、装置１が有する各種データベースについて図２〜図６を用いて説明する。図２は、文書・単語対応リスト１０のデータ構造を示す図である。文書・単語対応リスト１０は
、単語と文書の対応を管理する。各文書は、文書ＩＤを付加して文書ＩＤリストとして管理する。文書ＩＤは、収集した各文書に割り当てる固有のＩＤである。図２に示す例では、“狂牛病”という単語が“１、１０５、２０１”のＩＤを持つ文書に含まれていることが示されている。 <data structure>
Next, various databases included in the apparatus 1 will be described with reference to FIGS. FIG. 2 is a diagram illustrating a data structure of the document / word correspondence list 10. The document / word correspondence list 10 manages correspondence between words and documents. Each document is managed as a document ID list by adding a document ID. The document ID is a unique ID assigned to each collected document. In the example shown in FIG. 2, it is indicated that the word “mad cow disease” is included in the document having the ID “1, 105, 201”.

次に、検索データベース９のデータ構造について説明する。検索データベース９は、文書・単語対応リスト１０と同じ内容の情報を保持する。この検索データベース９は、高速にアクセスできる点が文書・単語対応リスト１０と異なる。検索データベース９では、例えば、単語からハッシュなどを用いてアドレスを発生させており、そのアドレスを元に文書ＩＤが取得できる。 Next, the data structure of the search database 9 will be described. The search database 9 holds information having the same contents as the document / word correspondence list 10. This search database 9 is different from the document / word correspondence list 10 in that it can be accessed at high speed. In the search database 9, for example, an address is generated from a word using a hash or the like, and a document ID can be acquired based on the address.

図３は、カテゴリマスタデータベース１１のデータ構造を示す図である。カテゴリマスタデータベース１１は、カテゴリＩＤとカテゴリ名を対応づけて各カテゴリを管理する。カテゴリは、単語を分類する際の情報種別である。図３に示す例では、カテゴリとして“金融”、“食品”等の業界種別が示されている。カテゴリＩＤは、分類の単位となるカテゴリを一意に示すＩＤである。カテゴリ名は、カテゴリを特定する名称である。 FIG. 3 is a diagram showing the data structure of the category master database 11. The category master database 11 manages each category by associating the category ID with the category name. The category is an information type for classifying words. In the example shown in FIG. 3, industry types such as “Finance” and “Food” are shown as categories. The category ID is an ID that uniquely indicates a category that is a unit of classification. The category name is a name that identifies the category.

図４は、分類情報データベース１２のデータ構造を示す図である。分類情報データベース１２は、文書ＩＤとカテゴリＩＤの対応を管理する。図４に示す例では、文書ＩＤ“１”を持つ文書が“２、５、１０”のＩＤを持つカテゴリに属していることが示されている。 FIG. 4 is a diagram showing the data structure of the classification information database 12. The classification information database 12 manages the correspondence between document IDs and category IDs. In the example shown in FIG. 4, it is indicated that the document having the document ID “1” belongs to the category having the ID of “2, 5, 10”.

図５は、単語情報データベース１３のデータ構造を示す図である。単語情報データベース１３は、“単語”と“カテゴリＩＤ”の組み合わせ毎に対応する“ａ（ｋ）”、“ｄｆ（１）”、“更新日”、“ユーザ使用”という各項目を有する。このうち、“ａ（ｋ）”は、ｋ＝１，・・・，ｎによりｎ回分の処理結果を記録することを示している。ここで、ａ（ｋ）は、単位区間毎（ｋ回前とｋ＋１回前の区間）に換算した文書頻度の傾きを示す。文書頻度とは、当該単語が検出された当該カテゴリ中の文書の数をいう。また、この傾きは、単位区間当り（例えば、１日毎）の増減数である。ｄｆ（１）は、前回の処理時の文書頻度である。更新日は、前回にキーワード提案処理を実行した実行日である。ユーザ使用は、単語をユーザがクリッピングキーワードとして使用（指定）しているか否かを示しており、“０（未使用）”または“１（使用）”の値により示す。この単語情報データベース１３により、収集した文書内に存在する単語について、カテゴリ毎に出現頻度の傾向と推移を把握することができる。 FIG. 5 is a diagram illustrating a data structure of the word information database 13. The word information database 13 has items “a (k)”, “df (1)”, “update date”, and “user use” corresponding to each combination of “word” and “category ID”. Among these, “a (k)” indicates that n times of processing results are recorded by k = 1,..., N. Here, a (k) indicates the inclination of the document frequency converted for each unit section (section before k times and before k + 1 times). Document frequency refers to the number of documents in the category in which the word is detected. Moreover, this inclination is the increase / decrease number per unit section (for example, every day). df (1) is the document frequency at the previous processing. The update date is the execution date when the keyword proposal process was executed last time. User use indicates whether or not the word is used (specified) by the user as a clipping keyword, and is indicated by a value of “0 (unused)” or “1 (used)”. With this word information database 13, it is possible to grasp the tendency and transition of the appearance frequency for each category with respect to the words existing in the collected document.

図６は、提案検索式管理データベース１４のデータ構造を示す図である。提案検索式管理データベース１４は、ユーザＩＤに対応する検索式、提案日、状態フラグを示す各項目を有する。ユーザＩＤは、検索式が提案されたユーザを識別するＩＤである。検索式は、提案された検索式を示す。提案日は、検索式を提案した日付を示す。状態フラグは、ユーザに提案した検索式のステータスを１〜４の値を用いて表す。各ステータスは、ユーザに提案中の場合には“１”とし、ユーザが評価中の場合には“２”とし、ユーザが提案を採用した場合には“３”とし、ユーザが提案を採用しなかった（不採用）場合には“４”とする。また、状態フラグに“４（不採用）”が示されている項目は、一定時間で削除すればよい。 FIG. 6 is a diagram showing the data structure of the proposed search expression management database 14. The proposal search formula management database 14 has items indicating a search formula corresponding to the user ID, a proposal date, and a status flag. The user ID is an ID for identifying a user for whom a search formula has been proposed. The search formula indicates the proposed search formula. The proposal date indicates the date on which the search formula is proposed. The status flag represents the status of the search formula proposed to the user using values of 1 to 4. Each status is set to “1” when the user is proposing, “2” when the user is evaluating, “3” when the user adopts the proposal, and the user adopting the proposal. If not (not adopted), set to “4”. Further, an item whose status flag indicates “4 (not adopted)” may be deleted in a certain time.

コンテンツデータベース８は、文書と文書ＩＤの対応関係を含むデータを保持する。ここで、収集ロボットにより収集された文書が文書ＩＤにより管理される。
〈処理フロー〉
次に、装置１のキーワード・検索式提案処理部５により実行される処理の流れを説明する。まず、キーワード・検索式提案処理部５が実行する全体の流れについて図７を用いて
説明する。 The content database 8 holds data including correspondence between documents and document IDs. Here, the documents collected by the collecting robot are managed by the document ID.
<Processing flow>
Next, the flow of processing executed by the keyword / search formula suggestion processing unit 5 of the device 1 will be described. First, the overall flow executed by the keyword / search formula suggestion processing unit 5 will be described with reference to FIG.

《キーワード・検索式提案処理フロー》
図７は、キーワード・検索式提案処理を示すフローチャートである。図７に示すように、キーワード・検索式提案処理部５は、前処理、単語情報処理、検索式提案処理の順にユーザ２０に対するキーワード・検索式提案処理を実行する。《Keyword / search formula proposal processing flow》
FIG. 7 is a flowchart showing the keyword / search formula suggestion process. As shown in FIG. 7, the keyword / search formula suggestion processing unit 5 executes the keyword / search formula proposal processing for the user 20 in the order of pre-processing, word information processing, and search formula proposal processing.

キーワード・検索式提案処理部５は、前処理として収集した文書中の全単語について、文書頻度と増加量を集計しておく（Ｓ１０）。そして、キーワード・検索式提案処理部５は、ユーザ２０に対して検索式を提案しようとする場合に、単語情報処理に移る（Ｓ２０）。ここでは、例えば、キーワードＷに対して関連しそうなキーワード（単語）Ｘを抽出する。続いて、キーワード・検索式提案処理部５は、検索式提案処理として、例えば、キーワードＷと抽出したキーワードＸを組み合わせた検索式をユーザに提案するような処理を実行する（Ｓ３０）。以下、各処理ステップについてさらに詳しく説明する。 The keyword / search formula suggestion processing unit 5 aggregates the document frequency and the increase amount for all the words in the document collected as preprocessing (S10). Then, the keyword / search formula suggestion processing unit 5 proceeds to word information processing when trying to propose a search formula to the user 20 (S20). Here, for example, a keyword (word) X that is likely to be related to the keyword W is extracted. Subsequently, the keyword / search formula suggestion processing unit 5 executes, as the search formula proposal processing, for example, processing for proposing a search formula combining the keyword W and the extracted keyword X to the user (S30). Hereinafter, each processing step will be described in more detail.

《前処理フロー》
図８は、図７に示す前処理の詳細を示すフローチャートである。この前処理は、キーワード・検索式提案処理部５が主体となって、文書・単語対応リスト１０中の全単語に対して実行する単語情報データベース１３の更新処理である。なお、以下の説明において、“ｄ（ｔ，Ｃｍ，０）”は、処理日を示す。《Pre-processing flow》
FIG. 8 is a flowchart showing details of the preprocessing shown in FIG. This pre-process is an update process of the word information database 13 that is executed for all the words in the document / word correspondence list 10 mainly by the keyword / search formula suggestion processing unit 5. In the following description, “d (t, Cm, 0)” indicates a processing date.

まず、キーワード・検索式提案処理部５は、文書・単語対応リスト１０と分類情報データベース１２から“単語ｔ”の“カテゴリＣｋ”毎の文書頻度“ｄｆ（ｔ，Ｃｋ，０）”を集計する（Ｓ１１）。ここで、文書頻度にある“０”は、今回の処理であることを示している。 First, the keyword / search formula suggestion processing unit 5 totals the document frequency “df (t, Ck, 0)” for each “category Ck” of the “word t” from the document / word correspondence list 10 and the classification information database 12. (S11). Here, “0” in the document frequency indicates this processing.

続いて、キーワード・検索式提案処理部５は、単語情報データベース１３から“単語”と“カテゴリＩＤ”をキーとして、対応する文書頻度“ｄｆ（１）”と“更新日”を取得する（Ｓ１２）。ここで取得された文書頻度“ｄｆ（１）”と“更新日”は、以降の処理において、“ｄｆ（ｔ，Ｃｋ，１）”と“ｄ（ｔ，Ｃｋ，１）”に置換えられて用いられる。ここで、文書頻度にある“１”は、今回より１回前の処理で算出されたことを示している。 Subsequently, the keyword / search formula suggestion processing unit 5 acquires the corresponding document frequencies “df (1)” and “update date” from the word information database 13 using “word” and “category ID” as keys (S12). ). The document frequency “df (1)” and “update date” acquired here are replaced with “df (t, Ck, 1)” and “d (t, Ck, 1)” in the subsequent processing. Used. Here, “1” in the document frequency indicates that it was calculated in the process one time before this time.

続いて、キーワード・検索式提案処理部５は、今回の処理における文書頻度の傾きを示す値“ａ（０）”を計算する（Ｓ１３）。ａ（０）は、文書頻度の差分（（ｄｆ（ｔ，Ｃｋ，０）−ｄｆ（ｔ，Ｃｋ，１））を日付（更新日）の差分（（ｄ（ｔ，Ｃｋ，０）−ｄ（ｔ，Ｃｋ，１））で割り算することにより求める。 Subsequently, the keyword / search formula suggestion processing unit 5 calculates a value “a (0)” indicating the inclination of the document frequency in the current process (S13). a (0) is a difference in document frequency ((df (t, Ck, 0) −df (t, Ck, 1)) to a difference in date (update date) ((d (t, Ck, 0) −d It is obtained by dividing by (t, Ck, 1)).

続いて、キーワード・検索式提案処理部５は、単語情報データベース１３を更新する（Ｓ１４）。ここでは、求めたａ（０）をａ（１）に置換え、以降に続く値（ａ（１）、ａ（２））を１ずつずらしながら文書頻度の傾きを更新していく。また、Ｓ１２で取得した文書頻度“ｄｆ（１）”と“更新日”も同時に、Ｓ１１で集計した文書頻度と今回の日付に更新する。 Subsequently, the keyword / search formula suggestion processing unit 5 updates the word information database 13 (S14). Here, the obtained a (0) is replaced with a (1), and the gradient of the document frequency is updated while shifting subsequent values (a (1), a (2)) one by one. In addition, the document frequencies “df (1)” and “update date” acquired in S12 are simultaneously updated to the document frequency and the current date tabulated in S11.

以上のような処理により、単語情報データベース１３は、常に単語毎に最新のＮ個のデータを保持することができる。
《単語情報処理フロー》
図９は、図７に示す単語情報処理の詳細を示すフローチャートである。この単語情報処理は、ユーザ２０が登録したキーワードに対して、提案するキーワードを検索する処理である。即ち、ユーザ２０が登録したキーワードに対する新たなキーワードを抽出する処理
である。ここでは、ユーザ２０が登録したキーワードをＷとして説明する。 Through the processing as described above, the word information database 13 can always hold the latest N pieces of data for each word.
《Word information processing flow》
FIG. 9 is a flowchart showing details of the word information processing shown in FIG. This word information processing is a process of searching for a keyword to be proposed for a keyword registered by the user 20. That is, it is a process of extracting a new keyword for the keyword registered by the user 20. Here, the keyword registered by the user 20 is described as W.

まず、キーワード・検索式提案処理部５は、単語情報データベース１３から“単語”と“カテゴリＩＤ”をキーとして、対応する文書頻度の傾き“ａ（１）、ａ（２）・・・ａ（ｎ）”を取得する（Ｓ２１）。取得した文書頻度の傾き“ａ（１）、ａ（２）・・・ａ（ｎ）”は、“ａ（Ｗ，Ｃｋ，１）、ａ（Ｗ，Ｃｋ，２）・・・ａ（Ｗ，Ｃｋ，ｎ）”に置換えられて用いられる。 First, the keyword / search formula suggestion processing unit 5 uses the “word” and the “category ID” as keys from the word information database 13 and the corresponding document frequency gradients “a (1), a (2)... A ( n) "is acquired (S21). The inclinations “a (1), a (2)... A (n)” of the acquired document frequencies are “a (W, Ck, 1), a (W, Ck, 2). , Ck, n) ".

続いて、キーワード・検索式提案処理部５は、単語情報データベース１３から同じ“カテゴリＩＤ”をキーとして、未だ処理されていない単語ｔについても同様に、対応する文書頻度の傾きを取得する（Ｓ２２）。このステップは、同じカテゴリに属する全ての単語について実行される。また、取得した文書頻度の傾き“ａ（１）、ａ（２）・・・ａ（ｎ）”は、“ａ（ｔ，Ｃｋ，１）、ａ（ｔ，Ｃｋ，２）・・・ａ（ｔ，Ｃｋ，ｎ）”に置換えられて用いられる。 Subsequently, the keyword / search formula suggestion processing unit 5 similarly obtains a corresponding document frequency gradient for the word t that has not been processed from the word information database 13 by using the same “category ID” as a key (S22). ). This step is performed for all words belonging to the same category. The acquired document frequency gradients “a (1), a (2)... A (n)” are “a (t, Ck, 1), a (t, Ck, 2). (T, Ck, n) "is used as a replacement.

続いて、キーワード・検索式提案処理部５は、キーワードＷと単語ｔの文書頻度の傾きの類似度を計算する（Ｓ２３）。この類似度は、一定区間における差分の絶対値（例えば、（｜ａ（ｗ，Ｃｋ，１）−ａ（ｔ，Ｃｋ，１）｜）を加算していくことにより求まる。 Subsequently, the keyword / search formula suggestion processing unit 5 calculates the similarity of the gradient of the document frequency between the keyword W and the word t (S23). This similarity is obtained by adding the absolute value of the difference in a certain section (for example, (| a (w, Ck, 1) −a (t, Ck, 1) |)).

続いて、キーワード・検索式提案処理部５は、求めた類似度の値が閾値以下であるか否かを判断する（Ｓ２４）。ここで、類似度の値が閾値以下である場合には、キーワード・検索式提案処理部５は、単語ｔを提案するキーワードＸとして検索式提案処理を行う（Ｓ２５）。 Subsequently, the keyword / search formula suggestion processing unit 5 determines whether or not the obtained similarity value is equal to or less than a threshold value (S24). Here, if the similarity value is equal to or smaller than the threshold value, the keyword / search formula suggestion processing unit 5 performs the search formula proposal processing as the keyword X for proposing the word t (S25).

そして、キーワード・検索式提案処理部５は、全ての単語についてキーワードを検索する処理が完了したか否かを判断する（Ｓ２６）。ここで、全ての単語について処理が完了している場合には、処理を終了する（Ｓ２７）。一方、全ての単語について処理が完了していない場合には、Ｓ２２に戻り、全ての単語について処理が完了するまで同様の処理を実行する。 Then, the keyword / search formula suggestion processing unit 5 determines whether or not the process of searching for keywords for all words has been completed (S26). Here, if the processing has been completed for all the words, the processing ends (S27). On the other hand, if the processing has not been completed for all words, the process returns to S22, and the same processing is executed until the processing is completed for all words.

また、Ｓ２４において、類似度の値が閾値以下でない場合には、全ての単語についてキーワードを検索する処理が完了したか否かを判断する処理（Ｓ２６）に移る。
以上のような処理により、ユーザ２０が登録したあるキーワードに対して、同じ分野に属し、且つ一定区間の出現頻度からなる出現推移が類似する単語を新たなキーワードとして抽出することができる。このようにして、ユーザ２０が登録したキーワードに対して、時間的な変化によって関連性を持つようになった語句を抽出することができる。 If the similarity value is not less than or equal to the threshold value in S24, the process proceeds to a process (S26) for determining whether or not the process of searching for keywords for all words is completed.
Through the processing as described above, it is possible to extract a word that belongs to the same field and has a similar appearance transition with an appearance frequency in a certain section as a new keyword for a keyword registered by the user 20. In this way, it is possible to extract a phrase that has become related to the keyword registered by the user 20 due to temporal changes.

《検索式提案処理フロー》
図１０は、図７に示す検索式提案処理の詳細を示すフローチャートである。この検索式提案処理は、単語情報処理において抽出したキーワードと検索に用いたキーワード（ユーザによって登録されているキーワード）を組み合わせた検索式をユーザ２０に対して提案する処理である。《Search formula suggestion processing flow》
FIG. 10 is a flowchart showing details of the search formula suggestion process shown in FIG. This search formula proposing process is a process for proposing to the user 20 a search formula that is a combination of the keywords extracted in the word information processing and the keywords used in the search (keywords registered by the user).

まず、キーワード・検索式提案処理部５は、文書・単語対応リスト１０からキーワードＷの文書ＩＤリスト“Ｌ（Ｗ）”とキーワードＸの文書ＩＤリスト“Ｌ（Ｘ）”を取得する（Ｓ３１）。 First, the keyword / search formula suggestion processing unit 5 acquires the document ID list “L (W)” of the keyword W and the document ID list “L (X)” of the keyword X from the document / word correspondence list 10 (S31). .

続いて、キーワード・検索式提案処理部５は、取得した文書ＩＤリスト“Ｌ（Ｗ）”と“Ｌ（Ｘ）”の一致率を求める（Ｓ３２）。この一致率は、例えば、２つの文書ＩＤリストに存在する文書ＩＤの論理積の絶対値を２乗した値を、各文書ＩＤリストに存在する文
書ＩＤの個数の絶対値を掛け合わせた値で割り算することによって求めればよい。言い換えれば、キーワードＷとキーワードＸの双方を含む文書（文書ＩＤ）の数を２乗した値を、キーワードＷとキーワードＸの何れかを含む文書（文書ＩＤ）の数の積で割り算する。例えば、図２では、単語“狂牛病”と“武部”の文書ＩＤリストにおいて、文書ＩＤ“１”と“１０５”の２つが一致しているため論理積は２となる。このようにして、２つの文書ＩＤリストが重なっている比率を求めた値から把握することができる。 Subsequently, the keyword / search formula suggestion processing unit 5 obtains a matching rate between the acquired document ID lists “L (W)” and “L (X)” (S32). This matching rate is, for example, a value obtained by multiplying a value obtained by squaring the absolute value of the logical product of document IDs existing in two document ID lists and the absolute value of the number of document IDs existing in each document ID list. What is necessary is just to obtain | require by dividing. In other words, a value obtained by squaring the number of documents (document ID) including both the keyword W and the keyword X is divided by the product of the number of documents (document ID) including either the keyword W or the keyword X. For example, in FIG. 2, in the document ID list of the words “mad cow disease” and “Takebe”, the document ID “1” and “105” match, so the logical product is 2. In this way, the ratio at which the two document ID lists overlap can be grasped from the obtained value.

続いて、キーワード・検索式提案処理部５は、求めた一致率の値が閾値以上であるか否かを判断する（Ｓ３３）。ここで、一致率が閾値以上であった場合には、提案する検索式をキーワードＷとキーワードＸの論理積（ＷａｎｄＸ）からなる検索式Ｙに決定する（Ｓ３５）。即ち、単語情報処理により抽出したキーワードのうち、ユーザ２０が登録してあるキーワードが含まれる文書によく出現するキーワードは、関連語であると考えられるので、“ユーザが登録してあるキーワード”ａｎｄ“提案する関連語”という検索式を提案することになる。ここで、関連語であるとみなされたキーワードは、絞込条件として利用される。 Subsequently, the keyword / search formula suggestion processing unit 5 determines whether or not the obtained matching rate value is equal to or greater than a threshold value (S33). If the matching rate is equal to or greater than the threshold, the proposed search formula is determined as a search formula Y consisting of the logical product (W and X) of the keyword W and the keyword X (S35). That is, among keywords extracted by word information processing, keywords that frequently appear in a document including a keyword registered by the user 20 are considered to be related words, and therefore, “a keyword registered by the user” and A search expression “related words to propose” is proposed. Here, keywords that are regarded as related words are used as narrowing conditions.

一方、一致率が閾値以上でない場合には、提案する検索式をキーワードＷとキーワードＸの論理和（ＷｏｒＸ）からなる検索式Ｙに決定する（Ｓ３６）。即ち、単語情報処理により抽出したキーワードのうち、ユーザ２０が登録してあるキーワードが含まれていない文書によく出現するキーワードは、ユーザの登録した語の同義語である可能性があるので、“ユーザが登録してあるキーワード”ｏｒ“提案するキーワード”という検索式を提案することになる。 On the other hand, if the matching rate is not equal to or higher than the threshold, the proposed search formula is determined as a search formula Y consisting of the logical sum (W or X) of the keyword W and the keyword X (S36). That is, among keywords extracted by word information processing, a keyword that frequently appears in a document that does not include the keyword registered by the user 20 may be a synonym of the word registered by the user. A search expression of a keyword “or“ proposed keyword ”registered by the user is proposed.

また、Ｓ３５に移る前に、提案するキーワードＸが予め登録されているキーワードＷの同義語（言い換え）であるか否かを判断する（Ｓ３４）。この判断は、例えば、予めデータベースなどに同義語を登録しておき（例えば、同義語辞書）、提案するキーワードが登録されているか否かを判断すればよい。ここで、提案するキーワードＸが予め登録されているキーワードＷの同義語（言い換え）でない場合には、論理積（ＷａｎｄＸ）からなる検索式Ｙを提案する処理を継続する。一方、提案するキーワードＸが予め登録されているキーワードＷの同義語（言い換え）である場合には、検索式を提案する処理を実行しない。 Further, before proceeding to S35, it is determined whether or not the proposed keyword X is a synonym (paraphrase) of the keyword W registered in advance (S34). For this determination, for example, a synonym is registered in advance in a database or the like (for example, a synonym dictionary), and it is determined whether or not a keyword to be proposed is registered. Here, when the keyword X to be proposed is not a synonym (paraphrase) of the keyword W registered in advance, the process of proposing the search expression Y composed of the logical product (W and X) is continued. On the other hand, when the keyword X to be proposed is a synonym (paraphrase) of the keyword W registered in advance, the process of proposing a search expression is not executed.

ユーザに対して何らかの検索式を提案する場合、キーワード・検索式提案処理部５は、提案する検索式Ｙを提案検索式管理データベース１４へ登録して、状態フラグを“１（提案中）”に設定する（Ｓ３７）。 When suggesting any search formula to the user, the keyword / search formula proposal processing unit 5 registers the proposed search formula Y in the proposal search formula management database 14 and sets the status flag to “1 (proposed)”. Set (S37).

以上のような処理により、単に抽出したキーワードを関連語として提案するのではなく、ユーザ２０が予め登録したキーワードに対する組み合わせを検索式として提案することができる。 Through the processing as described above, it is possible to propose a combination for a keyword registered in advance by the user 20 as a search expression, instead of simply proposing the extracted keyword as a related word.

また、論理積からなる検索式を提案する前に、同義語（言い換え）をはじく（Ｓ３４）ことにより、単なるキーワードの言い換えではなく、そのキーワードに対して時間の経過と共に関連性を持つようになった語句を関連語として提案することができる。 In addition, by suggesting synonyms (paraphrases) before proposing a search expression composed of logical products (S34), the keywords are not simply paraphrased but become related to the keywords as time passes. Can be proposed as related words.

《検索式提案管理処理フロー》
図１１は、図７に示す検索式提案管理処理の詳細を示すフローチャートである。この検索式提案管理処理は、提案検索式管理データベース１４に新たに提案する検索式が登録された場合にキーワード提案管理処理部６が主体となって実行する処理である。《Search formula suggestion management processing flow》
FIG. 11 is a flowchart showing details of the search formula suggestion management process shown in FIG. This search formula proposal management process is a process executed mainly by the keyword proposal management processing unit 6 when a search formula to be newly proposed is registered in the proposal search formula management database 14.

まず、キーワード提案管理処理部６は、検索式Ｙを該当するユーザ２０に対して提案す
る（Ｓ４１）。即ち、キーワード提案管理処理部６は、提案検索式管理データベース１４から検索式Ｙに対応するユーザＩＤを持つユーザ２０と特定し、そのユーザ２０に対して検索式Ｙを含む提案通知を送信（出力）する。以下、特定されたユーザ２０をユーザＡとして説明する。 First, the keyword proposal management processing unit 6 proposes the search formula Y to the corresponding user 20 (S41). That is, the keyword proposal management processing unit 6 identifies the user 20 having the user ID corresponding to the search formula Y from the proposal search formula management database 14 and transmits a proposal notification including the search formula Y to the user 20 (output) ) Hereinafter, the identified user 20 will be described as a user A.

提案通知を受けたユーザＡは、その通知に含まれる検索式Ｙを採用する否かを判断して、ユーザ端末への入力操作により採用または不採用を入力する（Ｓ４２）。ユーザＡの入力結果は、提案通知に対する応答通知として装置１に対して送信される。 Upon receiving the proposal notification, the user A determines whether or not to adopt the search expression Y included in the notification, and inputs adoption or non-adoption by an input operation to the user terminal (S42). The input result of the user A is transmitted to the device 1 as a response notification for the proposal notification.

装置１のキーワード提案管理処理部６は、応答通知を受信すると、その内容から提案結果が採用であるか否かを判断する（Ｓ４３）。ここで、提案結果が採用であった場合には、キーワード提案管理処理部６は、提案検索式管理データベース１４の該当するレコード（ユーザＡのレコード）の状態フラグを“２（評価中）”に更新する（Ｓ４４）。 Upon receiving the response notification, the keyword proposal management processing unit 6 of the device 1 determines whether or not the proposal result is adopted from the content (S43). Here, if the proposal result is adopted, the keyword proposal management processing unit 6 sets the status flag of the corresponding record (the record of the user A) in the proposal retrieval formula management database 14 to “2 (under evaluation)”. Update (S44).

さらに、キーワード提案管理処理部６は、ユーザＡに対してクリッピングサービスを実行する際のキーワード条件に検索式Ｙを追加する（Ｓ４５）。即ち、キーワード提案管理処理部６は、クリッピング処理システム７に検索式Ｙを登録する。 Further, the keyword proposal management processing unit 6 adds the search formula Y to the keyword condition for executing the clipping service for the user A (S45). That is, the keyword proposal management processing unit 6 registers the search expression Y in the clipping processing system 7.

検索式Ｙを登録してから一定期間経過した後、キーワード提案管理処理部６は、ユーザＡに対してその検索式Ｙを継続して利用するか否かを確認する（Ｓ４６）。ここでは、継続して利用するか否かを確認する確認通知がユーザＡに対して送信される。 After a certain period of time has elapsed since the search expression Y was registered, the keyword proposal management processing unit 6 confirms with the user A whether or not to continue using the search expression Y (S46). Here, a confirmation notification for confirming whether or not to continue using is transmitted to the user A.

確認通知を受けたユーザＡは、その通知に含まれる検索式Ｙの利用を継続する否かを判断して、ユーザ端末への入力操作により終了または継続を入力する（Ｓ４７）。ユーザＡの入力結果は、確認通知に対する応答通知として装置１に対して送信される。 Receiving the confirmation notification, user A determines whether or not to continue using the search expression Y included in the notification, and inputs end or continuation by an input operation to the user terminal (S47). The input result of the user A is transmitted to the device 1 as a response notification to the confirmation notification.

装置１のキーワード提案管理処理部６は、応答通知を受信すると、その内容から確認結果が継続であるか否かを判断する（Ｓ４８）。ここで、確認結果が継続であった場合には、キーワード提案管理処理部６は、提案検索式管理データベース１４の該当するレコードの状態フラグを“３（採用）”に更新する（Ｓ４９）。一方、確認結果が継続でなかった場合には、キーワード提案管理処理部６は、クリッピング処理システムに登録されている検索式Ｙを削除する（Ｓ５０）。同時に、キーワード提案管理処理部６は、提案検索式管理データベース１４の該当するレコードの状態フラグを“４（不採用）”にする（Ｓ５１）。 Upon receiving the response notification, the keyword proposal management processing unit 6 of the device 1 determines whether or not the confirmation result is continued from the content (S48). Here, if the confirmation result is continuation, the keyword proposal management processing unit 6 updates the status flag of the corresponding record in the proposal retrieval formula management database 14 to “3 (adopted)” (S49). On the other hand, if the confirmation result is not continuation, the keyword proposal management processing unit 6 deletes the search expression Y registered in the clipping processing system (S50). At the same time, the keyword proposal management processing unit 6 sets the status flag of the corresponding record in the proposal retrieval formula management database 14 to “4 (not adopted)” (S51).

また、Ｓ４３において、提案結果が採用でなかった場合にも、キーワード提案管理処理部６は、提案検索式管理データベース１４の該当するレコードの状態フラグを“４（不採用）”にする（Ｓ４４）。 In S43, even if the proposal result is not adopted, the keyword proposal management processing unit 6 sets the status flag of the corresponding record in the proposal retrieval formula management database 14 to “4 (not adopted)” (S44). .

以上のような処理により、ユーザ２０が予め登録してあるキーワードに対して、抽出したキーワードを組み合わせて検索式として提案することができる。また、提案した検索式がユーザ２０により採用された場合に、その検索式から得られる時間的な変化を考慮したキーワードを利用してクリッピングサービスを実施することができる。 Through the processing as described above, keywords extracted by the user 20 can be combined with the extracted keywords and proposed as a search expression. In addition, when the proposed search formula is adopted by the user 20, the clipping service can be implemented using a keyword that takes into account temporal changes obtained from the search formula.

また、提案した検索式が採用された場合であっても、一定時間経過後に、その検索式を継続して利用するか否かをユーザに確認することにより、キーワードのメンテナンスをサービス提供側から促して実施することができる。 Even if the proposed search formula is adopted, after a certain period of time, the service provider encourages the maintenance of keywords by asking the user whether to continue using the search formula. Can be implemented.

本発明によれば、新出語・関連語・検索式を提案することができるため、ユーザの知識を補完してキーワード・検索式をよりよいものに更新していくことができる。このため、
ユーザが持つ知識の範囲外のキーワードを用いてクリッピングサービスを実施することができる。 According to the present invention, new words / related words / search formulas can be proposed, so that the user's knowledge can be complemented and the keywords / search formulas can be updated to better ones. For this reason,
A clipping service can be implemented using keywords outside the range of knowledge the user has.

本発明によれば、時系列により発生する情報を利用して新しく関連性を持つようになった語句を関連語として提案することができるため、キーワードの陳腐化を防ぐことができる。このため、例えば、新しく関連性を持つようになった語句を用いて、ニュースなどの注目すべき新しいトピックスをいち早くクリッピングすることが可能となる。 According to the present invention, it is possible to propose a phrase having a new relevance using information generated in a time series as a related word, and thus it is possible to prevent the keyword from becoming obsolete. For this reason, for example, it is possible to quickly clip a new topic to be noticed such as news by using a word having a new relationship.

〈変形例〉
上述した実施形態では、Ｓ３５において提案する検索式を決定する前に、提案するキーワードＸが予め登録されているキーワードＷの同義語（言い換え）であるか否かを判断した。しかしながら、本システムは、Ｓ３５において提案する検索式Ｙを決定する前に、同義語であるか否かの判断を行わなくてもよい。 <Modification>
In the embodiment described above, it is determined whether or not the proposed keyword X is a synonym (paraphrase) of the keyword W registered in advance before determining the search expression to be proposed in S35. However, this system does not need to determine whether or not it is a synonym before determining the search expression Y proposed in S35.

このような構成を採ることにより、本システムでは、本来同義語として扱われる語句であっても、ユーザによっては同義語として考えていないような語句を同義語として除外しないようにすることができる。このため、本システムでは、ユーザ毎に異なる同義語の認識の違いに対応することができる。 By adopting such a configuration, in this system, even if a phrase is originally treated as a synonym, a phrase that is not considered as a synonym by a user can be prevented from being excluded as a synonym. For this reason, in this system, it can respond to the difference in recognition of a synonym different for every user.

〈その他〉
本発明は、以下のように特定することができる。
（付記１）文書をカテゴリに分類する手段と、
前記文書に含まれる単語に対して同一カテゴリ中でのその単語が含まれている文書の数である文書頻度を算出する手段と、
前記文書頻度の所定期間の推移を記録する手段と、
同一カテゴリ中で、第１の単語と類似する文書頻度の推移を有する第２の単語を、前記第１の単語の関連語として抽出する手段と、
を備えるサーバ。
（付記２）前記第１の単語またはその関連語が含まれる文書の数に対して、第１の単語及びその関連語の双方が含まれる文書の数の比率である一致度を求める手段と、
前記一致度に応じて前記第１の単語とその関連語とを含む検索式を出力する手段と、
を備える付記１に記載のサーバ。
（付記３）前記出力する手段は、前記一致度が所定値に達する場合に、第１の単語とその関連語の論理積からなる検索式を出力し、前記一致度が所定値に達しない場合に、第１の単語とその関連語の論理和からなる検索式を出力する
付記２に記載のサーバ。
（付記４）前記出力する手段は、前記一致度が所定値に達する場合であっても、第１の単語とその関連語が予め登録されている同義語に該当する場合には、第１の単語とその関連語の論理積からなる検索式を出力しない
付記３に記載のサーバ。
（付記５）前記第１の単語の登録をユーザから受け付ける手段をさらに備え、
前記出力する手段は、前記第１の単語を登録したユーザに対して検索式を含む通知を送信する手段と、
前記送信した通知に対して、ユーザから検索式の採用または不採用を含む応答を受け付ける手段と、
前記応答に採用が含まれている場合に、一定期間経過後に前記検索式を継続して利用するか否かを確認する通知を送信する手段と、
を有する付記１に記載のサーバ。
（付記６）前記第１の単語は、前記サーバに予め登録された登録単語である付記１〜５の何れかに記載のサーバ。
（付記７）前記推移は、単位期間における増減率である付記１に記載のサーバ。
（付記８）コンピュータが、
文書をカテゴリに分類するステップと、
前記文書に含まれる単語に対して同一カテゴリ中でのその単語が含まれている文書の数である文書頻度を算出するステップと、
前記文書頻度の所定期間の推移を記録するステップと、
同一カテゴリ中で、第１の単語と類似する文書頻度の推移を有する第２の単語を、前記第１の単語の関連語として抽出するステップと、
を実行する関連語提案方法。
（付記９）前記コンピュータは、
前記第１の単語またはその関連語が含まれる文書の数に対して、第１の単語及びその関連語の双方が含まれる文書の数の比率である一致度を求めるステップと、
前記一致度に応じて前記第１の単語とその関連語とを含む検索式を出力するステップと、を実行する付記８に記載の関連語提案方法。
（付記１０）前記コンピュータは、
前記出力するステップにおいて、前記一致度が所定値に達する場合に、第１の単語とその関連語の論理積からなる検索式を出力し、前記一致度が所定値に達しない場合に、第１の単語とその関連語の論理和からなる検索式を出力する
付記９に記載の関連語提案方法。
（付記１１）前記コンピュータは、
前記出力するステップにおいて、前記一致度が所定値に達する場合であっても、第１の単語とその関連語が予め登録されている同義語に該当する場合には、第１の単語とその関連語の論理積からなる検索式を出力しない
付記１０に記載の関連語提案方法。
（付記１２）前記コンピュータは、前記第１の単語の登録をユーザから受け付けて、
前記出力するステップにおいて、前記第１の単語を登録したユーザに対して検索式を含む通知を送信するステップと、
前記送信した通知に対して、ユーザから検索式の採用または不採用を含む応答を受け付けるステップと、
前記応答に採用が含まれている場合に、一定期間経過後に前記検索式を継続して利用するか否かを確認する通知を送信するステップと、
をさらに実行する付記１１に記載の関連語提案方法。

<Others>
The present invention can be specified as follows.
And means for classifying (Appendix 1) documents into categories,
Means for calculating a document frequency is the number of documents that contain that word in the same categories for words included in the document,
Means for recording a transition of the document frequency over a predetermined period;
Means for extracting a second word having a document frequency transition similar to the first word in the same category as a related word of the first word;
A server comprising
(Supplementary Note 2) The first word or for the number of documents that the related word is included, the first word and the associated word means both seeking coincidence degree is the number ratio of the document that contains the When,
Means for outputting a search expression including the first word and its related word according to the degree of coincidence;
The server according to appendix 1, comprising:
(Supplementary Note 3) When the degree of coincidence reaches a predetermined value, the outputting means outputs a search expression composed of a logical product of the first word and its related word, and the degree of coincidence does not reach the predetermined value. The server according to appendix 2, which outputs a search expression comprising a logical sum of the first word and its related word.
(Additional remark 4) Even if it is a case where the said coincidence reaches a predetermined value, when the said means to output corresponds to a synonym with which the 1st word and its related word are registered beforehand, the 1st word The server according to supplementary note 3, which does not output a search expression composed of a logical product of a word and its related word.
(Supplementary Note 5) Further comprising means for receiving registration of the first word from the user,
The means for outputting means for transmitting a notification including a search expression to a user who has registered the first word;
Means for accepting a response including adoption or non-adoption of a search expression from the user in response to the transmitted notification;
Means for transmitting a notification for confirming whether or not to continue using the search formula after elapse of a fixed period when adoption is included in the response;
The server according to appendix 1, wherein
(Supplementary note 6) The server according to any one of supplementary notes 1 to 5, wherein the first word is a registered word registered in advance in the server.
(Supplementary note 7) The server according to supplementary note 1, wherein the transition is an increase / decrease rate in a unit period.
(Appendix 8) The computer
The method comprising the steps of: classifying documents into categories,
Calculating a document frequency is the number of documents that contain that word in the same categories for words included in the document,
Recording the transition of the document frequency over a predetermined period;
Extracting a second word having a document frequency transition similar to the first word in the same category as a related word of the first word;
The related word suggestion method to execute.
(Supplementary note 9)
And determining the relative first word or the number of documents that related word is contained, the degree of coincidence is the ratio of the number of first words and documents that contain both the relevant language,
The related word suggestion method according to appendix 8, wherein a step of outputting a search expression including the first word and its related word according to the degree of coincidence is executed.
(Appendix 10) The computer
In the outputting step, when the degree of coincidence reaches a predetermined value, a search expression consisting of a logical product of the first word and its related word is output, and when the degree of coincidence does not reach the predetermined value, the first The related word proposal method according to supplementary note 9, wherein a search expression including a logical sum of the word and the related word is output.
(Supplementary Note 11) The computer
In the outputting step, even when the degree of coincidence reaches a predetermined value, if the first word and its related word are equivalent to pre-registered synonyms, the first word and its related word The related word suggestion method according to supplementary note 10, wherein a search expression consisting of a logical product of words is not output.
(Additional remark 12) The said computer receives registration of the said 1st word from a user,
In the step of outputting, the step of transmitting a notification including a search expression to a user who has registered the first word;
In response to the transmitted notification, receiving a response including adoption or non-adoption of a search expression from a user;
A step of sending a notification for confirming whether or not to continue to use the search formula after a certain period of time when adoption is included in the response;
The related word suggestion method according to attachment 11, further executing:

本発明は、リアルタイムな情報を提供するサービス等、各種情報機器を利用したサービス産業に利用可能である。 The present invention is applicable to a service industry using various information devices such as a service that provides real-time information.

図１は、本情報システムのシステム構成を示す図である。FIG. 1 is a diagram showing a system configuration of the information system. 図２は、文書・単語対応リスト１０のデータ構造を示す図である。FIG. 2 is a diagram illustrating a data structure of the document / word correspondence list 10. 図３は、カテゴリマスタデータベース１１のデータ構造を示す図である。FIG. 3 is a diagram showing the data structure of the category master database 11. 図４は、分類情報データベース１２のデータ構造を示す図である。FIG. 4 is a diagram showing the data structure of the classification information database 12. 図５は、単語情報データベース１３のデータ構造を示す図である。FIG. 5 is a diagram illustrating a data structure of the word information database 13. 図６は、提案検索式管理データベース１４のデータ構造を示す図である。FIG. 6 is a diagram showing the data structure of the proposed search expression management database 14. 図７は、キーワード・検索式提案処理を示すフローチャートである。FIG. 7 is a flowchart showing the keyword / search formula suggestion process. 図８は、図７に示す前処理の詳細を示すフローチャートである。FIG. 8 is a flowchart showing details of the preprocessing shown in FIG. 図９は、図７に示す単語情報処理の詳細を示すフローチャートである。FIG. 9 is a flowchart showing details of the word information processing shown in FIG. 図１０は、図７に示す検索式提案処理の詳細を示すフローチャートである。FIG. 10 is a flowchart showing details of the search formula suggestion process shown in FIG. 図１１は、図７に示す検索式提案管理処理の詳細を示すフローチャートである。FIG. 11 is a flowchart showing details of the search formula suggestion management process shown in FIG. 図１２は、キーワードの一定期間ごとの出現頻度をグラフ化した一例を示す図である。FIG. 12 is a diagram illustrating an example in which the appearance frequency of a keyword for each certain period is graphed.

Explanation of symbols

１装置
２コンテンツ収集システム
３コンテンツ分類処理部
４検索インデックス作成部
５キーワード・検索式提案処理部
６キーワード提案管理処理部
７クリッピング処理システム
８コンテンツデータベース
９検索データベース
１０文書・単語対応リスト
１１カテゴリマスタデータベース
１２分類情報データベース
１３単語情報データベース
１４提案検索式管理データベース
２０ユーザ DESCRIPTION OF SYMBOLS 1 Apparatus 2 Content collection system 3 Content classification processing part 4 Search index creation part 5 Keyword / search formula proposal processing part 6 Keyword proposal management processing part 7 Clipping processing system 8 Content database 9 Search database 10 Document / word correspondence list 11 Category master database 12 Classification information database 13 Word information database 14 Proposed search expression management database 20 User

Claims

Based on the word contained in the document, using a classification dictionary that shows the relationship between the document, the word contained in the document, and the field to which the document belongs, by the relationship between the word and the field of the word. , means for classifying the documents in the category,
Means for calculating a document frequency for each unit interval is the number of documents that contain that word in the same categories for words included in the document,
And means for recording the estimated transfer of the document frequency in a plurality of the unit sections in the word information database,
Means for referring to the word information database and extracting a second word having a document frequency transition similar to the first word in the same category as a related word of the first word;
A server comprising

For the number of the first word or document that related word is contained, it means for determining the degree of matching is the number ratio of the document that contains both the first word and its related terms,
Means for outputting a search expression including the first word and its related word according to the degree of coincidence;
The server according to claim 1, comprising:

Means for accepting registration of the first word from a user;
The means for outputting means for transmitting a notification including a search expression to a user who has registered the first word;
Means for accepting a response including adoption or non-adoption of a search expression from the user in response to the transmitted notification;
Means for transmitting a notification for confirming whether or not to continue using the search formula after elapse of a fixed period when adoption is included in the response;
The server according to claim 1.

Computer
Based on the word contained in the document, using a classification dictionary that shows the relationship between the document, the word contained in the document, and the field to which the document belongs, by the relationship between the word and the field of the word. , the method comprising the steps of: classifying the documents in the category,
Calculating a document frequency is the number of documents that contain that word in the same category for each unit interval for the word contained in the document,
And recording the estimated transfer of the document frequency in a plurality of the unit sections in the word information database,
Referring to the word information database and extracting a second word having a document frequency transition similar to the first word in the same category as a related word of the first word;
The related word suggestion method to execute.

The computer
And determining the relative first word or the number of documents that related word is contained, the degree of coincidence is the ratio of the number of first words and documents that contain both the relevant language,
Outputting a search expression including the first word and its related word according to the degree of coincidence;
The related word suggestion method according to claim 4, wherein: