JP2003006221A

JP2003006221A - Predictive analysis type retrieval system, predictive analysis type retrieval method, and computer program

Info

Publication number: JP2003006221A
Application number: JP2001185913A
Authority: JP
Inventors: Masakatsu Morii; 昌克森井
Original assignee: KANDA TOYOMI
Current assignee: KANDA TOYOMI
Priority date: 2001-06-20
Filing date: 2001-06-20
Publication date: 2003-01-10

Abstract

PROBLEM TO BE SOLVED: To provide a technology by which the information a user desires to have can be accurately collected from flood of information on the Internet. SOLUTION: A predictive analysis type retrieval system is provided with a keyword creation means which creates keywords for contents retrieval on the Internet and scores the importance of keywords, a contents evaluation means which scores the text data in the contents in accordance with the keyword or the like, and a URL collection means which collects URL where user's desiring contents are predicted to appear by using the importance of keywords scored by the above keyword creation means and the contents scored by the contents evaluation means.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、インターネット上
に氾濫する情報の中から所望する情報、およびその情報
を掲載しているＵＲＬを効率的に収集する技術に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for efficiently collecting desired information from the flood of information on the Internet and URLs carrying the information.

【０００２】[0002]

【従来の技術】近年、インターネットのユーザが急増し
ており、閲覧できるページも毎日増加している。インタ
ーネットユーザが欲する情報へアクセスする手段として
は、まず、情報検索を行わなければならない。一般的に
は、検索サイトへアクセスし、自分が望む情報に関連す
るキーワードを入力する。キーワードを入力された検索
サイトにおいては、そのキーワードをテキストデータと
して含むサイトを抽出するという直接的かつ単純な方法
が、主だった検索サイトでは採用されている。2. Description of the Related Art In recent years, the number of users of the Internet has increased rapidly, and the number of pages that can be browsed has also increased daily. In order to access the information desired by Internet users, information retrieval must first be performed. Generally, you go to a search site and enter keywords relevant to the information you want. In the search site where the keyword is input, a direct and simple method of extracting the site including the keyword as text data is adopted in the main search sites.

【０００３】キーワードを含むサイトを抽出する方法
は、「登録制」と「ロボット型」とに分類される。登録
制とは、検索サイトへホームページ作成者が登録し、あ
るいは検索サイトのスタッフが登録し、その登録された
サイトの中から、キーワードを含むサイトを抽出する方
法である。ロボット型とは、世界中のウェブサイトをロ
ボットが自動巡回し、検索サイトのデータベースへ登録
し、その登録に基づいてキーワードを含むサイトを抽出
する方法である。ロボット型においては、巡回したウェ
ブサイトが更新頻度の高いウェブサイトである場合、キ
ーワードが入力された場合に更新頻度の高いサイトを優
先して抽出するようにしたものもある。Methods for extracting a site containing a keyword are classified into "registration system" and "robot type". The registration system is a method in which a home page creator is registered in a search site or staff of the search site is registered and a site including a keyword is extracted from the registered sites. The robot type is a method in which a robot automatically circulates websites around the world, registers it in a database of search sites, and extracts sites containing keywords based on the registration. In the robot type, when the visited website is a website with a high update frequency, when a keyword is input, a site with a high update frequency is preferentially extracted.

【０００４】[0004]

【発明が解決しようとする課題】インターネット上から
入手できる情報は玉石混合となっている。このため、
「玉」の情報を取捨選択する方法に関して、様々な工夫
が見られる。例えば、上述した「更新頻度の高いウェブ
サイト」が優先して抽出されるというロボット型は、
「玉」＝「新鮮」と置き換えた検索サイトといえる。し
かし、検索サイトのユーザにとって、新鮮な情報が欲す
る情報であるとは限らない。そこで、本発明が解決しよ
うとする課題は、インターネット上に反乱する情報の中
からユーザが欲する情報を的確に収集することができる
技術を提供することにある。The information available on the Internet is cobblestone mixed. For this reason,
There are various innovations in the method of selecting information on "balls". For example, the robot type in which the above-mentioned "Websites with high update frequency" is preferentially extracted is
It can be said to be a search site in which "tama" = "fresh" is replaced. However, fresh information is not always what the search site user wants. Then, the subject which this invention tends to solve is providing the technique which can collect the information which a user wants exactly from the information which rebels on the internet.

【０００５】[0005]

【課題を解決するための手段】本発明は、検索サイトの
運営者のサーバへ設定し、検索サイトのユーザが活用す
ることによって、上記した目的を達成するものである。The present invention achieves the above-mentioned object by setting it in the server of the operator of the search site and utilizing it by the user of the search site.

【０００６】（請求項１）請求項１記載の発明は、イン
ターネット上におけるコンテンツ検索のためのキーワー
ドを作成すると共に、そのキーワードの重要度を得点化
する機能を備えたキーワード作成手段と、コンテンツ中
のテキストデータをキーワード等に応じて得点化するコ
ンテンツ評価手段とを具備し、前記キーワード作成手段
によって作成されたキーワードを基に得られた検索結果
中のコンテンツを、前記コンテンツ評価手段により評価
して得点化すると共に、得点化されたコンテンツと前記
キーワードとを対比して、キーワードの重要度を得点化
し、前記キーワード作成手段により、次回の検索時に用
いるキーワードを得点化されたキーワードの重要度に基
づいて作成し、これにより検索時に用いるキーワードを
更新していくことを特徴とする予測分析型検索システム
に係る。(Claim 1) The invention according to claim 1 creates a keyword for content search on the Internet, and a keyword creating means having a function of scoring the importance of the keyword, Content evaluation means for scoring the text data according to a keyword or the like, and evaluating the content in the search result obtained based on the keyword created by the keyword creating means by the content evaluating means. Along with scoring, the scored contents are compared with the keywords to score the importance of the keywords, and the keyword creating means is used to score the keywords to be used in the next search based on the importance of the keywords. Created by updating the keywords used for searching. According to predictive analysis-type search system according to claim.

【０００７】（用語説明）「キーワード」は、ユーザが
指定する場合、ユーザの指定を参考にして検索サイトの
運営者側で再設定する場合などがある。キーワード作成
手段にいう「得点化」の手法としては、例えば、キーワ
ードそのものやキーワードが入った文章の文脈に応じた
重み付けによって行う。また、出現回数をそのまま加算
したのでは誤差が大きいので、複数回出現する場合には
補正して加算するなどの方法も採用する。コンテンツ評
価手段にいう「得点化」の手法としては、得点化関数を
予め複数種類用意しておき、ユーザの検索目的に応じて
適切な得点化関数を選択し、その得点化関数によって算
出する。得点化関数としては、例えば、ユーザが法人で
ある場合における当該法人の株価への影響がありそうな
風説を収集したい場合、「風説」に該当すると判断され
るコンテンツには高得点を付与することとなる関数が採
用される。(Explanation of Terms) “Keyword” may be designated by the user, or may be reset by the operator of the search site with reference to the designation by the user. As a method of "scoring" referred to as the keyword creating means, for example, weighting according to the context of the keyword itself or the sentence containing the keyword is performed. Further, since the error is large if the number of appearances is added as it is, a method of correcting and adding the number of appearances is also adopted. As the “scoring” method referred to as the content evaluation means, a plurality of scoring functions are prepared in advance, an appropriate scoring function is selected according to the user's search purpose, and the scoring function is used for the calculation. As the scoring function, for example, when a user is a corporation and wants to collect a rumor that might affect the stock price of the corporation, a high score is given to the content judged to correspond to the "rumor". The following function is adopted.

【０００８】（作用）インターネット上におけるコンテ
ンツ検索のためのキーワードをキーワード作成手段が作
成する。一方、インターネット上のコンテンツ中のテキ
ストデータを、キーワード等に応じてコンテンツ評価手
段が得点化する。コンテンツ評価手段によって得点化さ
れたコンテンツを、キーワード作成手段によって、使用
したキーワードの重要度を得点化する。次回検索に用い
るキーワードには、この得点化された重要度を反映させ
て更新する。これが繰り返されることにより、キーワー
ドが適正化されていき、ユーザの嗜好度に応じた有益な
コンテンツを収集することができる。(Operation) The keyword creating means creates a keyword for content search on the Internet. On the other hand, the content evaluation means scores the text data in the content on the Internet according to the keyword or the like. The content scored by the content evaluation means is scored by the keyword creation means. The keyword used for the next search is updated by reflecting the scored importance. By repeating this, the keywords are optimized, and useful contents according to the user's preference can be collected.

【０００９】（請求項２）請求項２記載の発明は、イン
ターネット上におけるコンテンツ検索のためのキーワー
ドを作成し、そのキーワードの重要度を得点化するキー
ワード作成手段と、コンテンツ中のテキストデータをキ
ーワード等に応じて得点化するコンテンツ評価手段と、
前記キーワード作成手段によって得点化されたキーワー
ドの重要度およびコンテンツ評価手段によって得点化さ
れたコンテンツを用いて、ユーザが希望するコンテンツ
が現れるであろうと予測できるＵＲＬを収集するＵＲＬ
収集手段とを備えた予測分析型検索システムに係る。(Claim 2) According to a second aspect of the invention, a keyword creating means for creating a keyword for content search on the Internet and scoring the importance of the keyword, and text data in the content as a keyword Content evaluation means for scoring according to the
A URL that collects a URL at which it can be predicted that the content desired by the user will appear using the importance of the keyword scored by the keyword creation means and the content scored by the content evaluation means.
The present invention relates to a predictive analysis type search system including a collecting means.

【００１０】（作用）インターネット上におけるコンテ
ンツ検索のためのキーワードをキーワード作成手段が作
成する。一方、インターネット上のコンテンツ中のテキ
ストデータを、キーワード等に応じてコンテンツ評価手
段が得点化する。コンテンツ評価手段によって得点化さ
れたコンテンツを、キーワード作成手段によって、使用
したキーワードの重要度を得点化する。次回検索に用い
るキーワードには、この得点化された重要度を反映させ
て更新する。キーワード作成手段によって得点化された
重要度を用いているため、そのキーワードにはユーザの
嗜好度が十分反映されており、ユーザが希望するコンテ
ンツが現れるであろうと予測できるＵＲＬをＵＲＬ収集
手段により収集できる。収集されたＵＲＬは重要度が高
いと予測されており、ユーザは自ら検索をすることなく
有益なコンテンツが入手できるＵＲＬを取得できる。(Operation) The keyword creating means creates a keyword for content search on the Internet. On the other hand, the content evaluation means scores the text data in the content on the Internet according to the keyword or the like. The content scored by the content evaluation means is scored by the keyword creation means. The keyword used for the next search is updated by reflecting the scored importance. Since the degree of importance scored by the keyword creating means is used, the keyword sufficiently reflects the user's preference, and the URL collecting means collects URLs at which it can be predicted that the content desired by the user will appear. it can. The collected URLs are predicted to have a high degree of importance, and the user can obtain a URL where useful contents can be obtained without performing a search by himself.

【００１１】（請求項３）請求項３記載の発明は、請求
項１又は２記載の予測分析型検索システムを限定したも
のであり、コンテンツ評価手段には、コンテンツ中のタ
グ情報を解析するタグ情報解析手段を備えた予測分析型
検索システムに係る。(Claim 3) The invention according to claim 3 limits the predictive analysis type search system according to claim 1 or 2, and the content evaluation means includes a tag for analyzing tag information in the content. The present invention relates to a predictive analysis type search system equipped with information analysis means.

【００１２】（用語説明）「タグ情報」とは、例えば、
ＨＴＭＬ言語を用いてホームページが作成されている場
合、そのタグから得られる情報である。有益なタグ情報
としては、例えば、リンク先である。(Explanation of terms) "Tag information" means, for example,
When the home page is created using the HTML language, it is the information obtained from the tag. The useful tag information is, for example, a link destination.

【００１３】（作用）コンテンツ評価手段にタグ情報解
析手段を備えているので、コンテンツを評価する評価基
準の一つとしてタグ情報を用いることができる。(Operation) Since the content evaluation means is provided with the tag information analysis means, the tag information can be used as one of the evaluation criteria for evaluating the content.

【００１４】（請求項４）請求項４記載の発明は、請求
項１からまたは請求項３のいずれかに記載の予測分析型
検索システムを限定したものであり、キーワード作成手
段は、ユーザによるキーワード入力を受け付ける入力受
付手段を備えるとともに、その入力受付手段によって受
け付けたキーワードを用いてキーワード作成手段が再得
点化することとした予測分析型検索システムに係る。(Claim 4) The invention according to claim 4 limits the predictive analysis type search system according to any one of claims 1 to 3 and the keyword creating means is a keyword created by a user. The present invention relates to a predictive analysis type search system that includes an input receiving unit that receives an input and that the keyword creating unit uses the keyword received by the input receiving unit to re-score.

【００１５】（作用）入力受付手段を備えているので、
ユーザによるキーワードの追加、変更などに対応するこ
とができる。(Operation) Since the input receiving means is provided,
It is possible to deal with addition and modification of keywords by the user.

【００１６】（請求項５）請求項５記載の発明は、請求
項１から請求項４のいずれかに記載の予測分析型検索シ
ステムを限定したものであり、キーワード作成手段は、
コンテンツ評価手段による評価結果を踏まえて、収集さ
れた情報の中から新たな文言や文脈等を抽出し、新たな
キーワードを自動生成する機能を備えることを特徴とす
る予測分析型検索システムに係る。（作用）コンテンツ評価手段による評価結果が高い場
合、それらの情報の中から新たな文言や文脈等を抽出す
ることにより、キーワードを新たに生成する。これによ
り、ユーザが関心を示したコンテンツを基に、新たなキ
ーワードを生成していくことができ、さらにキーワード
の適正化が図られる。（請求項６）請求項６記載の発明は、請求項１から請求
項５のいずれかに記載の予測分析型検索システムを限定
したものであり、コンテンツ評価手段によって得点化さ
れたコンテンツを高得点順にソートするソート手段を備
えるとともに、ユーザへ検索結果を提供するユーザイン
タフェイスには、前記ソート手段によるソート結果を出
力可能とした予測分析型検索システムに係る。(Claim 5) The invention according to claim 5 limits the predictive analysis type search system according to any one of claims 1 to 4, and the keyword creating means includes:
The present invention relates to a predictive analysis type search system having a function of extracting a new wording or a context from the collected information and automatically generating a new keyword based on the evaluation result of the content evaluation means. (Operation) When the evaluation result by the content evaluation means is high, a new keyword or a new context is extracted from the information to newly generate the keyword. As a result, new keywords can be generated based on the content the user has shown interest in, and the keywords can be optimized. (Claim 6) The invention according to claim 6 limits the predictive analysis type search system according to any one of claims 1 to 5, and the content scored by the content evaluation means has a high score. The present invention relates to a predictive analysis type search system that includes a sorting unit that sorts in order, and that can output the sorting result by the sorting unit to a user interface that provides a search result to a user.

【００１７】（作用）コンテンツ評価手段によって得点
化されたコンテンツは、ソート手段が高得点順にソート
する。そして、ユーザが検索結果を得るためのユーザイ
ンタフェイスには、ソート手段によるソート結果を出力
可能としている。このため、ユーザは得点の高いコンテ
ンツから順に閲覧すればよく、効率がよい。(Operation) The content scored by the content evaluation means is sorted by the sorting means in the order of high score. Then, the sorting result by the sorting means can be output to the user interface for the user to obtain the search result. Therefore, the user may browse in order from the content with the highest score, which is efficient.

【００１８】（請求項７）請求項７記載の発明は、イン
ターネット上におけるコンテンツ検索のためのキーワー
ドを作成すると共に、そのキーワードの重要度を得点化
する機能を備えたキーワード作成手順と、コンテンツ中
のテキストデータをキーワード等に応じて得点化するコ
ンテンツ評価手順とを具備し、前記キーワード作成手順
によって作成されたキーワードを基に得られた検索結果
中のコンテンツを、前記コンテンツ評価手順により評価
して得点化すると共に、得点化されたコンテンツと前記
キーワードとを対比して、キーワードの重要度を得点化
し、前記キーワード作成手順により、次回の検索時に用
いるキーワードを得点化されたキーワードの重要度に基
づいて作成し、これにより検索時に用いるキーワードを
更新していくことを特徴とする予測分析型検索方法に係
る。(Claim 7) The invention according to claim 7 creates a keyword for content search on the Internet, and a keyword creating procedure having a function of scoring the importance of the keyword, Content evaluation procedure for scoring the text data according to a keyword or the like, and evaluating the content in the search results obtained based on the keyword created by the keyword creation procedure by the content evaluation procedure. Along with scoring, the scored content and the keyword are compared to score the importance of the keyword, and the keyword creation procedure is based on the importance of the scored keyword used in the next search. Created by updating the keywords used for searching. According to predictive analysis-first search method characterized.

【００１９】（請求項８）請求項８記載の発明は、イン
ターネット上におけるコンテンツ検索のためのキーワー
ドを作成し、そのキーワードの重要度を得点化するキー
ワード作成手順と、コンテンツ中のテキストデータをキ
ーワード等に応じて得点化するコンテンツ評価手順と、
前記キーワード作成手順によって得点化されたキーワー
ドの重要度およびコンテンツ評価手順によって得点化さ
れたコンテンツを用いて、ユーザが希望するコンテンツ
が現れるであろうと予測できるＵＲＬを収集するＵＲＬ
収集手順とを備えた予測分析型検索方法に係る。(Claim 8) According to the invention of claim 8, a keyword creating procedure for creating a keyword for content search on the Internet and scoring the importance of the keyword, and the text data in the content as a keyword Content evaluation procedure for scoring according to
A URL that collects a URL that can predict that the content desired by the user will appear using the importance of the keyword scored by the keyword creation procedure and the content scored by the content evaluation procedure.
And a predictive analysis type search method including a collection procedure.

【００２０】（請求項９）請求項９記載の発明は、請求
項７又は８記載の予測分析型検索方法を限定したもので
あり、コンテンツ評価手順には、コンテンツ中のタグ情
報を解析するタグ情報解析手順を備えた予測分析型検索
方法に係る。(Claim 9) The invention according to claim 9 limits the predictive analysis type search method according to claim 7 or 8, and the content evaluation procedure includes a tag for analyzing tag information in the content. The present invention relates to a predictive analysis type search method including an information analysis procedure.

【００２１】（請求項１０）請求項１０記載の発明は、
請求項７から請求項９のいずれかに記載の予測分析型検
索システムを限定したものであり、キーワード作成手順
は、ユーザによるキーワード入力を受け付ける入力受付
手順を備えるとともに、その入力受付手順によって受け
付けたキーワードを用いて再得点化する再得点化手順を
備えた予測分析型検索方法に係る。(Claim 10) The invention according to claim 10 is
The predictive analysis type search system according to any one of claims 7 to 9 is limited, and the keyword creation procedure includes an input acceptance procedure for accepting a keyword input by a user and is accepted by the input acceptance procedure. The present invention relates to a predictive analysis type search method including a rescoring procedure for rescoring using a keyword.

【００２２】（請求項１１）請求項１１記載の発明は、
請求項７から請求項１０のいずれかに記載の予測分析型
検索方法を限定したものであり、キーワード作成手順
は、コンテンツ評価手順による評価結果を踏まえて、収
集された情報の中から新たな文言や文脈等を抽出し、新
たなキーワードを自動生成する手順を備えることを特徴
とする予測分析型検索方法に係る。（請求項１２）請求項１２記載の発明は、請求項７から
請求項１１のいずれかに記載の予測分析型検索方法を限
定したものであり、コンテンツ評価手順によって得点化
されたコンテンツを高得点順にソートするソート手順
と、そのソート手順によるソート結果をユーザへ出力可
能とするユーザ出力手順とを備えた予測分析型検索方法
に係る。(Claim 11) The invention according to claim 11 is
The predictive analysis type search method according to any one of claims 7 to 10 is limited, and the keyword creation procedure is based on the evaluation result of the content evaluation procedure, and a new wording is selected from the collected information. The present invention relates to a predictive analysis type search method characterized by including a procedure for automatically generating a new keyword by extracting a keyword or a context. (Claim 12) The invention according to claim 12 limits the predictive analysis type search method according to any one of claims 7 to 11, and gives a high score to the content scored by the content evaluation procedure. The present invention relates to a predictive analysis type search method including a sorting procedure for sequentially sorting and a user output procedure that enables a sorting result by the sorting procedure to be output to a user.

【００２３】（請求項１３）請求項１３記載の発明は、
コンピュータに対し、予測分析型検索方法を実現させる
ためのプログラムであって、そのプログラムは、インタ
ーネット上におけるコンテンツ検索のためのキーワード
を作成すると共に、そのキーワードの重要度を得点化す
る機能を備えたキーワード作成手順と、コンテンツ中の
テキストデータをキーワード等に応じて得点化するコン
テンツ評価手順とを具備し、前記キーワード作成手順に
よって作成されたキーワードを基に得られた検索結果中
のコンテンツを、前記コンテンツ評価手順により評価し
て得点化すると共に、得点化されたコンテンツと前記キ
ーワードとを対比して、キーワードの重要度を得点化
し、前記キーワード作成手順により、次回の検索時に用
いるキーワードを得点化されたキーワードの重要度に基
づいて作成し、これにより検索時に用いるキーワードを
更新していくことをコンピュータに実行させるためのプ
ログラムに係る。(Claim 13) The invention according to claim 13 is
A program for realizing a predictive analysis type search method for a computer, which program has a function of creating a keyword for content search on the Internet and scoring the importance of the keyword. A keyword creation procedure and a content evaluation procedure for scoring text data in the content according to a keyword etc. are provided, and the content in the search result obtained based on the keyword created by the keyword creation procedure is The content is evaluated and scored, and the scored content is compared with the keywords to score the importance of the keywords, and the keyword creation procedure scores the keywords to be used in the next search. Created based on the importance of the keywords According to a program for executing that it will update the keywords used more during the search on the computer.

【００２４】（請求項１４）請求項１４記載の発明は、
コンピュータに対し、予測分析型検索方法を実現させる
ためのプログラムであって、そのプログラムは、インタ
ーネット上におけるコンテンツ検索のためのキーワード
を作成し、そのキーワードの重要度を得点化するキーワ
ード作成手順と、コンテンツ中のテキストデータをキー
ワード等に応じて得点化するコンテンツ評価手順と、前
記キーワード作成手順によって得点化されたキーワード
の重要度およびコンテンツ評価手順によって得点化され
たコンテンツを用いて、ユーザが希望するコンテンツが
現れるであろうと予測できるＵＲＬを収集するＵＲＬ収
集手順とをコンピュータに実行させるためのプログラム
に係る。(Claim 14) The invention according to claim 14 is
A program for realizing a predictive analysis type search method for a computer, the program creating a keyword for content search on the Internet, and a keyword creating procedure for scoring the importance of the keyword, The user wants to use the content evaluation procedure for scoring the text data in the content according to the keywords, etc., the importance of the keywords scored by the keyword creation procedure, and the content scored by the content evaluation procedure. A program for causing a computer to execute a URL collection procedure for collecting a URL at which content can be expected to appear.

【００２５】（請求項１５）請求項１５記載の発明は、
コンテンツ評価手順には、コンテンツ中のタグ情報を解
析するタグ情報解析手順を備えた請求項１３又は１４記
載のプログラムに係る。(Claim 15) The invention according to claim 15 is
The program according to claim 13 or 14, wherein the content evaluation procedure includes a tag information analysis procedure for analyzing tag information in the content.

【００２６】（請求項１６）請求項１６記載の発明は、
キーワード作成手順は、ユーザによるキーワード入力を
受け付ける入力受付手順を備えるとともに、その入力受
付手順によって受け付けたキーワードを用いて再得点化
する再得点化手順を備えた請求項１３から請求項１５の
いずれかに記載のプログラムに係る。(Claim 16) The invention according to claim 16 is
The keyword creation procedure comprises an input acceptance procedure for accepting a keyword input by a user, and a rescoring procedure for rescoring using a keyword accepted by the input acceptance procedure. Related to the program described in.

【００２７】（請求項１７）請求項１７記載の発明は、
キーワード作成手順は、コンテンツ評価手順による評価
結果を踏まえて、収集された情報の中から新たな文言や
文脈等を抽出し、新たなキーワードを自動生成する手順
を備えることを特徴とする請求項１３から請求項１６の
いずれかに記載のプログラムに係る。（請求項１８）請求項１８記載の発明は、コンテンツ評
価手順によって得点化されたコンテンツを高得点順にソ
ートするソート手順と、そのソート手順によるソート結
果をユーザへ出力可能とするユーザ出力手順とを備えた
請求項１３から請求項１７のいずれかに記載のプログラ
ムに係る。(Claim 17) The invention according to claim 17 is
14. The keyword creating procedure includes a procedure of extracting a new wording or context from the collected information based on the evaluation result of the content evaluation procedure and automatically generating a new keyword. To the program according to claim 16. (Claim 18) The invention according to claim 18 includes a sorting procedure for sorting the contents scored by the content evaluation procedure in the order of high scores, and a user output procedure for outputting the sorting result by the sorting procedure to the user. The program according to any one of claims 13 to 17 provided.

【００２８】請求項１３から請求項１８に記載の発明に
おいて、プログラムを、記録媒体へ記憶させて提供する
こともできる。ここで、「記録媒体」とは、それ自身で
は空間を占有し得ないプログラムを担持することができ
る媒体であり、例えば、フレキシブルディスク、ハード
ディスク、ＣＤ−ＲＯＭ、ＭＯ（光磁気ディスク）、Ｄ
ＶＤ−ＲＯＭなどである。In the thirteenth to eighteenth inventions, the program can be stored in a recording medium and provided. Here, the "recording medium" is a medium that can carry a program that cannot occupy space by itself, and is, for example, a flexible disk, a hard disk, a CD-ROM, an MO (magneto-optical disk), or a D.
For example, VD-ROM.

【００２９】[0029]

【発明の実施の形態】以下、本発明を実施の形態及び図
面に基づいて、更に詳しく説明する。ここで使用する図
面は、図１乃至図９である。「ユーザ」とは、本実施形
態に係る検索システムを利用する者（法人の場合もあ
る）をいう。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will now be described in more detail with reference to the embodiments and the drawings. The drawings used here are FIGS. 1 to 9. The “user” refers to a person (may be a corporation) who uses the search system according to the present embodiment.

【００３０】（全体構成）実施形態に係る検索システム
の全体的な構成は、バックグラウンドで動作し、インタ
ーネット上から情報を取得する「予測分析型検索エンジ
ン（SEIP、SearchEngine with Intelligent Prediction
s）」と、取得した情報をデータベースから取り出し、W
ebブラウザを通してユーザに提供する「結果表示用CGI
プログラム」との、２つの基本システムから構成されて
いる。(Overall Configuration) The overall configuration of the search system according to the embodiment operates in the background and acquires information from the Internet by using "SEIP (Search Engine with Intelligent Prediction).
s) ”and retrieves the acquired information from the database and
"CGI for displaying results" provided to users through eb browser
It is composed of two basic systems called "program".

【００３１】予測分析型検索エンジンは、キーワードに
応じて抽出したインターネット上のコンテンツの内容を
吟味し、得点化する「コンテンツ評価手段」と、ユーザ
の嗜好を分析し、検索のシーズとなるキーワードの作
成、そのキーワードの重要度の得点化を行う「キーワー
ド作成手段」と、検索した結果およびユーザの嗜好の分
析によって、ユーザが希望するコンテンツが現れるであ
ろうと予測できるＵＲＬを収集する「ＵＲＬ収集手段」
とを備えている。キーワード作成手段は、また、コンテ
ンツ評価手段による結果を踏まえて、ユーザーが関心を
示したコンテンツの情報に基づき、当該情報で使用され
ている文言や文脈等から、新たなキーワードを自動生成
する機能を備えている。The predictive analysis type search engine examines the contents of the contents on the Internet extracted according to the keyword and makes a score, and "contents evaluation means", and analyzes the user's preference, and searches for keywords that are the seeds of the search. A "keyword creating means" for creating and scoring the importance of the keyword, and a "URL collecting means" for collecting a URL that can predict that the content desired by the user will appear by analyzing the search result and the user's preference. "
It has and. The keyword creation means also has a function to automatically generate a new keyword based on the information of the content the user has shown interest in, based on the wording or context used in the information, based on the result of the content evaluation means. I have it.

【００３２】（予測分析型検索エンジン）予測分析型検
索エンジンがインターネット上から情報を取得し、デー
タベースに保存するまでの処理は、ＵＲＬで与えられる
指定のページをインターネットから取得する第一過程、
取得したページの内容を解析する第二過程、解析結果が
目的とする情報であればデータベースに保存する第三過
程に大別される。(Predictive analysis type search engine) The process until the predictive analysis type search engine acquires information from the Internet and stores it in the database is the first process of acquiring a specified page given by the URL from the Internet,
It is roughly divided into a second process of analyzing the content of the acquired page, and a third process of storing the information in the database if the analysis result is the target information.

【００３３】（各機能）また、以下の六つの機能を備え
ており、これらの処理部はインタープリタ言語(Perl)に
よって記述されている。第一に、検索対象となるＵＲＬ
の追加、削除や、ＵＲＬ情報の更新を行うＵＲＬデータ
ベース処理手段（ＵＲＬ収集部）である。第二に、ヘッ
ダ情報を取得したり、ページの更新時間をチェックした
り、ページ本体の取得を行うページ取得手段である。第
三に、タグを除去したり、リンク情報を取得するタグ情
報解析手段である。第四に、キーワードおよび文脈を利
用したパターンマッチング処理を行い、その処理の結果
に基づいて取得したページを得点化するページ内容解析
部（コンテンツ評価部）である。第五に、コンテンツ評
価による新たなキーワードを作成したり抽出したり、キ
ーワードを再評価して得点を見直したりするキーワード
作成手段である。第六に、取得したページのデータベー
スへの追加や削除をしたり、データベース内の同一メッ
セージの検出をしたりするデータベース処理手段であ
る。(Each function) Further, it has the following six functions, and these processing units are described in an interpreter language (Perl). First, the URL to be searched
Is a URL database processing unit (URL collecting unit) that adds, deletes, and updates URL information. Secondly, it is a page acquisition unit that acquires header information, checks the page update time, and acquires the page body. Thirdly, it is a tag information analysis means for removing tags and acquiring link information. Fourth, a page content analysis unit (content evaluation unit) that performs pattern matching processing using keywords and contexts and scores the pages acquired based on the result of the processing. Fifth, it is a keyword creating means for creating or extracting a new keyword based on content evaluation, or re-evaluating a keyword to review the score. Sixth is a database processing means for adding or deleting the acquired pages to the database and detecting the same message in the database.

【００３４】（ＵＲＬデータベース処理手段）本実施形
態の予測分析型検索システムは、ページのリンクをたど
ることによって、ＵＲＬを収集し、そのＵＲＬで指定さ
れる情報の関連性を吟味することによって検索範囲を拡
大していく。情報の関連性に基づいて検索範囲を拡大す
ることは嗜好度を吟味することになり、同時にそのＵＲ
Ｌにおいて更新される「情報の期待度」も計算する。そ
の期待度によって、ＵＲＬの収集を行う。収集されたＵ
ＲＬを元に新たな検索を行うことになる。情報の期待度
は「ページ内容解析手段」で計算される。本手段では、
この期待度を元に、ＵＲＬの収集および削除、ならびに
順序付けを、以下のようにして行う。(URL Database Processing Means) The predictive analysis type search system of this embodiment collects URLs by following the links of pages and examines the relevance of the information specified by the URLs. To expand. Expanding the search range based on the relevance of information means examining the preference level, and at the same time, determining the UR.
It also calculates the "expectation of information" that is updated in L. URLs are collected according to the degree of expectation. U collected
A new search will be performed based on the RL. The degree of expectation of information is calculated by "page content analysis means". By this means,
Based on this expectation, collection and deletion of URLs and ordering are performed as follows.

【００３５】（連想配列の記述）ＵＲＬの管理には、連
想配列という手法を採用する。図２には、連想配列の構
造を示す。この実施形態では、UNIX（登録商標）の世界
で古くから標準的に用いられているdbmデータベースを
使用した。「key 」へはＵＲＬを代入し、「value」へ
はページの得点と検索回数とを代入する。連想配列を用
いた場合、一つのkeyに対して一つのvalueしか保存する
ことができない。しかし、1つのＵＲＬに対して、得点
と検索回数の情報を保存する必要があるため、“:”を
区切りとして二つの値を代入することとする。例えば、 value値＝１２．５:５の場合ページの得点が１２．５点、検索回数が５回、であるこ
とを表す。(Description of associative array) A method called associative array is adopted for managing URLs. FIG. 2 shows the structure of the associative array. In this embodiment, a dbm database that has been used as a standard for a long time in the UNIX (registered trademark) world was used. The URL is substituted for "key", and the page score and the number of searches are substituted for "value". When using an associative array, only one value can be stored for one key. However, since it is necessary to store the information on the score and the number of searches for one URL, two values are substituted with ":" as a delimiter. For example, when the value value = 12.5: 5, the page score is 12.5 and the number of searches is 5.

【００３６】（ＵＲＬの追加、削除）ＵＲＬを新たに追
加する場合は、keyにＵＲＬ、value値は０:０にセット
する。登録されているＵＲＬを削除するのは、そのvalu
eがある条件を満たした時のみ行うようにした。現在の
バージョンでは、検索回数が１０回以上かつ得点が０点
の時を削除条件としている。但し、この条件は初期設定
として仮に設定したものであり、変更可能である。(Addition / deletion of URL) When a URL is newly added, the URL is set in the key and the value value is set to 0: 0. Deleting the registered URL is the valu
It was done only when e satisfied a certain condition. In the current version, when the number of searches is 10 or more and the score is 0, the deletion condition is set. However, this condition is provisionally set as an initial setting and can be changed.

【００３７】（ＵＲＬ情報の更新）検索を行うと、その
ＵＲＬに対して得点が得られる。この得点をＵＲＬの情
報として付加する必要がある。また検索回数を表す数値
の更新も行わなければならない。その際には、ＵＲＬを
keyにして、そのvalueの値を変更する。 (例) 得られた得点＝10点、value値＝０：４の場合この場合、これまでに4回検索が行われ、いずれの場合
も得点が０点だったことを表している。しかし、今回10
点を得ることができたので、更新後のvalue値＝１０：
５となる。(Update of URL information) When a search is performed, a score is obtained for the URL. It is necessary to add this score as URL information. Also, the numerical value indicating the number of searches must be updated. In that case, the URL
Use as a key and change the value of that value. (Example) When the obtained score is 10 points and the value value is 0: 4 In this case, the search has been performed 4 times so far, and in each case, the score is 0 point. But this time 10
Since the points were obtained, the updated value value = 10:
It becomes 5.

【００３８】（ページ取得手段）図３には、ページ取得
部の概念を示している。ページ取得部は、ＵＲＬの指示
するページの情報を取得する。また、取得する際に、ペ
ージ取得部は、ページの更新日時をチェックし、所定期
間内（例えば過去一ヶ月間）に更新された情報のみを取
得する。すなわち情報の有効期限を仮定するのである。
通常、情報において、その一つの価値は「新鮮さ」にあ
る。資料としての検索等を除き、ある一定期間を過ぎた
情報の価値はゼロとなることが多い。したがって本シス
テムでは、情報の有効期間に着目し、その有効期間を設
定することとした。なお、この有効期限は任意に設定可
能としている。(Page Acquisition Unit) FIG. 3 shows the concept of the page acquisition unit. The page acquisition unit acquires information on a page designated by the URL. In addition, at the time of acquisition, the page acquisition unit checks the update date and time of the page and acquires only the information updated within a predetermined period (for example, the past month). That is, the expiration date of information is assumed.
Usually, in information, one value is "freshness". Except for searches as materials, the value of information that has passed a certain period of time is often zero. Therefore, in this system, we focused on the effective period of information and set the effective period. The expiration date can be set arbitrarily.

【００３９】具体的には、ページ取得手段が取得したヘ
ッダ情報に基づき、更新日時をチェックする。ヘッダ内
のLast-Modifiedは、ページの最終更新日時を示してお
り、これを用いる。所定期間内であればページ全体を取
り込むこととするのである。このことによって、検索時
間、ページ取得時間を短縮する。なお、ヘッダに更新日
時の記述がない場合が存在する。その場合はHTMLで記述
された本文の最初の一定文字数（数百バイトから数キロ
バイト）を読み取り、更新日時が記述されている項目を
探索し、更新日時を推定することによって、そのページ
の取得可否を決定することとする。文書データの書き出
し部分には日時データが含まれることが多いということ
が、経験的に把握されているからである。Specifically, the update date and time is checked based on the header information obtained by the page obtaining means. Last-Modified in the header indicates the date and time when the page was last updated, and is used. Within the predetermined period, the entire page is taken in. This shortens the search time and the page acquisition time. Note that there is a case where the update date and time is not described in the header. In that case, whether the page can be acquired by reading the first fixed number of characters (several hundred bytes to several kilobytes) of the text described in HTML, searching for the item that describes the update date and time, and estimating the update date and time. Will be decided. This is because it is empirically understood that date / time data is often included in the writing portion of the document data.

【００４０】（ＨＴＭＬ解析手段）図４には、ＨＴＭＬ
の解析手段の概念を示している。HTTPリクエストによっ
て取得されたページは、HTMLタグを含んだ状態であり、
リンクやレイアウト情報といった、内容には無関係な情
報も含まれている。これをPerlの正規表現を用いて取り
除き、キーワードによるマッチング処理を行うのに適し
た形にする。一方、タグが表記された状態から得ること
のできる有益な情報もある。例えば、ページのタイトル
や、そのページから張られているリンク情報などであ
る。HTML解析手段では、有益なタグ情報を取得するとと
もに、無益なタグ情報を取り除くことによってキーワー
ドによるマッチング処理を行うのに適したフォーマット
（例えばプレーンテキスト）へ変換する。リンク情報
は、例えば以下のようにして扱う。まず、該当ページの
得点が閾値以上の時、正規表現を用いて、そのページ内
から張られているリンクのＵＲＬを取得する。取得した
ＵＲＬは、ＵＲＬデータベース処理部に引き渡し、デー
タベースに保存する。そのＵＲＬの内容を調べれば、よ
り価値ある情報を得る可能性があるからである。(HTML Analysis Means) FIG. 4 shows HTML.
It shows the concept of the analysis means. The page acquired by the HTTP request is in a state that includes HTML tags,
It also includes information that is irrelevant to the content, such as links and layout information. This is removed by using a Perl regular expression, and a form suitable for performing matching processing using keywords is created. On the other hand, there is useful information that can be obtained from the state where the tags are written. For example, the title of the page and the link information provided from the page. The HTML analysis means acquires useful tag information and removes useless tag information to convert it into a format (for example, plain text) suitable for matching processing by keywords. The link information is handled as follows, for example. First, when the score of the corresponding page is equal to or higher than the threshold value, the URL of the link provided from within the page is acquired using the regular expression. The acquired URL is passed to the URL database processing unit and stored in the database. This is because there is a possibility that more valuable information may be obtained by examining the contents of that URL.

【００４１】（ページ内容解析手段）このページ内容解
析手段は、後述のキーワード生成部で作成されたキーワ
ードおよびテキストデータの文脈を基に、パターンマッ
チング処理をし、例えば、以下のようにして内容を採点
する。HTML解析手段によってタグを除去されてプレーン
テキストとなった情報に対し、キーワードや文脈を用い
たマッチング処理を行う。キーワードや文脈は、後述の
キーワード生成手段で新たに生成され、得点化が行われ
る。(Page Content Analyzing Means) This page content analyzing means performs pattern matching processing based on the context of keywords and text data created by a keyword generating unit, which will be described later. For example, the content is analyzed as follows. to grade. The matching processing using keywords and contexts is performed on the information that has become the plain text after the tags are removed by the HTML analysis means. Keywords and contexts are newly generated by the keyword generation means described later and scored.

【００４２】なお、デフォルト（初期設定）のキーワー
ド、および文脈はユーザが入力する場合と、システム設
計者があらかじめ用途（例えば、誹謗中傷発言の検索
用、株価に対する風説の流布用、特定商品の話題用等）
別に入力しておく場合とがある。本実施形態では、ある
企業（Ｍ社）の広報担当者が、当該企業に関する不当な
発言、誹謗中傷などがないかどうかチェックする場合を
想定して説明する。ページの文章内に出現するキーワー
ドや文脈を検出し、それぞれの点数(重み)を加算し、ペ
ージの持つ総得点を求める。点数が高いほど、目的に合
った情報である可能性が高いと仮定する。なお、加算に
関して、単純加算とキーワードや文脈の出現する位置、
個数等で加重配点を行う場合もある。この場合、ページ
にリンクが貼られている場合には、取得したリンク情報
の重要度を判断し加重配点する構成とすることが好まし
い。すなわち、ページのリンク構造の重要度を判定する
構成である。重要度の高いページ（又はＵＲＬ）をリン
ク（引用）しているページは、それなりに重要度が高
く、また、重要度の高いページにリンク（引用）されて
いるページ（又はＵＲＬ）も、それに準じて重要度が高
いと考えられ、ページの重要度（得点）評価の指標に用
いるのに適切だからである。It should be noted that default keywords and contexts are input by the user, and by the system designer in advance (for example, for retrieving slanderous remarks, for disseminating rumor about stock prices, and for specific product topics). Etc.)
Sometimes you need to enter it separately. In the present embodiment, a case will be described in which a public relations officer of a certain company (M company) checks whether there is an unwarranted statement, slander or the like regarding the company. Detect keywords and contexts appearing in the text of a page, add the scores (weights) for each, and find the total score of the page. It is assumed that the higher the score, the more likely the information is for the purpose. Regarding addition, simple addition and the position where keywords and contexts appear,
In some cases, weighted points are assigned based on the number of pieces. In this case, when a link is pasted on the page, it is preferable that the importance of the acquired link information is judged and weighted points are assigned. That is, this is a configuration for determining the importance of the page link structure. A page linking (quoting) a page (or URL) of high importance is reasonably high in importance, and a page (or URL) linking (quoting) to a page of high importance is also included in it. According to this, it is considered that the importance is high, and it is suitable for use as an index for evaluating the importance (score) of the page.

【００４３】例えば、以下のような文章が掲載されたペ
ージがあったとする。企業倫理に厳しいＡ国で、20年以
上も「欠陥」の事実を隠していた。この「欠陥」隠しに
対する「責任」は重大であり、Ｍ社が大きく信用を落と
したのは確かなようだ。上記文章のキーワードとして、
「欠陥」、「責任」を設定し、この文章の得点を求め
る。これらキーワードの重みは「欠陥=５点」、「責任=
２点」である。よって、この文章の得点は１２点（５点
×２＋２点)となる。For example, assume that there is a page in which the following sentences are posted. In country A, which has severe corporate ethics, he has been hiding the fact of "deficiency" for over 20 years. The "responsibility" for this "defective" concealment is significant, and it seems certain that Company M has lost its credibility. As a keyword of the above sentence,
Set "defect" and "responsibility" and ask for the score of this sentence. The weight of these keywords is "defect = 5 points" and "responsibility =
2 points ". Therefore, the score of this sentence is 12 points (5 points × 2 + 2 points).

【００４４】（キーワード生成手段）前述のように、個
々のキーワードおよび文脈には「重み」を設定する。
「重み」は、文章を意味付ける上で情報量の差を反映さ
せるためである。この「重み」は、デフォルト（初期設
定）においては、ユーザが重み付けを行う場合と、上記
したようにシステム設計者がキーワードや文脈とともに
その重み付けを与える場合とがある。しかし、本検索シ
ステムを利用するユーザの嗜好に合致しているとは言い
がたい場合もある。それはユーザがデフォルトでキーワ
ードや文脈、およびそれらの重み付けを与える場合も例
外ではない。(Keyword Generating Unit) As described above, “weight” is set for each keyword and context.
The “weight” is for reflecting the difference in the amount of information in meaning of a sentence. In the default (initial setting), the "weight" may be weighted by the user or may be given by the system designer together with the keyword and context as described above. However, it may not be said that it matches the preference of the user who uses this search system. It's no exception when users give keywords and contexts and their weightings by default.

【００４５】図５には、ページ内容解析の機能を概念的
に示している。本実施形態における検索システムは、ユ
ーザが当初、本システムが与えた情報、すなわち検索し
てきた情報に対しての評価をユーザが行うことによっ
て、システムがキーワードや文脈の重み付けを行うこと
もできる。また、ユーザが評価や重み付けを意識するこ
とがなくとも、どの情報をどのように処理したかによっ
て、本システムが自動的に重み付けを行う機能も有す
る。FIG. 5 conceptually shows the page content analysis function. In the search system according to the present embodiment, the user may initially evaluate the information provided by the system, that is, the information searched for by the user, and the system may weight the keywords and the context. Further, even if the user is not aware of evaluation and weighting, this system has a function of automatically weighting according to how and what information is processed.

【００４６】（得点加算）ユーザの着目しているキーワ
ードを分析し、その出現回数によって得点を計算する。
これをユーザポイントとし、通常で得られた得点に加算
する。その方法は、まず、検索結果のタイトルをクリッ
クし、ページ全文が表示され、それと同時にキーワード
を検出し、キーワードに対して、“０．１×出現回数”
で得られる得点を与える、という方法である。以上の方
法で得たキーワード別の得点は、次回、同じ検索を行う
場合に利用する。すなわち、最初の検索時、ページ本文
のキーワードをリアルタイムで解析し、ユーザの着目し
ているキーワードが検出されたら、先程得たキーワード
別の得点を通常の得点に加算する。次回の検索はこの加
算された値を元にして行うのである。(Score Addition) The keyword the user is paying attention to is analyzed, and the score is calculated according to the number of appearances.
This is used as a user point, and is added to the score obtained in the usual way. The method is as follows: first click the title of the search result, the full text of the page is displayed, and at the same time, the keyword is detected and "0.1 x the number of appearances"
It is a method of giving the score obtained in. The score for each keyword obtained by the above method will be used when the same search is performed next time. That is, at the time of the first search, the keyword of the page body is analyzed in real time, and when the keyword of the user's attention is detected, the score obtained for each keyword is added to the normal score. The next search is based on this added value.

【００４７】この機能によって、ユーザの着目している
キーワードを含む情報ほど高得点となり、ユーザは得た
い情報をいち早くデータベースから見つけることができ
る。また、「この情報はあなたにとって上記のメッセー
ジは有益な情報でしたか?」の問に対してYESを選択する
と、同様の方法でさらに、０．５ポイント加算されるこ
ととしている。図６はユーザ別に得点が加えられた様子
を表す。図中、得点を表示している部分の、「１５．５
３」が通常で得られた得点を表し、「２．５」がユーザ
別で得られた得点を表している。ユーザ別に得られた得
点は、使用を繰り返すことで絶えず変化し、よりユーザ
の意図に沿った検索を可能とする。With this function, the higher the information including the keyword the user is paying attention to, the higher the score, and the user can quickly find the desired information from the database. If you select YES to the question "Is this information useful for you, is this information useful?", You will be awarded an additional 0.5 points in the same way. FIG. 6 shows how scores are added for each user. In the figure, "15.5
"3" represents the score obtained in the normal state, and "2.5" represents the score obtained for each user. The score obtained for each user constantly changes with repeated use, which enables a search more in line with the user's intention.

【００４８】（キーワードの追加）ユーザが検索内容を
細分化したい場合、より精度を高めたい場合などに対応
するため、ユーザが任意でキーワードを追加することが
できるようにしている。ただし、キーワードの追加はシ
ステムに対する負担が大きいので、システムの性能に応
じて追加できるワード数を制限している。本実施形態で
は３つの追加までとしている。(Addition of Keyword) In order to deal with the case where the user wants to subdivide the search content or to improve the accuracy, the user can arbitrarily add the keyword. However, the addition of keywords imposes a heavy burden on the system, so the number of words that can be added is limited according to the performance of the system. In this embodiment, up to three additions are made.

【００４９】（ＵＲＬの追加）ユーザーが任意でＵＲＬ
を検索対象として追加することができる。数は無制限。
偶然発見したページを常に検索したい場合に、ＵＲＬの
追加を行う。(Addition of URL) The URL is optional by the user
Can be added as a search target. Unlimited number.
If you want to always search for a page found by chance, add a URL.

【００５０】（キーワードの新たな生成）本システムが
検索してきた情報に対してユーザが評価する際、または
本システムがユーザの振る舞いを解析してその解析結果
を評価する際には、その時点で登録されているキーワー
ドや文脈の重みを更新するだけでなく、評価や解析結果
の評価が高いと判断された場合、それらの情報の中から
新たなキーワードや文脈を抽出し、キーワード等のデー
タベースに登録する。(New Generation of Keyword) When the user evaluates the information retrieved by the system, or when the system analyzes the behavior of the user and evaluates the analysis result, at that time In addition to updating the weight of registered keywords and contexts, if it is determined that the evaluation and analysis results are highly evaluated, new keywords and contexts are extracted from those information and stored in a database of keywords, etc. register.

【００５１】（キーワードの重複に対する調整）同一ペ
ージ内に同じ単語が複数出現する場合、無意味にページ
の得点が高いものになってしまう。これを回避し、正当
な得点とするために、単語の出現回数によって加算する
得点に変化をつける。さまざまな評価式が考えられる
が、本実施形態では、次の関数を利用する。加算するキ
ーワードの点数＝キーワードの通常点／（単語の出現回
数×２）例えば、キーワードの通常の持ち点(重み)を
「４点」とすると、最初のリコールには４点、次は２
点、最後は１点というように、加算する点数は半減され
るものとする。(Adjustment for Duplication of Keyword) When a plurality of the same words appear in the same page, the score of the page is meaninglessly high. In order to avoid this and obtain a valid score, the score to be added is changed depending on the number of appearances of the word. Although various evaluation formulas can be considered, the following functions are used in this embodiment. Keyword score to be added = keyword normal score / (number of word appearances × 2) For example, if the normal score (weight) of the keyword is “4”, 4 points are given for the first recall and 2 points are given next.
The number of points to be added is halved, such as 1 point at the end and 1 point at the end.

【００５２】（キーワードのシソーラス）例えば、ユー
ザの呼び名が「Ｘ自動車、ＸＸＣ、Ｘ自工」というよう
に、複数種類存在する場合、これらを事前に調べ、ある
いはユーザから別の呼び名を募っておき、予め登録して
おく。そして、それらは同じキーワードとして扱う。登
録企業名の検出方法は、先程のキーワードによるマッチ
ング処理と同様の手段を用いる。ページ内に登録されて
いる企業名が出現した場合、そのページの情報はデータ
ベースに保存するのである。(Keyword Thesaurus) For example, when there are a plurality of types of user names such as “X car, XXC, X self-engineering”, these are checked in advance or another name is recruited from the user. , Register in advance. And treat them as the same keyword. As the method of detecting the registered company name, the same means as the matching process using the keyword is used. When the company name registered in the page appears, the information of the page is stored in the database.

【００５３】（ページから取得する情報）データベース
に保存する直前には、データベースに保存すべき情報す
べてが揃っていることになる。本実施形態におけるデー
タベースに保存する情報は、タイトル、本文、ＵＲＬ、
取得時刻、リンク情報、検出されたキーワード、得点で
ある。ユーザインタフェイスでは、これらを利用してデ
ータベースから情報を引き出すこととする。なお、得点
順、取得時刻順などによってコンテンツの表示順序を変
更できるソート手段が備えられており、ユーザの便宜を
図っている。(Information Obtained from Page) Just before saving in the database, all the information to be saved in the database is complete. Information stored in the database according to the present embodiment includes a title, a text, a URL,
The acquisition time, the link information, the detected keyword, and the score. The user interface uses these to retrieve information from the database. Note that a sorting means is provided that can change the display order of the contents according to the score order, the acquisition time order, etc., for the convenience of the user.

【００５４】ページの得点が閾値を超えた場合、その情
報をデータベースへ保存する。但し、新着の情報の扱い
に注意しなければならない。例えば、同じＵＲＬから得
た情報が、すでにデータベース内にあったとする。その
場合、図７に示すように、過去に保存した情報はデータ
ベースから削除し、新着の情報を保存することとしてい
る。When the score of the page exceeds the threshold value, the information is stored in the database. However, care must be taken when handling new information. For example, assume that the information obtained from the same URL is already in the database. In that case, as shown in FIG. 7, the information saved in the past is deleted from the database and the new information is saved.

【００５５】前述したように、ページによってはLast-M
odifiedがヘッダに記述されてない場合がある。する
と、更新日時が不明となるため、同一の情報が検出され
た場合、ページが更新されてないにも関わらず、本シス
テムはその情報を取得し、データベースに追加しようと
する。これを防ぐために、保存する直前に、データベー
ス内の同一情報を検出する。そして、同一のＵＲＬが検
出されたことに対しては、データベースに新着の情報を
追加する際、過去に取得した同一ＵＲＬの情報をデータ
ベースから削除することとする。As described above, depending on the page, Last-M
odified may not be described in the header. Then, since the update date and time becomes unknown, when the same information is detected, the system tries to acquire the information and add it to the database even though the page has not been updated. To prevent this, the same information in the database is detected immediately before saving. Then, when the same URL is detected, the information of the same URL acquired in the past is deleted from the database when new information is added to the database.

【００５６】（ユーザインタフェイス）ユーザインタフ
ェイスを実装するにあたって、本実施形態ではPerlを使
用言語として採用した。インターネット上でCGIとして
最も多く使用されている言語だからである。データベー
ス内に保存されている情報は、一つのページ毎に、タイ
トル、本文、ＵＲＬ、取得時間、検出されたキーワー
ド、得点である。これらを利用して、ユーザが希望する
情報をデータベース内から引き出し、表示することを最
大の目的とする。本実施形態で実装したユーザインタフ
ェイスは、データベース内をさまざまな方法で検索でき
るようにしている。図８には、ウェブブラウザソフトを
用いたインタフェイスの一例を示している。(User Interface) In implementing the user interface, Perl was adopted as the language used in this embodiment. This is because it is the most used CGI language on the Internet. The information stored in the database is the title, body, URL, acquisition time, detected keyword, and score for each page. By using these, the maximum purpose is to extract the information desired by the user from the database and display it. The user interface implemented in this embodiment enables the database to be searched in various ways. FIG. 8 shows an example of an interface using web browser software.

【００５７】（情報の取得時間を利用した検索）3日以
内に取得した情報の表示、1週間以内に取得した情報の
表示、すべて表示と取得日時を利用した検索を行う。取
得日時の新しい順にソートして出力させる、といったユ
ーザインタフェイスを備えていることはもちろんであ
る。(Search Using Information Acquisition Time) Display information acquired within 3 days, display information acquired within one week, display all and search using acquisition date and time. Of course, it has a user interface that sorts and outputs the data in the order of acquisition date and time.

【００５８】（HTML解析部ページの得た得点による検
索）ページの得点によって、そのページの評価値（危険
度）を設定している。本実施形態では、危険度 High、M
iddle、Lowの三段階に分けている。危険度 Highとは、
非常に有害な情報である可能性が最も高いと判断した情
報であり、得点１６点以上とする。危険度 Middleと
は、有害である可能性が高いと判断した情報であり、得
点8点以上とする。危険度 Lowとは、有害である可能性
があるかもしれない情報であり、得点3点以上とする。
これらを利用して、危険度Highのみ表示、危険度Middle
以上を表示、危険度Low以上を表示、すべてを表示の４
段階に分けて検索を行うことが可能としている。(Search by score obtained by HTML analysis section page) The evaluation value (risk level) of the page is set by the score of the page. In this embodiment, the risk level is High, M
It is divided into three stages, middle and low. What is the high risk level?
The information is judged to have the highest possibility of being extremely harmful, and the score is 16 or more. The risk level Middle is information that is judged to be highly harmful, and the score is 8 or higher. The risk level Low is information that may be harmful, and the score is 3 or more.
By using these, only the risk level High is displayed, and the risk level Middle
Show above, Show risk level Low or above, Show all 4
It is possible to search in stages.

【００５９】（得点順、新着順にソート）ユーザに対し
ては、ページの得た得点の高い順にソートして閲覧可能
とする。また、ページの取得時間の新しい順にソートし
て表示することもできる。更に、これらの機能を併用
し、検索することが可能である。また、検索結果のタイ
トル部をクリックすることによってページの全文を閲覧
可能である。図９はページの全文を表示した状態を表
す。(Sort by score, new arrival) For users, the pages can be sorted and viewed in the descending order of score. It is also possible to sort and display the pages in the descending order of acquisition time. Furthermore, it is possible to search by using these functions together. Also, the full text of the page can be browsed by clicking the title part of the search result. FIG. 9 shows a state in which the entire text of the page is displayed.

【００６０】（データベースのバックアップ）本実施形
態に係る検索システムは、データベースに記録されたＵ
ＲＬを、その重要度に応じて順序つけられて巡回し、検
索する。従って、データベースの内容は非常に重要であ
り、突発的なシステムの異常によってデータベースの内
容が破壊された場合には、システムの運用上、重大な問
題となる。したがって巡回を始める前に、データベース
のバックアップを自動的に作成することとしている。(Backup of Database) The search system according to the present embodiment uses U recorded in the database.
The RLs are cycled and searched according to their importance. Therefore, the contents of the database are very important, and if the contents of the database are destroyed due to a sudden system abnormality, it becomes a serious problem in the operation of the system. Therefore, before starting the tour, we will automatically create a backup of the database.

【００６１】[0061]

【発明の効果】請求項１から請求項６記載の発明によれ
ば、インターネット上に反乱する情報の中からユーザが
欲する情報を的確に収集することができる検索システム
を提供することができた。また、請求項７から請求項１
２記載の発明によれば、インターネット上に反乱する情
報の中からユーザが欲する情報を的確に収集することが
できる検索方法を提供することができた。また、請求項
１３から請求項１８記載の発明によれば、インターネッ
ト上に反乱する情報の中からユーザが欲する情報を的確
に収集することができる検索プログラムを提供すること
ができた。According to the inventions of claims 1 to 6, it is possible to provide a search system capable of accurately collecting information desired by a user from among rebellious information on the Internet. Further, claim 7 to claim 1
According to the invention described in 2, it is possible to provide a search method capable of accurately collecting the information desired by the user from the information that rebels on the Internet. Further, according to the inventions of claims 13 to 18, it is possible to provide a search program capable of accurately collecting the information desired by the user from the information revolting on the Internet.

[Brief description of drawings]

【図１】本発明の実施形態の概念図である。FIG. 1 is a conceptual diagram of an embodiment of the present invention.

【図２】連想配列の構造を示す図である。FIG. 2 is a diagram showing a structure of an associative array.

【図３】ページ取得部の概念を示した図である。FIG. 3 is a diagram showing a concept of a page acquisition unit.

【図４】ＨＴＭＬの解析手段の概念を示した図である。FIG. 4 is a diagram showing the concept of HTML analysis means.

【図５】ページ内容解析の機能を概念的に示した図であ
る。FIG. 5 is a diagram conceptually showing a page content analysis function.

【図６】本実施形態による得点およびユーザの評価選択
が可能な画面表示を示した図である。FIG. 6 is a diagram showing a screen display on which a score and a user's evaluation selection can be made according to the present embodiment.

【図７】データベース処理部の機能を図示した図であ
る。FIG. 7 is a diagram illustrating functions of a database processing unit.

【図８】ウェブブラウザソフトを用いたインタフェイス
の一例を示した図である。FIG. 8 is a diagram showing an example of an interface using web browser software.

【図９】全文閲覧の例を示した図である。FIG. 9 is a diagram showing an example of full text browsing.

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ０６Ｆ 13/00 ５４０Ｇ０６Ｆ 13/00 ５４０Ｅ (71)出願人 501247223 神田豊實東京都町田市中町１−10−21 (72)発明者森井昌克徳島県徳島市助任本町326 Ｆターム(参考） 5B075 ND03 NK02 NR05 NR15 PP22 PR04 PR08 ─────────────────────────────────────────────────── ─── Continuation of front page (51) Int.Cl. ⁷ Identification code FI theme code (reference) G06F 13/00 540 G06F 13/00 540E (71) Applicant 501247223 Kanda Toyotomi 1-10 Nakamachi, Machida-shi, Tokyo 21 (72) Inventor Masakatsu Morii 326 H-term, Tokushima City, Tokushima Prefecture F-term (reference) 5B075 ND03 NK02 NR05 NR15 PP22 PR04 PR08

Claims

[Claims]

1. A keyword creating means having a function of creating a keyword for content search on the Internet and scoring the importance of the keyword, and scoring text data in the content according to the keyword or the like. Content evaluation means for performing the content in the search results obtained based on the keyword created by the keyword creation means, the content evaluation means to evaluate and score, and the content scored. By comparing the keywords with each other, the importance of the keywords is scored, and the keyword creating means creates the keywords to be used in the next search based on the importance of the keywords that are scored. Predictive analysis type search system characterized by updating .

2. A keyword creating means for creating a keyword for content search on the Internet and scoring the importance of the keyword, and a content evaluating means for scoring text data in the content according to the keyword or the like. A URL that collects a URL that can predict that the content desired by the user will appear using the importance of the keyword scored by the keyword creating means and the content scored by the content evaluation means.
A predictive analysis type search system including a collecting means.

3. The content evaluation means comprises tag information analysis means for analyzing tag information in the content.
Alternatively, the predictive analysis type search system described in 2.

4. The keyword creating means comprises input accepting means for accepting a keyword input by a user, and the keyword creating means uses the keyword accepted by the input accepting means to score again. Item 4. The predictive analysis type search system according to any one of Items 3.

5. The keyword creating means has a function of automatically generating a new keyword by extracting a new word or context from the collected information based on the evaluation result of the content evaluation means. The predictive analysis type search system according to any one of claims 1 to 4.

6. A sort means for sorting the contents scored by the content evaluation means in descending order of the score, and a sort result by the sort means can be output to a user interface for providing a search result to a user. The predictive analysis type search system according to any one of claims 1 to 4.

7. A keyword creation procedure having a function of creating a keyword for content search on the Internet and scoring the importance of the keyword, and scoring text data in the content according to the keyword or the like. And a content in a search result obtained based on the keyword created by the keyword creation procedure, and scored by evaluating the content by the content evaluation procedure. By comparing the keywords, the importance of the keywords is scored, and the keywords used in the next search are created based on the importance of the keywords scored by the keyword creation procedure. A predictive analysis type search method characterized by updating.

8. A keyword creating procedure for creating a keyword for content search on the Internet and scoring the importance of the keyword, and a content evaluation procedure for scoring text data in the content according to the keyword or the like. A URL that collects a URL at which it can be predicted that the content desired by the user will appear using the importance of the keyword scored by the keyword creation procedure and the content scored by the content evaluation procedure.
A predictive analysis type search method comprising a collecting procedure.

9. The content evaluation procedure comprises a tag information analysis procedure for analyzing tag information in the content.
Alternatively, the predictive analysis type search method according to item 8.

10. The keyword creation procedure comprises an input acceptance procedure for accepting a keyword input by a user, and a re-scoring procedure for re-scoring using the keyword accepted by the input acceptance procedure. Item 10. The predictive analysis type search method according to any one of Items 9.

11. The keyword creating procedure comprises a procedure for automatically generating a new keyword by extracting a new wording or context from the collected information based on the evaluation result of the content evaluation procedure. The predictive analysis type search method according to any one of claims 7 to 10.

12. The method according to claim 7, further comprising a sorting procedure for sorting the contents scored by the content evaluation procedure in descending order of the score, and a user output procedure for outputting the sorting result by the sorting procedure to the user. 11
The predictive analysis type search method described in any one of 1.

13. A program for realizing a predictive analysis type search method for a computer, the program creating a keyword for content search on the Internet and scoring the importance of the keyword. It has a keyword creation procedure with a function to perform, and a content evaluation procedure for scoring text data in the content according to keywords, etc., and a search result obtained based on the keywords created by the keyword creation procedure The content is evaluated and scored by the content evaluation procedure, the scored content is compared with the keyword, and the importance of the keyword is scored, and is used at the next search by the keyword creation procedure. Keywords are scored based on the importance of the keywords Program for executing create, that thereby continue to update the keywords to be used during the search on the computer Te.

14. A program for causing a computer to realize a predictive analysis type search method, which program creates a keyword for content search on the Internet and scores the importance of the keyword. Using the keyword creation procedure, the content evaluation procedure for scoring the text data in the content according to the keyword, etc., the importance of the keyword scored by the keyword creation procedure and the content scored by the content evaluation procedure , A URL that collects URLs from which the user can expect that the desired content will appear
A program that causes a computer to perform the collection procedure.

15. The program according to claim 13, wherein the content evaluation procedure includes a tag information analysis procedure for analyzing tag information in the content.

16. The method according to claim 13, wherein the keyword creation procedure comprises an input acceptance procedure for accepting a keyword input by a user, and a rescoring procedure for rescoring using the keyword accepted by the input acceptance procedure. Item 16. The program according to any one of Items 15.

17. The keyword creating procedure includes a procedure for automatically generating a new keyword by extracting a new wording or context from the collected information based on the evaluation result of the content evaluation procedure. The program according to any one of claims 13 to 16.

18. The method according to claim 13, further comprising a sorting procedure for sorting the contents scored by the content evaluation procedure in descending order of the score, and a user output procedure for outputting the sorting result by the sorting procedure to the user. 1
7. The program according to any one of 7.