JP3896014B2

JP3896014B2 - Information collection system, information collection method, and program causing computer to collect information

Info

Publication number: JP3896014B2
Application number: JP2002081642A
Authority: JP
Inventors: 和之後藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-03-22
Filing date: 2002-03-22
Publication date: 2007-03-22
Anticipated expiration: 2022-03-22
Also published as: JP2003281173A

Description

【０００１】
【発明の属する技術分野】
本発明は、インターネットやイントラネットなどの情報ネットワークに分散して配置された複数の情報源から、ユーザの要求を満足する情報を収集する情報収集システムに関する。
【０００２】
【従来の技術】
大規模情報ネットワーク技術の普及により、誰もが大量の情報を自由に利用できるようになっている。インターネットやイントラネットなどの情報ネットワーク上には、大量の情報がハイパーテキスト形式の文書（ウェブページ）として公開されており、その数は数十億とも言われている。これらの情報を利用する方法として、ブラウザと呼ばれる情報閲覧ソフトウェアを用い、関心のある項目（ハイパーリンク）を選択していく（ブラウジングする）方法が用いられるのが一般的である。また、大量の情報の中から、キーワード等で指定した条件を満足する情報を検索するための検索サービスサイトや、情報を利用しやすい形に分類して提供するディレクトリサイトが、各種運用されている。ユーザは、所望の情報を得るためには、まず、検索サービスサイトやディレクトリサイトを用いて自分の関心に合致しそうな文書を求めた後に、その文書の内容や、その文書にリンクされた他の文書の内容を、ブラウジングすることによって調べるという、一連の作業を繰返し行う。また、頻繁に利用する情報や特に重要な情報については、ブラウザの付属機能であるブックマークと呼ばれる手段を用いてその情報の位置（ＵＲＬ）を記憶したり、有用な情報の位置をリストアップした文書（リンク集）を作成して利用することが行われている。
【０００３】
しかし、大量の情報の中から、検索やブラウジングによって必要な情報を集める作業は時間と労力を要する。また、検索サービスサイトやディレクトリサイトでは、最新の情報や、専門性の高い情報が十分に提供されていないという問題もある。これらの問題を解決する技術の一つに、自動クローリング技術が知られている。これは、ハイパーテキストのハイパーリンクを再起的に辿る（すなわち、クローリングする）ソフトウェア（すなわち、クローラ）を用いて、大量の文書情報を自動的に走査し、ユーザが指定した条件を満足する文書を収集する方法である。ユーザがクローラに与えることのできる収集条件には、収集する文書の個数・容量の制限や、収集を開始する起点の文書、起点の文書から辿るリンクの段数の上限、収集する範囲（ウェブサーバのドメインなど）、文書の更新日時の条件、などがある。また、文書の内容に関する条件としては、キーワード・フレーズ等が対象文書中で出現する頻度や、例示した文書と対象文書との類似度、ユーザの興味・関心の記述（プロファイル）と対象文書との類似度、などについての条件がある。さらには、対象文書の重要度を、アクセス数やハイパーリンクの構造に基づいて計算し、重要度の大きい文書を優先的に収集する方法なども提案されている。自動クローリング技術に関する公知文献には、"Focused Crawling: A New Approach for Topic-Specific Resource Discovery", Soumen Chakrabarti他, The Eighth International World Wide Web Conference, 1999（以下、「文献１」と称する）や、特開平１０−２６０９７８号公報「情報収集方法及び装置」（以下、「文献２」と称する）などがある。
【０００４】
一方、複数のユーザが互いに情報を交換するための手段としては、電子メールおよびメーリングリスト、電子掲示板、チャットなどの手段が、広く普及している。メーリングリストは、複数のユーザの電子メールアドレスをまとめて、その全員に一括してメッセージを送信できるようにした手段である。また、電子掲示板は、ネットワーク上に情報共有のためのスペースを設けて、複数の登録ユーザあるいは匿名ユーザが自由にメッセージを記入できるようにした手段である。チャットは、電子掲示板と同様に情報共有スペースを設けて、テキストのメッセージをリアルタイムに送受信できるようにした手段である。メーリングリストや電子掲示板、チャット等のように、比較的多数のユーザによる（一対一のみでない）メッセージの交換を目的としたコミュニケーション手段では、参加メンバーの大部分が共通に関心を持つ話題に関するメッセージがやり取りされることが多い。このように、共通の目的や話題を持って電子的なメッセージを交換するユーザの集団を、本明細書においては、以下、「コミュニティ」と称する。
【０００５】
コミュニティのメンバーの一人が有用な情報を得た場合、上述のコミュニケーション手段を用いて他のメンバーに通知することによって、メンバー間で情報を共有するということが日常的に行われている。このようにして交換される情報のうち、とくに有用な情報については、メンバーの有志が自発的に、有用な情報を手作業でリストアップし、他のメンバーが利用しやすいようにリンク集などの形に整理し、定期的に保守するということが行われる場合もある。コミュニティのメンバーが関心を持つ話題は、コミュニティの趣旨を逸脱しない範囲内にある場合が多いが、多少は動的に変遷する。コミュニティのメンバーがどのような話題に関心を持っているかを自動的に調べる技術については、特開２０００−２９３５２６号公報「嗜好情報収集システム」（以下、「文献３」と称する）や、特開２００１−９２７５５号公報「プロファイル作成方法及びシステム」（以下、「文献４」と称する）などの公知文献がある。
【０００６】
【発明が解決しようとする課題】
自動クローリングは、収集に要する時間とネットワーク資源の消費が大きいわりに収集の効率が良くないという問題がある。インターネットからのクローリングによる収穫率、すなわち、収集したウェブページの中にユーザの要求と関連する情報が含まれる割合は、最良の場合で50%程度とされており（文献１）、残りの50%のページは利用されずに捨てられることになる。文献１と文献２では、収集の効率を改善するための方法が開示されているが、そもそもインターネット上には、有用でない情報も多数含まれている。例えば、ユーザの収集要求をキーワード集合で記述した場合、そのキーワード集合を多く含んだ文書でさえ、ユーザにとって実際に有用であるとは限らず、古い情報や誤った情報、冗長な情報である可能性がある。したがって、収集効率の改善には限界があり、収集された情報が有用かどうかの判断はユーザに委ねざるを得ない。また、個々のユーザが個別にクローラを利用することは、通信ネットワークやプロキシサーバ、ウェブサーバなどにかかる負荷が大きくなるため、現実的でない。従って、より効率的な収集方法と、収集結果を無駄にせずに再利用する方法が望まれる。
【０００７】
さらに、クローリングによってウェブページを収集するには、収集の条件として、収集を開始する起点のＵＲＬや収集する範囲、キーワードなどの条件をユーザが指定する必要がある。しかし、どのような条件を指定すれば有用な情報が得られるかが不可知である上、上述のように収集効率が良くない。従って、一般的に、検索サービスサイトや、配信型の情報フィルタリングシステムと比較して、クローラを利用するには熟練を要する。このため、有用な情報を効率よく収集するための知識やノウハウをユーザ間で共有することが望まれる。
【０００８】
以上のような理由のため、クローラは、主に、検索サービスサイトが、任意の内容のウェブページを大量に収集してインデキシングする目的と、既知の限定されたウェブサイトを定期的に巡回して、更新された情報の有無を監視する目的に利用されるにとどまっている。従って、クローラが、未知の情報源から積極的に情報を収集したり、潜在的にユーザの関心に合致するであろう新しい情報を発見したりする目的に活用されていないのが現状である。
【０００９】
一方、コミュニティのメンバーが電子掲示板等の従来のコミュニケーション手段を用いて情報をやり取りする方法では、メンバー各々の知識や専門性を生かした情報の共有を柔軟に行うことができる。しかしこの方法は、個々のユーザの能力と自発性に依存するところが大きい。有用な情報を探して他のメンバーに知らせる作業は労力を要するし、そもそも、コミュニティのメンバー全員が知らないような新しい情報を発見することは不可能である。文献３と文献４には、コミュニティでやり取りされるメッセージを解析して、ユーザの関心や嗜好（プロファイル）を求める発明が開示されているが、これらの発明は、コミュニティのメンバーの関心・嗜好に合った情報を新たに収集する手段を提供するものではない。
【００１０】
また、有用な情報が個々のメンバーの努力によって数多く得られたとしても、その各々が未整理のまま別々のメッセージに分散している状態では、収集した情報を有効活用することができない。有用な情報を大量のメッセージの中から選び出してコミュニティのメンバー間で共有できる形に整理する作業には労力を要するが、その作業もメンバー各々の自発的な手作業に負っている。文献３に係る発明はユーザの嗜好調査、文献４に係る発明は、ユーザを関心・嗜好に基づいてカテゴライズした結果を明示することにより、コミュニケーションの円滑化を図ることを目的とする。いずれの発明も、コミュニティのメンバーのために有用な情報を整理したり保守するという作業を支援するものではない。
【００１１】
本発明は、上記の課題を解決するためになされたものであり、ユーザの要求を満足する情報を効率よく収集するとともに、その収集結果を複数のユーザで有効に活用し、かつ、有用な情報を継続的に整理・保守する作業を支援することを目的とする。
【００１２】
【課題を解決するための手段】
前記課題を解決するために、本発明に係る情報収集システムは、ユーザの要求を満足する情報を収集して提示する情報収集システムにおいて、それぞれ複数のユーザをメンバーとする複数のコミュニティを管理するコミュニティ管理手段と、各コミュニティに属するメンバーがメッセージの送受信を行うためのメッセージ送受信手段と、前記複数のコミュニティの各々で共有されている情報をユーザが閲覧するためのコミュニティ情報提示手段と、各コミュニティに属するメンバーが共同で、当該コミュニティにおける収集要求を編集するための収集要求編集手段と、各複数のコミュニティにおいて編集された複数の収集要求のいずれかを満足する情報を情報ネットワーク上の複数の情報源から収集する情報収集手段と、前記収集した情報に基づいて前記複数の収集要求の各々に対応する収集結果を各々生成する収集結果生成手段と、各コミュニティに属するメンバーが共同で、当該コミュニティにおける収集結果を編集するための収集結果編集手段と、を具備し、前記コミュニティ情報提示手段は、複数のコミュニティで各々作成された複数の収集結果を、当該コミュニティまたは当該コミュニティ内で送受信されるメッセージと関連付けて、当該コミュニティのメンバーおよび非メンバーのユーザに提示することを特徴とする。
【００１３】
本発明に係る情報収集システムの好ましい実施態様は以下のとおりである。なお、以下の各実施態様は、単独で適用しても良いし、適宜組み合わせて適用しても良い。
【００１４】
（１）コミュニティのメンバーが前記メッセージ送受信手段を用いて送受信するメッセージに基づき、当該コミュニティの収集要求及び当該コミュニティの収集結果の少なくとも一方を自動的に更新すること。
【００１５】
（２）コミュニティのメンバーが前記収集結果編集手段を用いて行った収集結果の編集内容に基づき、当該収集結果に対応する収集要求を更新すること。
【００１６】
（３）コミュニティの収集結果と、当該コミュニティの収集結果に含まれる情報を重複して含む他のコミュニティの収集結果とを関連付けて提示すること。
【００１７】
（４）ユーザが入力する検索条件を満足する情報を、前記情報収集手段で収集した情報の中から検索する収集情報検索手段をさらに具備し、当該収集情報検索手段は、検索された情報と、コミュニティで作成した収集結果のうち前記検索された情報を含む収集結果とを、関連付けて提示すること。
【００１８】
本発明に係る情報収集方法は、ユーザの要求を満足する情報を収集して提示する情報収集方法において、各コミュニティに属するメンバーが共同で、当該コミュニティにおける収集要求を編集し、各コミュニティにおいて編集された複数の収集要求のいずれかを満足する情報を情報ネットワーク上の複数の情報源から収集し、前記収集した情報に基づいて前記複数の収集要求の各々に対応する収集結果を各々生成し、各コミュニティに属するメンバーが共同で、当該コミュニティにおける収集結果を編集し、複数のコミュニティが各々作成された複数の収集結果を、当該コミュニティまたは当該コミュニティ内で送受信されるメッセージと関連付けて、前記複数のコミュニティの各々で共有されている情報を、当該コミュニティのメンバーおよび非メンバーのユーザに提示することを特徴とする。
【００１９】
本発明に係る情報収集コンピュータにユーザの要求を満足する情報を収集して提示する情報収集を実行させるプログラムは、コンピュータにユーザの要求を満足する情報を収集して提示する情報収集を実行させるプログラムにおいて、各コミュニティに属するメンバーによって共同で編集された当該コミュニティにおける収集要求を入力し、各コミュニティにおいて編集された複数の収集要求のいずれかを満足する情報を情報ネットワーク上の複数の情報源から収集し、前記収集した情報に基づいて前記複数の収集要求の各々に対応する収集結果を各々生成し、各コミュニティに属するメンバーが共同で編集された当該コミュニティにおける収集結果を入力し、複数のコミュニティで各々作成された複数の収集結果を、当該コミュニティまたは当該コミュニティ内で送受信されるメッセージと関連付けて、前記複数のコミュニティの各々で共有されている情報を、当該コミュニティのメンバーおよび非メンバーのユーザに提示することを特徴とする。
【００２０】
【発明の実施の形態】
以下、図面を用いて本発明の実施の形態を説明する。
【００２１】
図１は、本発明の一実施形態に係る情報収集システムの構成を示す図である。図１において、コミュニティ管理部１は、複数のコミュニティを管理する。すなわち、コミュニティ管理部１は、複数のコミュニティの各コミュニティにおけるメンバーである複数のユーザの情報と、各コミュニティにおけるユーザ間で送受信されるメッセージとを記憶管理する。コミュニティ管理部１は、従来技術による電子掲示板あるいはメーリングリスト等の管理手段と同様に、ユーザ情報記憶部１１とメッセージ記憶部１２を有する。通常、コミュニティのメンバーと非メンバーでは、アクセス権、すなわち、ユーザ情報の閲覧やメッセージの送受信などが行えるか否かの権限が異なるが、このコミュニティ管理部１が、そのアクセス制御を行う。また、本明細書においては、ユーザは、メンバーと非メンバーを含むものとする。また、詳細は後述するように、コミュニティ管理部１は、ユーザからの情報収集の要求を複数記憶する収集要求記憶部１３と、情報収集の結果としてユーザに提示する情報を複数記憶する収集結果記憶部１４とを有する。
【００２２】
コミュニティ情報提示部２は、複数のコミュニティの名称やメンバーなどの基本的な情報や、個々のコミュニティ内でやり取りされるメッセージや共有文書などの情報をユーザに提示する。これにより、ユーザが様々な情報を閲覧できる。
【００２３】
メッセージ送受信部３は、コミュニティのメンバーが、他のメンバーに対してメッセージを送信・受信するための手段である。メッセージ送受信部３で送受信されたメッセージは、コミュニティ毎に整理されて、メッセージ記憶部１２に記憶される。
【００２４】
収集要求編集部４は、情報収集の要求をコミュニティの複数のメンバーが共同で編集して登録するための手段であり、収集要求編集部４で編集された結果は、収集要求記憶部１３にコミュニティ毎に記憶される。同様に、収集結果編集部５は、情報収集の結果をコミュニティの複数のメンバーが利用しやすい形に編集するための手段であり、収集結果編集部５で編集された結果は、収集結果記憶部１４に、コミュニティ毎に記憶される。
【００２５】
情報収集部６は、収集要求記憶部１３に記憶された複数の収集要求を入力として、インターネットやイントラネットなどの情報ネットワークから、いずれかの収集要求を満足する情報（本実施形態の場合はウェブ文書）を収集する。情報収集部６で収集されたウェブ文書は、ウェブ文書記憶部７にインデキシングされて記憶される。
【００２６】
収集結果生成部８は、コミュニティ毎に登録された収集要求に基づき、収集したウェブ文書から要求に合致するものを選択・加工して、コミュニティ毎に収集結果を生成する。この収集結果は収集結果記憶部１４に記憶されるが、ユーザは、必要に応じて収集結果編集部５を用いて収集結果をより利用しやすい形に編集して保存することができる。
【００２７】
以上に説明した構成は、本発明を実施するための最小の構成であるが、上記の構成に加え、さらに、収集要求生成部９を備えてもよい。収集要求生成部９は、個々のコミュニティのメンバーが送受信するメッセージに基づき、当該コミュニティの収集要求を自動的に生成あるいは追加する。これと同様に、収集結果生成部８に、メッセージに基づいて収集結果を生成あるいは追加する機能を持たせることも可能である。さらに、収集結果生成部８に、ユーザが収集結果を編集した場合に、その編集内容に基づいて、対応する収集要求を変更する機能を持たせることも可能である。
【００２８】
ウェブ文書検索部１０は、情報ネットワークから収集して前記のウェブ文書記憶部７に記憶したウェブ文書を、ユーザが検索して利用するための手段である。ウェブ文書検索部１０の手段の検索機能は、従来技術によるウェブ文書の検索手段と概ね同じである。本発明の実施形態に係るウェブ文書検索部１０は、検索結果を提示する際に、収集結果記憶部１４に記憶されている収集結果を併せて提示する処理を行う機能を備えている。
【００２９】
以上に説明した本発明の実施形態に係る情報収集システムの構成と、従来の情報収集システムの構成との違いを、図２を参照して説明する。図２は、従来の一般的な情報収集システムの概略ブロック図である。図２に示す情報収集システムは、図１の構成要素でもある、収集要求編集部４、収集要求記憶部１３、情報収集部６、ウェブ文書記憶部７、収集結果生成部８、収集結果記憶部１４、収集結果編集部５、および、場合によりウェブ文書検索部１０を具備している。しかしながら、従来の情報収集システムは、収集要求の作成から収集結果の作成、編集までを一人のユーザが行うように構成されている。このため、従来の情報収集システムは、複数のユーザ、すなわちコミュニティで協力しあって情報を収集する目的には利用できない。また、従来の情報収集システムは、収集された情報や新たに収集すべき情報についての議論や情報交換といった活動を行うための手段も具備せず、加えて、収集結果を複数のユーザで共有し保守するための手段も備えていない。このような構成では、ユーザの労力が大きいだけでなく、複数のユーザによる情報収集結果の共有・再利用が行えないという問題がある。
【００３０】
以下、本発明の実施形態について詳細に説明する。
【００３１】
図３は、ユーザ情報記憶手段に記憶されるユーザの情報とコミュニティの情報を示す図である。図３（ａ）は、ユーザ情報３１の一例であり、図３（ｂ）は、コミュニティ情報３２の一例である。ユーザ情報３１は、本情報収集システムを利用する個々の登録ユーザ（所定の権限が与えられている既知のユーザ）の情報であり、ユーザＩＤ、パスワード、氏名、メールアドレス、所属コミュニティ、ホームページＵＲＬ等の項目を有する。コミュニティ情報３２は、本情報収集システムが管理するコミュニティの情報であり、コミュニティＩＤ、コミュニティ名、メーリングリストアドレス、掲示板ＵＲＬ、および、参加メンバーのユーザＩＤ等の項目を有する。メーリングリストアドレスは、コミュニティのメンバー全員に一括してメッセージを送信する際の宛先である。掲示板ＵＲＬは、メッセージを記入して共有するスペースの位置を表す。メーリングリストアドレスと掲示板ＵＲＬのどちらか一方が設定されていれば、その手段を用いてメンバー間のメッセージの交換が可能となる。メーリングリストアドレスと掲示板ＵＲＬの両方が設定されていれば、ユーザは、利用しやすい方の手段を利用することができる。また、コミュニティ情報３２のコミュニティのメンバーの項目は、ユーザ情報３１のユーザＩＤによって記述される。逆に、ユーザ情報３１の所属コミュニティの項目は、コミュニティＩＤによって記述される。
【００３２】
ユーザがコミュニティを利用して情報交換の作業を行うための手順を、図４のフローチャートを用いて説明する。まず、ユーザが登録ユーザなら（ステップ４１）、ユーザ認証を行う（ステップ４２）。ステップ４２で、認証に成功すれば（ステップ４３）、当該登録ユーザの権限でコミュニティを利用できるようになる。ユーザ認証の手続きは、従来の方法と同じく、ユーザが入力したユーザのＩＤとパスワードを認証する方法でよい。ユーザが未登録のユーザであり、かつ新規にユーザ登録を希望するなら（ステップ４４）、ユーザ登録手続き（ステップ４５）を行う。ステップ４５で、登録が正しく行えたならば（ステップ４６）、新規の登録ユーザとしての権限でコミュニティを利用できるようになる。ユーザの登録の手続きは、従来の方法と同様に、図３（ａ）に示すユーザ情報３１を、ユーザＩＤ３１のうち氏名、パスワード等の必須項目をユーザに入力させ、ユーザＩＤを新しく発行することによってなされる。以上の処理は、コミュニティ管理部１が行う。
【００３３】
その後、コミュニティ情報提示部にて、既存のコミュニティの一覧をユーザに提示する。まず、登録ユーザに対してのみ、当該ユーザが参加しているコミュニティの一覧を提示する（ステップ４７）。次に、登録ユーザと、未登録の匿名ユーザの両方に対して、非参加のコミュニティの一覧を提示する（ステップ４８）。ここで、コミュニティに参加していないユーザや匿名ユーザについては、制限した情報のみを提示する。図５は、複数のコミュニティ情報の一覧を提示例した画面の例を示す。この画面は、登録ユーザ「ａｏｋｉ」（図３のユーザＩＤがｕ１のユーザ）がユーザ認証を行った段階において提示される「ａｏｋｉのポータルページ」５１を示す図である。図５において、ユーザが参加しているコミュニティの一覧（図４のステップ４７）を示す部分５２には、「ｅコマース調査グループ」５３や、「プロ野球ファンの集い」などの参加コミュニティが提示される。各々のコミュニティに関連付けて「新着メッセージ」５４、「新着情報」５４の情報が提示される。新着メッセージとは、当該コミュニティに新しく送信されたメッセージの一覧である。また、新着情報とは、後述する情報収集の処理によって新しく収集された情報である。このように、コミュニティ情報の一覧画面では、コミュニティ毎に、メンバーが注目すべき新しい情報が明示的に提示される。一方、非参加コミュニティ５６とは、ユーザ「ａｏｋｉ」が参加していないコミュニティであり、「Ｌｉｎｕｘユーザ会」５７、「ガーデニング倶楽部」などのコミュニティが提示される。非参加コミュニティに関連付けて表示される「トピック」５８は、当該コミュニティが関心を持って情報を収集しているトピック（話題）を表すものであり、例えば「Ｌｉｎｕｘユーザ会」５７は「Ｌｉｎｕｘ」「ディストリビューション」などのトピックに関心を持つコミュニティであることが、非参加ユーザにも提示される一方で、具体的なメッセージなどの情報は非参加ユーザには提示されないようになっている。以上に説明したコミュニティ情報の提示の処理は図１のコミュニティ情報提示部２にて行われる。
【００３４】
次に、ユーザが、ある一つのコミュニティを選択してこれに加わり、メッセージ送受信などの活動をおこなうための処理の流れを説明する。ユーザが、図４のステップ４９にて選択したコミュニティに入ると、まず、ユーザがコミュニティのメンバーかどうかが確認される（ステップ４１０）。ステップ４１０において、ユーザが、当該コミュニティのメンバーでなく、かつ、コミュニティへの新規参加をユーザが希望するなら（ステップ４１１）、コミュニティへの加入手続きを行う（ステップ４１２）。この際、前記の登録ユーザのみがコミュニティへの加入対象となり、匿名ユーザはコミュニティへの参加は行えない。この加入手続き（ステップ４１２）は、図３（ｂ）に示したコミュニティ情報３２のメンバーの項目に、新規ユーザのＩＤを加えることによってなされるが、コミュニティの管理者や他のメンバーによって加入の可否を決定する手続きを含めてもよい。コミュニティのメンバーは、当該コミュニティ内でのメッセージの送受信と、収集要求・収集結果の閲覧・編集などの活動を行うことができる（ステップ４１４）。一方、コミュニティのメンバーでないユーザや、匿名ユーザは、コミュニティの利用が制限付きで許される（ステップ４１５）。図４に示した例では、非メンバーは、メッセージと収集結果の閲覧のみ許可され、編集は禁止するとして説明したが、コミュニティの性質によっては、これと異なるかたちで権限を許可あるいは禁止してもよい。ユーザは、コミュニティ内での活動を適宜行った後、コミュニティから出て（ステップ４１６）終了したり（ステップ４１７）、他のコミュニティに入って活動を行ってもよい。なお、図４では省略するが、コミュニティからの脱退やユーザ情報の変更、コミュニティの新規作成など、従来のコミュニティ管理システムで備えるべき処理機能も本発明の実施形態に係る情報収集システムは具備するものとする。さらに、本実施形態では主に、従来技術での電子掲示板と類似した画面例を用いて説明するが、メーリングリストのような手段を用い、ユーザ登録やコミュニティへの参加、情報の閲覧などの処理を電子メールで行うことも可能である。
【００３５】
図６から図８は、コミュニティ内でのユーザの活動として、メッセージの送信、収集要求の編集、収集結果の編集の処理の流れを示した図である。また、図９から図１３は、これらの処理に対応する画面の表示例である。メッセージの送受信は、本実施形態の場合は電子掲示板用の手段を用い、図１のメッセージ送受信部３にて行う。図９に示した画面例では、コミュニティ「ｅコマース調査グループ」に入ったユーザが、掲示板のメニュー９１を選択すると、画面上には最近に送信されたメッセージ９２、９４、９５等が表示される。個々のメッセージの間には返信関係が付されており、たとえばメッセージ９５とメッセージ９６は、ともにメッセージ９４の返信メッセージである。画面上でユーザが一つのメッセージを選択すると、その内容が提示されるようになっており、例えば図９では、ユーザが選択したメッセージ９６「著名なオークションサイト」（１月１０日にユーザｙａｍａｄａによって送信されたメッセージ）の内容として、テキスト９７が表示された場面を示している。メッセージのうち、後述する情報収集の結果と関連するものについては、これらが互いに関連付けられて提示される。例えば図９では、メッセージ９２「音楽配信ビジネス」に対して、このコミュニティが共同で情報収集を行っているトピック「コンテンツ配信」９３が関連付けられて表示される。
【００３６】
一方、メッセージの送信は、図６で示した処理により行われる。ユーザは、まずステップ６１で、送信したいメッセージが既存メッセージの返信であるか、あるいは、新規のメッセージであるかを選択する。この選択は、図９の画面例ではボタン９８またはボタン９９を押すことによって行う。ここで、「返信メッセージ」のボタン９８を押した場合は、図９で現在表示しているメッセージ９６に対する返信メッセージを作成することになる。図１０は返信メッセージの作成（図６のステップ６２）の画面例を表す図である。ユーザは、図１０に示す返信メッセージのタイトル１０１と本文１０２とを、必要ならば返信元のメッセージを引用して編集し、返信メッセージを作成する。その後、既存メッセージの返信メッセージとして送信する（ステップ６３）と、上記に説明した返信関係が付されてシステムに記憶される。メッセージの送信は、図１０に示す「送信」ボタン１０３を押すことによって行われる。新規メッセージの場合も、既存メッセージの返信メッセージの送信と同様に、図６のステップ６４、ステップ６５の処理を経て送信が行われる。送信されたメッセージは図１のメッセージ記憶部１２に記憶され、コミュニティのメンバーが図９で説明した形で閲覧したり、新しいメッセージを返信したりすることができるようになる。
【００３７】
図７は、収集要求をコミュニティのメンバーが編集する処理の流れを示す図である。収集要求は、本明細書においては、ユーザが、どのような情報を収集したいかの要求や条件を記述したデータをいい、図１の情報収集部６の入力となる。本実施形態では、収集要求をコミュニティの複数のメンバーが共同で編集することとしているため、編集内容の整合性を保つ必要がある。このため、まず、既に収集要求が存在するかどうかを確認する（ステップ７１）。ステップ７１において、収集要求が存在していない場合には、新規の収集要求を作成する（ステップ７６）。ステップ７１において、既に収集要求が存在する場合には、収集要求が他のユーザにチェックアウトされていないことを確認する（ステップ７２）。この確認後に、ユーザが収集要求を編集できるようになる。ステップ７２において、収集要求が他のユーザにチェックアウトされていなければ（ステップ７２のＹｅｓ）、まず、編集対象の収集要求が当該ユーザにチェックアウトされる（ステップ７３）。そして、ユーザによる編集作業（ステップ７４）の後に、チェックイン（ステップ７５）を経て、システムへの登録（ステップ７７）が行われる。なお、ステップ７２において、収集要求が他のユーザにチェックアウトされていれば（ステップ７２のＮｏ）、当該ユーザの収集要求は編集できないので、そのまま終了する。
【００３８】
以上説明した収集要求編集処理は、図１の収集要求編集部４にて行われ、編集された結果は収集要求記憶部１３に記憶される。なお、編集された収集要求は、過去の収集要求と置き換えて記憶してもよいし、過去のリビジョンを保存しておいて、編集毎に新たな収集要求を追加記憶してもよい。
【００３９】
図１１には、収集要求を編集する画面の例を示す。ユーザが画面上で収集要求のメニュー１１１を選択すると、収集要求を編集するための手段が表示される。コミュニティ内で収集を行いたいトピックは、通常複数あると考えられるので、一つのコミュニティが作成する収集要求の中で、複数のトピックを記述することができるようにしている。
【００４０】
図１１の例では「ｅコマース調査グループ」というコミュニティの収集要求の例として、「電子モール」「コンテンツ配信」「オンライン・トレード」のトピックが示されている。ユーザは、これらの既存のトピックの他に新しいトピックを追加したり（ボタン１１６）、不要となったトピックを削除したり（ボタン１１３）といった編集も可能である。なお、図７で説明したチェックアウト・チェックインの処理単位は、収集要求全体を１つの処理単位とするのでなく、トピックを１つの処理単位としてもよい。個々のトピック毎に記述するデータとしては、図１１に示すように、トピックの名称１１２、キーワード１１４、収集起点ＵＲＬ１１５がある。キーワード１１４は、収集した情報（本実施形態の場合はウェブ文書）がその内容に含むべきキーワードの論理式を記述する項目である。また、収集起点ＵＲＬは、クローリングを開始するウェブ文書のＵＲＬを記述する項目である。収集起点ＵＲＬは、必ずしも設定する必要はない。なぜならば、あるトピックの収集起点ＵＲＬが未指定であっても、複数のコミュニティが複数のトピックに記述した収集起点ＵＲＬのいずれかからクローリングすることによって、ユーザが所望する当該トピックの情報が収集できる可能性が高いからである。また、場合によっては、デフォルトの収集起点ＵＲＬとして、代表的なディレクトリサイト等を選ぶことにしてもよい。以上説明した項目を図１１の画面上で編集した後、「登録」ボタン１１７を押すことによって、編集後の収集要求がシステムに登録される。
【００４１】
図８は、収集結果をコミュニティのメンバーが編集する処理の流れを示す図である。収集結果は、情報要求に応じてシステムが収集した情報を、コミュニティのメンバーが利用しやすい形式に加工したデータをいい、主には図１の収集結果生成部８の出力である。収集結果は、必ずしもクローリングによって収集した情報のみからなるわけでなく、ユーザが明示的に有用と思う情報を記述してもよいし、後述するように、コミュニティのメンバー間で送受信されるメッセージに含まれる情報を追加してもよい。本実施形態では、前述の収集要求と同様に、収集結果もコミュニティの複数のメンバーが共同で編集することとしているため、編集内容の整合性を保つ必要がある。このため、まず、既に収集結果が存在するかどうかを確認する（ステップ８１）。ステップ８１において、収集結果が存在していない場合には、新規の収集結果を作成する（ステップ８６）。ステップ８１において、既に収集要求が存在する場合には、収集結果が他のユーザにチェックアウトされていないことを確認する（ステップ８２）。この確認後に、ユーザが編集できるようになる。ステップ８２において、収集結果が他のユーザにチェックアウトされていなければ（ステップ８２のＹｅｓ）、まず、編集対象の収集結果がチェックアウトされる（ステップ８３）。そして、ユーザによる編集作業（ステップ８４）の後に、チェックイン（ステップ８５）を経て、システムへの登録（ステップ８７）が行われる。なお、ステップ８２において、収集結果が他のユーザにチェックアウトされていれば（ステップ８２のＮｏ）、当該ユーザの収集結果は編集できないので、そのまま終了する。
【００４２】
以上説明した収集結果編集処理は、図１の収集結果編集部５にて行われ、編集された結果は収集結果記憶部１４に記憶される。図１２には、収集結果を表示する画面の例を示す。ユーザが画面上で収集結果のメニュー１２１を選択すると、収集結果を表示するための手段が表示される。収集結果は、上述の収集要求のトピック毎に整理されて表示される。図１２の例では、「ｅコマース調査グループ」の収集結果として、「電子モール」１２２、「コンテンツ配信」１２６等のトピック毎に整理されて情報が表示されている。さらに、個々のトピック中の情報は、サイト別に整理される。サイトは、インターネットにおける情報サービスの主体であり、情報源の単位でもある。図１２の例では、トピック「電子モール」１２２の中にサイト「○○モール」１２３が分類されている。テキスト１２４は、「○○モール」１２３を説明するコメント文であって、コミュニティのメンバーが当該サイトの内容を理解しやすいように、メンバーの一人または複数が共同で作成したテキストである。個々のサイトの中で特に有用な情報や、新しい情報については、図１２に示したように、サイト内の詳細情報１２５として提示する。
【００４３】
クローリングによる情報収集の結果としては、このような既知のサイト内の情報が収集される場合（図１２の情報１２５参照）と、新しいサイトが収集される場合（図１２の情報１２８の例）がある。後者の場合、新しいサイトを説明するテキストはまだユーザによって作成されていないため、当該サイトのウェブ文書のテキストがそのまま提示される（図１２の情報１２９参照）が、これをより理解しやすいコメント文に直す必要がある。また一般に、クローラによって収集された情報は全てが有用な情報とは限らず、コミュニティのメンバーが共有するに値する情報を取捨・整理する作業が必要である。収集結果編集部５は、この作業をコミュニティの複数のメンバーが行うために設けられた手段であり、図１３は収集結果を編集するための画面の例である。
【００４４】
ユーザが図１２で示した画面上の「編集」ボタン（１２１０）を押すと、図１３に示すような画面が表示される。収集結果は上述のように、複数のトピック（「電子モール」１３１等）によって整理され、さらにトピックは、サイト（「○○モール」１３４等）によって整理される。ユーザは、新しいトピックの追加と不要なトピックの削除を行うことができる（図１３のボタン１３１１、１３３）。さらに、新しいサイトの追加と不要なサイトの削除を行うことができる（図１３のボタン１３２、１３６）。個々のサイト毎に編集すべき項目としては、サイト名１３４、サイトのＵＲＬ１３５、サイトを説明するためのコメント文１３７、および、サイト内の詳細情報１３８である。このうち、クローリングによる情報収集で自動的に獲得できないデータはコメント文なので、ユーザの編集作業としては、コメント文を作成することが主な作業の一つであるが、これは、当該サイトのウェブ文書から取得したテキストをもとに作成すればよい。その他の作業としては、サイトや詳細情報を取捨して不要なものを削除する作業が主となる。
【００４５】
以上の説明では、ユーザがコミュニティ内で行う活動と、そのために提供された本発明の実施形態に係る手段を中心に説明したが、以下は、ユーザが要求する情報を情報ネットワークから収集してユーザの要求に合った収集結果を生成する処理について説明する。図１４は、図１の情報収集部６が行う処理の流れを表す図である。また、図１４の処理の複数のステップから、収集した情報を収集結果に加える処理である図１５の処理が呼び出されるが、これは図１の収集結果生成部８が行う処理である。
【００４６】
情報収集部６は、収集対象の候補であるＵＲＬの集合を保持し、その個々のＵＲＬについて、ウェブ文書を既に取得したかどうかに係る情報や、最後に取得した日時、当該ＵＲＬのリンク元ＵＲＬおよびそのリンクのアンカーテキストの情報を、図１のウェブ文書記憶部７に記憶する。このＵＲＬ集合をＵとする。また、全コミュニティが作成する収集要求の集合をＲとする。
【００４７】
まず、Ｕの初期値を空集合とする（ステップ１４１）。その後、Ｒに新しい収集要求ｒが作成されるたびに、個々のｒのトピックの収集起点ＵＲＬとして新しいＵＲＬが登録されたかどうかをチェックする（ステップ１４２）。新しいＵＲＬｕ（以下、単に、「ｕ」とのみ表記する）が登録されれば、そのスコアを計算する（ステップ１４３）。ここで、ｕの、ある収集要求ｒに対するスコアｓ（ｕ，ｒ）は、次式で計算する。
【００４８】
【数１】

【００４９】
ここで、α、β、γは定数である。ｖはＵに含まれるＵＲＬ（以下、単に、「ｖ」とのみ表記する）であり、かつ、ｖはｕのリンク元であるとする。ｓ（ｖ，ｒ）はｖの収集要求ｒに対するスコアである。また、ａ：ｖ→ｕはｖからｕへのリンクに付されたアンカーテキストである。ｓｉｍ（ａ，ｒ）は、アンカーテキストａと収集要求ｒのキーワード集合との類似度である。ｄｕはｕのウェブ文書のテキストである。ｓｉｍ（ｄｕ，ｒ）はｄｕのテキストと収集要求ｒのキーワード集合との類似度である。収集要求ｒのキーワード集合とは、収集要求ｒの全てのトピックに記述されたキーワードの論理式に出現する（否定表現以外の）すべてのキーワードである。テキストｔとキーワード集合との類似度は、キーワードｋの重みｗｋにテキストｔ中のｋの頻度ｆ（ｔ，ｋ）を乗じた値を、キーワード集合の個々の要素について合計をとった値として計算する。すなわち、
【数２】

とする。ｎｒは収集要求ｒのキーワード集合の要素数である。キーワードの重みｗｋはＩＤＦ（Inverted Document Frequency：すなわち、より多くのテキストに現れるキーワードほど値が小さくなる重み）で求めるのが一般的である。また、頻度ｆ（ｔ，ｋ）は、単純にテキストｔ中のキーワードｋの出現回数としてもよいが、テキストｔのテキスト長によって正規化した値であってもよい。ｓ（ｕ，ｒ）を計算する時点でｄｕすなわちｕのウェブ文書が未取得である場合は、ｓｉｍ（ｄｕ，ｒ）の値は０とする。上記の式から分かるように、ｄｕが未取得であっても、ｕが収集要求ｒを満足する可能性の大小が、ｕをリンクするｖのスコアや、そのリンクのアンカーテキストに基づいて推測できる。このようにして個々の収集要求ｒに対するｕのスコアｓ（ｕ，ｒ）が求められるが、Ｒ中の全ての収集要求ｒについてのｓ（ｕ，ｒ）の最大値をｓ（ｕ，Ｒ）とする。すなわち、
ｓ（ｕ，Ｒ）＝Ｍａｘ｛ｓ（ｕ，ｒ）｝（ここで、ｒ∈Ｒ）
である。ｓ（ｕ，Ｒ）の値が大きいｕほど、全てのＲを考慮した上で最も優先的に収集すべきＵＲＬであるとみなすことができる。
【００５０】
ｓ（ｕ，ｒ）とｓ（ｕ，Ｒ）の計算方法は、上記に説明した方法に限らない。ウェブ文書が未取得のＵＲＬに対して、取得する優先順位が十分に精度良く決定できる計算方法であれば、他の計算方法を採用してもよい。優先順位の精度がよいほど、ウェブ文書を取得するコストに対して、収集要求を満たす情報が収集できる割合が高くなる。ｓ（ｕ，ｒ）とｓ（ｕ，Ｒ）は、図１４におけるステップ１４３とステップ１４１４のように、新たなＵＲＬに対して常に計算される。また、既知のＵＲＬに対しても、ステップ１４５とステップ１４１２のように、Ｒの内容が変更される毎、ｕのウェブ文書やｕのリンク元のスコアが変化する毎にも計算される。図１４のステップ１４４で、ある収集要求ｒのキーワードの条件が変更された場合には、ステップ１４５にて、ｓ（ｕ，ｒ）とｓ（ｕ，Ｒ）が計算し直される。
【００５１】
ｓ（ｕ，ｒ）とｓ（ｕ，Ｒ）をつねに最新の値に維持した上で、ステップ１４６では、ＵＲＬ集合Ｕの中から、ウェブ文書をまだ取得していないｕを選択するか、もしくは、最後にウェブ文書を取得してから閾値以上の時間が経過したＵＲＬで、かつ、スコアｓ（ｕ，Ｒ）が最大であるようなｕを選択する。そこで、ｕが存在すれば（ステップ１４７）、このｕが、情報ネットワークから最優先に取得すべきＵＲＬである。ステップ１４７において、ｕが一つも存在しなければ、取得すべきＵＲＬがないので、処理を終了する（ステップ１４８）か、もしくは、収集要求集合Ｒの変更の有無をチェックしつつ処理を待機することになる。ステップ１４９では、ｕのウェブ文書を取得する。本実施形態が対象とするインターネットのウェブ文書については、ＨＴＴＰプロトコルに従った取得を行う。取得に失敗すれば（ステップ１４１０）、前のステップに戻り、他のＵＲＬに対して上述の処理を繰り返し行う。取得に成功すれば、これを図１のウェブ文書記憶部７に記憶する（ステップ１４１１）。次に、ｕのウェブ文書の内容に基づいて、上述のｓｉｍ（ｄｕ，ｒ）の項を計算して、スコアｓ（ｕ，ｒ）およびｓ（ｕ，Ｒ）を計算し直す（ステップ１４１２）。その後、取得したウェブ文書のパージング（タグの解析）を行って、当該ウェブ文書がリンクするリンク先ＵＲＬを抽出し、その各々のｖについて（ステップ１４１３）、スコアｓ（ｖ，ｒ）およびｓ（ｖ，Ｒ）を計算し、ＵＲＬ集合Ｕにｖを追加する（ステップ１４１４）。情報収集部６は、以上に説明した処理を再帰的に行い、複数のコミュニティの全ての収集要求に対して、一括して並列に、要求を満たす可能性の高いウェブ文書を収集する。したがって、個々の収集要求毎に独立にクローリングを行って収集する場合と比べて、不要なウェブ文書を取得する割合が減るとともに、一つのトピックに着目したクローリングでは発見しにくいような、新たな情報を発見する機会が増えるという効果がある。
【００５２】
図１４のステップ１４５、ステップ１４１２、及びステップ１４１４でスコアを計算したＵＲＬのうち、ウェブ文書を取得済みのＵＲＬの中には、個々のコミュニティの収集結果として追加すべきものがある。あるいは逆に、収集結果の中にすでに含まれているＵＲＬのうち、収集要求の条件を満たさなくなったＵＲＬについては、これを収集結果から削除する必要がある。そこで、収集結果生成部８が行う処理を図１５を参照して説明する。
【００５３】
まず、対象とするｕのウェブ文書が取得済みであれば（ステップ１５１）、収集要求集合Ｒの中の、スコアｓ（ｕ，ｒ）が変化した収集要求について、下記の処理を繰り返し行う（ステップ１５２）。すなわち、収集要求ｒに対応する収集結果ｃに既にｕが含まれていれば（ステップ１５３）、収集要求ｒの各々のトピックにキーワードの論理式の形式で記述された条件をｕが満たすかどうかを調べる（ステップ１５４）。この処理は、ｕのウェブ文書のテキストが、収集要求ｒの論理式を満足する形でキーワードを含むかどうかを調べることによってなされる。ｕのウェブ文書のテキストが、収集要求ｒの中のどのトピックの条件も満たさなければ、ｕを収集結果ｃから削除する必要がある。しかし、過去にユーザがｕを有用であるとみなし、収集結果ｃの中にｕを含めるように明示的に編集を行ったことがある場合には（ステップ１５５）、ｕは収集結果ｃから削除しない。ステップ１５５において、明示的な編集とは、前述の図１３で示したような編集手段を用いて、ｕを追加したり、あるいはコメント文などの付加情報の作成を行う編集をいう。ステップ１５５において、ユーザが明示的な編集を行っていない場合は、ｕを収集結果ｃから削除する（ステップ１５６）。一方、ステップ１５３にて、ｕが収集結果ｃに含まれておらず、かつ、ｕが収集要求ｒの条件を満たす（ステップ１５７）ならば、ｕは収集結果ｃに追加すべきである。ただし、過去にユーザがｕを不要であるとみなし、収集結果ｃの中にｕを含めないように明示的に編集を行ったことがある場合には（ステップ１５８）、ｕを収集結果ｃに追加しない。ステップ１５８において、明示的な編集とは、前述の図１３で示したような編集手段を用いてｕを削除した場合をいう。このような場合以外は、ｕを収集結果ｃに追加する（ステップ１５９）。ここで、本実施形態の収集結果は、図１２と図１３で説明したように、トピックとサイトによって整理した形式で作成されるので、ｕを収集結果ｃの中のトピックのうち、条件を最もよく満たすトピックの中に追加する。また、ｕが既知のサイト内のＵＲＬである場合には、そのサイトの詳細情報として、図１２の情報１２５に示したような形で追加するし、未知のサイトの情報である場合には、図１２の情報１２８に示したように新しいサイトとして追加し、コメント文１２９としてウェブ文書から取得したテキストを付加する。
【００５４】
本発明の実施形態に係る情報収集システムにおいては、収集要求と収集結果を、ユーザが明示的に編集するだけなく、コミュニティ内でやり取りしたメッセージから収集要求と収集結果を自動的に更新する処理をも行う。この処理によって、動的に変化するユーザの興味・関心に常に合致するように収集要求と収集結果とを維持することができる。
【００５５】
図１６を用いて、メッセージに基づいて収集要求と収集結果を更新する処理の流れを説明する。
【００５６】
未処理のメッセージｍについて（ステップ１６１）、まず、ｍの返信メッセージを再帰的に集め、ｍを含むこれらのメッセージの集合をＭｍとする（ステップ１６２）。図１７に示したメッセージの例では、メッセージ１７１に対して、メッセージ１７２、１７３等が返信メッセージである。次に、Ｍｍのメッセージの各々から、ＵＲＬの記述、すなわち、「ｈｔｔｐ：／／」等で始まる記述を抽出して、これをＭｍ全てのメッセージについて集めたＵＲＬ集合をＵｍとする（ステップ１６３）。図１７の例では、１７４、１７６、１７８、１７１２がＵＲＬである。なお、テキスト１７１１は、ＵＲＬ１７４と同一であるし、メッセージ１７１の引用部分に含まれるので、この部分は処理しない。ステップ１６３の処理と同時に、Ｕｍの各ＵＲＬに対してメッセージ中に記述されているコメント文を抽出し、Ｕｍの各要素に対応したコメント文集合Ｄｍを得る（ステップ１６４）。ステップ１６４において、メッセージからＵＲＬへのコメント文を抽出する処理は、単純には、ＵＲＬと同一メッセージ内の同一の段落のテキストをそのまま抽出することで実現できるが、より複雑には、メッセージの返信関係に基づき、引用されているテキストまでも含めて文脈を理解し、複数のメッセージ間にまたがってコメント文を抽出する方法もある。図１７の例では、ＵＲＬ１７４に対するテキスト１７５、ＵＲＬ１７６に対するテキスト１７７、ＵＲＬ１７８に対するテキスト１７９、および、ＵＲＬ１７１２に対するテキスト１７１１が、コメント文として抽出される。また、ＵＲＬ１７１２はＵＲＬ１７１０（すなわち１７４）のサイト内のＵＲＬであり、さらに、ＵＲＬ１７１０はメッセージ１７１を引用した部分に含まれることから、テキスト１７１１およびＵＲＬ１７１２は、ＵＲＬ１７４をより詳細に説明する情報であると解釈できる。
【００５７】
このようにして、ＵＲＬ集合Ｕｍとコメント文集合Ｄｍとをメッセージ集合Ｍｍから得た後は、これを当該コミュニティの収集要求ｒ（または収集結果ｃ）の、どのトピックに追加すべきかを決定する処理を行う。
【００５８】
まず、ステップ１６５にて、収集要求ｒの各トピックに記述された収集起点ＵＲＬ（あるいは、収集結果ｃの各トピックに記述されたＵＲＬ）と、前記Ｕｍとを比較し、最も重複の多いトピックｔｍを選択することを試みる（ステップ１６５）。ＵＲＬの重複を調べる処理では、ＵＲＬが完全に一致する場合だけでなく、ＵＲＬのサイトが一致する場合も考慮する。ステップ１６５でｔｍが選択できない場合（ステップ１６６）には、収集要求ｒの各トピックに記述されたキーワード集合（あるいは収集結果ｃの各トピックに記述されたサイト名やコメント文などのテキスト）と、Ｄｍのテキストとを比較し、最も重複の多いトピックをｔｍとする（ステップ１６７）。ステップ１６７でもｔｍが選択できない場合（ステップ１６８）には、トピックを新たに作成してこれをｔｍとする（ステップ１６９）。この場合、トピック名には、メッセージのタイトルを用いる。さらに、収集要求を更新する場合には、新規トピックであるｔｍに対するキーワードとして、Ｄｍから抽出した重要語を選択する（ステップ１６１０）。ここでの重要語は、コメント文テキストに高い頻度で含まれ、かつ、他のトピックのコメント文テキストには低い頻度でしか含まれない語とする（従来の統計的手法により求めることができる）。ステップ１６５から１６１０の処理でトピックｔｍを選択もしくは作成した後、ｔｍに、先のＵｍを（収集結果の更新の場合には、Ｄｍのコメント文と関連付けて）追加する（ステップ１６１１）。
【００５９】
以上に説明した処理によって、図１７のメッセージに対して、図１８に示した収集要求、および、図１９に示した収集結果が生成される。図１８のトピック名１８１は図１７のメッセージ１７１のタイトルであり、キーワード１８２は、図１７のテキスト１７５、１７７、１７９、１７１１から抽出した重要語のＯＲからなる論理式である。また、収集起点ＵＲＬ１８３には、ＵＲＬ１７４、１７６、１７８、１７１２が設定される。ユーザは、自動的に生成されたこれらの項目を、必要ならば前述の収集要求編集手段を用いて適宜修正して、メッセージで議論された話題に関連する情報を収集するための収集要求を簡単に作成することができる。一方、図１９の収集結果については、トピック名１９１には図１７のメッセージ１７１のタイトルが用いられ、サイト１９２、１９５、１９７にはそれぞれ図１７のＵＲＬ１７４、１７６、１７８が用いられる。各サイトに対するコメント文１９３、１９６、１９８には、それぞれ、図１７のテキスト１７５、１７７、１７９が用いられる。また、メッセージ１７３の１７１１の部分は、サイト１９２の詳細情報として情報１９４に示した形で埋め込まれる。このようにして自動生成された収集結果は、常にユーザにとって利用しやすい内容に作られるとは限らず、例えばコメント文１９８のように余分なテキストが含まれる場合もある。この場合には、前述の収集結果編集手段を用いて、ユーザが見やすい形に自由に編集することが容易に行える。
【００６０】
以上に説明した処理によって、一連のメッセージＭｍに対して、収集要求あるいは収集結果のトピックｔｍが関連付けられる（ステップ１６５、１６７）か、あるいは、新たに作成される（ステップ１６９）。このようなメッセージとトピックとの関連をユーザに提示することによって、ユーザがメッセージを理解したり、メッセージと関連する情報にアクセスしたりする作業を支援することができる。これは例えば、図９に示したように、メッセージ「音楽配信ビジネス」９２に対して、関連するトピック「コンテンツ配信」９３を関連付けて表示することによって行われる。
【００６１】
一方、収集結果に対してユーザが行う編集に応じて、収集要求を自動的に更新することも可能である。この処理は、図１６で説明した処理と同様の処理で実現される。ユーザが自由な形式で記述するメッセージと異なり、収集結果は、上述の収集結果編集手段（図１３）で説明したような所定の形式で記述するため、この処理は図１６の処理よりも比較的容易に実現できる。収集要求の条件とするキーワードは、収集結果に記述されるコメント文等から作成する。
【００６２】
図１のウェブ文書検索部１０の処理の流れを、図２０を用いて説明する。ウェブ文書検索部１０は、図１の情報収集部６が収集してウェブ文書記憶部７に記憶したウェブ文書を、ユーザが検索して利用するための手段である。
【００６３】
図２０において、まず、ユーザによって検索条件ｑが入力されると（ステップ２０１）、収集済みのウェブ文書からｑを満足する文書を検索し、その結果のＵＲＬ集合をＵｑとする（ステップ２０２）。次に、Ｕｑの各々の要素ｕについて（ステップ２０３）、ｕを含む収集結果ｃを探す（ステップ２０４）。この収集結果ｃは、ｕ自体を含む収集結果であってもよいし、あるいは、ｕと同一サイトのＵＲＬや、ｕをリンクするリンク元のＵＲＬを含む収集結果であってもよい。このような収集結果ｃが存在すれば（ステップ２０５）、ｕを説明する見出しおよび説明文として収集結果ｃに記述されているサイト名、コメント文のテキストを用い、ｕと収集結果ｃとを関連付けてユーザに提示する（ステップ２０６）。収集結果ｃが存在しなければ、ｕを説明する見出しおよび説明文として、ｕのウェブ文書に記述されているタイトルや本文等のテキストを用いてｕをユーザに提示する（ステップ２０７）。
【００６４】
図２１は、図２０で説明した処理によってユーザに提示された検索結果の画面例を示す図である。ユーザが入力した検索条件「オークション」２１１に対して検索された個々のウェブ文書のＵＲＬ「ｈｔｔｐ：／／ｘｙｚ．ｃｏｍ／」２１２等に対して、見出し「○○オークション」２１３、説明文２１４等を、ステップ２０４で求めた収集結果、例えば図１９に示すサイト名１９２、コメント文１９３を用いてユーザに提示する。さらに、図２１に示すように、収集結果のトピック２１５を収集結果と関連付けて提示する。検索結果のＵＲＬと関連する収集結果がなければ、例えば、検索結果の説明文としてウェブ文書のテキストの一部２１７（一般的には、冒頭部分のテキストや、検索語が出現する近傍のテキスト）を提示する。このように、ウェブ文書からそのまま得たテキストは、意味が理解し難しかったり、必ずしもそのサイトの内容を適切に表した記述でない場合がある。これに対し、説明文２１４のように、コミュニティのメンバーが収集結果の中で記述したテキストは、簡潔で理解しやすい記述である場合が多い。また、検索結果の情報に対して図２１に示すように収集結果のトピックを関連付けて表示することにより、その情報がどのような分野・文脈の情報であるかが容易に理解できるようになる。さらに、ユーザは、当該トピックに含まれる他の有用な情報を利用することができる。あるトピックに関する情報を収集しているコミュニティは、そのトピックに関心を持つ専門家の集団であると言えるので、検索結果中の個々の情報について、どのようなコミュニティがこれを有用とみなしているか、いないかを、即座に知ることができるという効果もある。
【００６５】
以上に説明した処理は、検索結果と収集結果とを関連付けて提示する処理であったが、これと同様の方法により、あるコミュニティの収集結果に対して、他のコミュニティの収集結果を関連付けて表示することも可能である。
【００６６】
図１２の情報１２７の例では、「ｅコマース調査グループ」が「コンテンツ配信」のトピックとして収集した情報「××エンターテインメント」に対し、別のコミュニティである「カラオケ友の会」が収集した「家庭用コンテンツ」のトピック１２７が関連付けて提示される。この処理も、図２０のステップ２０４と同様に、あるＵＲＬが収集結果に含まれているかどうかを調べることって実現される。このように、検索結果や収集結果に対し、他のコミュニティが関心のあるトピックや収集した情報を関連付けて提示することは、ユーザが検索結果や収集結果を利用する際の手助けになるだけでなく、ユーザが参加していない他のコミュニティがどのようなトピックに関心を持って活動を行っているかを、知る機会を増やす働きをする。その結果、複数のコミュニティ間の交流が活発になるという効果がある。
【００６７】
本発明は、上記の発明の実施の形態に限定されるものではない。本発明の要旨を変更しない範囲で種々変形して実施できるのは勿論である。
【００６８】
【発明の効果】
以上説明したように、本発明によれば、共通の関心を持ったコミュニティのメンバーが共同で収集要求と収集結果を編集し、これを継続的に洗練・保守していくことができるので、メンバー一人一人の少ない労力の寄与によって、コミュニティ全員にとって有用な情報を収集・整理して共有することができる。さらに、コミュニティ内で日常的に行われるメッセージのやり取りに基づいて、収集要求と収集結果が自動的に更新されるので、収集要求と収集結果を編集するユーザの作業が軽減するとともに、コミュニティの活動に応じて動的に変化する関心に対応した情報収集を行うことができる。
【図面の簡単な説明】
【図１】本発明の一実施形態である情報収集システムの構成を示す図。
【図２】従来の情報収集システムの構成の一例を表す図。
【図３】ユーザ情報の例を表す図。
【図４】ユーザの登録、認証およびコミュニティへの参加の処理の流れを表す図。
【図５】コミュニティ情報の一覧提示画面の例を表す図。
【図６】メッセージの送信の処理の流れを表す図。
【図７】収集要求の編集の処理の流れを表す図。
【図８】収集結果の編集の処理の流れを表す図。
【図９】メッセージの閲覧画面の例を表す図。
【図１０】メッセージの編集画面の例を表す図。
【図１１】収集要求の編集画面の例を表す図。
【図１２】収集結果の閲覧画面の例を表す図。
【図１３】収集結果の編集画面の例を表す図。
【図１４】情報収集の処理の流れを表す図。
【図１５】収集結果の生成の処理の流れを表す図。
【図１６】メッセージから収集要求または収集結果を生成する処理の流れを表す図。
【図１７】メッセージの例を表す図。
【図１８】メッセージから生成された収集要求の例を表す図。
【図１９】メッセージから生成された収集結果の例を表す図。
【図２０】ウェブページ検索の処理の流れを表す図。
【図２１】ウェブページ検索の検索結果画面の例を表す図。
【符号の説明】
１…コミュニティ管理部
２…コミュニティ情報提示部
３…メッセージ送受信部
４…収集要求編集部
５…収集結果編集部
６…情報収集部
７…ウェブ文書記憶部
８…収集結果生成部
９…収集要求生成部
１０…ウェブ文書検索部
１１…ユーザ情報記憶部
１２…メッセージ記憶部
１３…収集要求記憶部
１４…収集結果記憶部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information collection system that collects information satisfying a user's request from a plurality of information sources distributed in an information network such as the Internet or an intranet.
[0002]
[Prior art]
With the spread of large-scale information network technology, everyone can freely use a large amount of information. A large amount of information is disclosed as hypertext documents (web pages) on information networks such as the Internet and intranet, and the number is said to be several billion. As a method of using such information, a method of selecting (browsing) an item of interest (hyperlink) using information browsing software called a browser is generally used. Also, there are various types of search service sites for searching for information that satisfies the conditions specified by keywords, etc. from a large amount of information, and directory sites that classify and provide information in a form that is easy to use. . In order to obtain the desired information, the user first searches for a document that is likely to meet his interests using a search service site or directory site, and then the content of the document or other linked documents. A series of operations of repeatedly examining the contents of a document by browsing is performed. In addition, for frequently used information and particularly important information, the location (URL) of the information is stored using a means called bookmark, which is a browser-attached function, or the location of useful information is listed. (Links) are created and used.
[0003]
However, it takes time and labor to collect necessary information from a large amount of information by searching and browsing. In addition, search service sites and directory sites have a problem that the latest information and highly specialized information are not sufficiently provided. One of the techniques for solving these problems is an automatic crawling technique. It automatically scans a large amount of document information using software (ie, a crawler) that recursively follows (ie, crawling) hypertext hyperlinks, and documents that satisfy user-specified conditions. How to collect. The collection conditions that can be given to the crawler by the user include restrictions on the number and capacity of documents to be collected, the document at the starting point of collection, the upper limit of the number of links to be traced from the document at the starting point, the range to be collected (web server Domain etc.), document update date and time conditions, etc. In addition, as conditions regarding the contents of the document, the frequency of occurrence of keywords, phrases, etc. in the target document, the similarity between the illustrated document and the target document, the description (profile) of the user's interest / interest and the target document There are conditions for similarity, etc. Furthermore, a method has been proposed in which the importance of a target document is calculated based on the number of accesses and the structure of hyperlinks, and documents with a high importance are preferentially collected. Known documents on automatic crawling techniques include “Focused Crawling: A New Approach for Topic-Specific Resource Discovery”, Soumen Chakrabarti et al., The Eighth International World Wide Web Conference, 1999 (hereinafter referred to as “Reference 1”), No. 10-260978 “Information Collection Method and Apparatus” (hereinafter referred to as “Document 2”).
[0004]
On the other hand, as means for exchanging information among a plurality of users, means such as e-mail, mailing list, electronic bulletin board, and chat are widely used. The mailing list is a means for collecting e-mail addresses of a plurality of users and sending a message to all of them at once. The electronic bulletin board is a means for providing a space for information sharing on the network so that a plurality of registered users or anonymous users can freely enter messages. Chat is a means for providing an information sharing space in the same way as an electronic bulletin board so that text messages can be transmitted and received in real time. In communication methods aimed at exchanging messages (not only one-on-one) by a relatively large number of users, such as mailing lists, electronic bulletin boards, chats, etc., messages related to topics that the majority of participating members are commonly interested in exchange Often done. A group of users exchanging electronic messages with a common purpose and topic is hereinafter referred to as a “community” in this specification.
[0005]
When one member of a community obtains useful information, it is routinely performed to share information among members by notifying other members using the communication means described above. Of the information exchanged in this way, especially useful information is voluntarily listed by member volunteers, and useful information is listed manually so that other members can use it easily. In some cases, it is organized into shapes and regularly maintained. The topics of interest to community members are often within a range that does not depart from the spirit of the community, but change somewhat dynamically. For techniques for automatically examining what topics community members are interested in, see Japanese Patent Application Laid-Open No. 2000-293526 “Preference Information Collection System” (hereinafter referred to as “Reference 3”), There are publicly known documents such as 2001-92755, “Profile creation method and system” (hereinafter referred to as “Document 4”).
[0006]
[Problems to be solved by the invention]
The automatic crawling has a problem that the collection efficiency is not good although the time required for collection and the consumption of network resources are large. Harvest rate due to crawling from the Internet, that is, the ratio of information related to user requests in the collected web pages is about 50% in the best case (Reference 1), and the remaining 50% Will be discarded without being used. References 1 and 2 disclose methods for improving the efficiency of collection, but the Internet contains a lot of information that is not useful in the first place. For example, when a user's collection request is described in a keyword set, even a document containing a large number of keyword sets is not always useful to the user, and may be old information, incorrect information, or redundant information. There is sex. Therefore, there is a limit to improving the collection efficiency, and it is left to the user to determine whether the collected information is useful. Moreover, it is not realistic for each user to use the crawler individually because the load on the communication network, the proxy server, the web server, and the like increases. Therefore, a more efficient collection method and a method of reusing the collection results without wasting them are desired.
[0007]
Furthermore, in order to collect a web page by crawling, it is necessary for the user to specify conditions such as a starting URL, a range to be collected, and a keyword as collection conditions. However, it is unknown how to specify useful conditions to obtain useful information, and the collection efficiency is not good as described above. Therefore, in general, skill is required to use a crawler as compared with a search service site or a distribution type information filtering system. For this reason, it is desired to share knowledge and know-how for efficiently collecting useful information among users.
[0008]
For the reasons described above, the crawler is mainly used by search service sites to collect and index a large number of web pages of arbitrary contents and to periodically visit known limited websites. It is used only for the purpose of monitoring the presence or absence of updated information. Therefore, the current situation is that crawlers are not utilized for the purpose of actively collecting information from unknown sources or discovering new information that would potentially match the user's interest.
[0009]
On the other hand, in a method in which members of a community exchange information using conventional communication means such as an electronic bulletin board, it is possible to flexibly share information utilizing the knowledge and expertise of each member. However, this method relies heavily on the capabilities and spontaneity of individual users. Finding useful information and letting other members know is labor intensive, and in the first place it is impossible to discover new information that all members of the community do not know.

References

3 and 4 disclose inventions that analyze messages exchanged in the community to obtain user interests and preferences (profiles). These inventions are based on the interests and preferences of community members. It does not provide a new way to collect relevant information.
[0010]
Moreover, even if a lot of useful information is obtained by the efforts of individual members, the collected information cannot be effectively used in a state where each piece is unsorted and distributed in separate messages. It takes a lot of work to select useful information from a large number of messages and arrange it in a form that can be shared among members of the community. The invention according to Document 3 aims at facilitating communication by clarifying the results of categorizing the user based on interest / preference. Neither invention supports the task of organizing and maintaining useful information for community members.
[0011]
The present invention has been made to solve the above-described problems, and efficiently collects information that satisfies the user's request, and effectively uses the collected results for a plurality of users, and is useful information. The purpose is to support the work of continuously organizing and maintaining
[0012]
[Means for Solving the Problems]
In order to solve the above problems, an information collection system according to the present invention is a community that manages a plurality of communities each of which is a member of a plurality of users in an information collection system that collects and presents information that satisfies a user's request. Management means, message transmission / reception means for members belonging to each community to send and receive messages, community information presentation means for users to view information shared by each of the plurality of communities, A collection request editing means for the members belonging to each other to edit a collection request in the community, and information satisfying any of the plurality of collection requests edited in each of the plurality of communities are information sources on the information network. Information collecting means to collect from the information collected A collection result generation means for generating a collection result corresponding to each of the plurality of collection requests, and a collection result editing means for the members belonging to each community to jointly edit the collection result in the community. The community information presenting means presents a plurality of collection results respectively created in a plurality of communities to a member of the community and non-member users in association with a message transmitted / received in the community or the community. It is characterized by doing.
[0013]
A preferred embodiment of the information collecting system according to the present invention is as follows. In addition, each following embodiment may be applied independently and may be applied in combination as appropriate.
[0014]
(1) Automatically updating at least one of the community collection request and the community collection result based on a message transmitted and received by a member of the community using the message transmission / reception means.
[0015]
(2) Update the collection request corresponding to the collection result based on the edited contents of the collection result performed by the community member using the collection result editing means.
[0016]
(3) To present a collection result of a community in association with a collection result of another community that redundantly contains information included in the collection result of the community.
[0017]
(4) It further comprises a collection information search means for searching information satisfying a search condition input by the user from the information collected by the information collection means, and the collection information search means includes the searched information, Presenting the collection results including the searched information among the collection results created by the community in association with each other.
[0018]
The information collecting method according to the present invention is an information collecting method for collecting and presenting information satisfying a user's request, wherein members belonging to each community edit the collection request in the community and edit the information in each community. Collecting information satisfying any of the plurality of collection requests from a plurality of information sources on the information network, and generating a collection result corresponding to each of the plurality of collection requests based on the collected information, The members belonging to the community collaborate to edit the collection results in the community, and associate the plurality of collection results each created by the plurality of communities with the community or a message transmitted / received within the community. Information shared by each of the community members Characterized in that it presented to the user of the beauty non-members.
[0019]
A program for causing an information collection computer according to the present invention to execute information collection for collecting and presenting information satisfying a user's request is a program for causing a computer to collect information for presenting and presenting a user's request. , Input the collection request in the community edited jointly by the members belonging to each community, and collect information satisfying any of the multiple collection requests edited in each community from multiple information sources on the information network And generating a collection result corresponding to each of the plurality of collection requests based on the collected information, and inputting a collection result in the community edited by members belonging to each community. Multiple collection results created for each In association with messages sent and received in the within the community, the information that is shared by each of the plurality of communities, characterized by presenting to the user of the members and non-members of the community.
[0020]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0021]
FIG. 1 is a diagram showing a configuration of an information collection system according to an embodiment of the present invention. In FIG. 1, a community management unit 1 manages a plurality of communities. That is, the community management unit 1 stores and manages information on a plurality of users who are members in each community of the plurality of communities and messages transmitted and received between users in each community. The community management unit 1 includes a user information storage unit 11 and a message storage unit 12 in the same manner as a conventional management means such as an electronic bulletin board or a mailing list. Usually, community members and non-members have different access rights, that is, whether or not user information can be browsed or messages can be transmitted / received, but the community management unit 1 controls the access. In this specification, a user includes a member and a non-member. As will be described in detail later, the community management unit 1 includes a collection request storage unit 13 that stores a plurality of information collection requests from the user, and a collection result storage that stores a plurality of information presented to the user as a result of information collection. Part 14.
[0022]
The community information presentation unit 2 presents basic information such as names and members of a plurality of communities, and information such as messages and shared documents exchanged in each community to the user. Thereby, a user can browse various information.
[0023]
The message transmission / reception unit 3 is a means for community members to transmit and receive messages to other members. Messages transmitted and received by the message transmitting / receiving unit 3 are organized for each community and stored in the message storage unit 12.
[0024]
The collection request editing unit 4 is a means for a plurality of community members to jointly edit and register information collection requests, and the results edited by the collection request editing unit 4 are stored in the collection request storage unit 13 in the community. It is memorized every time. Similarly, the collection result editing unit 5 is a means for editing the information collection result into a form that can be easily used by a plurality of members of the community. The result edited by the collection result editing unit 5 is a collection result storage unit. 14 is stored for each community.
[0025]
The information collection unit 6 receives a plurality of collection requests stored in the collection request storage unit 13 and receives information that satisfies any collection request from an information network such as the Internet or an intranet (in the case of this embodiment, a web document). ). The web document collected by the information collecting unit 6 is indexed and stored in the web document storage unit 7.
[0026]
Based on the collection request registered for each community, the collection result generation unit 8 selects and processes the collected web documents that match the request, and generates a collection result for each community. Although this collection result is stored in the collection result storage unit 14, the user can edit and save the collection result in a form that is easier to use by using the collection result editing unit 5 as necessary.
[0027]
The configuration described above is the minimum configuration for carrying out the present invention. However, in addition to the above configuration, a collection request generation unit 9 may be further provided. The collection request generation unit 9 automatically generates or adds a collection request for the community based on messages transmitted and received by individual community members. Similarly, the collection result generation unit 8 may have a function of generating or adding a collection result based on a message. Furthermore, when the user edits the collection result, the collection result generation unit 8 can have a function of changing the corresponding collection request based on the edited content.
[0028]
The web document search unit 10 is a means for a user to search and use web documents collected from an information network and stored in the web document storage unit 7. The search function of the means of the web document search unit 10 is almost the same as the search means for web documents according to the prior art. The web document search unit 10 according to the embodiment of the present invention has a function of performing a process of presenting the collection results stored in the collection result storage unit 14 when presenting the search results.
[0029]
The difference between the configuration of the information collection system according to the embodiment of the present invention described above and the configuration of the conventional information collection system will be described with reference to FIG. FIG. 2 is a schematic block diagram of a conventional general information collection system. The information collection system shown in FIG. 2 includes the collection request editing unit 4, the collection request storage unit 13, the information collection unit 6, the web document storage unit 7, the collection result generation unit 8, and the collection result storage unit that are also the components of FIG. 1. 14, a collection result editing unit 5, and a web document search unit 10 in some cases. However, the conventional information collection system is configured such that one user performs from collection request creation to collection result creation and editing. For this reason, the conventional information collection system cannot be used for the purpose of collecting information in cooperation with a plurality of users, that is, communities. In addition, the conventional information collection system does not have means for performing activities such as discussion and information exchange on collected information or information to be newly collected, and in addition, the collected results are shared by a plurality of users. There is no means for maintenance. With such a configuration, there is a problem that not only the user's labor is large, but also the information collection result cannot be shared and reused by a plurality of users.
[0030]
Hereinafter, embodiments of the present invention will be described in detail.
[0031]
FIG. 3 is a diagram showing user information and community information stored in the user information storage means. FIG. 3A is an example of user information 31, and FIG. 3B is an example of community information 32. The user information 31 is information of each registered user (a known user who is given a predetermined authority) who uses this information collection system, and includes a user ID, password, name, email address, affiliation community, homepage URL, etc. Have items. The community information 32 is community information managed by the information collection system, and includes items such as a community ID, a community name, a mailing list address, a bulletin board URL, and user IDs of participating members. The mailing list address is a destination for sending messages to all members of the community at once. The bulletin board URL represents the position of a space where a message is entered and shared. If either a mailing list address or a bulletin board URL is set, messages can be exchanged between members using that means. If both the mailing list address and the bulletin board URL are set, the user can use the means that is easier to use. The community member item of the community information 32 is described by the user ID of the user information 31. On the contrary, the item of the community to which the user information 31 belongs is described by the community ID.
[0032]
A procedure for a user to exchange information using a community will be described with reference to the flowchart of FIG. First, if the user is a registered user (step 41), user authentication is performed (step 42). If authentication is successful in step 42 (step 43), the community can be used with the authority of the registered user. The user authentication procedure may be a method of authenticating the user ID and password entered by the user, as in the conventional method. If the user is an unregistered user and wishes to newly register a user (step 44), a user registration procedure (step 45) is performed. If the registration is correctly performed in step 45 (step 46), the community can be used with the authority as a new registered user. As with the conventional method, the user registration procedure is to issue the user information 31 shown in FIG. 3 (a) by making the user input required items such as name and password in the user ID 31, and newly issue the user ID. Made by. The community management unit 1 performs the above processing.
[0033]
Then, the community information presentation unit presents a list of existing communities to the user. First, a list of communities in which the user participates is presented only to registered users (step 47). Next, a list of non-participating communities is presented to both registered users and unregistered anonymous users (step 48). Here, only limited information is presented for users who are not participating in the community or anonymous users. FIG. 5 shows an example of a screen on which a list of a plurality of community information is presented. This screen is a diagram showing an “aoki portal page” 51 presented at the stage when the registered user “aoki” (the user whose user ID is u1 in FIG. 3) performs user authentication. In FIG. 5, a part 52 showing a list of communities in which the user participates (step 47 in FIG. 4) shows participating communities such as “e-commerce survey group” 53 and “collection of professional baseball fans”. The Information of “new message” 54 and “new information” 54 is presented in association with each community. The new message is a list of messages newly transmitted to the community. New arrival information is information newly collected by information collection processing described later. Thus, on the community information list screen, new information that members should pay attention to is explicitly presented for each community. On the other hand, the non-participating community 56 is a community in which the user “aoki” does not participate, and a community such as “Linux user group” 57 and “gardening club” is presented. A “topic” 58 displayed in association with a non-participating community represents a topic (topic) in which the community is interested and is collecting information. For example, “Linux user group” 57 is “Linux”. While a community interested in a topic such as “distribution” is presented to non-participating users, information such as a specific message is not presented to non-participating users. The process of presenting community information described above is performed by the community information presentation unit 2 in FIG.
[0034]
Next, the flow of processing for the user to select and join a certain community and perform activities such as message transmission and reception will be described. When the user enters the community selected in step 49 of FIG. 4, it is first confirmed whether the user is a member of the community (step 410). In step 410, if the user is not a member of the community and the user desires to newly join the community (step 411), a procedure for joining the community is performed (step 412). At this time, only the registered user is an object to join the community, and the anonymous user cannot participate in the community. This subscription procedure (step 412) is performed by adding the ID of the new user to the member information of the community information 32 shown in FIG. 3B. You may include a procedure to determine The members of the community can perform activities such as transmission / reception of messages in the community and browsing / editing of collection requests / collection results (step 414). On the other hand, users who are not members of the community and anonymous users are allowed to use the community with restrictions (step 415). In the example shown in FIG. 4, it has been explained that non-members are only allowed to view messages and collection results and are prohibited from editing, but depending on the nature of the community, permission may be granted or prohibited in a different way. Good. The user may appropriately perform activities in the community and then exit from the community (step 416) or end (step 417) or enter another community to perform the activity. Although omitted in FIG. 4, the information collection system according to the embodiment of the present invention also includes processing functions that should be provided in the conventional community management system, such as withdrawal from the community, change of user information, and creation of a new community. And Furthermore, in the present embodiment, description will be given mainly using a screen example similar to the electronic bulletin board in the prior art, but processing such as user registration, community participation, and information browsing is performed using means such as a mailing list. It can also be done by email.
[0035]
FIG. 6 to FIG. 8 are diagrams showing the flow of processing of message transmission, collection request editing, and collection result editing as user activities in the community. 9 to 13 are display examples of screens corresponding to these processes. In the case of this embodiment, message transmission / reception is performed by the message transmission / reception unit 3 of FIG. In the screen example shown in FIG. 9, when a user who has entered the community “e-commerce research group” selects the menu 91 on the bulletin board, recently sent

messages

92, 94, 95, etc. are displayed on the screen. . Each message has a reply relationship. For example, the message 95 and the message 96 are reply messages of the message 94. When the user selects one message on the screen, the content is presented. For example, in FIG. 9, the message 96 “famous auction site” selected by the user (on January 10 by the user yamada) The scene where the text 97 is displayed as the content of the transmitted message) is shown. Among the messages, those related to the result of information collection described later are presented in association with each other. For example, in FIG. 9, a topic “content distribution” 93 on which information is jointly collected by this community is displayed in association with the message 92 “music distribution business”.
[0036]
On the other hand, message transmission is performed by the processing shown in FIG. In step 61, the user first selects whether the message to be transmitted is a reply to an existing message or a new message. This selection is performed by pressing the button 98 or the button 99 in the screen example of FIG. Here, when the “reply message” button 98 is pressed, a reply message for the message 96 currently displayed in FIG. 9 is created. FIG. 10 is a diagram showing an example of a screen for creating a reply message (step 62 in FIG. 6). The user edits the title 101 and the body 102 of the reply message shown in FIG. 10 by quoting the reply source message, if necessary, and creates a reply message. Thereafter, when it is transmitted as a reply message of the existing message (step 63), the reply relationship described above is added and stored in the system. The message is transmitted by pressing a “Send” button 103 shown in FIG. Also in the case of a new message, transmission is performed through the processing of step 64 and step 65 in FIG. The transmitted message is stored in the message storage unit 12 shown in FIG. 1, and community members can browse and return a new message in the manner described with reference to FIG.
[0037]
FIG. 7 is a diagram illustrating a flow of processing in which a community member edits a collection request. In this specification, the collection request refers to data describing a request or condition that the user wants to collect, and is an input to the information collection unit 6 in FIG. In this embodiment, since the collection request is edited by a plurality of members of the community, it is necessary to maintain the consistency of the editing content. For this reason, first, it is confirmed whether or not a collection request already exists (step 71). If there is no collection request in step 71, a new collection request is created (step 76). In step 71, if a collection request already exists, it is confirmed that the collection request has not been checked out by another user (step 72). After this confirmation, the user can edit the collection request. In step 72, if the collection request is not checked out by another user (Yes in step 72), first, the collection request to be edited is checked out by the user (step 73). Then, after the editing operation (step 74) by the user, registration in the system (step 77) is performed through check-in (step 75). In step 72, if the collection request is checked out by another user (No in step 72), the collection request of the user cannot be edited, and the process ends.
[0038]
The collection request editing process described above is performed by the collection request editing unit 4 in FIG. 1, and the edited result is stored in the collection request storage unit 13. The edited collection request may be stored in place of a past collection request, or a past revision may be stored, and a new collection request may be additionally stored for each editing.
[0039]
FIG. 11 shows an example of a screen for editing a collection request. When the user selects the collection request menu 111 on the screen, a means for editing the collection request is displayed. Since it is generally considered that there are a plurality of topics to be collected in a community, a plurality of topics can be described in a collection request created by one community.
[0040]
In the example of FIG. 11, topics of “electronic mall”, “content distribution”, and “online trade” are shown as examples of the collection request of the community “e-commerce research group”. In addition to these existing topics, the user can also edit such as adding new topics (button 116) and deleting unnecessary topics (button 113). Note that the check-out / check-in processing unit described with reference to FIG. 7 is not limited to the entire collection request as one processing unit, but may be a topic as one processing unit. As data described for each topic, there are a topic name 112, a keyword 114, and a collection origin URL 115 as shown in FIG. The keyword 114 is an item describing a logical expression of a keyword that should be included in the content of the collected information (in the case of this embodiment, a web document). The collection start URL is an item that describes the URL of a web document for starting crawling. It is not always necessary to set the collection starting URL. This is because even if the collection start URL of a certain topic is not specified, information on the topic desired by the user can be collected by crawling from any of the collection start URLs described in a plurality of topics by a plurality of communities. This is because the possibility is high. In some cases, a representative directory site or the like may be selected as the default collection starting URL. After editing the items described above on the screen of FIG. 11, when the “Register” button 117 is pressed, the edited collection request is registered in the system.
[0041]
FIG. 8 is a diagram illustrating a flow of processing in which a community member edits a collection result. The collection result refers to data obtained by processing the information collected by the system in response to an information request into a format that can be easily used by members of the community, and is mainly output from the collection result generation unit 8 in FIG. The collection result does not necessarily consist only of information collected by crawling, but may describe information that the user thinks is useful, and is included in messages sent and received between community members as described later. Information may be added. In the present embodiment, similar to the above-described collection request, since the collection result is edited by a plurality of members of the community, it is necessary to maintain the consistency of the editing contents. For this reason, first, it is confirmed whether or not a collection result already exists (step 81). If there is no collection result in step 81, a new collection result is created (step 86). In step 81, if there is already a collection request, it is confirmed that the collection result is not checked out by another user (step 82). After this confirmation, the user can edit. In step 82, if the collection result is not checked out by another user (Yes in step 82), first, the collection result to be edited is checked out (step 83). Then, after the editing operation by the user (step 84), registration in the system (step 87) is performed through check-in (step 85). In step 82, if the collection result is checked out by another user (No in step 82), the collection result of the user cannot be edited, and the process is terminated.
[0042]
The collection result editing process described above is performed by the collection result editing unit 5 in FIG. 1, and the edited result is stored in the collection result storage unit 14. FIG. 12 shows an example of a screen that displays the collection result. When the user selects the collection result menu 121 on the screen, a means for displaying the collection result is displayed. The collection results are arranged and displayed for each topic of the above collection request. In the example of FIG. 12, information collected by topics such as “electronic mall” 122 and “content distribution” 126 is displayed as a collection result of “e-commerce survey group”. In addition, information in individual topics is organized by site. A site is a main body of information service on the Internet and is also a unit of information source. In the example of FIG. 12, the site “XX mall” 123 is classified in the topic “electronic mall” 122. The text 124 is a comment sentence explaining “XX Mall” 123, and is a text jointly created by one or more members so that the members of the community can easily understand the contents of the site. Information that is particularly useful in each site or new information is presented as detailed information 125 in the site, as shown in FIG.
[0043]
As a result of information collection by crawling, information in such a known site is collected (see information 125 in FIG. 12) and a new site is collected (example of information 128 in FIG. 12). is there. In the latter case, the text describing the new site is still created by the user Not because The text of the web document of the site is presented as it is (see information 129 in FIG. 12), but this needs to be rewritten into a comment sentence that is easier to understand. In general, not all the information collected by the crawler is useful information, and it is necessary to sort out and organize information worthy to be shared by community members. The collection result editing unit 5 is a means provided for a plurality of members of the community to perform this work, and FIG. 13 is an example of a screen for editing the collection result.
[0044]
When the user presses an “edit” button (1210) on the screen shown in FIG. 12, a screen as shown in FIG. 13 is displayed. As described above, the collection results are organized by a plurality of topics (such as “electronic mall” 131), and the topics are further organized by site (such as “XX mall” 134). The user can add new topics and delete unnecessary topics (buttons 1311 and 133 in FIG. 13). Further, new sites can be added and unnecessary sites can be deleted (buttons 132 and 136 in FIG. 13). Items to be edited for each site are a site name 134, a site URL 135, a comment sentence 137 for describing the site, and detailed information 138 in the site. Among these, the data that cannot be automatically acquired by collecting information by crawling is a comment text. Therefore, as a user editing work, creating a comment text is one of the main works. It can be created based on text obtained from a document. The other work is mainly the work of removing unnecessary things by discarding the site and detailed information.
[0045]
In the above description, the activities performed by the user in the community and the means according to the embodiment of the present invention provided therefor have been mainly described. However, in the following, the information requested by the user is collected from the information network and the user A process for generating a collection result that meets the above request will be described. FIG. 14 is a diagram illustrating a flow of processing performed by the information collection unit 6 of FIG. 14 is called from a plurality of steps in the process of FIG. 14, which is a process performed by the collection result generation unit 8 of FIG. 1.
[0046]
The information collection unit 6 holds a set of URLs that are candidates for collection, and information about whether or not a web document has already been acquired for each URL, the date and time of the last acquisition, the link source URL of the URL And the information of the anchor text of the link is memorize | stored in the web document storage part 7 of FIG. Let this URL set be U. Also, let R be the set of collection requests created by all communities.
[0047]
First, the initial value of U is set as an empty set (step 141). Thereafter, each time a new collection request r is created in R, it is checked whether or not a new URL has been registered as the collection starting URL of each r topic (step 142). If a new URL u (hereinafter simply referred to as “u”) is registered, its score is calculated (step 143). Here, the score s (u, r) of u for a certain collection request r is calculated by the following equation.
[0048]
[Expression 1]

[0049]
Here, α, β, and γ are constants. It is assumed that v is a URL included in U (hereinafter simply expressed as “v”), and v is a link source of u. s (v, r) is a score for the collection request r of v. Moreover, a: v → u is an anchor text attached to a link from v to u. Sim (a, r) is the similarity between the anchor text a and the keyword set of the collection request r. du is the text of u's web document. Sim (du, r) is the similarity between the text of du and the keyword set of the collection request r. The keyword set of the collection request r is all keywords (other than negative expressions) that appear in the logical expressions of the keywords described in all topics of the collection request r. The similarity between the text t and the keyword set is calculated as a value obtained by multiplying the weight w of the keyword k by the frequency f (t, k) of k in the text t and totaling the individual elements of the keyword set. To do. That is,
[Expression 2]

And nr is the number of elements in the keyword set of the collection request r. The keyword weight wk is generally obtained by IDF (Inverted Document Frequency: that is, a weight that decreases as the keyword appears in more text). The frequency f (t, k) may be simply the number of appearances of the keyword k in the text t, but may be a value normalized by the text length of the text t. If du, that is, the web document of u has not been acquired at the time of calculating s (u, r), the value of sim (du, r) is set to 0. As can be seen from the above formula, even if du is not acquired, the possibility of u satisfying the collection request r can be estimated based on the score of v linking u and the anchor text of the link. . In this way, the score s (u, r) of u for each collection request r is obtained. The maximum value of s (u, r) for all the collection requests r in R is set to s (u, R). And That is,
s (u, R) = Max {s (u, r)} (where r∈R)
It is. It can be considered that u with a larger value of s (u, R) is a URL that should be collected most preferentially in consideration of all Rs.
[0050]
The calculation method of s (u, r) and s (u, R) is not limited to the method described above. Other calculation methods may be employed as long as the calculation priority can be determined with sufficient accuracy for URLs from which web documents have not been acquired. The higher the accuracy of the priority, the higher the ratio of collecting information that satisfies the collection request with respect to the cost of acquiring the web document. s (u, r) and s (u, R) are always calculated for a new URL, as in

steps

143 and 1414 in FIG. Also for known URLs, as in

steps

145 and 1412, each time the contents of R are changed, the u web document or u link source score is also calculated. When the keyword condition of a certain collection request r is changed in step 144 of FIG. 14, s (u, r) and s (u, R) are recalculated in step 145.
[0051]
In step 146, s (u, r) and s (u, R) are always kept up-to-date, and in step 146, u that has not yet acquired a web document is selected from the URL set U, or Then, u is selected such that the URL has a time equal to or greater than the threshold value since the last acquisition of the web document and the score s (u, R) is the maximum. Therefore, if u exists (step 147), this u is the URL that should be acquired with the highest priority from the information network. In step 147, if there is no u, there is no URL to be acquired, so the process ends (step 148), or the process waits while checking whether the collection request set R has been changed. become. In step 149, u's web document is obtained. An Internet web document targeted by the present embodiment is acquired according to the HTTP protocol. If acquisition fails (step 1410), the process returns to the previous step and the above-described processing is repeated for other URLs. If the acquisition is successful, this is stored in the web document storage unit 7 of FIG. 1 (step 1411). Next, based on the content of the web document of u, the above-described term of sim (du, r) is calculated, and the scores s (u, r) and s (u, R) are recalculated (step 1412). . Thereafter, parsing (tag analysis) of the acquired web document is performed to extract a link destination URL to which the web document is linked, and for each v (step 1413), scores s (v, r) and s ( v, R) is calculated, and v is added to the URL set U (step 1414). The information collection unit 6 recursively performs the processing described above, and collects web documents that are highly likely to satisfy the requests in a lump in parallel with respect to all the collection requests of a plurality of communities. Therefore, new information that is less likely to be discovered by crawling focusing on one topic, while reducing the proportion of unnecessary web documents to be acquired, compared to the case of collecting and collecting crawling independently for each individual collection request. This has the effect of increasing opportunities to discover
[0052]
Among the URLs whose scores have been calculated in

steps

145, 1412, and 1414 in FIG. 14, some URLs that have already been acquired web documents are to be added as collection results of individual communities. Or, conversely, among URLs already included in the collection result, URLs that no longer satisfy the collection request condition need to be deleted from the collection result. Therefore, processing performed by the collection result generation unit 8 will be described with reference to FIG.
[0053]
First, if the target u web document has been acquired (step 151), the following processing is repeated for the collection request in which the score s (u, r) has changed in the collection request set R (step 151). 152). That is, if u is already included in the collection result c corresponding to the collection request r (step 153), whether u satisfies the condition described in the form of the keyword logical expression in each topic of the collection request r (Step 154). This process is done by checking whether the text of u's web document contains keywords in a way that satisfies the logical expression of the collection request r. If the text of u's web document does not satisfy the condition of any topic in the collection request r, u must be deleted from the collection result c. However, if the user has regarded u as useful in the past and has explicitly edited u to be included in the collection result c (step 155), u is deleted from the collection result c. do not do. In step 155, explicit editing refers to editing in which u is added or additional information such as a comment sentence is created using the editing means as shown in FIG. In step 155, if the user has not explicitly edited, u is deleted from the collection result c (step 156). On the other hand, if u is not included in the collection result c in step 153 and u satisfies the condition of the collection request r (step 157), u should be added to the collection result c. However, if the user has regarded u as unnecessary in the past and has explicitly edited so that u is not included in the collection result c (step 158), u is changed to the collection result c. Do not add. In step 158, explicit editing refers to the case where u is deleted using the editing means as shown in FIG. In other cases, u is added to the collection result c (step 159). Here, the collection result of the present embodiment is created in a form organized by topic and site as described with reference to FIGS. 12 and 13, so u is the most important condition among the topics in the collection result c. Add to a topic that meets your needs. If u is a URL in a known site, it is added as detailed information of the site in the form shown in the information 125 of FIG. 12, and if it is information on an unknown site, As shown in the information 128 of FIG. 12, a new site is added, and the text acquired from the web document is added as a comment sentence 129.
[0054]
In the information collection system according to the embodiment of the present invention, not only the user explicitly edits the collection request and the collection result, but also a process of automatically updating the collection request and the collection result from messages exchanged in the community. Also do. By this processing, the collection request and the collection result can be maintained so as to always match the dynamically changing user's interest.
[0055]
The flow of processing for updating the collection request and the collection result based on the message will be described with reference to FIG.
[0056]
For the unprocessed message m (step 161), first, m reply messages are collected recursively, and a set of these messages including m is defined as Mm (step 162). In the example of the message shown in FIG. 17,

messages

172, 173, etc. are reply messages with respect to the message 171. Next, a URL description, that is, a description starting with “http: //” or the like is extracted from each message of Mm, and a set of URLs collected for all messages of Mm is defined as Um (step 163). . In the example of FIG. 17, 174, 176, 178, 1712 are URLs. Since the text 1711 is the same as the URL 174 and is included in the quoted portion of the message 171, this portion is not processed. Simultaneously with the processing of step 163, a comment sentence described in the message is extracted for each URL of Um, and a comment sentence set Dm corresponding to each element of Um is obtained (step 164). In step 164, the process of extracting the comment text from the message to the URL can be realized simply by extracting the text of the same paragraph in the same message as the URL, but more complicatedly, the message reply Based on the relationship, there is also a method of understanding the context including the quoted text and extracting a comment sentence across multiple messages. In the example of FIG. 17, text 175 for URL 174, text 177 for URL 176, text 179 for URL 178, and text 1711 for URL 1712 are extracted as comment sentences. Since URL 1712 is the URL within the URL 1710 (that is, 174), and URL 1710 is included in the portion where the message 171 is cited, the text 1711 and URL 1712 are information that describes the URL 174 in more detail. Can be interpreted.
[0057]
In this way, after the URL set Um and the comment sentence set Dm are obtained from the message set Mm, a process for determining which topic of the collection request r (or collection result c) of the community should be added thereto. I do.
[0058]
First, in step 165, the collection start URL described in each topic of the collection request r (or the URL described in each topic of the collection result c) is compared with the Um, and the topic tm having the largest number of duplicates is compared. (Step 165). In the process of checking the duplication of URLs, not only when the URLs completely match but also when the URL sites match. If tm cannot be selected in step 165 (step 166), a set of keywords described in each topic of the collection request r (or text such as a site name or a comment sentence described in each topic of the collection result c), and The Dm text is compared, and the topic with the most overlap is set to tm (step 167). If tm cannot be selected in step 167 (step 168), a new topic is created and set as tm (step 169). In this case, the title of the message is used as the topic name. Further, when updating the collection request, an important word extracted from Dm is selected as a keyword for the new topic tm (step 1610). The important words here are words that are frequently included in the comment text and are included only in a low frequency in the comment text of other topics (can be obtained by a conventional statistical method). . After the topic tm is selected or created in the processing of steps 165 to 1610, the previous Um is added to tm (in association with the comment text of Dm in the case of updating the collection result) (step 1611).
[0059]
Through the processing described above, the collection request shown in FIG. 18 and the collection result shown in FIG. 19 are generated for the message in FIG. The topic name 181 in FIG. 18 is the title of the message 171 in FIG. 17, and the keyword 182 is a logical expression composed of OR of important words extracted from the

texts

175, 177, 179 and 1711 in FIG. 17. Further,

URLs

174, 176, 178, 1712 are set in the collection start URL 183. The user can easily modify these automatically generated items, if necessary, using the collection request editing means described above to simplify the collection request for collecting information related to the topic discussed in the message. Can be created. On the other hand, as for the collection result of FIG. 19, the title of the message 171 of FIG. 17 is used for the topic name 191 and the

URLs

174, 176, and 178 of FIG. 17 are used for the

sites

192, 195, and 197, respectively. The

texts

175, 177, and 179 of FIG. 17 are used for the

comment sentences

193, 196, and 198 for the respective sites. Further, the portion 1711 of the message 173 is embedded as the detailed information of the site 192 in the form shown in the information 194. The collection results automatically generated in this way are not always made into contents that are easy for the user to use, and there may be cases where extra text such as a comment sentence 198 is included. In this case, the above-described collection result editing means can be easily edited freely in a form that is easy for the user to see.
[0060]
Through the processing described above, a collection request or collection result topic tm is associated with a series of messages Mm (steps 165 and 167) or newly created (step 169). By presenting the relationship between the message and the topic to the user, it is possible to assist the user in understanding the message or accessing information related to the message. For example, as illustrated in FIG. 9, the message “music distribution business” 92 is displayed in association with the related topic “content distribution” 93.
[0061]
On the other hand, it is possible to automatically update the collection request in accordance with the editing performed by the user on the collection result. This processing is realized by the same processing as the processing described in FIG. Unlike the message described by the user in a free format, the collection result is described in a predetermined format as described in the above collection result editing means (FIG. 13). It can be easily realized. The keyword used as the condition for the collection request is created from a comment sentence described in the collection result.
[0062]
The processing flow of the web document search unit 10 in FIG. 1 will be described with reference to FIG. The web document search unit 10 is a means for the user to search and use the web document collected by the information collection unit 6 of FIG. 1 and stored in the web document storage unit 7.
[0063]
In FIG. 20, first, when a search condition q is input by the user (step 201), a document satisfying q is searched from the collected web documents, and the resulting URL set is set as Uq (step 202). Next, for each element u of Uq (step 203), a collection result c including u is searched (step 204). The collection result c may be a collection result including u itself, or may be a collection result including a URL of the same site as u and a link source URL to link u. If such a collection result c exists (step 205), the site name described in the collection result c and the text of the comment sentence are used as a headline and explanation for explaining u, and u is associated with the collection result c. To the user (step 206). If the collection result c does not exist, u is presented to the user by using text such as a title and a body described in the web document of u as a headline and an explanatory text explaining u (step 207).
[0064]
FIG. 21 is a diagram illustrating an example of a search result screen presented to the user by the processing described in FIG. For the URL “http://xyz.com/” 212 etc. of the individual web documents searched for the search condition “auction” 211 entered by the user, the heading “XX auction” 213, the explanatory note 214, etc. Is presented to the user using the collection result obtained in step 204, for example, the site name 192 and the comment sentence 193 shown in FIG. Further, as shown in FIG. 21, a collection result topic 215 is presented in association with the collection result. If there is no collection result related to the URL of the search result, for example, a part 217 of the text of the web document as an explanatory text of the search result (generally, text at the beginning or text in the vicinity where the search word appears) Present. As described above, the text obtained from the web document as it is may be difficult to understand the meaning or may not necessarily be a description that appropriately represents the content of the site. On the other hand, the text described by the community members in the collection results, such as the explanatory text 214, is often a concise and easy-to-understand description. Further, by displaying the collection result topic in association with the search result information as shown in FIG. 21, it is possible to easily understand the field / context of the information. Further, the user can use other useful information included in the topic. A community that collects information about a topic is a group of professionals interested in that topic, so what communities consider it useful for individual information in search results, There is also an effect that it is possible to know immediately whether there is any.
[0065]
The process described above was a process of presenting search results and collection results in association with each other, but using the same method, display the collection results of another community in association with the collection results of another community. It is also possible to do.
[0066]
In the example of information 127 in FIG. 12, “household content” collected by another community “Karaoke Tomo no Kai” for information “XX Entertainment” collected as a topic of “content distribution” by “e-commerce research group”. ”Topic 127 is presented in association with each other. This process is also realized by checking whether or not a certain URL is included in the collection result, as in step 204 of FIG. In this way, it is not only helping users to use search results and collection results, but also presenting search results and collection results in association with topics of interest and information collected by other communities. , It works to increase opportunities to learn what topics other communities where users are not participating are interested in. As a result, there is an effect that exchanges between multiple communities become active.
[0067]
The present invention is not limited to the above-described embodiments. Of course, various modifications can be made without departing from the scope of the present invention.
[0068]
【The invention's effect】
As described above, according to the present invention, members of a community who have a common interest can edit a collection request and a collection result jointly, and continuously refine and maintain this. Thanks to the small effort of each person, we can collect, organize and share useful information for the whole community. Furthermore, since collection requests and collection results are automatically updated based on daily message exchanges within the community, the user's work to edit collection requests and collection results is reduced and community activities are reduced. It is possible to collect information corresponding to interests that dynamically change according to the situation.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of an information collection system according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of a configuration of a conventional information collection system.
FIG. 3 is a diagram illustrating an example of user information.
FIG. 4 is a diagram showing a flow of processing for user registration, authentication, and community participation.
FIG. 5 is a diagram illustrating an example of a list display screen for community information.
FIG. 6 is a diagram showing a flow of message transmission processing.
FIG. 7 is a diagram illustrating a flow of processing for editing a collection request.
FIG. 8 is a diagram illustrating a flow of processing for editing a collection result.
FIG. 9 is a diagram illustrating an example of a message browsing screen.
FIG. 10 is a diagram illustrating an example of a message editing screen.
FIG. 11 is a diagram illustrating an example of a collection request editing screen.
FIG. 12 is a diagram illustrating an example of a collection result browsing screen.
FIG. 13 is a diagram illustrating an example of a collection result editing screen.
FIG. 14 is a diagram showing a flow of information collection processing.
FIG. 15 is a diagram illustrating a flow of processing for generating a collection result.
FIG. 16 is a diagram showing a flow of processing for generating a collection request or a collection result from a message.
FIG. 17 is a diagram illustrating an example of a message.
FIG. 18 is a diagram illustrating an example of a collection request generated from a message.
FIG. 19 is a diagram illustrating an example of a collection result generated from a message.
FIG. 20 is a diagram showing a flow of web page search processing.
FIG. 21 is a diagram illustrating an example of a search result screen for web page search.
[Explanation of symbols]
1 ... Community Management Department
2 ... Community information presentation part
3 ... Message transmission / reception unit
4 ... Collection request editor
5 ... Collection result editing department
6 ... Information collection department
7 ... Web document storage
8 ... Collection result generator
9 ... Collection request generator
10 ... Web document search part
11: User information storage unit
12 ... Message storage
13 ... Collection request storage unit
14 ... Collection result storage unit

Claims

In an information collection system that collects and presents information that satisfies user requirements,
Community management means for managing multiple communities each of which has multiple users as members,
Message sending and receiving means for members belonging to each community to send and receive messages;
Community information presenting means for a user to browse information shared by each of the plurality of communities;
A collection request editing means for editing a collection request in which members belonging to each community jointly describe information as a collection request in the community and information on a starting point of the collection and a phrase condition to be included in the information ,
Information satisfying any one of a plurality of collection requests edited in each of the plurality of communities is information that satisfies a condition of the phrase by following a hyperlink from information that is a starting point of collection described in each of the collection requests. Information collecting means for collecting from a plurality of information sources on the information network by searching ;
A collection result generating means for generating a collection result corresponding to each of the plurality of collection requests based on the collected information;
A collection result editing means for the members belonging to each community to jointly edit the collection results in the community,
The community information presenting means presents a plurality of collection results respectively created in a plurality of communities in association with a message transmitted / received in the community or the community, to members of the community and non-member users , If the information collected by the information collecting means constituting the collection result in the community overlaps with the information collected by the information collecting means constituting the collection result in another community, An information collection system characterized by presenting the collection result and the collection result in the other community in association with each other .

The information collection system according to claim 1 , wherein information that can be used as a starting point of the information collection and a comment sentence related to the information are extracted from a message transmitted and received by a member of the community using the message transmission / reception means. Based on the above, at least one of the community collection request and the community collection result is automatically updated.

3. The information collection system according to claim 1 or 2, wherein a community member updates a collection request corresponding to the collection result based on the edited content of the collection result performed by the community member using the collection result editing unit. Characteristic information collection system.

4. The information collection system according to claim 1, further comprising: a collection information retrieval unit that retrieves information satisfying a search condition input by a user from information collected by the information collection unit. The information collection system further comprising: the collected information retrieval unit presenting the retrieved information and the collected result including the searched information among the collected results created by the community in association with each other.

In an information collection method for collecting and presenting information that satisfies user requirements,
  The computer edits the collection request that describes the information that is the starting point of the collection and the terms of the phrase that the information should contain as a collection request in the community, together with members belonging to each community.
  Information that satisfies any of the plurality of collection requests edited in each community by satisfying the above phrase by following a hyperlink from information that is the starting point of collection described in each of the collection requests. By collecting information from multiple sources on the information network,
  A computer collects data corresponding to each of the plurality of collection requests based on the collected information. Each result is generated,
  Computers collaborate with members of each community to edit the results collected in that community,
  A computer associates a plurality of collection results, each of which is created by a plurality of communities, with a message transmitted or received within the community or the community, and information shared by each of the plurality of communities is displayed as a member of the community. And the information collected by the information collecting means constituting the collection result in the community and the information collected by the information collecting means constituting the collection result in another community. In the case of overlapping, the information collection method characterized by presenting the collection result in the community and the collection result in the other community in association with each other.

In a program that causes a computer to execute information collection that collects and presents information that satisfies a user's request,
Let the computer input a collection request that describes the information that is the starting point of the collection and the terms of the phrase that the information should contain as a collection request in the community that has been edited by members belonging to each community,
Information that satisfies any of a plurality of collection requests edited in each community on a computer is traced to the information that is the starting point of collection described in each of the collection requests, and information that satisfies the conditions of the phrase By collecting information from multiple sources on the information network,
Causing a computer to generate a collection result corresponding to each of the plurality of collection requests based on the collected information;
Let the computer input the collected results in the community edited by members belonging to each community,
Associating a computer with a plurality of collection results respectively created by a plurality of communities with messages sent and received within the community or the community, and sharing information shared by each of the plurality of communities with members of the community And the information collected by the information collecting means constituting the collection result in the community and the information collected by the information collecting means constituting the collection result in another community. In the case of duplication, the collection result in the community and the collection result in the other community are related and presented.