JP2000508450A

JP2000508450A - How to organize information retrieved from the Internet using knowledge-based representations

Info

Publication number: JP2000508450A
Application number: JP9536436A
Authority: JP
Inventors: トーマスカーク
Original assignee: AT&T Corp
Current assignee: AT&T Corp
Priority date: 1996-04-10
Filing date: 1997-04-08
Publication date: 2000-07-04
Also published as: WO1997038378A1; EP0976062A1; CA2251043A1

Abstract

(57)【要約】知識ベース表現システムにおいて、ドキュメントの電子的な表現を編成するシステムおよび方法が開示される。知識ベース表現システムは、コンピュータおよびネットワークが相互接続されまたコンピュータおよびネットワークからドキュメントが検索できる環境において動作する。ドキュメントを探索するために、問い合わせが生成される。本システムは、どのコンピュータおよびネットワークが問い合わせ構文を理解できるかを決定する。問い合わせは、問い合わせを処理できるコンピュータおよびネットワークに送られる。本システムは、それらのコンピュータおよびネットワークから、ドキュメントに関する結果を受け取る。結果は、単独の結果セットに併合される。各結果は、各ドキュメントに対する参照を含む。次に、ドキュメントは、ドキュメントを知識ベースのテキストマッチングパターンと比較することによって、質を向上される。質の向上は、各参照に対するドキュメントを検索し、次に、マッチングパターンをこのドキュメントに適用することによって実現される。本システムは、ドキュメントと一致するコンセプトのリストを決定する。本システムは、コンセプトのインスタンスとして、知識ベース表現システムにドキュメントを提供する。 SUMMARY A system and method for organizing an electronic representation of a document in a knowledge-based representation system is disclosed. The knowledge base representation system operates in an environment where computers and networks are interconnected and documents can be retrieved from the computers and networks. A query is generated to search for the document. The system determines which computers and networks can understand the query syntax. Queries are sent to computers and networks that can process the queries. The system receives results from the computer and the network on the document. The results are merged into a single result set. Each result contains a reference to each document. Next, the document is enhanced by comparing the document to a knowledge-based text matching pattern. The enhancement is achieved by searching the document for each reference and then applying a matching pattern to this document. The system determines a list of concepts that match the document. The system provides documents to the knowledge-based representation system as instances of a concept.

Description

【発明の詳細な説明】インターネットから検索される情報を知識ベース表現を使用して編成する方法技術分野本発明は、インターネット上の情報にアクセスする方法に関し、特に、インターネットから検索される情報を知識ベース表現システムを使用して編成する方法に関する。発明の背景インターネットは、相互に接続される一連のネットワークであり、情報、データおよびファイルの交換を容易にする。インターネットに接続される利用者は、これらのネットワークによって膨大な量の情報にアクセスすることができる。インターネットに対するアクセスを得る通常の方法は、オンラインサービスサーバによる方法である。図１について述べると、ネットワーク１１０、１１２、および１１４は、それぞれ、オンラインサービスサーバ１２０、１２２、および１２４を経由してインターネット１００に接続される。インターネットにアクセスする別の方法は、ダイヤルインインターネットプロバイダを経由する方法である。たとえば、利用者は、自分のモデム１５２を使用しインターネットプロバイダ１５０にダイヤルインすることによって、自分のパーソナルコンピュータ（ＰＣ）１５８上において、インターネット１００にアクセスすることができる。ルータは、コンピュータとネットワークを接続し、トラフィックをネットワークおよびインターネットに向ける。ルータ１６０、１６２、１６４、および１６６は、ネットワークおよびインターネット全域を移動するデータのパケットを検査し、データの行き先はどこであるかを決定する。オンラインサービスサーバおよびインターネットプロバイダを利用することによって、利用者は、サーチエンジン１３０、１３２、１３４、および１５４として知られているソフトウェアプログラムを使用し、全世界のネットワークに接続されているワールドワイドウェブ（ウェブ）をインターネット上において探索することができる。サーチエンジンは、探索ツールおよびウェブクローラ（Web cr awlers）としも知られる。これらのサーチエンジンは、ウェブ全域を移動し、ウェブ（ホーム）ページ１４０、14２、１４４、および１５６に見出されるハイパーテキストリンクを巡ることによってドキュメントを収集する。インターネットを探索する一つの方法は、キーワードによる方法である。たとえば、利用者は、自分の求めている情報を表すキーワードのストリングを問い合わせに打ち込む。サーチエンジンは、インターネット上のデータベースを探索し、その結果はハイパーテキストマークアップ言語（ＨＴＭＬ）ページとして戻る。利用者は、次に、関心のあるドキュメントを、そのドキュメントに対するリンクをクリックすることによって見ることができる。クリックするとは、カーソルを所望の項目に合わせマウススイッチを作動させる処理をいう。現在のサーチエンジンによってインターネット上でキーワードの探索はできるが、インターネット上の情報の量は膨大であるので妥当な情報を得ることは難しい。言い換えれば、キーワード探索の結果、通常、膨大な量の情報が回答され、利用者は妥当な情報を検索するために全部をブラウズする必要がある。したがって、インターネットから情報を検索する一層効率的な方法が必要とされる。発明の概要情報探索結果を編成するという前述した問題は、探索結果を自動的に類別するために知識ベース表現（知識ベース・リプリゼンテーション・テクニック）を応用することによって軽減される。この情報探索および管理のシステムは、知識ベースを探索サーバと対応付け、探索タスクの妥当性および正確度を改良する。知識ベースは、利用者の情報編成に対する関心および好みを表すユーザプロファイル（主題分類法）を提供する。本システムは、この知識ベースを使用しキーワード探索の結果を編成する。本システムは、知識ベースに従って、探索結果を自動的に類別し区分し、妥当な情報の探索を容易にする。本システムは、探索結果を知識ベース主題分類法のサブセットによって表示し、その結果を最も妥当なドキュメントを見出すことを容易にする方法によって区分し、不適切な情報を除外する。図面の簡単な説明図１は、本発明の作動環境の解説のために、コンピュータおよびネットワークならびにそれらのインターネットに対する接続を示す図である。図２は、本発明の原理によるコンセプト総合分類法の図式表現を表示する知識ベースブラウザの例を示す図である。図2aは、図２に例示した知識ベースブラウザの実際のスクリーン表示を示す図である。図３は、本発明の原理による探索インタフェースを示すブロック図である。図４は、利用者がインターネットから情報を検索し、知識ベース表現を使用してその情報を編成するステップを示す流れ図である。発明の詳細な説明図１は、本発明に関する環境を示す図であり、環境はインターネット１００と相互接続されるネットワーク１１０、１１２、ならびに１１４およびＰＣ１５８および１５９を含む。これらのネットワークは、たとえば、トークンリングネットワーク（ネットワーク１１４）またはイーサネット（Ｅｔｈｅｒｎｅｔ）ネットワーク（ネットワーク１１０および１１２）によって互いに接続される利用者を含む。各ネットワークは、さらに、サーバ１２０、１２２および１２４を含む。サーバはホストコンピュータであり、利用者は、サーバによってネットワーク上において相互に通信すること、およびインターネットによってネットワーク外の利用者と通信することができる。ＰＣ１５８および１５９上の利用者は、インターネットプロバイダ１５０に加入し、インターネットプロバイダを使用することによって互いに通信およびインターネット上の他の利用者と通信することができる。いかなる利用者も、インターネット上で利用できる情報を探索することができる。コンピュータまたはネットワークがインターネットに接続されると、その時点でコンピュータまたはネットワーク上の情報は、保護されていない場合は、他者によって利用可能となる。インターネットは全世界的なネットワークであるので、検索できる情報の量は膨大である。多数のサーバおよびプロバイダは、サーチエンジン１３０、１３２、１３４、および１５４を備え、利用者はサーチエンジンを使用することによりキーワードによって探索することができる。これらのサーチエンジンは、コンピュータプログラムであり、このプログラムは、サーチアプリケーションを基礎とするプログラムであり、オンラインサービスサーバ１２０、１２２、１２４、およびインターネットプロバイダ１５０上で実行される。キーワードによる探索は、通常、結果として、膨大な量の情報の戻りとなり、利用者は、所望の情報を得るために、全体をブラウジングする必要がある。現在は、インターネットを探索する方法は二つある。両方法は、クライアント／サーバモデルによって作動される。クライアント／サーバモデルの対象と考えられるのは、一つのソフトウェアを自分のコンピュータ上で実行またはクライアントであるサーバの共用プログラムを実行し、遠隔サーバコンピュータ（インターネットに接続される他のサーバ）のリソースを使用する利用者である。たとえば、図１において、ＰＣ１１０ａの利用者は、オンラインサービスサーバ１２２および１２４ならびにインターネットプロバイダ１５０によって、情報を探索することができる。同様に、ＰＣ１５６の利用者は、オンラインサーバ１２０、１２２、および１２４によって情報を探索することができる。遠隔サーバ、たとえば、オンラインサーバ１２０、１２２、および１２４ならびにインターネットプロバイダ１５０は、多数のネットワークの多数の利用者にサービスするので、ホストとも呼ばれる。ホストを使用することによって、多数の異なるクライアントが、それらのクライアントリソースに同時にアクセスすることができる。ホストは、単独の利用者の専用ではない。インターネットを探索する第１の方法はインデックスによる方法である。インデックスは、情報発見の高度な構造を有する方法を提供する。インデックスを使用し、利用者は、たとえば、芸術、コンピュータ、娯楽、スポーツ、などの類別によって、情報を徹底的にブラウジングすることができる。ウェブブラウザにおいては、ＰＣ１１０ａの利用者は、通常、自分のマウス１１０ｂを使用し、一連のサブ類別を表示させることができる。たとえば、スポーツには、野球、バスケットボール、フットボール、などが含まれる。インデックスのサイズによって、サブ類別には幾つかの層が存在する場合がある。利用者が自分の目的とするサブ類別を得るとき、利用者は妥当なドキュメントのリストを提示される。これらのドキュメントを得るために、利用者はこれらのドキュメントに対するリンクをクリックする。ヤフー（Ｙａｈｏｏ！）は、インターネット上で普及しているインデックスの名称である。ヤフーおよび他のインデックスを使用し、利用者は、利用者が求める情報を表すワードを打ち込むことによってそのワードによって探索することができる。次に、利用者は、一連の探索結果、すなわち、利用者の探索に一致するドキュメントに対するリンクを得る。情報を得るために、利用者はドキュメントに対するリンクをクリックする。情報を発見する第２の方法は、サーチエンジンを使用する方法であり、サーチエンジンも探索ツールとして公知である。サーチエンジンは、本質上、静的であり事前に組み込まれたインデックス上で作動する。すなわち、インデックスは、オンラインコンテンツから構築され、サーチサーバ上に記憶される。ウェブクローラは、検索されサーチサーバのデータベースにおいてインデックスを付けられるオンラインコンテンツを収集するために、サーチエンジンによって使用される。幾つかの普及しているインターネットサーチエンジンは、たとえば、Ｌｙｃｏｓ、ＷｅｂＣｒａｗｌｅｒ、およびＡｌｔａＶｉｓｔａなどである。探索を開始するために、利用者は、自分の欲する情報を表すキーワードを打ち込む。探索から得られ利用者の探索基準に一致する結果は、利用者に返信される。結果のリストから、利用者は、そのドキュメントに対するリンクをクリックすることによって、ドキュメントを検索することができる。インデックスおよびサーチエンジンの両者を使用することによって、利用者はインターネット上の情報を得ることができるが、見出される情報は、通常、多量であり、多くの場合、妥当な情報を突き止めることが困難である。したがって、利用者が探索結果を容易に十分にブラウザし妥当な情報を発見できるように、インターネット上で見出される探索結果を自動的に類別することが望ましい。本発明によれば、知識ベース表現システムは、対象を表現し対象間の関係を推論するその機能によって、前述した問題を軽減する。特に、本発明は、知識ベース情報検索および管理システムを目的とし、このシステムは、たとえば、インターネットなどのマルチメディアネットワークシステム上の探索の質を向上させる。本システムは、インターネット上で見出された情報について個別利用者に適合するコンセプトの編成を実施する手段を利用者に提供するので、その情報の有効性が向上し情報に対するアクセスが改良される。図１に示すように、本システムは、既存のウェブブラウザ１３０、１３２、１３４、および１５４と統合され、コンセプトのナビゲーションによるハイパーテキストブラウジングを併用するシームレス環境を生成する。本システムは、パーソナルコンピュータ、たとえば、ＰＣ１１０ａに記憶させることも可能であり、その場合は、そのパーソナルコンピュータに対するアクセスを有する利用者のみが本システムを使用することができる。図２は、知識ベースブラウザの例を示す図であり、このブラウザは、本発明によるコンセプト総合分類法２００の図式表現を表示する。分類法は、コンセプト間の関係を図で表現する総合階層である。コンセプトは、対象の抽象的記述である。図２のノードは、知識ベースコンセプトに対応し（たとえば、２１０、２２０、２３０、２１２、２１４、など）、また辺（エッジ）（たとえば、２１０ａ、２１０ｂ、２２０ａなど）はノード間を連結しコンセプト間の包摂関係（subs umption relationships）を示す。本発明の特徴は、本システムはコンセプトおよびインスタンス（instance:実例）（２７０，２８０）に基づいて包摂関係を自動的に管理できることである。インスタンスは、コンセプトの特定の実現であり、すなわち、コンセプトはあるものの抽象的記述であり、一方、そのコンセプトのインスタンスは、その記述を満足する現実の対象である。たとえば、新しいドキュメントがインスタンスとして知識ベースブラウザに追加されるとき、本システムは、分類法に属するすべての位置を推論する。図２に示すように、最も一般的なコンセプトは左にある。コンセプトノードから出る辺を追跡すると（左から右に）、一層特殊化されたコンセプトに到達する。たとえば、主題「人工知能」２２８は「コンピュータサイエンス」２２０の特殊化であり、次に、「知識表現」２２９は「人工知能」２２８の特殊化である。この表示内のパネル２７０および２８０は、これらのコンセプトのインスタンスのリストを示す。たとえば、パネル２７０は、主題「小児科医療」２１２のインスタンスであるドキュメントを示す。パネル２８０は、コンセプト「知識表現」２２９のインスタンスを示す。インスタンスは、終始、上位の階層の親コンセプトによって受け継がれるので、たとえば、「知識表現」によって出現するドキュメントは「コンピュータサイエンス」によっても出現する。インスタンスを編成する方法は、探索インタフェースに関して、以下に述べる。図２ａは、図２の知識ベースブラウザの例の実際のスクリーン表示を示す図であり、コンセプト総合分類法２００ならびにコンセプトおよびインスタンス間の包摂関係を示す。探索インタフェースは、知識ベースブラウザのインタフェースと同様に動作する。探索インタフェースは、知識ベースを使用し、利用者のコンセプト総合分類法に関して結果を区分し類別することによって、探索結果の質を向上させる。たとえば、キーワード探索から得られた結果が表示のために結果セットとしてまとめられた後、本システムは、結果セットをさらに集約することができるステップを提供する。知識ベースと対比して結果セットの質を向上させることは、結果セット中のドキュメントを探索し、それを知識ベースパターンマッチャによって処理することを含む。知識ベース中のコンセプトに対応づけられる原文のパターンを使用することによって、知識表現システムは、これらのドキュメントをコンセプト分類法内において類別し編成することができる。知識ベースの各パターンは、コンセプトと対応づけられる。言い換えれば、各ドキュメントはこれらのパターンマッチャに対して比較され、ドキュメントに一致するコンセプトが存在するかが決定される。この比較処理の出力は、知識ベース中のコンセプトの内、ドキュメントのコンテンツに対していくらかの対応を有する特定のコンセプトのセットである。コンセプトとドキュメント間の一致のレコードは、一致したコンセプトの記述を含む一時的インスタンスを生成することによって、知識ベースに作成される。最後に、質の向上された探索結果が、知識ベース主題分類法のサブセットによって図式的に表示される。このサブセットは、マッチング処理中に生成される一つ以上の一時的インスタンスを有するこれらのコンセプトによって形成される。これを図３に示す。図３はドキュメントのコンテンツに一致するこれらのコンセプトのみを示す。本発明は、データ編成に知識ベース表現システムを使用し、キーワード探索結果が数千のような多量のドキュメントとなるとき、特に有用である。パターンマッチャをこれらのドキュメントに実行することによって、これらのドキュメントを利用者に対して最も妥当なドキュメントに迅速に集約することができる。したがって、本発明による知識ベース表現システム（ブラウザおよび探索インタフェース）を使用することによって、利用者は妥当な情報を迅速に見出すことができる。分類法の別の特徴は、コンセプトに従って結果をグループ分けすることによって、利用者は、自分が最も妥当であると考える部分をクローズアップすることができることである。この方法は、さらに、ブラウジング時間を削減することによって、インターネット上の探索を強化する。探索インタフェースは、さらに、問い合わせ適用範囲を最大とし応答遅延を最小とするために、多数のインデックスサーバに対する即応型の並行アクセスを実行する。個別サーチエンジンの機能を明白に表現することによって、問い合わせシステムは、問い合わせを処理することが可能であるサーバのみが問い合わせを受けることを保証する。本発明の別の特徴は、ユーザインタフェースであり、このインタフェースは、コンセプト階層を拡張し再編成するためのエディタを提供する。ユーザインタフェースは、ナビゲーション経過の対話型図式マップを保持するナビゲーションブラウザも提供する。ナビゲーションブラウザは、利用者のブラウジング経過の木構造図式表現である。次に、ナビゲーションブラウザの機能について述べる。利用者がブラウジングするとき、利用者は巡回するウェブサイトの規則正しい順序を生成し、リンクを一つのページから別のページに進む。利用者が後戻りし、新しいブラウジングを選択するとき、ブラウジング経過は分岐する木となる。ナビゲーションブラウザは、巡回したあらゆるサイト／ページごとに、木に新しいノードを追加し、これらの選択の軌跡を保持する。この木は、ブラウジング経過の概要を示すだけでなく、ナビゲーションの代替方法となる（木のノードをクリックし対応づけられるページに戻ることによって）。本発明の他の特徴は、本システムアーキテクチャは知識ベースをクライアントから分離することによって、利用者はクライアントの位置に関係なく利用者の情報空間を一貫して調べることができることである。知識ベースを一つの位置に保持することによって、環境は、利用者を一つのプラットフォームから別のプラットフォームに追跡することができる。分離の利点は、システムサーバの連続利用可能性を保証しやすくなることであり、その理由は、分離によって、知識ベースに対する共用のアクセスが提供され、クライアントが非活動状態または非接続であるときでも、タスクの自主的モニタリングを実行できるためである。言い換えれば、知識ベースは、クライアントから分離された別のサーバに記憶することができる。図４について述べると、この流れ図は、利用者が、インターネットから情報を検索し、本発明による知識ベース表現を使用し、その情報を編成するステップを示す。ステップ４０１において、利用者は、探索する必要があるキーワードの問い合わせストリングを自分のパーソナルコンピュータ１１０ａにおいて入力し、本発明による知識ベースウェブブラウザ１３０を使用する。知識ベースウェブブラウザは、クライアント１１０ａまたはサーバ１２０のいずれかにインストールすることができるソフトウェアである。ステップ４０３において、問い合わせストリングは事前に処理され、どのサーチサーバが問い合わせ構文を理解できるかを決定する。このステップは、問い合わせストリングのユニバーサルリソースロケータ（ＵｎｉｖｅｒｓａｌＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）（ＵＲＬ）を検査し、どのサーバに問い合わせを送るべきかを決定することによって実行される。一般に、問い合わせは、利用者が情報を要求するサーバの特定の問い合わせ構文に翻訳する必要がある。通常、問い合わせ翻訳プログラムは、問い合わせの要求を満たすためにサーバに対するインタフェースを備える。ステップ４０５において、問い合わせは、式を処理することができる各サーバに送信される。問い合わせは、連続して、または同時に送信することができる。問い合わせを同時に送信する利点は、ネットワークおよび探索処理の両者の遅延が減少することである。言い換えれば、すべてのサーバが問い合わせに応じて同時に作動することができる。ステップ４０７において、結果サイズしきい値によって、個別サーバは、特定数の一致を収集するために、反復して問い合わされることを必要とする。大部分のサーバは、所定の問い合わせのために使用されるリソースの量を制限するために、戻ってくる結果を幾つかの妥当なセットに分解して返信する。たとえば、探索に対して１００件の適合が存在する場合、サーバは、一時に１０件の適合のみを送信するように設定することができる。それ故、特定数の一致に到達した場合は、手順が進む。特定数の一致に到達しない場合は、サーバは特定の数に到達するまで反復して問い合わせを受ける。ステップ４０９において、サーバーから戻る結果は、単独の結果セットに併合される。結果は、結果の重複を除去することによって併合される。結果セットの各項目は、ドキュメント（一つのＵＲＬ）に対する参照およびできる限り１行の記述テキストよりなる。ステップ４１１において、利用者が結果セットの質をさらに向上させることを望む場合は、利用者は、結果を知識ベースマッチャと比較することを要求することができる。それ以外の場合は、結果セットは利用者に対して表示される。ステップ４１３において、結果セット中の各参照に対するドキュメントが検索される。ステップ４１５において、パターンマッチャーをドキュメントテキストに適用し、テキストと一致する主題コンセプトが存在するかを決定する。ステップ４１７において、ドキュメントのテキストに一致する主題コンセプトのリストが生成される。ステップ４１９において、コンセプトに一致する各ドキュメントに対して、インスタンスが生成される。ステップ４２１において、ドキュメントに対するインスタンスが、知識ベースの主題分類法によって分類される。前述したステップ４１３−４２１の反復は、並行処理され、ネットワーク遅延の影響を最小とする。その理由は、結果セットは、検索する必要がある数ダースまたは数百のドキュメントを含むためである。ドキュメントが検索され分類されるとき、本システムは、主題分類法のサブセットによって、後処理された結果を図式で逐次表示する。この表示は、探索結果を利用者に良く知られており意義のあるコンセプトに関する探索結果を類別し区分するために、実施される。それ故、本発明による知識ベース表現システムを使用することによって、探索結果は、利用者がどの程度特定の区分を望むかによって、種々のレベルの詳細度においてブラウジングすることができる。以上述べたことは、本発明の原理の応用を説明することのみを目的とする。当業者は、本発明の理念および範囲から逸脱することなく、他の構造および方法を具体化できる。DETAILED DESCRIPTION OF THE INVENTION How to organize information retrieved from the Internet using a knowledge-based representation Technical field The present invention relates to a method for accessing information on the Internet, and in particular, to an interface. To Organize Information Retrieved from Internet Using Knowledge Base Representation System About. Background of the Invention The Internet is a series of interconnected networks that contain information, data, Facilitates the exchange of data and files. Users connected to the Internet These networks allow access to vast amounts of information. I The usual way to gain access to the Internet is via an online service server It is a method by. Referring to FIG. 1, networks 110, 112, and And 114 are online service servers 120, 122 and 12 respectively. 4 is connected to the Internet 100. Access the internet Another method is via a dial-in Internet provider. For example, a user may use his modem 152 to connect to Internet provider 1 By dialing into 50, you can use your personal computer (PC) On 158, the Internet 100 can be accessed. Router Connects the computer to the network and routes traffic to the network and Turn to the Internet. Routers 160, 162, 164, and 166 Inspect packets of data traveling across the network and the Internet Determine where the data is going. Using online service servers and internet providers Thus, users are referred to as search engines 130, 132, 134, and 154. Connect to worldwide networks using known software programs The World Wide Web (Web) Can be Search engines are search tools and web crawlers (Web cr awlers). These search engines move across the web and Web pages 140, 142, 144, and 156 -Gather documents by following text links. One way to search the Internet is by keyword. And For example, a user can query for a string of keywords that represent the information he or she wants. Type in. Search engines search databases on the Internet , The result is returned as a hypertext markup language (HTML) page . The user then links the document of interest to the document. You can see it by clicking on it. Click the cursor Means that the mouse switch is operated according to the desired item. You can search for keywords on the Internet with current search engines However, the amount of information on the Internet is enormous, making it difficult to obtain reasonable information. No. In other words, a keyword search usually returns a huge amount of information, Users need to browse all to find relevant information. Accordingly What is needed is a more efficient way to retrieve information from the Internet. Summary of the Invention The aforementioned problem of organizing information search results automatically categorizes search results Knowledge-based representation (knowledge-based representation techniques) Alleviated. This information search and management system is a knowledge base. Source to a search server to improve the validity and accuracy of the search task. Knowledge The knowledge base is a user profile that describes the interests and preferences of users in organizing information. (Subject taxonomy). The system uses this knowledge base to Organize the results of the search. This system automatically searches search results according to the knowledge base. Classify and classify in order to facilitate searching for appropriate information. This system uses search results Presented by a subset of the knowledge-based thematic taxonomy and the results Segmentation in a way that makes it easier to find You. BRIEF DESCRIPTION OF THE FIGURES FIG. 1 illustrates a computer and network for describing the operating environment of the present invention. FIG. 2 is a diagram showing connections to the Internet. FIG. 2 shows the knowledge representing a graphical representation of the concept comprehensive classification method according to the principles of the present invention. It is a figure showing the example of a base browser. FIG. 2a shows an actual screen display of the knowledge base browser illustrated in FIG. It is. FIG. 3 is a block diagram illustrating a search interface according to the principles of the present invention. Figure 4 shows that a user retrieves information from the Internet and uses a knowledge base representation. 5 is a flowchart showing steps for organizing the information. Detailed description of the invention FIG. 1 is a diagram showing an environment relating to the present invention. Interconnected networks 110, 112, and 114 and PC 158 And 159. These networks are, for example, token ring networks. Network (network 114) or Ethernet (Ethernet) network. Networked users (networks 110 and 112) including. Each network further includes servers 120, 122 and 124 . The server is a host computer, and the user can access the network by the server. Communicating with each other on the Internet and outside the network by the Internet Can communicate with other users. Users on PCs 158 and 159 Subscribe to Internet provider 150 and use Internet provider Can communicate with each other and with other users on the Internet. Wear. Any user can search for information available on the Internet You. When a computer or network is connected to the Internet, In this regard, the information on the computer or network, if not protected, Made available by the person. The Internet is a worldwide network Thus, the amount of information that can be searched is enormous. Many servers and providers have And the search engines 130, 132, 134, and 154. By using the gin, it is possible to search by keywords. these The search engine is a computer program, and this program An application-based program, an online service server 1 20, 122, 124 and running on Internet provider 150 . Keyword searches usually result in a huge return of information, The user needs to browse the whole to obtain desired information. Currently, there are two ways to explore the Internet. Both methods are client / Operated by the server model. Considered for the client / server model You can run a piece of software on your own computer or Executes the shared program of the server that is the -Other users connected to the Internet). for example For example, in FIG. 1, the user of the PC 110 a And 124 and the Internet provider 150 to search for information. Can be Similarly, the user of the PC 156 can use the online server 120, Information can be searched by means of 22 and 124. Remote server, even if For example, online servers 120, 122, and 124 and Internet The provider 150 serves a large number of users of a large number of networks, Also called strike. By using a host, many different clients Can access their client resources simultaneously. host Is not dedicated to a single user. The first way to search the Internet is by index. Inn Dex offers a highly structured method of information discovery. Use index Users, for example, categorized as arts, computers, entertainment, sports, etc. This allows you to browse the information thoroughly. Web browser Normally, the user of the PC 110a uses his / her mouse 110b to perform a series of operations. Can be displayed. For example, sports include baseball and basketball Football, football, and the like. Depending on the size of the index, There may be several layers in sub-classification. The sub that the user intends to use When getting the categorization, the user is presented with a list of valid documents. these To get the documents, the user clicks on links to these documents. Click. Yahoo! is an Internet-based internet It is the name of dex. Using Yahoo and other indices, Type in a word that represents the information the user wants and search by that word can do. Next, the user obtains a series of search results, Get a link to the document that matches. To get information, the user Click the link for the document. A second way to find information is to use a search engine. Engines are also known as search tools. Search engines are static in nature. It operates on pre-built indexes. That is, the index is Constructed from online content and stored on a search server. Web black Are searched and indexed in the search server's database. Used by search engines to collect online content . Some popular Internet search engines are, for example, Lyco s, WebCrawler, and Alta Vista. Open search To begin, the user types in a keyword that represents the information he wants. search The results obtained from and matching the user's search criteria are returned to the user. Results From the list, the user can click on the link to the document Thus, a document can be searched. By using both an index and a search engine, Information on the Internet can be obtained, but the information found is usually large And it is often difficult to determine relevant information. Therefore, In order for users to easily and sufficiently browse the search results and find valid information, It is desirable to automatically categorize search results found on the Internet. According to the present invention, a knowledge-based expression system expresses objects and infers relationships between objects. The features discussed discuss alleviate the aforementioned problems. In particular, the present invention For information retrieval and management systems, such as The search quality on multimedia network systems such as Internet . This system is suitable for individual users for information found on the Internet To provide users with a means of organizing concepts to be And access to information is improved. As shown in FIG. Integrates with existing web browsers 130, 132, 134, and 154, A system that combines hypertext browsing with concept navigation Create a seamless environment. This system is a personal computer, for example, It is also possible to store the data in the PC 110a. Only users with access to the computer can use the system. Wear. FIG. 2 is a diagram showing an example of a knowledge base browser. The graphical representation of the concept comprehensive classification method 200 is displayed. Taxonomy is a concept It is a comprehensive hierarchy that expresses the relationship between them in a diagram. Concepts are abstract descriptions of objects. You. The nodes in FIG. 2 correspond to a knowledge base concept (eg, 210, 22). 0, 230, 212, 214, etc.) and edges (eg, 210a , 210b, 220a) connect the nodes and include the subsumption relations (subs umption relationships). The feature of the present invention is that the system And subsumption relations based on instances (270, 280) It can be managed automatically. An instance is a specific realization of a concept. That is, a concept is an abstract description of something, while its concept Instance is a real object that satisfies the description. For example, a new When a document is added to the Knowledge Base Browser as an instance, The stem infers all positions belonging to the taxonomy. The most common concept is on the left, as shown in FIG. A concept node Tracking outgoing edges (from left to right) leads to a more specialized concept . For example, the subject “Artificial Intelligence” 228 is a feature of “Computer Science” 220. In particular, "knowledge representation" 229 is a specialization of "artificial intelligence" 228. Panels 270 and 280 in this display are instances of these concepts. Here is a list. For example, panel 270 shows the subject “pediatric care” 212 Indicates a document that is a stance. Panel 280 is the concept "Knowledge expression" 2 Shows 29 instances. Instances are always the parent concept of a higher hierarchy Is inherited by, for example, the document Events also emerge from "computer science." Organize instances This method is described below with respect to the search interface. FIG. 2a shows the knowledge of FIG. FIG. 5 is a diagram showing an actual screen display of an example of a base browser, and shows a concept integrated content; FIG. 6 illustrates the class 200 and subsumption relationships between concepts and instances. The search interface works similarly to the knowledge base browser interface. You. The search interface uses the knowledge base to comprehensively classify user concepts. By partitioning and categorizing the results with respect to the method, the quality of the search results is improved. Was For example, the results from a keyword search are organized into a result set for display. The system can then further summarize the result set I will provide a. Improving the quality of the result set compared to the knowledge base is Search for the document being processed and process it with the knowledge base pattern matcher. Management. Textual patterns mapped to concepts in the knowledge base By using a knowledge representation system, these documents can be Can be categorized and organized within the Putt taxonomy. Each pattern in the knowledge base , Can be associated with a concept. In other words, each document has these patterns There is a concept that matches the document compared to the matcher Is determined. The output of this comparison process is one of the concepts in the knowledge base. A set of specific concepts that have some correspondence to the content of the document. It is. Records of matches between concepts and documents are Created in the knowledge base by generating a temporary instance containing the description of the Is done. Finally, the enhanced search results are a subset of the knowledge-based subject taxonomy. Graphically displayed by the This subset is generated during the matching process. Formed by these concepts with one or more temporary instances It is. This is shown in FIG. FIG. 3 shows that these match the content of the document. Only the concept is shown. The present invention uses a knowledge-based representation system for data organization, This is particularly useful when the result is a large number of documents, such as thousands. Pattern By running the installer on these documents, Can be quickly aggregated into the most appropriate document for the user. Therefore, the knowledge base representation system (browser and search in Interface allows users to find relevant information quickly Can be. Another feature of taxonomies is that they group results according to concepts. Users can take a close-up view of what they consider most relevant. What you can do. This method further reduces browsing time. To enhance your search on the Internet. The search interface also maximizes query coverage and minimizes response delay. Implement responsive, concurrent access to many index servers to keep Run. Queries by explicitly expressing the capabilities of individual search engines The system ensures that only those servers that can process the query Guarantee to receive. Another feature of the present invention is a user interface, which comprises: Provides an editor for extending and reorganizing the concept hierarchy. User interface Is a navigation block that holds an interactive schematic map of the navigation process. Rausers are also provided. Navigation browser is a tree of the browsing process of the user 3 is a structural schematic representation. Next, the function of the navigation browser will be described. Profit When browsing, the user must follow a regular order of the website Generates a link and goes from one page to another. The user returns and the new When selecting a new browsing, the browsing process becomes a branching tree. Navigation The Gation Browser creates a new tree in the tree for every site / page visited. And keep track of these selections. This tree is a browsing process It is not only an overview, but an alternative to navigation (clicking a tree node And return to the associated page). Another feature of the present invention is that the system architecture uses a knowledge base for the client. Separation from the client so that the user's information is independent of the client's location. The ability to examine the news space consistently. Keep your knowledge base in one place Environment allows the environment to move users from one platform to another. Can be tracked on the form. The advantage of separation is continuous use of system servers It is easier to guarantee the potential, because the separation allows the knowledge base Is provided with shared access to This is because even at one time, the task can be independently monitored. Paraphrase The knowledge base could be stored on a separate server separate from the client it can. Referring to FIG. 4, this flowchart shows that a user can retrieve information from the Internet. Searching and using the knowledge base representation according to the present invention to organize the information. Show. In step 401, the user inquires about a keyword that needs to be searched. Input the string on the personal computer 110a, A knowledge-based web browser 130 according to Ming is used. Knowledge Base Web Brow Installs on either client 110a or server 120 Software that can. In step 403, the query string is pre-processed and Determine whether the query server understands the query syntax. This step is String Universal Resource Locator (Universal Res source Locator) (URL) and check which server Performed by deciding what to send. In general, inquiries are made by the user Needs to be translated into the specific query syntax of the server requesting the information. Normal, The query translator sends a query to the server to satisfy the query request. It has an interface. In step 405, a query is made for each server that can process the expression. Sent to. The queries can be sent sequentially or simultaneously. The advantage of sending queries simultaneously is the delay in both the network and the discovery process. Is to decrease. In other words, all servers respond to queries Sometimes can be activated. In step 407, the individual server is identified by the result size threshold. It needs to be queried repeatedly to collect number matches. Most Servers limit the amount of resources used for a given query , And return the returned result in some reasonable set. For example, If there are 100 matches for the search, the server will only allow 10 matches at a time Can be set to be sent. Therefore, if a certain number of matches are reached Goes the procedure. If the server does not reach a certain number of matches, the server Inquired repeatedly. In step 409, the results returned from the server are merged into a single result set Is done. The results are merged by removing duplicate results. Result set Each item is a reference to a document (one URL) and one line if possible. Consists of descriptive text. In step 411, it is necessary for the user to further improve the quality of the result set. If desired, the user should request that the results be compared to a knowledge base matcher. Can be. Otherwise, the result set is displayed to the user. In step 413, a document is searched for each reference in the result set. Is done. Apply the pattern matcher to the document text in step 415 And determine if there is a subject concept that matches the text. In step 417, a subject concept that matches the text of the document Is generated. In step 419, for each document that matches the concept, Instance is created. In step 421, the instance for the document is Subject classification method. The repetitions of steps 413-421 described above are processed in parallel, resulting in network delays. Minimize the effect of The reason is that the result set needs to be searched Or to include hundreds of documents. When a document is searched and classified, the system uses a sub-set of the subject taxonomy. The post-processed results are sequentially displayed graphically according to the data. This display shows the search results Classify search results for concepts that are well known and meaningful to users. Implemented to separate. Therefore, the knowledge base expression system according to the present invention is used. By using the search results, the search results depend on how much the user wants a specific segment. To browse at various levels of detail. What has been described above is intended only to illustrate applications of the principles of the present invention. This Those skilled in the art will recognize other structures and methods without departing from the spirit and scope of the invention. Can be embodied.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＩ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＣＡ，ＪＰ，ＭＸ【要約の続き】決定する。本システムは、コンセプトのインスタンスとして、知識ベース表現システムにドキュメントを提供する。────────────────────────────────────────────────── ─── Continuation of front page (81) Designated countries EP (AT, BE, CH, DE, DK, ES, FI, FR, GB, GR, IE, IT, L U, MC, NL, PT, SE), CA, JP, MX [Continuation of summary] decide. The system consists of a concept instance and To provide documents to the knowledge base representation system You.

Claims

[Claims] 1. An apparatus for classifying an electronic representation of a document, said apparatus comprising: A knowledge base that automatically organizes concepts and concept instances An expression system, Means to associate each concept with a search pattern; Using search patterns, whether each document is an instance of a concept And if it is an instance, as an instance of that concept, Means for providing document data to the knowledge base representation system; An apparatus comprising: 2. The apparatus according to claim 1, The apparatus is used in a system having a plurality of search engines, the apparatus comprising: ,further, Translate the search pattern into a format compatible with the search engine, and An apparatus comprising means for providing a format to the search engine. 3. How to organize an electronic representation of a document in a knowledge-based representation system The knowledge-based representation system comprises a computer and a network. Are interconnected and the document is the computer and network Operating in an environment that can be retrieved from Entering a query to explore the document; Which computers and networks can understand the query syntax Deciding; The inquiry is sent to each of the computers and networks that can process the inquiry. Sending to the network; Providing results on the document from the computer and the network Receiving, Merging the results into a single result set; Retrieving document data from the result set; Text matching pattern of the knowledge based representation system Improving the quality of the document data by comparing When, A method comprising: 4. 4. The method according to claim 3, wherein each said result is associated with each said document. A method comprising the step of: 5. 5. The method according to claim 4, wherein the step of improving the quality further comprises: To Searching for a document for each said reference; Applying the matching pattern to the entire document; Determining a list of concepts that match the document; The document is expressed as a knowledge base expression as an instance of the concept. Providing steps to the system; A method comprising: 6. 4. The method according to claim 3, wherein the transmitting step comprises the step of: Simultaneously to each of the computers and networks that can process the query A method characterized by transmitting to a user. 7. 6. The method according to claim 5, wherein the searching step includes matching each of the references. To minimize the effect of network delay on the result collection. A method comprising: 8. 6. The method according to claim 5, wherein the text matching pattern is: Those who can be edited by changing the list of concepts Law. 9. 4. The method according to claim 3, wherein the knowledge base representation system comprises A method characterized by being stored in the client. 10. 4. The method according to claim 3, wherein said knowledge-based representation system comprises a Stored in the server. 11. In an environment that includes a means to locate documents using matching patterns Locate electronic representations of documents that contain information about a given subject An apparatus, wherein the apparatus comprises: An information retrieval system that organizes information according to the subject, In the information retrieval system, each subject is associated with a matching pattern. Means, In the information retrieval system, matching associated with the predetermined subject The above-mentioned method of locating document data using a pattern matching pattern Column to respond to inquiries containing a given subject and A document located by said means using at least a matching pattern Means for returning the location of the An apparatus comprising: 12. The apparatus according to claim 11, The environment uses multiple patterns to locate documents using matching patterns. A step, wherein the apparatus further comprises: The matching pattern is defined as one of a plurality of matching patterns. Translation into a form that is appropriate for each of the means using the Characterized by comprising means for providing each said means using one matching means And equipment. 13. The apparatus according to claim 11, further comprising: Responding to a matching pattern for each of the predetermined subjects, To determine which subject is an instance of the document, Means for associating at least the location of the document with each of the predetermined subjects. The matching pattern associated with the predetermined subject is the document Means for finding a match in the list. 14． An apparatus according to any of claims 11, 12, or 13, wherein In addition, In the associating means, the matching pattern is received from a user of the apparatus. An apparatus comprising interactive receiving means for receiving. 15. Apparatus according to any of claims 11, 12 or 13, wherein The matching pattern includes an expression in a regular expression language. apparatus. 16. On computer readable media that classifies electronic representations of documents So, Knowledge base tables that automatically organize concepts and concept instances The current component, One or more search pattern configuration elements that associate each of the concepts with a search pattern And Using search patterns, each document is an instance of the concept And, if it is an instance, an instance of that concept. Text match to provide document data to the knowledge base representation system Components A medium comprising: 17． A computer readable medium according to claim 16, wherein: The computer readable medium is a system having a plurality of search engines. Used in a computer, the computer readable medium further comprises: Translate the search pattern into a format compatible with the search engine, and Providing one or more translation components for providing a format to the search engine. The medium to mark.