JP2007179490A

JP2007179490A - Information resource retrieval device, information resource retrieval method and information resource retrieval program

Info

Publication number: JP2007179490A
Application number: JP2005380311A
Authority: JP
Inventors: Noriko Kamikado; 典子神門; Terukazu Kanazawa; 輝一金沢
Original assignee: Research Organization of Information and Systems
Current assignee: Research Organization of Information and Systems
Priority date: 2005-12-28
Filing date: 2005-12-28
Publication date: 2007-07-12
Anticipated expiration: 2025-12-28
Also published as: JP4324650B2

Abstract

PROBLEM TO BE SOLVED: To regularly obtain a retrieval result without impairing retrieval efficiency even if metadata assigned to an information resource usable on a Web is imperfect or a retrieval condition is improper. SOLUTION: The device comprises a first score calculation part 21 calculating a first score for information resources with metadata containing a free word matched with an input free word; a second score calculation part 22 calculating a second score for information resources with metadata containing a thesaurus keyword matched with an input thesaurus keyword, and multiplying, for information resources with metadata containing the thesaurus keyword belonging to another node in the same facet, the second score by a weight according to a distance on a tree structure for the information resources with metadata containing a thesaurus keyword belonging to another node in the same facet, and taking the result as the second score; a score addition part 23 obtaining a third score by adding the first and second scores; and a sorting part 24 obtaining a list of information resources sorted in the descending order of the third score as a retrieval result to be outputted. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、情報資源検索装置、情報資源検索方法及び情報資源検索プログラムに関する。より詳しくは、作者、タイトル、作成日等の書誌情報がメタデータとして付与された、インターネット上でウェブブラウザを介して閲覧可能である情報資源や文書の検索において、検索におけるヒットミスを解消し、もって情報資源の可用性を向上させるための検索技術及びその検索条件の入力及び検索結果のブラウジングにおけるユーザーインターフェースに関する。 The present invention relates to an information resource search device, an information resource search method, and an information resource search program. More specifically, bibliographic information such as author, title, date of creation, etc. is given as metadata, and in search of information resources and documents that can be browsed via the web browser on the Internet, the hit miss in the search is resolved, Accordingly, the present invention relates to a search technique for improving the availability of information resources and a user interface for inputting search conditions and browsing search results.

例えばＨＴＭＬ化或いはＰＤＦ化された、又はテキストデータである学術論文等の文書や、写真、地図、稀少本などのデジタルライブラリー化されたデジタルコンテンツは、インターネット上でウェブブラウザを介して閲覧可能である。これらの情報資源の量はすでに膨大であり、そのデータベース化、その円滑な流通、再利用が強く要請される。 For example, documents such as academic papers that have been converted to HTML or PDF, or text data, and digital contents such as photographs, maps, and rare books can be viewed on the Internet via a web browser. is there. The amount of these information resources is already enormous, and there is a strong demand for the creation of a database, its smooth distribution, and reuse.

他人の情報資源を利用するためには、検索エンジンを使用して所望する情報資源に辿り着く必要があるが、従来の検索エンジンは、専ら情報資源に書誌情報として付与されたメタデータを対象として、入力された検索テキストと一致するテキストを含む情報資源を検索結果として抽出する。 In order to use another person's information resources, it is necessary to use a search engine to arrive at the desired information resource. However, conventional search engines are exclusively targeted for metadata given as bibliographic information to information resources. Then, an information resource including text that matches the input search text is extracted as a search result.

メタデータの記述方式としては、例えばダブリンコア（ＤｕｂｌｉｎＣｏｒｅ）方式が規格化されている（ＩＳＯ１５８３６）。このダブリンコア方式においては、ｔｉｔｌｅ（資源の題名）、ｃｒｅａｔｏｒ（作成者）、ｓｕｂｊｅｃｔ（資源の内容に含まれるトピック、キーワード）、ｄｅｓｃｒｉｐｔｉｏｎ（資源の内容の説明、要約、目次等）、ｄａｔｅ（作成日又は公開日）、ｔｙｐｅ（資源の内容の性質又はジャンル）、ｃｏｖｅｒａｇｅ（資源の範囲若しくは対象。地理区分、時間区分等）、ｒｉｇｈｔｓ（著作権、産業財産権等の言明）等、全部で１５の要素タイプが記述すべきメタデータの要素として定義される。情報資源提供者或いはライブラリアンは、人手によって、或いは一部自動生成により、この記述方式に従い、メタデータを例えばＲＤＦ（ＲｅｓｏｕｒｃｅＤｅｓｃｒｉｐｔｉｏｎＦｒａｍｅｗｏｒｋ）やＨＴＭＬ等が規定するタグ等として記述し、このメタデータを情報資源に付与してウェブ上で公表する。
特開２０００−１１２９４９特開２００４−３１０１９９ As a metadata description method, for example, the Dublin Core method is standardized (ISO15836). In this Dublin Core system, title (resource title), creator (creator), subject (topics and keywords included in resource contents), description (description of resource contents, summary, table of contents, etc.), date (creation) Date (date or release date), type (the nature or genre of the content of the resource), coverage (range or target of the resource, geographical division, time division, etc.), rights (statement of copyright, industrial property rights, etc.), etc. Are defined as metadata elements to be described. The information resource provider or librarian describes the metadata as, for example, a tag specified by RDF (Resource Description Framework), HTML, etc. according to this description method, either manually or partially automatically, and this metadata is described. Give to information resources and publish on the web.
JP 2000-1212949 A JP 2004-310199 A

しかしながら、他人に自身の情報資源を利用させるために行なうメタデータの記述作業は、煩雑であり、現存する情報資源に付与されたメタデータの記述は、記述されるべき要素を欠くものであったり、或いはキーワードが不足するものであったりという点において不完全である。 However, the metadata description work to make others use their information resources is cumbersome, and the description of metadata given to existing information resources may lack elements to be described. Or incomplete in terms of missing keywords.

また、特に、情報資源の検索に不慣れなユーザが、検索エンジンを介して情報資源に辿り着こうとする場合に顕著であるが、入力された検索文字列が不適当であって記述されたメタデータのキーワードに一致しないと、検索結果として情報資源が１件も得られない（ＥｘａｃｔＭａｔｃｈ型検索）。逆に、入力された検索文字列にヒットする情報資源が多数件出力される場合も、どれが所望する検索結果であるかが判別できないため、得られた検索結果をスクロールし、１件づつ閲覧して判断しなければならず、検索効率が悪く多大な労力を要する。 This is particularly noticeable when a user unfamiliar with information resource search tries to reach the information resource via the search engine, but the input search character string is inappropriate and is described in the meta data. If it does not match the data keyword, no information resource is obtained as a search result (Exact Match type search). Conversely, when a large number of information resources that hit the input search character string are output, it is impossible to determine which is the desired search result, so the obtained search results are scrolled and viewed one by one. And the search efficiency is poor and a lot of labor is required.

例えば、特に学術に関する情報資源には、シソーラス分類に基づき階層化されたディレクトリ構造であるファセット上のノードに対応して定義されたキーワードが付されている場合が多いが、多くの習熟していないユーザは、このシソーラス分類に基づくファセット体系を知らず、適切な検索キーワードとして入力することができない。 For example, in particular, academic information resources often have keywords defined corresponding to nodes on facets, which are hierarchical directory structures based on thesaurus classification, but are not well-versed. The user does not know the facet system based on the thesaurus classification and cannot input it as an appropriate search keyword.

さらに、従来の検索エンジンにおいては、検索条件が複数回入力されると、複数の検索条件のアンド条件に一致するもののみを抽出するため、１回でも不適当な検索キーワードの入力がされると、所望する情報資源が検索結果リストから外れてしまい、この情報資源に辿り着く機会を逸してしまう。 Further, in the conventional search engine, when a search condition is input a plurality of times, only those that match the AND condition of the plurality of search conditions are extracted, so that an inappropriate search keyword is input even once. The desired information resource is removed from the search result list, and the opportunity to reach this information resource is missed.

本発明は、上記課題に鑑みてされたものであり、その目的は、ウェブ上利用可能な情報資源に付与された書誌情報であるメタデータが不完全であっても、或いは習熟していないユーザにより不適切な検索キーワードが入力された場合であっても、検索効率を損なうことなく、常に検索結果を得ることができるとともに、より検索条件に適合する検索結果から順に得ることができる情報資源検索装置、情報資源検索方法及び情報資源検索プログラムを提供することにある。 The present invention has been made in view of the above problems, and its purpose is to provide users who are incomplete or unfamiliar with the bibliographic information given to information resources available on the web. Even if an inappropriate search keyword is entered by, information resource search that can always obtain search results without sacrificing search efficiency and that can be obtained in order from search results that match the search conditions. An apparatus, an information resource search method, and an information resource search program are provided.

また、本発明の他の目的は、入力された検索キーワードが情報資源に付与されたメタデータ上の記述に一致しない場合であっても、入力された検索キーワードに関連性を有する情報資源のリストを得ることを可能とする点にある。 Another object of the present invention is to provide a list of information resources that are relevant to an input search keyword even if the input search keyword does not match the description on the metadata assigned to the information resource. It is in the point that makes it possible to obtain.

本発明の他の目的は、検索キーワード入力が複数回実行された場合に、検索漏れを生じさせることなく、追加的検索キーワードが入力されるごとに、確実に、より精度の高い検索結果を得ることを可能とする点にある。 Another object of the present invention is to reliably obtain a more accurate search result each time an additional search keyword is input without causing a search omission when the search keyword input is executed a plurality of times. It is in the point that makes it possible.

本発明の他の目的は、シソーラス体系上の複数のファセットに属するキーワードを同時に検索条件として指定することを可能とする点にある。 Another object of the present invention is to make it possible to simultaneously specify keywords belonging to a plurality of facets on a thesaurus system as search conditions.

本発明の原理は、書誌情報としてのメタデータが付加された文書或いはデジタルコンテンツ等の、ウェブ上の情報資源の検索において、各情報資源の内容に記述された用語を検索するフリーワード検索と、シソーラス体系上で概念のカテゴリーごとに木構造に階層化されて定義されるファセットに付与されるキーワードを検索するシソーラスキーワード検索とを融合し、検索用フリーワードが入力される度に各情報資源についての第１のスコアを算出し、シソーラスキーワードが入力指定される度に各情報資源についての第２のスコアを算出し、この第１及び第２のスコアを合算して、この合算スコアの高い順に検索結果の情報資源をソートして表示出力するものである。 The principle of the present invention is a free word search for searching a term described in the contents of each information resource in a search of information resources on the web, such as a document or digital content with metadata as bibliographic information, Each information resource is merged with a thesaurus keyword search that searches for keywords assigned to facets that are defined in a hierarchical structure in a tree structure for each concept category in the thesaurus system. The second score for each information resource is calculated each time a thesaurus keyword is input and specified, and the first and second scores are added together in descending order of the combined score. The information resources of the search result are sorted and displayed and output.

より詳細には、ユーザが情報資源の検索を実行するためのユーザインターフェースの一例として、シソーラス体系上の木構造のファセットを複数表示可能とし、任意のファセット上の任意のキーワードを複数選択入力可能とする（マルチファセットキーワード指定）とともに、検索用フリーワードの入力を可能とし、このシソーラス上のキーワード或いはフリーワードが入力される毎に、各情報資源のスコアを算出する。 More specifically, as an example of a user interface for a user to search for information resources, a plurality of facets having a tree structure on a thesaurus system can be displayed, and a plurality of arbitrary keywords on an arbitrary facet can be selected and input. (Multi-faceted keyword designation) and the input of a search free word is enabled, and each time a keyword or free word on the thesaurus is input, the score of each information resource is calculated.

このスコアの算出においては、第１のスコア及び第２のスコアともに、入力指定された検索条件に一致する情報資源にだけでなく、検索条件に一致はしないが関連する情報資源にも重み付けされたスコアが加算される。この重み付けは、好適には、シソーラス体系の木構造のファセット上の距離に応じて実行されてよい。 In calculating this score, both the first score and the second score are weighted not only to the information resource that matches the input specified search condition, but also to the related information resource that does not match the search condition. Score is added. This weighting may preferably be performed according to the distance on the facet of the thesaurus tree.

検索結果の表示出力において、加算されたスコアの高い情報資源から順にソートされて出力される。新たなキーワード或いはフリーワードが入力される毎に算出される各情報資源のスコアに基づき、出力される検索結果のリストも更新表示される。例えば、検索用入力画面において、スコアの加算をクリア操作するためのキーを設け、このクリアキーが操作されない限り、スコア加算を繰り返し処理するよう構成されてよい。このため、常に検索結果が得られると共に、検索結果が出力された後、さらに追加的に検索条件を指定入力する毎に、検索の精度が向上し、より所望する情報資源がリスト上位に表示される。 In the display output of search results, the information resources with the highest score are added and sorted in order. Based on the score of each information resource calculated each time a new keyword or free word is input, the output search result list is also updated and displayed. For example, a key for clearing the score addition may be provided on the search input screen, and the score addition may be repeatedly processed unless the clear key is operated. As a result, search results are always obtained, and after each search result is output, each time a search condition is additionally specified, the search accuracy is improved and more desired information resources are displayed at the top of the list. The

また、検索結果の表示出力において、入力されたキーワード或いはフリーワードに一致する情報資源のリストと、一致しないが検索条件に関連する情報資源のリストとは、例えば別個のリストとして別欄に表示する等、区別可能に表示出力されてよい。メタデータの不完全性及び／又はユーザの不慣れに起因して、ユーザが指定した検索条件が、常に適切な検索条件であるとは限らないため、「検索条件に少し関係している」情報資源のリストが、そのタイトルや書誌情報の全部又は一部、及び／又はそのファセットのキーワードと共に、併せて表示されれば、ユーザはより適切な検索条件（フリーワード及び／又はシソーラス上のキーワード）に気付き、容易にこの検索条件に修正入力することができる。 In the display output of search results, the list of information resources that match the input keyword or free word and the list of information resources that do not match but are related to the search conditions are displayed in separate fields, for example, as separate lists. Or the like. Information resources that are "a bit related to search conditions" because the search conditions specified by the user are not always appropriate search conditions due to incomplete metadata and / or unfamiliarity with the user If the list is displayed together with all or part of the title and bibliographic information and / or keywords of the facets, the user can search for more appropriate search conditions (keywords on free words and / or thesaurus). You can easily make corrections to this search condition.

本発明のある特徴によれば、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータを記憶するメタデータ記憶部と、前記情報資源を検索するための入力シソーラスキーワード及び入力フリーワードを入力する入力部と、前記入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持する第１のスコア算出部と、前記入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持する第２のスコア算出部と、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得るスコア合算部と、前記第３のスコアの大きい順にソートされた情報資源を検索結果として一覧表示する出力部とを具備することを特徴とする情報資源検索装置が提供される。 According to an aspect of the present invention, metadata provided to an information resource, the metadata including a thesaurus keyword describing a node name in a facet on a thesaurus for each information resource, and the information resource A metadata storage unit for storing metadata including a free word and a pair of importance levels described therein, an input thesaurus keyword for searching for the information resource, and an input unit for inputting an input free word, For an information resource that includes the free word that matches the input free word in the metadata, a first score is calculated based on the degree of importance paired with the matching free word, and the first score is retained A first score calculation unit, the thesaurus keyword that matches the input thesaurus keyword, or the input thesaurus skier A second score is calculated for an information resource that includes in the metadata a thesaurus keyword belonging to a lower node of the matching thesaurus keyword in the facet tree structure to which the node belongs, and the thesaurus keyword belonging to another node in the facet For the information resource including the metadata in the metadata, a second score is obtained by multiplying the second score by a weight β (β ≦ 1) corresponding to the distance on the tree structure, and the second score is retained. 2 score calculation units, and for each information resource, the first score and the second score are combined to obtain a third score, and the third score is sorted in descending order. There is provided an information resource search apparatus comprising an output unit that displays a list of information resources as search results.

前記第１のスコア算出部は、前記出力部による前記検索結果の出力に引き続いて前記入力フリーワードが入力された場合、前記入力フリーワードの入力によって算出された第１のスコアを、保持される第１のスコアに加算して第１のスコアとし、前記出力部は、前記第３のスコアの大きい順に再度ソートされた情報資源を更新表示してよい。 The first score calculation unit holds the first score calculated by the input of the input free word when the input free word is input subsequent to the output of the search result by the output unit. The first score may be added to the first score, and the output unit may update and display the information resources sorted again in descending order of the third score.

前記第２のスコア算出部は、前記出力部による前記検索結果の出力に引き続いて前記入力シソーラスキーワードが入力された場合、前記入力シソーラスキーワードの入力によって算出された第２のスコアを、保持される第２のスコアに加算して第２のスコアとし、前記出力部は、前記第３のスコアの大きい順に再度ソートされた情報資源を更新表示してよい。 When the input thesaurus keyword is input subsequent to the output of the search result by the output unit, the second score calculation unit holds the second score calculated by inputting the input thesaurus keyword. The second score may be added to the second score, and the output unit may update and display the information resources sorted again in descending order of the third score.

前記入力部は、複数の前記ファセットの木構造を選択的に表示し、表示された前記複数の木構造上のノードを同時に選択可能としてよい。 The input unit may selectively display a plurality of facet tree structures, and simultaneously select nodes on the displayed tree structures.

前記出力部は、前記入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源を、前記第３のスコアの大きい順にソートして一覧表示する第１の表示出力欄と、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源を、前記第３のスコアの大きい順にソートして一覧表示する第２の表示出力欄とを具備してよい。 The output unit includes an information resource including, in the metadata, the thesaurus keyword that matches the input thesaurus keyword, or a thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. A first display output column that displays a list sorted in descending order of the third score, and an information resource that includes the thesaurus keywords belonging to other nodes in the facet in the metadata, A second display output field for sorting and displaying a list in order may be provided.

前記出力部は、実質的に、表示装置の有効表示領域内に一度に表示可能な件数の情報資源を一覧表示してよい。 The output unit may substantially display a list of the number of information resources that can be displayed at one time within the effective display area of the display device.

上記情報資源検索装置は、さらに、前記情報資源の検索結果を絞り込むための絞込み条件シソーラスキーワードを入力する絞込み条件入力部と、入力された絞込み条件シソーラスキーワードと一致する前記シソーラスキーワード、又は前記絞込み条件シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源のみを、出力されるべき情報資源として設定するフィルタリング処理部とを具備してよい。 The information resource search apparatus further includes a narrowing condition input unit for inputting a narrowing condition thesaurus keyword for narrowing down the search result of the information resource, the thesaurus keyword matching the input narrowing condition thesaurus keyword, or the narrowing condition A filtering processing unit that sets, as information resources to be output, only information resources that include a thesaurus keyword belonging to a node lower than the matching thesaurus keyword in the facet tree structure to which the thesaurus keyword belongs. .

上記情報資源検索装置は、さらに、前記ファセット中のいずれのノードにも属さない情報資源を、前記第３のスコアの大きい順にソートして一覧表示する第３の表示出力欄を具備してよい。 The information resource search apparatus may further include a third display output field for displaying a list of information resources that do not belong to any node in the facet, sorted in descending order of the third score.

本発明の他の特徴によれば、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータを記憶するメタデータ記憶部と、前記情報資源を検索するための入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持する第１のスコア算出部と、前記情報資源を検索するための入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持する第２のスコア算出部と、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得るスコア合算部と、前記第３のスコアの大きい順にソートされた情報資源のリストを出力されるべき検索結果として得るソート部とを具備することを特徴とする情報資源検索サーバ装置が提供される。 According to another feature of the present invention, metadata provided to an information resource, the metadata including a thesaurus keyword describing a node name in a facet on a thesaurus for each information resource, and the information A metadata storage unit that stores metadata including a free word described in a resource and a pair of importance thereof, and the free word that matches an input free word for searching for the information resource in the metadata A first score is calculated based on the degree of importance paired with the matching free word, and a first score calculation unit that holds the first score is searched for the information resource to be included. The thesaurus keyword that matches the input thesaurus keyword for the search, or the match on the facet tree structure to which the input thesaurus keyword belongs A second score is calculated for an information resource that includes a thesaurus keyword belonging to a subordinate node of a thesaurus keyword in the metadata, and an information resource that includes a thesaurus keyword that belongs to another node in the facet in the metadata. A second score is obtained by multiplying the second score by a weight β (β ≦ 1) corresponding to the distance on the tree structure, and a second score calculation unit that holds the second score, for each information resource A score summation unit that obtains a third score by summing the first score and the second score, and a search result to be output as a list of information resources sorted in descending order of the third score An information resource search server device characterized by comprising a sorting unit obtained as described above is provided.

本発明の他の特徴によれば、情報資源を検索するための入力フリーワードを入力するとともに、シソーラス上の複数のファセットの木構造を選択的に表示し、表示された前記木構造上のノードを選択させることにより、入力シソーラスキーワードを入力する入力部と、前記情報資源に付与されるメタデータに基づいて、前記入力フリーワード又は入力シソーラスキーワードに一致するメタデータを有する情報資源の集合である第１の情報資源群について、スコアを算出するとともに、前記入力シソーラスキーワードが属するファセット内の他のノードに属するシソーラスキーワードに一致するメタデータを有する情報資源の集合である第２の情報資源群について、スコアを算出する処理部と、スコアの大きい順にそれぞれソートされた前記第１の情報資源群と、前記第２の情報資源群とを、表示画面上区別可能に一覧表示する出力部とを具備することを特徴とする情報資源検索クライアント装置が提供される。 According to another aspect of the present invention, an input free word for searching for information resources is input, a plurality of facet tree structures on the thesaurus are selectively displayed, and the displayed nodes on the tree structure And an input resource for inputting an input thesaurus keyword, and a set of information resources having metadata matching the input free word or the input thesaurus keyword based on the metadata assigned to the information resource. Regarding the first information resource group, a score is calculated and the second information resource group is a set of information resources having metadata matching the thesaurus keyword belonging to another node in the facet to which the input thesaurus keyword belongs , The processing unit for calculating the score, and the above-mentioned sorted in descending order of the score And 1 information resource group, wherein a second resource group information resource retrieval client apparatus characterized by comprising an output section for displaying a screen on distinguishably lists are provided.

本発明の他の特徴によれば、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードを含むメタデータを記憶するメタデータ記憶部と、前記情報資源を検索するための入力シソーラスキーワードを入力する入力部と、前記入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、スコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記スコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じてスコアとし、該スコアを保持するスコア算出部と、前記スコアの大きい順にソートされた情報資源を検索結果として一覧表示する出力部とを具備することを特徴とする情報資源検索装置が提供される。 According to another aspect of the present invention, metadata provided to an information resource, the metadata including a thesaurus keyword describing a node name in a facet on the thesaurus for each information resource. On the facet tree structure to which the input thesaurus keyword that matches the input thesaurus keyword or the input thesaurus keyword belongs A score is calculated for the information resource including the thesaurus keyword belonging to the lower node of the matching thesaurus keyword in the metadata, and the information resource including the thesaurus keyword belonging to the other node in the facet is included in the metadata. The tree structure in the score A score is calculated by multiplying the weight β according to the distance above (β ≦ 1), a score calculation unit that holds the score, and an output unit that displays a list of information resources sorted in descending order of the score as a search result. There is provided an information resource search device characterized by comprising.

本発明の他の特徴によれば、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータをメタデータ記憶部に記憶するステップと、前記情報資源を検索するための入力シソーラスキーワード及び入力フリーワードを入力するステップと、前記入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持するステップと、前記入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持するステップと、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得るステップと、前記第３のスコアの大きい順にソートされた情報資源を検索結果として一覧表示するステップとを含むことを特徴とする情報資源検索方法が提供される。 According to another feature of the present invention, metadata provided to an information resource, the metadata including a thesaurus keyword describing a node name in a facet on a thesaurus for each information resource, and the information Storing metadata including a free word described in the resource and a pair of importance levels in the metadata storage unit; inputting an input thesaurus keyword and an input free word for searching the information resource; The first score is calculated based on the importance of pairing with the matching free word for the information resource including the free word that matches the input free word in the metadata, and the first score is calculated. And the thesaurus keyword matching the input thesaurus keyword, or the input thesaurus ski A second score is calculated for an information resource that includes in the metadata a thesaurus keyword belonging to a lower node of the matching thesaurus keyword in the facet tree structure to which the node belongs, and the thesaurus belonging to other nodes in the facet For the information resource including the keyword in the metadata, the second score is multiplied by the weight β (β ≦ 1) corresponding to the distance on the tree structure to obtain the second score, and the second score is retained. For each information resource, adding the first score and the second score to obtain a third score, and using the information resources sorted in descending order of the third score as search results An information resource search method including a step of displaying a list.

本発明の他の特徴によれば、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータをメタデータ記憶部に記憶するステップと、前記情報資源を検索するための入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持するステップと、前記情報資源を検索するための入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持するステップと、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得るステップと、前記第３のスコアの大きい順にソートされた情報資源のリストを出力されるべき検索結果として得るステップとを含むことを特徴とする情報資源検索方法が提供される。 According to another feature of the present invention, metadata provided to an information resource, the metadata including a thesaurus keyword describing a node name in a facet on a thesaurus for each information resource, and the information Storing metadata including a free word described in a resource and a pair of importance levels in a metadata storage unit; and selecting the free word that matches an input free word for searching the information resource as the meta data. For the information resource included in the data, a step of calculating a first score based on the degree of importance paired with the matching free word, retaining the first score, and an input for searching for the information resource The thesaurus keyword that matches the thesaurus keyword, or the match on the facet tree structure to which the input thesaurus keyword belongs A second score is calculated for an information resource that includes a thesaurus keyword belonging to a subordinate node of a thesaurus keyword in the metadata, and an information resource that includes a thesaurus keyword that belongs to another node in the facet in the metadata. The second score is multiplied by a weight β (β ≦ 1) corresponding to the distance on the tree structure to obtain a second score, the second score is retained, and for each information resource, the first score Adding a score and the second score to obtain a third score; and obtaining a list of information resources sorted in descending order of the third score as a search result to be output. An information resource search method characterized by the above is provided.

本発明の他の特徴によれば、情報資源を検索するための入力フリーワードを入力するとともに、シソーラス上の複数のファセットの木構造を選択的に表示し、表示された前記木構造上のノードを選択させることにより、入力シソーラスキーワードを入力するステップと、前記情報資源に付与されるメタデータに基づいて、前記入力フリーワード又は入力シソーラスキーワードに一致するメタデータを有する情報資源の集合である第１の情報資源群について、スコアを算出するとともに、前記入力シソーラスキーワードが属するファセット内の他のノードに属するシソーラスキーワードに一致するメタデータを有する情報資源の集合である第２の情報資源群について、スコアを算出するステップと、スコアの大きい順にそれぞれソートされた前記第１の情報資源群と、前記第２の情報資源群とを、表示画面上区別可能に一覧表示するステップとを含むことを特徴とする情報資源検索方法が提供される。 According to another aspect of the present invention, an input free word for searching for information resources is input, a plurality of facet tree structures on the thesaurus are selectively displayed, and the displayed nodes on the tree structure And selecting an input thesaurus keyword, and based on the metadata assigned to the information resource, a first set of information resources having metadata matching the input free word or the input thesaurus keyword For a second information resource group, which is a set of information resources having metadata corresponding to a thesaurus keyword belonging to another node in the facet to which the input thesaurus keyword belongs, while calculating a score for one information resource group, The step of calculating the score and before sorting in descending order of score A first resource group, and the second resource group information resource retrieval method characterized by including the step of displaying on the screen distinguishably lists is provided.

本発明の他の特徴によれば、情報資源検索処理をコンピュータに実行させるための情報資源検索プログラムであって、該プログラムは、前記コンピュータに、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータをメタデータ記憶部に記憶する処理と、前記情報資源を検索するための入力シソーラスキーワード及び入力フリーワードを入力する処理と、前記入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持する処理と、前記入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持する処理と、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得る処理と、前記第３のスコアの大きい順にソートされた情報資源を検索結果として一覧表示する処理とを含む処理を実行させるためのものであることを特徴とする情報資源検索プログラムが提供される。 According to another aspect of the present invention, there is provided an information resource search program for causing a computer to execute an information resource search process, the program being metadata attached to the information resource to the computer, Metadata stores, for each information resource, metadata including a thesaurus keyword that describes a node name in a facet on a thesaurus and a pair of free words and importance levels described in the information resource. Processing for storing the information resource, processing for inputting an input thesaurus keyword and input free word for searching the information resource, and information resource including the free word that matches the input free word in the metadata. The first score is calculated based on the importance level paired with the free word, and the first score is maintained. And the information resource including in the metadata the thesaurus keyword that matches the input thesaurus keyword, or the thesaurus keyword that belongs to the lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. 2 for the information resource including the thesaurus keywords belonging to other nodes in the facet in the metadata, the weight β corresponding to the distance on the tree structure (β ≦ 1) ) To obtain a second score, and to retain the second score; and for each information resource, a process for adding the first score and the second score to obtain a third score; , Processing for displaying a list of information resources sorted in descending order of the third score as search results; Information resource retrieval program characterized in that for executing a process comprising is provided.

本発明の他の特徴によれば、情報資源検索処理をコンピュータに実行させるための情報資源検索プログラムであって、該プログラムは、前記コンピュータに、情報資源に付与されるメタデータであって、該メタデータは、前記情報資源ごとに、シソーラス上のファセット内のノード名を記述するシソーラスキーワードと、前記情報資源内に記述されたフリーワード及びその重要度の対とを含むメタデータをメタデータ記憶部に記憶する処理と、
前記情報資源を検索するための入力フリーワードと一致する前記フリーワードを前記メタデータに含む情報資源について、前記一致するフリーワードと対をなす重要度に基づいて、第１のスコアを算出し、該第１のスコアを保持する処理と、前記情報資源を検索するための入力シソーラスキーワードと一致する前記シソーラスキーワード、又は前記入力シソーラスキーワードの属するファセットの木構造上前記一致するシソーラスキーワードの下位ノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、第２のスコアを算出すると共に、前記ファセット内のその他のノードに属するシソーラスキーワードを前記メタデータに含む情報資源について、前記第２のスコアに前記木構造上の距離に応じた重みβ（β≦１）を乗じて第２のスコアとし、該第２のスコアを保持する処理と、情報資源ごとに、前記第１のスコアと前記第２のスコアとを合算して第３のスコアを得る処理と、前記第３のスコアの大きい順にソートされた情報資源のリストを出力されるべき検索結果として得る処理とを含む処理を実行させるためのものであることを特徴とする情報資源検索プログラムが提供される。 According to another aspect of the present invention, there is provided an information resource search program for causing a computer to execute an information resource search process, the program being metadata attached to the information resource to the computer, Metadata stores, for each information resource, metadata including a thesaurus keyword that describes a node name in a facet on a thesaurus and a pair of free words and importance levels described in the information resource. Processing to be stored in the department,
For an information resource that includes in the metadata the free word that matches the input free word for searching for the information resource, a first score is calculated based on the importance paired with the matching free word; A process for holding the first score and the thesaurus keyword that matches the input thesaurus keyword for searching the information resource, or a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs A second score is calculated for the information resource including the thesaurus keyword belonging to the metadata in the metadata, and the information score including the thesaurus keyword belonging to another node in the facet is included in the second score as the second score. Weight β (β ≦ 1) according to the distance on the tree structure A second score obtained by multiplying, a process for retaining the second score, a process for obtaining a third score by adding the first score and the second score for each information resource, An information resource search program is provided for executing a process including a process of obtaining a list of information resources sorted in descending order of the third score as a search result to be output.

本発明の他の特徴によれば、情報資源検索処理をコンピュータに実行させるための情報資源検索プログラムであって、該プログラムは、前記コンピュータに、情報資源を検索するための入力フリーワードを入力するとともに、シソーラス上の複数のファセットの木構造を選択的に表示し、表示された前記木構造上のノードを選択させることにより、入力シソーラスキーワードを入力する処理と、前記情報資源に付与されるメタデータに基づいて、前記入力フリーワード又は入力シソーラスキーワードに一致するメタデータを有する情報資源の集合である第１の情報資源群について、スコアを算出するとともに、前記入力シソーラスキーワードが属するファセット内の他のノードに属するシソーラスキーワードに一致するメタデータを有する情報資源の集合である第２の情報資源群について、スコアを算出する処理と、スコアの大きい順にそれぞれソートされた前記第１の情報資源群と、前記第２の情報資源群とを、表示画面上区別可能に一覧表示する処理とを含む処理を実行させるためのものであることを特徴とする情報資源検索プログラムが提供される。 According to another aspect of the present invention, there is provided an information resource search program for causing a computer to execute an information resource search process, wherein the program inputs an input free word for searching the information resource to the computer. In addition, by selectively displaying a tree structure of a plurality of facets on the thesaurus and selecting nodes on the displayed tree structure, a process for inputting an input thesaurus keyword, and a meta data assigned to the information resource Based on the data, a score is calculated for the first information resource group that is a set of information resources having metadata that matches the input free word or the input thesaurus keyword, and other scores in the facet to which the input thesaurus keyword belongs are calculated. Information with metadata that matches a thesaurus keyword belonging to For the second information resource group that is a set of resources, a process for calculating a score, the first information resource group sorted in descending order of the score, and the second information resource group are displayed on the display screen. There is provided an information resource search program characterized in that the program includes a process including a process of displaying a list in a distinguishable manner.

本発明によれば、ウェブ上利用可能な情報資源に付与された書誌情報であるメタデータが不完全であっても、或いは習熟していないユーザにより不適切な検索キーワードが入力された場合であっても、検索効率を損なうことなく、常に検索結果を得ることができるとともに、より検索条件に適合する検索結果から順に得ることができる。 According to the present invention, even when metadata that is bibliographic information assigned to information resources available on the web is incomplete or an inappropriate search keyword is input by an unskilled user. However, it is possible to always obtain the search results without impairing the search efficiency, and to obtain the search results that are more suitable for the search conditions.

また、入力された検索キーワードが情報資源に付与されたメタデータ上の記述に一致しない場合であっても、入力された検索キーワードに関連性を有する情報資源のリストを得ることが可能となる。 Further, even when the input search keyword does not match the description on the metadata given to the information resource, it is possible to obtain a list of information resources having relevance to the input search keyword.

また、検索キーワード入力が複数回実行された場合に、検索漏れを生じさせることなく、追加的検索キーワードが入力されるごとに、確実に、より精度の高い検索結果を得ることが可能となる。 In addition, when a search keyword is input a plurality of times, it is possible to reliably obtain a more accurate search result each time an additional search keyword is input without causing a search omission.

さらに、シソーラス体系上の複数のファセットに属するキーワードを同時に検索条件として指定することが可能となる。 Furthermore, keywords belonging to a plurality of facets on the thesaurus system can be simultaneously specified as search conditions.

従って、利用者側におけるウェブ上の情報資源の検索効率が大幅に向上し、もってウェブ上の情報資源の流通が促進され、そのデータ可用性が向上する。 Therefore, the search efficiency of information resources on the web on the user side is greatly improved, so that the distribution of information resources on the web is promoted and the data availability is improved.

以下、図面を参照して、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

＜本実施形態の機能構成＞
図１は、本実施形態に係る情報資源検索装置の一構成例を示す。 <Functional configuration of this embodiment>
FIG. 1 shows a configuration example of an information resource search apparatus according to the present embodiment.

本実施形態に係る情報資源検索装置は、例えば文書や、写真、地図、稀少本等のデジタルコンテンツ等、ウェブ上利用可能な情報資源を格納する外部記憶装置であるデータ記憶部３を具備する。なお、当然ながら、本実施形態は、データ記憶部３に入力される入力手段を何ら限定するものではない。また、入力手段は、直接コンテンツの入力を受け付ける手段の他、例えばＣＤ−ＲＯＭ、ＤＶＤ、ＭＯ等任意の外部記録媒体に記録された情報資源を読み込み、入力として受け付けてもよい。データ記憶部３は、各情報資源の書誌情報であるメタデータ３１と、情報資源の文書本文或いはコンテンツ自体３２とを記憶する。代替的に、検索対象とすべき情報資源の数、所望される処理速度、データ記憶部３の容量、本文或いはコンテンツ自体もフリーワード検索の対象とするか否か等に照らして、情報資源の文書本文或いはコンテンツ自体３２は、本実施形態に係る情報資源検索装置と、例えばインターネット等の通信回線を介して、ネットワーク上接続される他のサーバ装置（図示せず）に格納されてもよい。メタデータ３１の記述方式としては、任意の方式が採用されてよいが、例えば上述のダブリンコア（ＤｕｂｌｉｎＣｏｒｅ）方式が使用されてよく、この場合ｔｉｔｌｅ（資源の題名）、ｃｒｅａｔｏｒ（作成者）、ｓｕｂｊｅｃｔ（資源の内容に含まれるトピック、の性質又はジャンル）、ｃｏｖｅｒａｇｅ（資源の範囲若しくは対象。地理区分、時間区分等）、ｒｉｇｈｔｓ（著作権、産業財産権等の言明）等、全部で１５の要素タイプの全部或いは一部がメタデータの要素として定義され得る。情報資源の文書本文或いはコンテンツ自体３２は、文書であればＨＴＭＬファイル、ＰＤＦファイル、テキストファイルやＣＳＶファイル等として記憶されてよく、他の写真、地図、稀少本等のデジタルコンテンツは任意の方式のイメージデータ等として記憶されてよく、必要に応じてコンテンツの特性に応じた任意の圧縮方式により圧縮され得る。 The information resource search device according to this embodiment includes a data storage unit 3 that is an external storage device that stores information resources that can be used on the web, such as documents, digital contents such as photographs, maps, and rare books. Needless to say, this embodiment does not limit the input means input to the data storage unit 3. Further, the input means may read information resources recorded on an arbitrary external recording medium such as a CD-ROM, DVD, MO, etc., for example, in addition to means for directly accepting content input, and accept them as input. The data storage unit 3 stores metadata 31 that is bibliographic information of each information resource, and the document text of the information resource or the content 32 itself. Alternatively, in light of the number of information resources to be searched, the desired processing speed, the capacity of the data storage unit 3, whether the text or content itself is also subject to free word search, etc. The document text or content itself 32 may be stored in the information resource search apparatus according to the present embodiment and another server apparatus (not shown) connected on the network via a communication line such as the Internet. As a description method of the metadata 31, an arbitrary method may be adopted. For example, the above-described Dublin Core method may be used, and in this case, title (resource title), creator (creator), Subject (the nature or genre of the topic included in the content of the resource), coverage (range or target of the resource, geographical division, time division, etc.), rights (statement of copyright, industrial property rights, etc.) All or some of the element types can be defined as metadata elements. The document body of the information resource or the content itself 32 may be stored as an HTML file, PDF file, text file, CSV file, etc. if it is a document, and other content such as photographs, maps, rare books, etc. It may be stored as image data or the like, and may be compressed by an arbitrary compression method according to the characteristics of the content as necessary.

本実施形態に係る情報資源検索装置は、ウェブ上の情報資源を検索するための検索条件を入力する入力部１と、入力部１に入力された検索条件に従い、本実施形態に係る検索処理を実行する検索実行部２と、検索実行部２により得られた検索結果を任意の出力装置に出力する出力部４と、検索実行部２において算出されたスコアに基づいて、データ記憶部３に記憶される情報資源のメタデータ３１を補完すべき新たなメタデータを自動生成するメタデータ自動生成部５とを具備する。 The information resource search apparatus according to the present embodiment performs an input unit 1 for inputting a search condition for searching for information resources on the web, and performs a search process according to the present embodiment according to the search condition input to the input unit 1. The search execution unit 2 to be executed, the output unit 4 that outputs the search result obtained by the search execution unit 2 to an arbitrary output device, and the score calculated in the search execution unit 2 are stored in the data storage unit 3 And an automatic metadata generation unit 5 for automatically generating new metadata to be supplemented with the metadata 31 of the information resource to be processed.

より詳細には、入力部１は、各情報資源の内容に記述された用語（以下、「索引語」という。）を検索するフリーワード検索用の検索フリーワードを入力する検索フリーワード入力部１１と、シソーラス体系上の木構造のディレクトリであるファセット上のノードに対応して定義されたキーワード（以下、「シソーラスキーワード」という。）であって後述するスコアリング処理において加点されるキーワードを選択入力する加点用シソーラスキーワード入力部１２と、検索結果として得られた情報資源のリストに対して絞り込み条件として与えられるシソーラスキーワードである絞込み条件シソーラスキーワードを選択入力する絞込み条件シソーラスキーワード入力部１３と、ユーザが注目する１つのキーワードを選択入力する注目キーワード入力部１４とを具備する。なお、入力部１において、例えば、検索フリーワード入力部１１、絞込み条件シソーラスキーワード入力部１３、注目キーワード入力部１４のいずれか１つ以上を省略した構成としてもよい。 More specifically, the input unit 1 is a search free word input unit 11 for inputting a search free word for searching for a word (hereinafter referred to as “index word”) described in the contents of each information resource. And keywords that are defined in correspondence with nodes on facets that are directories of a tree structure in the thesaurus system (hereinafter referred to as “thesaurus keywords”), and that are added in the scoring process described later. A thesaurus keyword input unit 12 for adding points, a narrowing condition thesaurus keyword input unit 13 for selectively inputting a narrowing condition thesaurus keyword that is a thesaurus keyword given as a narrowing condition to the list of information resources obtained as a search result, and a user Attention keyword to select and input one keyword to be noted Comprising a de input section 14. In the input unit 1, for example, one or more of the search free word input unit 11, the narrow-down condition thesaurus keyword input unit 13, and the attention keyword input unit 14 may be omitted.

検索実行部２は、検索用フリーワード入力部１１から入力された検索フリーワードに一致する、及び類似する索引語がメタデータ３１の一部として予め定義された、或いは代替的にこの索引語が検索時に抽出された情報資源を検索し、検索された情報資源のそれぞれについて、第１のスコアを算出するとともに、算出された第１のスコアを保持するフリーワード検索スコアリング部２１と、加点用シソーラスキーワード入力部１２から入力された加点用シソーラスキーワードに一致するキーワード、及び、ファセットの木構造上該キーワードに近い距離に位置するキーワードがメタデータ３１の一部として予め定義された情報資源を検索し、検索された情報資源のそれぞれについて、第２のスコアを算出するとともに、算出された第２のスコアを保持するキーワード検索スコアリング部２２と、フリーワード検索スコアリング部２１により算出された第１のスコアと、キーワード検索スコアリング部２２により算出された第２のスコアとを、各情報資源について合算するスコア合算部２３と、スコア合算部２３により算出された合算スコア（特許請求の範囲における「第３のスコア」に相当する。）の高い情報資源から順にソートするソート部２４と、絞込み条件シソーラスキーワード入力部１３から入力された絞込み条件に適合する情報資源のみを抽出してソート部２４に受け渡すフィルタリング処理部２５とを具備する。 The search execution unit 2 determines whether an index word that matches and is similar to the search free word input from the search free word input unit 11 is previously defined as a part of the metadata 31, or alternatively, the index word is The information resources extracted at the time of the search are searched, a first score is calculated for each of the searched information resources, and the free word search scoring unit 21 that holds the calculated first score, Search for an information resource in which a keyword that matches the point-added thesaurus keyword input from the thesaurus keyword input unit 12 and a keyword that is located at a distance close to the keyword on the facet tree structure is defined as a part of the metadata 31 in advance. For each searched information resource, a second score is calculated and the calculated second score is calculated. The keyword search scoring unit 22 that holds the information, the first score calculated by the free word search scoring unit 21, and the second score calculated by the keyword search scoring unit 22 are added together for each information resource. A score summation unit 23, a sorting unit 24 that sorts information resources in descending order of the summation score (corresponding to the “third score” in the claims) calculated by the score summation unit 23, and a narrow-down condition thesaurus A filtering processing unit 25 that extracts only information resources that meet the narrowing-down conditions input from the keyword input unit 13 and passes them to the sorting unit 24;

出力部３は、選択入力されたシソーラスキーワードに対応するシソーラス上のファセットの表示と、ソート部２４から出力される情報資源のメタデータに記述されたサブジェクト、作成者、タイトル等の書誌情報の全部又は一部と各情報資源へのリンク或いはそのＵＲＬ等をリスト表示するファセット対応表示部４１を具備する。 The output unit 3 displays the facets on the thesaurus corresponding to the selected thesaurus keywords, and all the bibliographic information such as subjects, creators, titles, etc. described in the metadata of the information resources output from the sort unit 24 Alternatively, it includes a facet correspondence display unit 41 that displays a list of a part and a link to each information resource or a URL thereof.

なお、本明細書において、シソーラスとは、ある概念に、どのような概念或いは単語が属しているかを体系的に示すデータベースであり、ファセットとは、このシソーラス体系上、概念のカテゴリーごとに、上位概念のノードと下位概念のノードとを階層型でリンクした木構造（ディレクトリ）をいう。図１２は、このファセットの一例を示す。最上位概念である「日本」は、その下位概念「関東」及び「関西」とリンクし、「関東」ノードは、その下位概念「東京都」及び「神奈川県」とリンクし、「東京都」ノードの下位には「中央区」ノードがリンクし、一方「関西」ノードは、その下位概念「大阪府」とリンクし、「大阪府」ノードの下位には、さらに「大阪市」ノードが、「大阪市」ノードの下位にはさらに「中央区」ノードがリンクしている。図１２に示されるこれらの各ノード名は、例えばダブリンコア方式によれば、ｃｏｖｅｒａｇｅｓｐａｔｉａｌ（資源の地理区分）の属性を有するメタデータ３１として、定義され、情報資源に付加され得る。１つの情報資源に対しては、複数のファセットが適用され得、例えば図１２に示す地理区分の他、歴史区分（ｃｏｖｅｒａｇｅｔｅｍｐｏｒａｌ）、資源タイプ（ｔｙｐｅ）、テーマ（ｓｕｂｊｅｃｔ）等の各ファセット木構造上いずれかに位置付けられる概念をメタデータ３１として定義してよい。 In this specification, a thesaurus is a database that systematically indicates what concepts or words belong to a certain concept, and a facet is a higher rank for each category of concepts in this thesaurus system. A tree structure (directory) in which concept nodes and subordinate concept nodes are linked in a hierarchical manner. FIG. 12 shows an example of this facet. The top level concept “Japan” is linked to its subordinate concepts “Kanto” and “Kansai”, and the “Kanto” node is linked to its subordinate concepts “Tokyo” and “Kanagawa”, and “Tokyo” The “Chuo-ku” node is linked to the lower level of the node, while the “Kansai” node is linked to its subordinate concept “Osaka Prefecture”, and the “Osaka City” node is further subordinate to the “Osaka Prefecture” node. A “Chuo-ku” node is further linked below the “Osaka City” node. Each of these node names shown in FIG. 12 is defined as metadata 31 having an attribute of coverage spatial (resource geographic division) and can be added to an information resource, for example, according to the Dublin Core system. A plurality of facets can be applied to one information resource. For example, in addition to the geographical divisions shown in FIG. 12, each facet tree structure such as a historical division, a resource type (type), a theme (subject), etc. A concept positioned in any of the above may be defined as the metadata 31.

図２は、図１に示すデータ記憶部３に格納されるメタデータ３１及び各情報資源のスコア算出用に使用されるワークエリアのレイアウトを例示的に示す。図２において、例えば「施設Ｘの案内」をタイトルとする文書情報資源は、シソーラス体系のファセット上位置するノード名（キーワード）として、「テーマ」ファセット（Ｆ１）については「歴史」、「時代」ファセット（Ｆ２）については空白（未定義）、「地理」ファセット（「地理区分」ファセット）（Ｆ３）については「東京千代田区」及び「横浜中区」、「機関」ファセット（Ｆ４）については「東京都庁」及び「神奈川県庁」がそれぞれメタデータ３１として定義され、さらに、検索フリーワード入力部１１から入力される検索フリーワードについての第１のスコアを効率よく算出するために、予め情報資源中の索引語（ｗｏｒｄＡ，Ｂ）と該索引語の当該情報資源内における重要度（Ｓ_ｉ１，Ｓ_ｉ２）がメタデータ３１として定義されている。索引語は、予め各情報資源に付与されてもよく、或いは自動的に抽出されてもよい。代替的に、各情報資源についての索引語及び各索引語の重要度は、フリーワード検索スコアリング部２１における第１のスコア算出時に得られてもよい。図２において、ワークエリアは、検索フリーワード入力部１１に入力された入力フリーワードの格納領域２３１、加点用シソーラスキーワード入力部１２及び／又は絞込み条件用シソーラスキーワード入力部１３から入力されたシソーラスキーワードの格納領域２３３、フリーワード及びキーワードが入力される毎に、当該情報資源の第１のスコア及び／又は第２のスコアを算出し、両者の和を更新して得られるスコアの格納領域２３５を備える。シソーラスキーワード格納領域２３３は、メタデータ３１で予め定義されたファセットの各キーワードに対応してセルが設けられ（Ｆ１「テーマ」、Ｆ２「時代」、Ｆ３「地理」、Ｆ４「機関」）、各セル毎に加点用として入力されたか絞込み条件として入力されたかを示すフィールドを備えてよい。ワークエリアは、メタデータ３１に連続する領域に配置されてもよいが、代替的に他の一時的記憶領域例えばＲＡＭやキャッシュメモリ等に設けられてもよい。 FIG. 2 exemplarily shows the layout of the work area used for calculating the metadata 31 and the score of each information resource stored in the data storage unit 3 shown in FIG. In FIG. 2, for example, a document information resource whose title is “Guidance for Facility X” is a node name (keyword) located on a facet of the thesaurus system, and “history” and “era” for the “theme” facet (F1). For facet (F2), blank (undefined), for “geography” facet (“geographic division” facet) (F3) for “Tokyo Chiyoda-ku” and “Yokohama Naka-ku”, for “institution” facet (F4) “Tokyo Metropolitan Government Office” and “Kanagawa Prefectural Office” are defined as metadata 31, respectively, and in order to efficiently calculate the first score for the search free word input from the search free word input unit 11, the index word (word a, B) and importance within the information resource of the index word _{_(S i1,} S _i2) metadata 31 To have been defined. The index word may be assigned to each information resource in advance or may be automatically extracted. Alternatively, the index word for each information resource and the importance of each index word may be obtained when the first score is calculated in the free word search scoring unit 21. In FIG. 2, the work area is a thesaurus keyword input from the input free word storage area 231 input to the search free word input unit 11, the scoring thesaurus keyword input unit 12 and / or the refinement condition thesaurus keyword input unit 13. Storage area 233, each time a free word and a keyword are input, a first score and / or a second score of the information resource is calculated, and a sum storage area 235 obtained by updating the sum of the two is used. Prepare. The thesaurus keyword storage area 233 is provided with cells corresponding to each keyword of facets defined in advance in the metadata 31 (F1 “theme”, F2 “era”, F3 “geography”, F4 “institution”), You may provide the field which shows whether it input as an object for point addition, or as a narrowing-down condition for every cell. The work area may be arranged in an area continuous with the metadata 31, but may alternatively be provided in another temporary storage area such as a RAM or a cache memory.

＜本実施形態に係る検索処理の処理フロー＞
図３ないし図８は、本実施形態において検索実行部２が行なう情報資源検索処理の詳細処理手順を示す。 <Processing flow of search processing according to this embodiment>
3 to 8 show the detailed processing procedure of the information resource search process performed by the search execution unit 2 in this embodiment.

図３は、本実施形態における情報資源検索装置が行なう検索処理及び検索結果表示処理の概略を示すフローチャートである。図３において、入力部１から検索実行部２に入力検索条件が出力されると、検索実行部２は、この入力された検索条件の種別を判断し（ステップＳ１）、検索フリーワードであれば、検索フリーワードの文字列Ｓをフリーワード検索スコアリング部２１に入力し、或いは入力されたフリーワードを変更し（ステップＳ２）、加点用シソーラスキーワードであれば、加点に用いるキーワード集合Ｗに入力されたシソーラスキーワードを追加、或いは削除し（ステップＳ３）、絞込み条件シソーラスキーワードであれば、絞込み条件に用いるキーワード集合Ｎに入力されたシソーラスキーワードを追加、或いは削除する（ステップＳ８）。入力が空でない場合（ステップＳ４Ｎ）、本実施形態に係る検索処理を実行し（ステップＳ５）、検索処理により得られた検索結果である情報資源のリストを表示出力し（ステップＳ６）、ユーザが所望する情報資源が得られたかあるいは検索処理の終了操作（例えば検索エンジンアプリケーションの終了）を行なうまで（ステップＳ７Ｙ）、ステップＳ１からステップＳ６の処理を繰り返す（ステップＳ７Ｎ）。 FIG. 3 is a flowchart showing an outline of search processing and search result display processing performed by the information resource search apparatus according to this embodiment. In FIG. 3, when an input search condition is output from the input unit 1 to the search execution unit 2, the search execution unit 2 determines the type of the input search condition (step S1), and if it is a search free word Then, the character string S of the search free word is input to the free word search scoring unit 21 or the input free word is changed (step S2), and if it is a thesaurus keyword for addition, it is input to the keyword set W used for addition The added thesaurus keyword is added or deleted (step S3), and if it is a refinement condition thesaurus keyword, the thesaurus keyword input to the keyword set N used for the refinement condition is added or deleted (step S8). When the input is not empty (step S4N), the search process according to the present embodiment is executed (step S5), and a list of information resources that are search results obtained by the search process is displayed and output (step S6). The processing from step S1 to step S6 is repeated (step S7N) until the desired information resource is obtained or until a search processing end operation (for example, end of the search engine application) is performed (step S7Y).

図４は、図３のステップＳ５において実行される検索処理の手順を示すフローチャートである。図４において、絞込み条件が入力されている場合、処理対象である情報資源（例えば文書、以下「文書」として例示する。）ｉが入力された絞込み条件に適合しているか、フィルタリング処理部２５において判断する（ステップＳ５１）。ステップＳ５１における絞込み条件の処理の詳細は、図５及び図６を参照して後述する。入力された絞込み条件に文書ｉが適合している場合（ステップＳ５２Ｙ）、フリーワード或いは加点用シソーラスキーワードのいずれかが入力されているか否か判断され、いずれかが入力された場合（ステップＳ５３Y）、文書ｉの得点を計算し（ステップＳ５４）、検索結果リストに文書ｉを追加する（ステップＳ５５）。ステップＳ５４におけるスコアリング処理の詳細は、図７を参照して後述する。ステップＳ５３において、フリーワード或いは加点用シソーラスキーワードのいずれも入力されていない場合（ステップＳ５３Ｎ）、ステップＳ５５に進む。ステップＳ５１からステップＳ５５の処理を、全文書を処理し終えるまで繰り返す（ステップＳ５６）。 FIG. 4 is a flowchart showing the procedure of the search process executed in step S5 of FIG. In FIG. 4, when the narrowing condition is input, whether the information resource (for example, document, hereinafter referred to as “document”) i to be processed is suitable for the input narrowing condition or not in the filtering processing unit 25. Judgment is made (step S51). Details of the narrowing-down condition processing in step S51 will be described later with reference to FIGS. When the document i matches the input narrowing condition (step S52Y), it is determined whether either a free word or a thesaurus keyword for addition is input, and when either is input (step S53Y). The score of the document i is calculated (step S54), and the document i is added to the search result list (step S55). Details of the scoring process in step S54 will be described later with reference to FIG. In step S53, when neither a free word nor a thesaurus keyword for adding points is input (step S53N), the process proceeds to step S55. The processing from step S51 to step S55 is repeated until all the documents are processed (step S56).

図５は、図４のステップＳ５１における絞込み条件の処理の詳細を示すフローチャートである。図５において、絞込み条件シソーラスキーワードが入力されているか否かが判断され（ステップＳ５１１）、絞込み条件シソーラスキーワードが入力されている場合（ステップＳ５１１Ｙ）、文書ｉがファセットｆの絞込み条件に適合しているか否か判断され（ステップＳ５１２）、適合している場合（ステップＳ５１３Ｙ）、全てのファセットが絞込み条件に適合しているか判断され終わるまでステップＳ５１２からステップＳ５１３の処理を繰り返し（ステップＳ５１４）、文書ｉは入力された絞込み条件に適合しているものとし、検索結果候補とする（ステップＳ５１５）。ステップＳ５１１において絞込み条件キーワードが入力されていない場合（ステップＳ５１１Ｎ）、ステップＳ５１５に進む。ステップＳ５１３において文書ｉがファセットｆの絞込み条件に適合していない場合（ステップＳ５１３Ｎ）、文書ｉは入力された絞込み条件に適合していないものとし、検索結果候補から削除する（ステップＳ５１６）。 FIG. 5 is a flowchart showing details of the narrowing-down condition processing in step S51 of FIG. In FIG. 5, it is determined whether or not a narrowing condition thesaurus keyword is input (step S511). If a narrowing condition thesaurus keyword is input (step S511Y), the document i matches the narrowing condition of facet f. (Step S512), and if it matches (step S513Y), the processing from step S512 to step S513 is repeated until it is determined whether all facets meet the narrowing conditions (step S514), and the document It is assumed that i matches the input narrowing condition and is a search result candidate (step S515). If no narrowing condition keyword is input in step S511 (step S511N), the process proceeds to step S515. If the document i does not conform to the narrowing condition of facet f in step S513 (step S513N), it is determined that the document i does not conform to the input narrowing condition and is deleted from the search result candidates (step S516).

図６は、図５のステップＳ５１２における文書ｉがファセットｆの絞込み条件に適合しているか否かの判断処理の詳細を示すフローチャートである。図６において、絞込み条件キーワードＮの中にファセットｆに属するものがあるか否かが判断され（ステップＳ５１２１）、ある場合には（ステップＳ５１２１Ｙ）、絞込み条件キーワードＮの中からファセットｆに属するキーワード集合Ｘを抽出し（ステップＳ５１２２）、文書ｉがキーワードＸ_ｊに関係ないか否かが判断され、関係ない場合には（ステップＳ５１２３Ｙ）、キーワード集合Ｘに属する他のキーワードについても、文書ｉが関係ないか否かを繰り返し判断し（ステップＳ５１２４）、いずれのキーワードとも関係ない場合には（ステップ５１２４Ｙ）、文書ｉはファセットｆの絞込み条件に適合していないものと判断する（ステップＳ５１２５）。ステップＳ５１２１において、絞込み条件キーワードＮの中にファセットｆに属するものがない場合には（ステップＳ５１２１Ｎ）、文書ｉはファセットｆの絞込み条件に適合しているものと判断する（ステップＳ５１２６）。ステップＳ５１２３において、文書ｉがキーワードＸｊに関係あると判断された場合には（ステップＳ５１２３Ｎ）、ステップＳ５１２６に進む。 FIG. 6 is a flowchart showing details of the determination process of whether or not the document i in step S512 of FIG. 5 meets the narrowing condition of the facet f. In FIG. 6, it is determined whether or not there is a narrowing condition keyword N belonging to facet f (step S5121). If there is any (step S5121Y), a keyword belonging to facet f from narrowing condition keyword N is determined. extracting a set X (step S5122), whether the document i is not related to the keyword _{X j} is determined, and if not relevant (step S5123Y), for other keywords belonging to keyword set X, document i is Whether or not there is a relationship is repeatedly determined (step S5124), and if it is not related to any keyword (step 5124Y), it is determined that the document i does not conform to the narrowing condition of facet f (step S5125). In step S5121, if there is no narrowing condition keyword N belonging to facet f (step S5121N), it is determined that document i conforms to the narrowing condition of facet f (step S5126). If it is determined in step S5123 that the document i is related to the keyword Xj (step S5123N), the process proceeds to step S5126.

図７は、図４のステップＳ５４における文書ｉのスコアリング処理の詳細を示すフローチャートである。図７において、入力された検索条件がフリーワードであるか、或いは加点用シソーラスキーワードであるかが判断され、フリーワードが入力された場合は（ステップＳ４５１Y）、フリーワード文字列Ｓによる文書ｉのスコアａ_ｉを算出する（ステップＳ５４２）。ステップＳ５４２におけるフリーワードによるスコアリング処理（第１のスコア算出処理）の詳細は、図８を参照して後述する。ステップＳ５４１において、検索フリーワードが入力されていない場合には（ステップＳ５４１Ｎ）、ステップＳ５４６に進む。ステップＳ５４３において、加点用シソーラスキーワードが入力された場合は（ステップＳ５４３Ｙ）、加点用シソーラスキーワードＷ_ｊによる文書ｉのスコアｂ_ｉｊを算出し（ステップＳ５４４）、すべての入力された加点用シソーラスキーワードについてスコアを算出するまでステップＳ５４４の処理を繰り返す（ステップＳ５４５）。すべての入力された加点用シソーラスキーワードについてスコアが算出されると（ステップＳ５４５Y）、ステップ５４６に進む。ステップＳ５４３において、加点用シソーラスキーワードが入力されていない場合には（ステップＳ５４３Ｎ）、ステップＳ５４６に進む。ステップＳ５４６において、ステップＳ５４２において算出された第１のスコアとステップＳ５４４において算出された第２のスコアが合算され、文書ｉのスコアｃ_ｉ＝ａ_ｉ＋Σｂ_ｉｊとして算出される。 FIG. 7 is a flowchart showing details of the scoring process for document i in step S54 of FIG. In FIG. 7, it is determined whether the input search condition is a free word or a thesaurus keyword for adding points. If a free word is input (step S451Y), the document i of the free word character string S is searched. Score a _i is calculated (step S542). Details of the free word scoring process (first score calculation process) in step S542 will be described later with reference to FIG. In step S541, when a search free word is not input (step S541N), the process proceeds to step S546. In step S543, if the point addition Thesaurus keyword is input (step S543Y), it added point Thesaurus keyword _{W j} calculates a score _{b ij} of document i by (step S544), for all of the input point addition Thesaurus Keyword The process of step S544 is repeated until the score is calculated (step S545). When scores are calculated for all the added thesaurus keywords for input (step S545Y), the process proceeds to step 546. In step S543, when the point addition thesaurus keyword is not input (step S543N), the process proceeds to step S546. In step S546, the first score calculated in step S542 and the second score calculated in step S544 are added together, and the score of the document i is calculated as c _i = a _i + Σb _ij .

＜本実施形態に係る検索フリーワードに基づく第１のスコア算出処理詳細及び索引語と重要度との事前登録処理＞
フリーワード検索スコアリング部２１により実行される第１のスコア算出処理は、検索フリーワード入力部１１に入力された検索フリーワードと、検索対象の情報資源との関連度を数値化してスコアとする。入力される検索フリーワードは、例えば日本語や英語等の自然言語で表現された自由キーワードのリスト、又は句読点を含む自然文により指定される。 <Details of first score calculation process based on search free word and pre-registration process of index word and importance>
The first score calculation process executed by the free word search scoring unit 21 quantifies the degree of association between the search free word input to the search free word input unit 11 and the information resource to be searched, and uses it as a score. . The search free word to be input is designated by a list of free keywords expressed in a natural language such as Japanese or English, or a natural sentence including punctuation marks.

図８は、図７のステップＳ５４２におけるフリーワードによるスコアリング処理（第１のスコア算出処理）の詳細を示すフローチャートである。図８において、検索フリーワードの文字列から検索語を切り出して単語集合Ｑを作成し（ステップＳ５４２１）、文書ｉの第１のスコアａｉを０と初期化した後（ステップＳ５４２２）、第１のスコアａ_ｉ＋＝文書ｉにおけるＱ_ｋの重みＳ_ｉｊとする（ステップＳ５４２３）。すべての検索後に対する重みを加算し終わるまでステップＳ５４２３の処理を繰り返す（ステップＳ５４２４）。なお、ステップＳ５４２１において、文書を処理する毎に単語集合Ｑを作成する必要はない。 FIG. 8 is a flowchart showing details of the scoring process (first score calculation process) using free words in step S542 in FIG. In FIG. 8, the search term is cut out from the character string of the search free word to create the word set Q (step S5421), the first score ai of the document i is initialized to 0 (step S5422), and the first Score a _i + = Q _k weight S _ij in document i (step S5423). The process of step S5423 is repeated until the weights for all after the search are added (step S5424). In step S5421, it is not necessary to create the word set Q every time a document is processed.

第１のスコア算出の基礎数値となる、メタデータ３１としての索引語の重要度の算出には、任意の手法が使用され得るが、例えばベクトル空間モデル（ＶｅｃｔｏｒＳｐａｃｅＭｏｄｅｌ：ＶＳＭ）によりスコアを算出する手法や、関連性の重ね合わせモデル（Ｒｅｌａｖａｎｃｅ−ｂａｓｅｄＳｕｐｅｒｉｍｐｏｓｉｔｉｏｎＭｏｄｅｌ：ＲＳモデル）によりスコアを算出する手法が、本実施形態に係るフリーワード検索スコアリング部２１に実装され得る。 Although any method can be used to calculate the importance of the index word as the metadata 31 that is a basic numerical value for the first score calculation, for example, the score is calculated by a vector space model (VSM). And a technique for calculating a score using a relevance-based superposition model (RS model) can be implemented in the free word search scoring unit 21 according to the present embodiment.

図９は、ベクトル空間モデルを使用した場合の、第１のスコア算出のための情報資源内の索引語及び重要度の算出及びメタデータ３１への登録の処理手順を示すフローチャートである。図９において、文書ｉの本文或いはメタデータ３１から索引語となるべき単語を切り出して、単語集合Ｔを作成する（ステップＳ９２）。単語集合Ｔに属する単語Ｔｋの重要度を、文書ｉ内での単語Ｔｋの出願回数及び全文書中で単語Ｔｋを含む文書の数を引数として、ｔｆｉｄｆ（ｔｅｒｍｆｒｅｑｕｅｎｃｙｉｎｖｅｒｓｅｄｏｃｕｍｅｎｔｆｒｅｑｕｅｎｃｙ）法によって算出し（ステップＳ９３）、全ての索引語に対する重要度を算出し終えるまでステップＳ９３の処理を繰り返し（ステップＳ９４）、さらに全ての文書についての重要度の算出を終えるまでステップＳ９２からステップＳ９４の処理を繰り返す（ステップＳ９５）。重要度が計算された単語のうち、重要度が大きかったものから、１つ或いは複数の単語を、索引語として、対応する重要度とともに、図２に示すように、メタデータ３１中にスコアリング用フリーワード及びこれと対をなす重要度として登録する。ｔｆｉｄｆ法とは、当該文書中にどれだけの頻度でその単語が出現するか、その単語が他の文書でどれだけ出現しないか、を考慮して単語の重要度を計算する方法であり、文書ｄjにおける単語ｔi（ｉ＝ａ，…，ｎ）の重要度Ｄ_ｊ，ｉは次式（１）に示される。
FIG. 9 is a flowchart showing a processing procedure for calculating the index word and importance in the information resource for calculating the first score and registering it in the metadata 31 when the vector space model is used. In FIG. 9, a word set T is created by cutting out words to be index words from the text of document i or metadata 31 (step S92). The importance of the word Tk belonging to the word set T is calculated by the tfidf (term frequency inverse document frequency) method with the number of applications of the word Tk in the document i and the number of documents including the word Tk in all documents as arguments. (Step S93), the processing of Step S93 is repeated until the importance level for all index words is calculated (Step S94), and the processing of Step S92 to Step S94 is repeated until the calculation of the importance level for all documents is completed. (Step S95). From the words of which importance is calculated, one or a plurality of words having the highest importance is scored in the metadata 31 as shown in FIG. 2 together with the corresponding importance as index words. Register as a free word and the degree of importance paired with it. The tfidf method is a method for calculating the importance of a word in consideration of how often the word appears in the document and how often the word does not appear in other documents. The importance D _{j, i} of the word ti (i = a,..., n) in dj is expressed by the following equation (1).

ここで、ｔｃ_ｊ，ｉは文書ｄjにおける単語ｔiの出現回数、ｔｆ_ｊ，ｉ＝ｔｃ_ｊ，ｉ／ｍａｘ（ｔｃ_ｊ，ｉ）、ｄｆ_ｉは全文書中での単語ｔ_ｉの出現文書数、ＤＮは全文書数とする。 Here, tc _{j, i} is the number of occurrences of the word ti in the document dj, tf _{j, i} = tc _{j, i} / max (tc _{j, i} ), and df _i is the number of occurrences of the word t _{i in} all the documents. , DN is the total number of documents.

図１０は、ＲＳモデルを使用した場合の、第１のスコア算出のための情報資源内の索引語及び重要度の算出及びメタデータ３１への登録の処理手順を示すフローチャートである。ＲＳモデルとは、ベクトル空間モデルの文書検索において、同一キーワードを含むなどの関達性に基づいて文書ｄｊを分類して文書クラスタＣ_ｋ（ｋ＝Ａ，…，Ｎ）を作成して、該文書クラスタＣｋの特徴を表す代表ベクトルｒ_ｋ（ｋ＝Ａ，…，Ｎ）を生成し、さらに該代表ベクトルｒ_ｋを用いて文書ベクトルＤ_ｊを補正するモデリング手法である。例えば、上記特許文献２（特開２００４−３１０１９９号公報）は、このＲＳモデルを用いた文書分類手法を開示する。尚、ここで、文書クラスタとは、キーワードによる意味的なまとまりを持っている文書の集合体で、同一トピックを有するものとする。 FIG. 10 is a flowchart illustrating a processing procedure for calculating the index word and importance in the information resource for calculating the first score and registering in the metadata 31 when the RS model is used. The RS model is a document search of a vector space model, which classifies documents dj based on the accessibility such as including the same keyword to create a document cluster C _k (k = A,..., N) representative vector r _k representing features of document clusters _{Ck (k = a, ...,} N) generates a further modeling technique for correcting the document vector D _j by using the surrogate table vector r _k. For example, Patent Document 2 (Japanese Unexamined Patent Application Publication No. 2004-310199) discloses a document classification method using this RS model. Here, the document cluster is a collection of documents having a semantic group by keywords, and has the same topic.

図１１（ａ）は、ＲＳモデルを具体的に説明する模式図である。図１１（ａ）において、２つのキーワードＡ及びＢが文書ｄ１，…，ｄ５に存在している場合を示しており、キーワードＡを含む文書ｄ_ｊは文書クラスタＣ_Ａに、キーワードＢを含む文書ｄ_ｊは文書クラスタＣ_Ｂに属し、また、キーワードＡ及びＢをともに含む文書ｄ_ｊは文書クラスタＣ_ＡとＣ_Ｂの両方に属している。即ちＲＳモデルでは、非排他的な文書分類を行えるようになっており、一つの文書ｄ_ｊが複数のキーワード（トピック）に跨っている状況を、複数の文書クラスタＣ_ｋに属しているという形で表現可能となる。そして、作成された文書クラスタＣ_Ａ、Ｃ_Ｂに含まれる文書ｄ_ｊの文書ベクトルＤ_ｊの例えば二乗平均平方根（Ｒｏｏｔ−Ｍｅａｎ−Ｓｑｕａｒｅ：ＲＭＳ、遠心力平均ともいう。）をそれぞれ計算して、文書クラスタＣ_Ａ及びＣ_Ｂの代表ベクトルｒ_ａ及びｒ_ｂを生成する。ここで、文書クラスタＣ_ｋの代表ベクトルｒ_ｋは、次式（２）で表される。尚、代表ベクトルｒ_ｋは文書クラスタＣ_ｋの特徴量を表すもので、文書ベクトルＤ_ｊと同じ空間内の特徴ベクトルであり、文書ベクトルＤ_ｊと同数の次元を持つ。
FIG. 11A is a schematic diagram for specifically explaining the RS model. In FIG. 11 (a), 2 single keyword A and B are documents d1, ..., shows a case present in d5, the document _{d j} containing the keyword A in the document cluster _{C A,} documents containing keywords B d _j belongs to the document cluster C _B , and the document d _j including both the keywords A and B belongs to both the document clusters C _A and C _B. Form that is in the RS model, we are able to perform a non-exclusive document classification, a situation where one of the document d _j is over a plurality of keyword (topic), that belongs to more than one document cluster C _k It can be expressed with. Then, for example, root-mean-square (RMS) is also calculated for each of the document vectors D _j of the documents d _j included in the created document clusters C _A and C _B , respectively. Generate representative vectors r _a and r _b for document clusters C _A and C _B. Here, the representative vector r _k of the document cluster C _k is expressed by the following equation (2). Incidentally, the representative vector r _k not represent the characteristics of the document cluster C _k, a feature vector in the same space as the document vector D _j, with the same number of dimensions and the document vector D _j.

ここで、ｒ_ｋ，ｉは代表ベクトルｒ_ｋの第ｉ要素、｜Ｃ_ｋ｜は文書クラスタＣ_ｋに含まれる文書数、Ｄ_ｊ，ｉは文書ｄ_ｊの文書ベクトルＤ_ｊの第ｉ要素である。 Here, r _{k, i} is the i-th element of the representative vector r _k , | C _k | is the number of documents included in the document cluster C _k , D _{j, i} is the i-th element of the document vector D _j of the document d _j is there.

次に、図１１（ｂ）に示すように、この代表ベクトルｒ_ｋを用いて各文書ｄ_ｊの文書ベクトルＤ_ｊを補正するが、これは、文書ｄ_ｊが属するすべての文書クラスタＣ_ｋの代表ベクトルｒ_ｋのＲＭＳと、文書ベクトルＤ_ｊとを要素ごとに比較して、前者が大きければ文書ベクトルＤ_ｊの新たな要素として置換するもので、次式（３）により表される。
Next, as shown in FIG. 11B, this representative vector r _k is used to correct the document vector D _j of each document d _j , which is the same for all document clusters C _k to which the document d _j belongs. and RMS representative vector r _k, is compared with the document vector D _j for each element, if the former intended to replace as new elements of the document vector D _j, is represented by the following equation (3).

ここで、Ｓ_ｊ，ｉは補正ベクトル、Ｃ（ｄ_ｊ）は、文書ｄ_ｊが属する文書クラスタ、｜Ｃ（ｄ_ｊ）｜は文書ｄ_ｊが属する文書クラスタの数である。 Here, S _{j, i} is a correction vector, C (d _j ) is a document cluster to which the document d _j belongs, and | C (d _j ) | is the number of document clusters to which the document d _j belongs.

ＲＳモデルを使用することにより、文書ｄ_ｊが本来備えていた特徴量だけでなく、キーワードを同一にする文書クラスタＣ_ｋの特徴量も加味して、文書ベクトルＤ’_ｊの値を算出することができる。 By using the RS model, the value of the document vector D ′ _j is calculated in consideration of not only the feature quantity originally provided in the document d _j but also the feature quantity of the document cluster C _k having the same keyword. Can do.

図１０に戻り、全文書について、切り出された全ての索引語の重要度（文書ベクトルＤ_ｊ）を、例えば図９に示すｔｆｉｄｆ法を用いて算出する（ステップＳ１０２）。次に、文書を図１１（ａ）に示すように、トピック（文書クラスタ）により分類する（ステップＳ１０２）。ここでは、１つの文書が複数のトピック（クラスタ）に属していてもよい。すべてのトピック（クラスタ）について、索引語の重要度（代表ベクトル）を算出する（ステップＳ１０４）。代表ベクトルの算出には、例えば上記のＲＭＳが用いられてよい。算出された代表ベクトルによって、文書ベクトルを補正する（ステップＳ１０５）。補正された文書ベクトルＤ’_ｊに基づき、既に算出された重要度が補正された単語のうち、重要度が大きかったものから、１つ或いは複数の単語を索引語として、重要度とともに、図２に示すように、メタデータ３１中にスコアリング用フリーワード及び重要度として登録する。 Returning to FIG. 10, the importance (document vector D _j ) of all the index words extracted for all documents is calculated using, for example, the tfidf method shown in FIG. 9 (step S102). Next, as shown in FIG. 11A, the documents are classified by topic (document cluster) (step S102). Here, one document may belong to a plurality of topics (clusters). The importance (representative vector) of the index word is calculated for all topics (clusters) (step S104). For example, the above RMS may be used for calculating the representative vector. The document vector is corrected based on the calculated representative vector (step S105). Based on the corrected document vector D ′ _j , one or a plurality of words from the words having the highest importance among the already calculated importance corrected are used as index words together with the importance as shown in FIG. As shown in FIG. 2, the free word for scoring and the importance are registered in the metadata 31.

なお、図９又は図１０の索引語重要度算出処理において、処理対象とされる文書は、自然文或いはキーワードリストであることを要し、予め情報資源に書誌情報として付与されたメタデータ３１の一部又は全部を処理対象としてもよく、代替的に、文書本文を処理対象としてもよい。 In the index word importance calculation processing of FIG. 9 or FIG. 10, the document to be processed needs to be a natural sentence or a keyword list, and the metadata 31 previously given as bibliographic information to the information resource. A part or the whole may be a processing target, or alternatively, a document body may be a processing target.

＜本実施形態に係る加点用シソーラスキーワードに基づく第２のスコア算出処理詳細＞
キーワード検索スコアリング部２２により実行される第２のスコア算出処理は、加点用シソーラスキーワード入力部１２に選択入力された加点用シソーラスキーワードと、検索対象の情報資源について予め定義されたスコアリング用シソーラスキーワードとの間の、シソーラスのファセット木構造上における距離を算出し、算出された距離に応じて重み付けされた一致度ないし関連度を数値化してスコアとする。入力される加点用シソーラスキーワードは、例えばディスプレイ装置上選択的に複数表示され得るシソーラスのファセット木構造上で、所望のノードを選択することにより、指定され得る。 <Details of second score calculation process based on scoring thesaurus keywords according to this embodiment>
The second score calculation process executed by the keyword search scoring unit 22 includes a scoring thesaurus keyword selected and input to the scoring thesaurus keyword input unit 12 and a scoring thesaurus defined in advance for the information resource to be searched. The distance between the keywords on the facet tree structure of the thesaurus is calculated, and the degree of coincidence or relevance weighted according to the calculated distance is digitized to obtain a score. The added thesaurus keywords to be inputted can be designated by selecting a desired node on a facet tree structure of a thesaurus that can be selectively displayed on a display device, for example.

キーワード検索スコアリング部による第２のスコア算出処理は、以下の手順で実行される。 The second score calculation process by the keyword search scoring unit is executed according to the following procedure.

・まず、第１のスコア算出処理（図８の手順による処理）の結果集合をＤ₀とする。検索フリーワードが指定されず、結果集合が空だった場合、Ｄ₀は、注目キーワード入力部１４から入力される注目条件で指定されたキーワードを付与されている情報資源の全集合とする。 First, let D _{0 be} the result set of the first score calculation process (process according to the procedure of FIG. 8). When the search free word is not specified and the result set is empty, D ₀ is set to the entire set of information resources to which the keyword specified by the attention condition input from the attention keyword input unit 14 is assigned.

・αは、シソーラス体系上のファセット（語彙表）の特性に応じた値を設定することが好ましく、例えば２とする。
Α is preferably set to a value corresponding to the characteristics of the facet (lexical table) in the thesaurus system, for example, 2.

・ファセット単位のhop数制限はデータ投入時に設定ファイルに記述しておくことが好ましく、例えば、１とする。 -It is preferable to describe the limit on the number of hops per facet in the configuration file when data is input.

・加点処理をしても資源の集合の元（構成要素である資源）は増減しないものとする。 -Even if point addition processing is performed, the source of the set of resources (resources that are constituent elements) does not increase or decrease.

例えば、図１２に示すファセット構造上、「東京都」が加点用シソーラスキーワードとして入力された場合、図１３から理解されるように、「東京都」及びこの「東京都」の下位に位置する「中央区」のスコアリング用シソーラスキーワードがメタデータ３１として付与された文書には、得点αが加算され、「関東」、「神奈川県」及び「横浜市」のスコアリング用シソーラスキーワードがメタデータ３１として付与された文書には、得点αβが加算され、「日本」、「関西」、「大阪府」、「大阪市」及びこの「大阪市」の下位に位置付けられる「中央区」のスコアリング用シソーラスキーワードがメタデータ３１として付与された文書には、得点αβ^２が加算される。 For example, in the facet structure shown in FIG. 12, when “Tokyo” is input as a thesaurus keyword for scoring, as understood from FIG. 13, “Tokyo” and “Tokyo” A score α is added to a document to which the scoring thesaurus keyword “Chuo-ku” is assigned as the metadata 31, and the scoring thesaurus keywords “Kanto”, “Kanagawa”, and “Yokohama” are added to the metadata 31. The score is added to the document given as “Japan”, “Kansai”, “Osaka Prefecture”, “Osaka City”, and “Chuo Ward”, which is positioned under this “Osaka City”. A score αβ ² is added to a document to which a thesaurus keyword is assigned as metadata 31.

代替的に、キーワード検索スコアリング部２２における第２のスコア加算処理において、ファセットごとに、シソーラスのファセット木構造上の距離によるスコアの算出方法及び／又は加点するか否かを設定可能に構成してもよい。 Alternatively, in the second score addition process in the keyword search scoring unit 22, for each facet, it is possible to set the score calculation method by the distance on the facet tree structure of the thesaurus and / or whether or not to add points. May be.

＜本実施形態に係るスコア合算処理詳細＞
図１３は、図１２に示すファセット構造上で、「東京都」が加点用シソーラスキーワードとして選択入力され、さらに単語「Ａ」及び「Ｃ」が検索フリーワードとして入力された場合の各文書について、スコア合算部２３により算出される得点を示す。文書１には、スコアリング用シソーラスキーワードとして地理区分「日本」及び時代区分「２１世紀」が、スコアリング用フリーワードとして「Ａ」（重要度Ｓ_１１）及び「Ｃ」（重要度Ｓ_１３）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書１の合算スコアは、αβ^２＋Ｓ_１１＋Ｓ_１３となる。文書２には、スコアリング用シソーラスキーワードとして地理区分「東京都」及び時代区分「２０世紀」が、スコアリング用フリーワードとして「Ｂ」（重要度Ｓ_２２）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書２の合算スコアは、αとなる。文書３には、スコアリング用シソーラスキーワードとして地理区分「（東京都の下位に位置する）中央区」及び時代区分「１９世紀」が、スコアリング用フリーワードとして「Ａ」（重要度Ｓ_３１）及び「Ｂ」（重要度Ｓ_３２）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書３の合算スコアは、α＋Ｓ_３１となる。文書４には、スコアリング用シソーラスキーワードとして地理区分「関東」及び時代区分「２０世紀」が、スコアリング用フリーワードとして「Ｂ」（重要度Ｓ_４２）及び「Ｃ」（重要度Ｓ_４３）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書４の合算スコアは、αβ＋Ｓ_４３となる。文書５には、スコアリング用シソーラスキーワードとして地理区分「横浜市」及び時代区分「１９世紀」が、スコアリング用フリーワードとして「Ｃ」（重要度Ｓ_５３）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書５の合算スコアは、αβ＋Ｓ_５３となる。文書６には、スコアリング用シソーラスキーワードとして地理区分「（大阪市の下位に位置する）中央区」及び時代区分「２１世紀」が、スコアリング用フリーワードとして「Ａ」（重要度Ｓ_６１）が、それぞれメタデータ３１として定義されているものとすると、スコア合算部２３が算出する文書１の合算スコアは、αβ^２＋Ｓ_６１となる。 <Details of score summation processing according to this embodiment>
FIG. 13 shows each document in the case where “Tokyo” is selected and inputted as a scoring thesaurus keyword and words “A” and “C” are inputted as search free words on the facet structure shown in FIG. The score calculated by the score summation unit 23 is shown. The document 1 includes a geography classification “Japan” and an age classification “21st century” as scoring thesaurus keywords, and “A” (importance S ₁₁ ) and “C” (importance S ₁₃ ) as scoring free words. Are defined as metadata 31, the combined score of the document 1 calculated by the score adding unit 23 is αβ ² + S ₁₁ + S ₁₃ . In the document 2, the geography division “Tokyo” and the era division “20th century” are defined as the scoring thesaurus keywords, and “B” (importance S ₂₂ ) is defined as the metadata 31 as the scoring free words. The score of the document 2 calculated by the score summation unit 23 is α. Document 3 includes a geography classification “Chuo-ku (located below Tokyo)” and a period classification “19th century” as scoring thesaurus keywords, and “A” (importance S ₃₁ ) as a scoring free word. Assuming that “B” (importance S ₃₂ ) is defined as the metadata 31, the combined score of the document 3 calculated by the score adding unit 23 is α + S ₃₁ . The document 4 includes a geography classification “Kanto” and a period classification “20th century” as scoring thesaurus keywords, and “B” (importance S ₄₂ ) and “C” (importance S ₄₃ ) as free words for scoring. Are defined as metadata 31, the combined score of the document 4 calculated by the score adding unit 23 is αβ + S ₄₃ . The document 5 defines the geographic classification “Yokohama City” and the period classification “19th century” as scoring thesaurus keywords, and the metadata 31 as the scoring free word “C” (importance S ₅₃ ). The score of the document 5 calculated by the score summation unit 23 is αβ + S ₅₃ . The document 6 includes the geography classification “Chuo-ku (located below Osaka City)” and the period classification “21st century” as scoring thesaurus keywords, and “A” (importance S ₆₁ ) as the scoring free word. Are defined as metadata 31, the combined score of the document 1 calculated by the score adding unit 23 is αβ ² + S ₆₁ .

＜本実施形態に係る絞込み条件シソーラスキーワードによるフィルタリング処理詳細＞
フィルタリング処理部２５により実行されるフィルタリング処理においては、合算スコアが算出された文書のうち、絞込み条件シソーラスキーワード入力部１３に選択入力された絞込み条件シソーラスキーワードに適合するスコアリング用シソーラスキーワードがメタデータ３１として定義された文書のみが抽出され、ソート部２４に出力される。入力される絞込み条件シソーラスキーワードは、例えばディスプレイ装置上選択的に複数表示され得るシソーラスのファセット木構造上の所望のノードを選択することにより、指定され得る。 <Details of Filtering Processing by Refinement Condition Thesaurus Keyword According to this Embodiment>
In the filtering process executed by the filtering processing unit 25, scoring thesaurus keywords that match the filtering condition thesaurus keyword selected and input to the filtering condition thesaurus keyword input unit 13 among the documents for which the combined score has been calculated are metadata. Only the document defined as 31 is extracted and output to the sorting unit 24. The narrow-down condition thesaurus keywords to be input can be specified by selecting a desired node on the facet tree structure of the thesaurus that can be selectively displayed on the display device, for example.

フィルタリング処理部２５によるフィルタリング処理は、以下の手順で実行される。 The filtering process by the filtering processor 25 is executed in the following procedure.

・絞り込み条件として指定されたキーワードｋ_iが付与されている資源の集合をＫ_iとする。絞り込み結果集合Ｄ₂は以下の式で定義される。 A set of resources to which a keyword k _i specified as a narrowing condition is assigned is K _i . Refine result set D ₂ is defined by the following equation.

Ｄ₂＝Ｄ₁∩（Ｋ₀∪Ｋ₁…）∩（Ｋ_a∪…）∩（Ｋ_b∪…）…
ここで、括弧でくくった集合は同一ファセットに属するキーワードの資源集合である。 D ₂ = D ₁ ∩ (K ₀ ∪K ₁ ...) ∩ (K _a ∪ ...) ∩ (K _b ∪ ...) ...
Here, the parenthesized set is a keyword resource set belonging to the same facet.

・絞込み条件に適合しても、情報資源の得点は変化しない。・ The score of information resources does not change even if the conditions are met.

図１４は、図１２に示すファセット構造上で、「東京都」が加点用シソーラスキーワードとして選択入力され、単語「Ａ」及び「Ｃ」が検索フリーワードとして入力され、さらに絞込み条件シソーラスキーワードとして「２０世紀」及び「２１世紀」が選択入力された場合の各文書について、フィルタリング処理部２５が抽出する文書を示す。フィルタリング処理部２５は、同一ファセット内の絞込み条件が複数入力された場合、ＯＲ条件（論理和）とみなし、異なるファセット間の絞込み条件が複数入力された場合、ＡＮＤ条件（論理積）とみなして、フィルタリング処理を実行する。図１４において、図１３で合算スコアが算出された文書１から文書６のうち、スコアリングシソーラスキーワードとして時代区分「１９世紀」がメタデータ３１として定義されている文書３及び文書５は、そのスコアがクリアされ、ソート部２４には出力されない。なお、絞込み条件に適合した文書１、２、４、６についても、スコアは加算されない。 In FIG. 14, “Tokyo” is selected and inputted as a thesaurus keyword for scoring on the facet structure shown in FIG. 12, words “A” and “C” are inputted as search free words, and “ For each document when “20th century” and “21st century” are selected and input, the document extracted by the filtering processing unit 25 is shown. The filtering processing unit 25 regards an OR condition (logical sum) when a plurality of narrowing conditions within the same facet are input, and assumes an AND condition (logical product) when a plurality of narrowing conditions between different facets are input. Execute the filtering process. 14, among the documents 1 to 6 for which the combined score is calculated in FIG. 13, the documents 3 and 5 in which the age classification “19th century” is defined as the metadata 31 as the scoring thesaurus keyword Is cleared and is not output to the sorting unit 24. Note that no score is added to the documents 1, 2, 4, and 6 that conform to the narrowing-down conditions.

図１５は、図１２に示すファセット構造上で、単語「Ａ」及び「Ｃ」が検索フリーワードとして入力され、絞込み条件シソーラスキーワードとして「東京都」、「大阪府」、「２０世紀」及び「２１世紀」が選択入力された場合の各文書について、フィルタリング処理部２５が抽出する文書を示す。フィルタリング処理部２５は、絞込み条件として指定されたシソーラスキーワードの下位概念も、絞込み条件に該当するものとみなして、フィルタリング処理を実行する。図１５において、図１３で合算スコアが算出された文書１から文書６のうち、スコアリングシソーラスキーワードとして地理区分「東京都」及び時代区分「２０世紀」がメタデータ３１として定義されている文書２、及びスコアリングシソーラスキーワードとして地理区分「（大阪府の下位に位置する）中央区」及び時代区分「２１世紀」がメタデータ３１として定義されている文書６が、絞込み条件に合致する文書として抽出され、その他の文書１，３，４，５は、絞込み条件シソーラスキーワードに適合するスコアリング用シソーラスキーワードが定義されていないため抽出されない。文書２の合算スコアは、入力された検索フリーワード「Ａ」、「Ｃ」に一致するスコアリング用フリーワードが定義されていないため、０となる。文書６の合算スコアは、スコアリング用フリーワードとして定義されている単語「Ａ」についての重要度Ｓ_６１となる。 In FIG. 15, the words “A” and “C” are input as search free words on the facet structure shown in FIG. 12, and “Tokyo,” “Osaka,” “20th century” and “ For each document when “21st century” is selected and input, the document extracted by the filtering processing unit 25 is shown. The filtering processing unit 25 executes the filtering process by regarding that the subordinate concept of the thesaurus keyword specified as the narrowing-down condition is also applicable to the narrowing-down condition. 15, among the documents 1 to 6 for which the combined score is calculated in FIG. 13, the document 2 in which the geographic classification “Tokyo” and the period classification “20th century” are defined as the metadata 31 as the scoring thesaurus keywords. , And as a scoring thesaurus keyword, the document 6 in which the geographic classification “Chuo-ku (located under Osaka Prefecture)” and the period classification “21st century” is defined as the metadata 31 is extracted as a document that matches the filtering condition. The other documents 1, 3, 4, and 5 are not extracted because a scoring thesaurus keyword that matches the narrow-down condition thesaurus keyword is not defined. The total score of the document 2 is 0 because the scoring free words that match the input search free words “A” and “C” are not defined. The total score of the document 6 is the importance S ₆₁ for the word “A” defined as the free word for scoring.

代替的に、スコア合算部２３におけるスコア合算処理において、検索結果の出力に引き続いて、入力シソーラスキーワードが入力された場合、新たに入力されたキーワードのスコアを、現在保持する合算スコアに加算するか、現在保持する合算スコア（或いはその一部）と置き換えるか、を設定可能に構成してもよい。この場合、好適には、ファセットとの関係でルールを規定し、例えば同一ファセット内で複数のシソーラスキーワードが選択された場合には、現在保持する合算スコアを置換え、異なるファセットで、新たなキーワードが選択された場合には、現在保持する合算スコアに加算するよう構成してもよい。 Alternatively, in the score summation process in the score summation unit 23, if an input thesaurus keyword is input subsequent to the output of the search result, the score of the newly input keyword is added to the currently stored summation score. Alternatively, it may be possible to set whether the total score (or a part thereof) currently held is replaced. In this case, a rule is preferably defined in relation to the facet. For example, when a plurality of thesaurus keywords are selected in the same facet, the summed score currently held is replaced, and a new keyword is replaced with a different facet. If selected, it may be configured to be added to the currently held total score.

＜本実施形態におけるユーザーインターフェース構成例＞
図１６ないし図２３は、本実施形態に係る情報資源検索装置の入力部１及び出力部４が提供するユーザーインターフェースの例示的構成を示す。 <User interface configuration example in this embodiment>
16 to 23 show exemplary configurations of user interfaces provided by the input unit 1 and the output unit 4 of the information resource search apparatus according to the present embodiment.

図１６は、本実施形態に係る情報資源検索装置が表示出力する検索初期画面の一例を示す。 FIG. 16 shows an example of an initial search screen displayed and output by the information resource search apparatus according to this embodiment.

図１６において、左上段に表示される検索文字列入力欄１６４は、検索フリーワード入力部１１の一構成例であり、左中段に表示されるファセットキーワード選択入力欄１６１は、加点用シソーラスキーワード入力部１２、絞込み条件シソーラスキーワード入力部１３、及び注目キーワード入力部１４の一構成例である。図１６の検索初期画面においては、シソーラス体系上のファセット「テーマ」が省略時に注目キーワードとして選択されている。複数のファセット「テーマ」、「時代区分」、「地理区分」、「機関名」、「資源タイプ」について、それぞれタブ１６５が表示されており、所望のタブを選択し、当該ファセットに属するキーワードを表示された階層構造上で選択入力することにより、加点用シソーラスキーワード或いは絞込み条件シソーラスキーワードの入力を、複数ファセットに跨り同時に指示することができる。図１６において、右欄は、ファセット対応表示部４１の一構成例であり、右上段に表示されるリスト１６２は、注目キーワード「テーマ」自体に関連する情報資源のリストを示し、該当文書が存在しないため、空リストとして表示され、右下段に表示されるリスト１６３は、注目キーワード「テーマ」の下位概念である「人文科学」に属する情報資源を例えば３件リスト表示する。同様に、「社会科学」に属する情報資源が、続けてリスト表示されている。各リスト内においては、好適には、例えば合算スコアの高い順に情報資源をソートして出力し、同点のスコアを有する情報資源間では、更新日時が新しい情報資源を優先して表示する。 In FIG. 16, a search character string input field 164 displayed on the upper left is an example of the configuration of the search free word input unit 11, and a facet keyword selection input field 161 displayed on the left middle is a thesaurus keyword input for adding points. This is a configuration example of the unit 12, the narrow-down condition thesaurus keyword input unit 13, and the attention keyword input unit 14. In the initial search screen of FIG. 16, the facet “theme” on the thesaurus system is selected as the keyword of interest when omitted. Tabs 165 are displayed for each of a plurality of facets “theme”, “period division”, “geographic division”, “institution name”, and “resource type”, and a desired tab is selected and keywords belonging to the facet are selected. By selecting and inputting on the displayed hierarchical structure, it is possible to simultaneously instruct the input of a thesaurus keyword for addition or the narrow-down condition thesaurus keyword across a plurality of facets. In FIG. 16, the right column is an example of the configuration of the facet correspondence display unit 41, and the list 162 displayed in the upper right column shows a list of information resources related to the attention keyword “theme” itself, and the corresponding document exists. Therefore, the list 163 displayed as an empty list and displayed in the lower right column displays a list of, for example, three information resources belonging to “humanities”, which is a subordinate concept of the keyword of interest “theme”. Similarly, information resources belonging to “social science” are continuously displayed in a list. In each list, for example, information resources are preferably sorted and output in descending order of the total score, and information resources having a new update date are preferentially displayed among information resources having the same score.

図１７は、図１６において、「北海道の昔の様子を知りたい。」と考えて、加点用シソーラスキーワードとして「北海道」１７１を選択入力した場合の画面の一例を示す。「現在の加点・絞込み条件」欄１７５には、ファセット「地理区分」「日本」「北海道」が加点用シソーラスキーワードとして指定されたことが表示される。右上段に表示されるリスト１７２は、加点用シソーラスキーワード「北海道」及びその下位概念が、スコアリング用キーワードとして定義されている情報資源をリスト表示する。右中段に表示されるリスト１７３は、加点用シソーラスキーワード「北海道」及びその下位概念を、定義されたスコアリング用キーワードに含まないが、例えば「北海道」と同一ファセットである「地理区分」に属するキーワードがスコアリング用キーワードとして定義されている情報資源を、検索条件に合致しないが関連するものとして「関連」欄にリスト表示する。「関連」欄にリスト表示される情報資源には、「北海道」と同一ファセットである「地理区分」に属するキーワードが定義されているため、「北海道」から定義されたキーワードまでの木構造上の距離に応じた重みを乗じた第２のスコアが算出されている。リスト１７２及びリスト１７３のいずれにも属さない情報資源は、「未分類」に分類され、右下段に表示されるリスト１７４にリスト表示される。 FIG. 17 shows an example of a screen in the case where “Hokkaido” 171 is selected and input as a scoring thesaurus keyword, considering that “I want to know the state of Hokkaido in the past” in FIG. In the “current score / squeezing condition” column 175, it is displayed that facets “geography”, “Japan”, and “Hokkaido” are designated as the thesaurus keywords for score addition. The list 172 displayed in the upper right column displays a list of information resources in which the scoring thesaurus keyword “Hokkaido” and its subordinate concepts are defined as scoring keywords. The list 173 displayed in the middle right section does not include the scoring thesaurus keyword “Hokkaido” and its subordinate concepts in the defined scoring keywords, but belongs to “geographic division” which is the same facet as “Hokkaido”, for example. Information resources whose keywords are defined as scoring keywords are listed in the “Related” column as relevant although they do not meet the search conditions. The information resources listed in the “Related” column have keywords that belong to the “Geographic division” that is the same facet as “Hokkaido”. A second score obtained by multiplying the weight according to the distance is calculated. Information resources that do not belong to any of the list 172 and the list 173 are classified as “unclassified” and are displayed in a list 174 displayed in the lower right.

なお、各リスト内に表示される情報資源の件数は、任意の数でよいが、好適には、リスト１７２、１７３及び１７４に表示される全件数のリストが大量の画面スクロールを要することなく、１画面の表示領域内に実質的に収まる範囲内の数とする。 The number of information resources displayed in each list may be any number, but preferably, the list of all the numbers displayed in the lists 172, 173, and 174 does not require a large amount of screen scrolling. The number is within a range substantially within the display area of one screen.

図１８は、図１７において、北海道の昔の様子を知るため、次にテーマを選択したいと考えたが、どのキーワードが適切なのか、表示された木構造のファセット一覧では判断ができず、注目キーワードとして「テーマ」１８１のファセット自体を指定した場合の画面の一例を示す。図１８において、ファセット「テーマ」に属する下位概念キーワードごとの文書を、キーワード「人文科学」及びその下位概念に属するスコアリング用キーワードが定義された情報資源のリスト１８３、キーワード「社会科学」及びその下位概念に属するスコアリング用キーワードが定義された情報資源のリスト１８４、キーワード「芸術」及びその下位概念に属するスコアリングキーワードが定義された情報資源のリスト１８５が、それぞれ表示されている。 In FIG. 18, in order to know the old state of Hokkaido in FIG. 17, we wanted to select the next theme. However, it is difficult to determine which keyword is appropriate from the displayed facet list of the tree structure. An example of a screen when the facet itself of “theme” 181 is designated as a keyword is shown. In FIG. 18, a document for each subordinate concept keyword belonging to the facet “theme” includes a list of information resources 183 in which the keyword “human science” and scoring keywords belonging to the subordinate concept are defined, the keyword “social science”, and the A list 184 of information resources in which scoring keywords belonging to the subordinate concepts are defined, and a list 185 of information resources in which the keyword “art” and scoring keywords belonging to the subordinate concepts are defined are displayed.

図１９は、図１８において、「人文科学」リストに表示された情報資源の書誌情報、例えばタイトルを見て、「写真が見られるので面白そう。」と考え、右上段に表示されたシソーラスキーワード「人文科学」１８２をクリックした場合の画面の一例を示す。クリックされたシソーラスキーワード「人文科学」は、絞込み条件シソーラスキーワードとして入力され、キーワード「人文科学」及びその下位概念に属するスコアリング用キーワードが定義された情報資源がリスト表示される。 FIG. 19 shows the bibliographic information of information resources displayed in the “Humanities” list, for example, the title in FIG. 18 and thinks that “It looks interesting because you can see a picture”, and the thesaurus keyword displayed in the upper right column. An example of a screen when the “humanities” 182 is clicked is shown. The clicked thesaurus keyword “Humanities” is input as a narrow-down condition thesaurus keyword, and information resources in which the keyword “Humanities” and scoring keywords belonging to its subordinate concepts are defined are displayed in a list.

図２０は、図１９において、「歴史学」リスト１９２に表示された情報資源の書誌情報、例えばタイトルを見て、「写真は歴史学にあるようだ。」と考え、右中段に表示されたシソーラスキーワード「歴史学」１９１をクリックした場合の画面の一例を示す。クリックされたシソーラスキーワード「歴史学」は、絞込み条件シソーラスキーワードとして入力され、キーワード「歴史学」及びその下位概念に属するスコアリング用キーワードが定義された情報資源のみが、リスト２０２にリスト表示される。図２０において、「関連」リスト２０３には、絞込み条件「歴史学」には適合しないが、「歴史学」と同一ファセット「テーマ」に属するキーワードがスコアリング用キーワードとして定義された情報資源がリスト表示され、さらに「未分類」リスト２０４には、リスト２０２及びリスト２０３のいずれにも分類されなかった情報資源がリスト表示される。リスト２０２と同一画面内に配置される「関連」リスト２０３に、絞込み条件に適合はしないが関連する情報資源が表示されるため、例えばリスト２０３の２件目に表示された情報資源（スコアリング用シソーラスキーワードとして「北海道」、「地理学」、「江戸時代」が定義されているものとする）に注目し、閲覧することができる。「現在の加点・絞込み条件」表示欄は、第１のファセット上「テーマ」「人文科学」「歴史学」が絞り込み条件として、第２のファセット上「地理区分」「日本」「北海道」が加点用シソーラスキーワードとして、それぞれ指定されている状態を示す。 FIG. 20 shows the bibliographic information of information resources displayed in the “history” list 192 in FIG. 19, for example, the title, and it is displayed in the middle right, thinking that “the photo seems to be in history”. An example of a screen when the thesaurus keyword “history” 191 is clicked is shown. The clicked thesaurus keyword “history” is input as a narrow-down condition thesaurus keyword, and only information resources in which the keyword “history” and scoring keywords belonging to its subordinate concepts are defined are listed in the list 202. . In FIG. 20, the “related” list 203 includes information resources in which keywords that belong to the same facet “theme” as “history” are defined as scoring keywords, but do not match the filtering condition “history”. In addition, the “uncategorized” list 204 displays information resources that are not classified in either the list 202 or the list 203. In the “relevant” list 203 arranged in the same screen as the list 202, information resources that do not match the filtering condition but are related are displayed. For example, the information resource (scoring) displayed in the second item in the list 203 is displayed. "Hokkaido", "Geography", and "Edo period" are defined as thesaurus keywords. In the “Current score / restriction conditions” display column, “Theme”, “Humanities” and “History” on the first facet are refined, and “Geography”, “Japan” and “Hokkaido” are added on the second facet. Indicates the state specified as the thesaurus keyword for each.

図２０において、「歴史学」は絞込み条件シソーラスキーワードとして入力されているが、絞込み条件シソーラスキーワードに適合する情報資源のリスト２０２の表示画面内下部に、適合はしないが関連する情報資源が、関連度（類似度）の高い順にソートされて「関連」リスト２０３に出力される。このため、絞込み条件シソーラスキーワードに適合する検索結果が０件である場合にも、検索ミスとはならず、常に関連する情報資源を把握することができる。 In FIG. 20, “history” is input as a refinement condition thesaurus keyword. However, in the lower part of the display screen of the information resource list 202 that conforms to the refinement condition thesaurus keyword, the relevant information resource that does not match is associated. Sorted in descending order of degree (similarity) and output to the “relevant” list 203. For this reason, even when there are no search results that match the narrow-down condition thesaurus keyword, a search error does not occur, and related information resources can always be grasped.

図２１は、図１６に示す初期画面において、「環境汚染について調べたい。」と考え、左上欄のファセット選択入力欄でファセット「テーマ」２１１内の「環境学」を、加点用シソーラスキーワードとして選択入力した場合の画面の一例を示す。右上段のリスト２１４には、入力されたカテゴリー「テーマ＞自然科学＞環境学」の表示２１３と共に、キーワード「環境学」及びその下位概念に属するスコアリング用キーワードが定義された情報資源がリスト表示され、右中段の「関連」リスト２１５には、加点条件「環境学」には適合しないが、「環境学」と同一ファセット「テーマ」に属するキーワードがスコアリング用キーワードとして定義された情報資源がリスト表示される。 FIG. 21 considers “I want to investigate environmental pollution” on the initial screen shown in FIG. 16, and selects “Environmentology” in the facet “theme” 211 in the facet selection input field in the upper left column as the scoring thesaurus keyword. An example of a screen when input is shown. The list 214 in the upper right column displays a list of information resources in which the keyword “environmental science” and scoring keywords belonging to its subordinate concepts are defined, together with the display 213 of the input category “theme> natural science> environmental science”. In the “relevant” list 215 in the middle right, there are information resources in which keywords that do not conform to the additional condition “environmental science” but that belong to the same facet “theme” as “environmentology” are defined as scoring keywords A list is displayed.

図２２は、図２１において、「シソーラス上の『環境学』の指定だけでは範囲が広すぎる。」と考え、検索文字列欄２２１に検索フリーワード「環境汚染」を入力した場合の画面の一例を示す。右上段のリスト２２２は、図２１のリスト２１４と比較して理解されるように、スコアリング用フリーワード「環境汚染」が定義された情報資源及びこれに類似する情報資源の有するスコアに、第１のスコアが加算され、合算スコアの高い順に再ソートされた情報資源をリスト表示する。概して、シソーラス上のキーワードは、比較的大まかな検索条件指定に、フリーワードは、具体的な検索条件指定に有効であるため、双方の条件指定を随時併用することにより、検索効率が向上する。 FIG. 22 shows an example of a screen when the search free word “environmental pollution” is entered in the search character string column 221 in FIG. 21, considering that “the range is too wide just by specifying“ Environment ”on the thesaurus”. Indicates. The list 222 in the upper right column shows, as understood in comparison with the list 214 in FIG. 21, the score of the information resource in which the scoring free word “environmental pollution” is defined and information resources similar thereto. A score of 1 is added, and information resources that have been re-sorted in descending order of the combined score are displayed in a list. In general, keywords on the thesaurus are effective for specifying a relatively rough search condition, and free words are effective for specifying a specific search condition. Therefore, the search efficiency is improved by using both of the condition specifications as needed.

図２３は、図２２において、「日本以外の環境汚染について調べたい。」と考え、左上欄のファセット選択入力欄でファセット「地理区分」２３１内の「海外」を、加点用シソーラスキーワードとして追加入力した場合の画面の一例を示す。フリーワード検索用の検索文字列への追加により「海外」に想到する条件を指定することは困難であり、この場合は、ファセット選択入力欄からの選択入力が有効である。右上段のリスト２３３は、図２２のリスト２２２と比較して理解されるように、キーワード「海外」が定義された情報資源、及び「海外」が属するファセット「地理区分」に属するいずれかのキーワードが定義された情報資源の有するスコアに、第２のスコアが加算され、合算スコアの高い順に再ソートされた情報資源をリスト表示する。リスト２３３から理解されるように、より検索目的に適合する、中国やアジアの環境汚染に関する情報資源がリスト表示される。この他、図２３の右最上段に「資源タイプで絞込み」と表示されており、フリーワード検索によっては困難な、例えば「論文」、「研究者のホームページ」、「研究データ」等の分類である「資源タイプ」ファセットの検索条件を直接指定することができる。 FIG. 23 thinks that “I want to investigate environmental pollution other than Japan” in FIG. 22, and additionally inputs “overseas” in facet “geographic division” 231 as a thesaurus keyword for adding in the facet selection input field in the upper left column. An example of the screen in the case of having been done. It is difficult to specify a condition that leads to “overseas” by adding to a search character string for free word search. In this case, selection input from the facet selection input field is effective. The list 233 in the upper right column is an information resource in which the keyword “overseas” is defined and any keyword that belongs to the facet “geographic division” to which “overseas” belongs, as understood from the list 222 in FIG. Is added to the score of the defined information resource, and the information resources rearranged in descending order of the total score are displayed in a list. As understood from the list 233, information resources related to environmental pollution in China and Asia that are more suitable for the search purpose are displayed in a list. In addition, “Filter by resource type” is displayed at the top right of FIG. 23, and it is difficult to search by free word search, for example, “paper”, “researcher homepage”, “research data”, etc. Search conditions for a certain “resource type” facet can be specified directly.

代替的に、検索条件入力時に選択するシソーラス体系のファセット上のノードと、出力部４が表示出力する際に基準となるノードとは、独立に指定可能であってよい。例えば、検索条件入力時において、「テーマ」ファセットの「心理学」ノードを、入力シソーラスキーワードとして選択入力した場合、検索結果を、検索キーとして「心理学」を基準として、「心理学」とその下位ノードのリスト、関連リスト、未分類リストに分けて分類表示してもよく、又は、同じ「テーマ」ファセット内の別のノードを独立に選択入力させ、この別のノードを基準に上記の分類表示をしてもよく、或いは、年代別、場所別、資源タイプ別等、別ファセット中のノードを基準に分類表示してもよい。 Alternatively, the node on the facet of the thesaurus system that is selected when the search condition is input and the node that becomes the reference when the output unit 4 performs display output may be independently specified. For example, when inputting the search conditions, if the “psychology” node of the “theme” facet is selected and input as an input thesaurus keyword, the search result is “psychology” and its You may categorize and display in a subordinate list, related list, or unclassified list, or you can select and input another node in the same “theme” facet independently, and the above classification based on this other node. You may display, or you may classify and display on the basis of nodes in different facets, such as by age, by location, by resource type.

＜本実施形態に係る情報資源検索装置のハードウエア構成＞
図２４は、本実施形態に係る情報資源検索装置のハードウエア構成を例示的に示すブロック図である。図２４に示されるコンピュータ装置１１０である情報資源検索装置において、ＣＰＵ１１１は、ＲＯＭ１１４および／またはハードディスクドライブ１１６に格納されたプログラムに従い、ＲＡＭ１１５を一次記憶用ワークメモリとして利用して、システム全体を制御する。さらに、ＣＰＵ１１１は、マウス１１２ａまたはキーボード１１２を介して入力される利用者の指示に従い、ハードディスクドライブ１１６に格納されたプログラムに基づき、本実施形態に係る情報資源検索処理を実行する。ディスプレイインタフェイス１１３には、ＣＲＴやＬＣＤなどのディスプレイが接続され、ＣＰＵ１１１が実行する情報資源検索処理の入力待ち受け画面、処理経過や検索結果、リストから選択された情報資源の内容などが表示される。リムーバブルメディアドライブ１１７は、主に、リムーバブルメディアからハードディスクドライブ１１６へファイルを書き込んだり、ハードディスクドライブ１１６から読み出したファイルをリムーバブルメディアへ書き込む場合に利用される。リムーバブルメディアとしては、フロッピディスク(ＦＤ)、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、ＤＶＤ−ＲＯＭ、ＤＶＤ−Ｒ、ＤＶＤ−Ｒ／Ｗ、ＤＶＤ−ＲＡＭやＭＯ、あるいはメモリカード、ＣＦカード、スマートメディア、ＳＤカード、メモリスティックなどが利用可能である。 <Hardware Configuration of Information Resource Retrieval Device According to this Embodiment>
FIG. 24 is a block diagram exemplarily showing a hardware configuration of the information resource search apparatus according to this embodiment. In the information resource search apparatus which is the computer apparatus 110 shown in FIG. 24, the CPU 111 controls the entire system by using the RAM 115 as a work memory for primary storage in accordance with programs stored in the ROM 114 and / or the hard disk drive 116. . Furthermore, the CPU 111 executes information resource search processing according to the present embodiment based on a program stored in the hard disk drive 116 in accordance with a user instruction input via the mouse 112 a or the keyboard 112. The display interface 113 is connected to a display such as a CRT or LCD, and displays an input standby screen for information resource search processing executed by the CPU 111, processing progress and search results, the contents of the information resource selected from the list, and the like. . The removable media drive 117 is mainly used when writing a file from the removable medium to the hard disk drive 116 or writing a file read from the hard disk drive 116 to the removable medium. Removable media include floppy disk (FD), CD-ROM, CD-R, CD-R / W, DVD-ROM, DVD-R, DVD-R / W, DVD-RAM and MO, memory card, CF Cards, smart media, SD cards, memory sticks, etc. can be used.

プリンタインタフェイス１１８には、レーザビームプリンタやインクジェットプリンタなどのプリンタが接続される。ネットワークインタフェイス１１９は、コンピュータ装置をネットワークへ接続するためのインターフェースである。 A printer such as a laser beam printer or an ink jet printer is connected to the printer interface 118. The network interface 119 is an interface for connecting a computer device to a network.

なお、上記各実施形態に係る情報資源検索装置における入力装置は、マウス１１２ａあるいはキーボード１１２に限定されることなく、任意のポインティングデバイス、例えばトラックボール、トラックパッド、タブレットなどを適宜用いることができる。携帯情報端末を上記各実施形態に係る情報資源検索装置として用いる場合には、入力部をボタンやモードダイヤル等で構成してもよい。 Note that the input device in the information resource search device according to each of the above embodiments is not limited to the mouse 112a or the keyboard 112, and any pointing device such as a trackball, a trackpad, or a tablet can be used as appropriate. When the portable information terminal is used as the information resource search device according to each of the above embodiments, the input unit may be configured with a button, a mode dial, or the like.

また、図２４に示した上記各実施形態に係る情報資源検索装置のハードウエア構成は一例に過ぎず、その他の任意のハードウエア構成を用いることができることはいうまでもない。 In addition, the hardware configuration of the information resource search apparatus according to each of the embodiments shown in FIG. 24 is merely an example, and it is needless to say that any other hardware configuration can be used.

殊に、上記各実施形態に係る情報資源検索処理の全部又は一部は、上記コンピュータ端末装置１１０あるいはＰＤＡ等の携帯情報端末装置等によって実現されてもよく、コンピュータ端末装置等とサーバー装置とをＢｌｕｅｔｏｏｔｈ（登録商標）等の無線、あるいはインターネット（ＴＣＰ／ＩＰ）、公共電話網（ＰＳＴＮ）、統合サービス・ディジタル網（ＩＳＤＮ）等の有線通信回線で相互接続した、インターネットあるいは任意の周知のローカル・エリア・ネットワーク（ＬＡＮ）またはワイド・エリア・ネットワーク（ＷＡＮ）からなるネットワークシステムによって情報資源検索処理が実現されてもよい。例えば、クライアント装置が情報資源検索要求及びブラウジング表示のための入力部１及び出力部４を備え、入力部１を介して入力された検索条件を、検索実行部２、データ記憶部３及びメタデータ自動生成部５を実装するサーバ装置に対して送信し、サーバ装置は、検索条件を送信した、或いは他の指定された識別子のクライアント装置に対して、検索結果を送信し、クライアント装置において出力させてもよい。 In particular, all or part of the information resource search process according to each of the above embodiments may be realized by the computer terminal device 110 or a portable information terminal device such as a PDA. The Internet or any well-known local network connected via a wired communication line such as Bluetooth (registered trademark) wireless or the Internet (TCP / IP), public telephone network (PSTN), integrated service digital network (ISDN), etc. The information resource search process may be realized by a network system including an area network (LAN) or a wide area network (WAN). For example, the client apparatus includes an input unit 1 and an output unit 4 for information resource search request and browsing display, and a search condition input via the input unit 1 is set as a search execution unit 2, a data storage unit 3, and metadata. The server apparatus that implements the automatic generation unit 5 transmits the search result to the client apparatus having transmitted the search condition or other specified identifier, and causes the client apparatus to output the search result. May be.

以上のとおり、本実施形態によれば、第１のスコア及び第２のスコアを合算し、得られた合算スコア（第３のスコア）により情報資源をソートして、合算スコアの高い情報資源から順に上位に検索結果として表示し、さらに引き続く複数回の検索条件入力がされるごとに、常に保持する第１のスコア及び／又は第２のスコアを新たなスコアを加算することにより更新し、更新されたスコアに基づいて、検索結果のリストを更新表示する。従って、ウェブ上利用可能な情報資源に付与された書誌情報であるメタデータが不完全であっても、或いは習熟していないユーザにより不適切な検索キーワードが入力された場合であっても、検索効率を損なうことなく、常に検索結果を得ることができるとともに、大量の情報資源が検索結果として得られた場合であっても、より検索条件に適合する検索結果から順に得ることができる。 As described above, according to the present embodiment, the first score and the second score are added together, and the information resources are sorted by the obtained combined score (third score). The search results are displayed in the upper order in order, and the first score and / or the second score that is always maintained is updated by adding a new score each time a plurality of subsequent search condition inputs are performed, and updated. Based on the score, the search result list is updated and displayed. Therefore, even if the metadata that is bibliographic information assigned to information resources available on the web is incomplete or an inappropriate search keyword is input by an unskilled user, search is possible. A search result can always be obtained without impairing efficiency, and even when a large amount of information resources is obtained as a search result, it is possible to obtain a search result that is more suitable for the search condition.

また、入力された検索キーワードが情報資源に付与されたメタデータ上の記述に一致しない場合であっても、シソーラスのファセット木構造上の距離に基づいて、入力された検索キーワードに関連性を有する情報資源のリストが常に表示される。このため、入力した検索条件には適合しないが関連する情報資源を確実に得ることができ、この関連する情報資源に基づいて、当初入力した検索条件をより適切な検索条件に修正することができる。 In addition, even if the input search keyword does not match the description on the metadata attached to the information resource, the input search keyword is related based on the distance on the facet tree structure of the thesaurus. A list of information resources is always displayed. For this reason, it is possible to reliably obtain related information resources that do not match the input search conditions, and based on the related information resources, it is possible to correct the initially input search conditions to more appropriate search conditions. .

また、検索条件入力が複数回実行された場合に、検索条件入力がされるごとに、常に保持するスコアを新たなスコアを加算することにより更新し、更新されたスコアに基づいて、検索結果のリストを更新表示する。このため、検索漏れを生じさせることなく、追加的検索キーワードが入力されるごとに、確実に、より精度の高い検索結果を得ることが可能となる。 In addition, when search condition input is executed a plurality of times, each time the search condition is input, the score that is always held is updated by adding a new score, and based on the updated score, Refresh the list. For this reason, it is possible to reliably obtain a more accurate search result each time an additional search keyword is input without causing a search omission.

さらに、複数のファセット木構造を選択的に表示し、表示されたファセット木構造上の任意のノードの選択入力により、同時に、複数のファセットに跨る複数のキーワードを指定し得る。このため、シソーラス体系上の複数のファセットに属するキーワードを同時に検索条件として指定することが可能となり、複数のファセットに跨るキーワード指定のために新たなファセットを作成する必要がなく、ファセットのディレクトリ構造を単純化することができる。これにより、分類作業における一致度が向上し、ユーザにも理解しやすい分類基準が提供できる一方、多面的な検索条件の設定が可能となるとともに、概念や情報資源の特性を、複数の基本的な特性の組み合わせで記述可能となるので、シソーラス構築時には想定していなかった新たな概念にも柔軟に対応することができる。 Furthermore, a plurality of facet tree structures can be selectively displayed, and a plurality of keywords across a plurality of facets can be designated simultaneously by selecting and inputting an arbitrary node on the displayed facet tree structure. This makes it possible to simultaneously specify keywords belonging to multiple facets on the thesaurus as search criteria, eliminating the need to create new facets for specifying keywords across multiple facets, and creating a facet directory structure. It can be simplified. This improves the degree of matching in classification work and provides classification criteria that are easy for the user to understand. On the other hand, multi-faceted search conditions can be set, and the characteristics of concepts and information resources Because it is possible to describe with a combination of various characteristics, it is possible to flexibly cope with a new concept that was not assumed when the thesaurus was constructed.

本発明の範囲は、図示され記載された例示的な実施形態に限定されるものではなく、本発明が目的とするものと均等な効果をもたらすすべての実施形態をも含む。例えば、本発明は、ウェブ上で利用可能な学術情報資源等の情報資源に限定されることなく、例えば個人用のファイル管理システムにも容易に適用することが可能である。さらに、本発明の範囲は、請求項１により画される発明の特徴の組み合わせに限定されるものではなく、すべての開示されたそれぞれの特徴のうち特定の特徴のあらゆる所望する組み合わせによって画されうる。 The scope of the present invention is not limited to the illustrated and described exemplary embodiments, but includes all embodiments that provide the same effects as those intended by the present invention. For example, the present invention is not limited to information resources such as academic information resources available on the web, but can be easily applied to, for example, a personal file management system. Further, the scope of the present invention is not limited to the combination of features of the invention defined by claim 1 but can be defined by any desired combination of specific features among all the disclosed features. .

本発明の実施形態に係る情報資源検索装置の機能構成の一例を示すブロック図である。It is a block diagram which shows an example of a function structure of the information resource search apparatus which concerns on embodiment of this invention. 図１のメタデータ３１のデータ構造及びレイアウトの一例を示す模式図である。It is a schematic diagram which shows an example of the data structure and layout of the metadata 31 of FIG. 本発明の実施形態に係る情報資源検索装置が実行する情報資源検索処理の概略処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the schematic process sequence of the information resource search process which the information resource search apparatus which concerns on embodiment of this invention performs. 図３のステップＳ５における検索処理の詳細処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process sequence of the search process in FIG.3 S5. 図４のステップＳ５１における、文書が入力絞込み条件に適合しているか否かの判断処理の詳細処理手順の一例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of a detailed processing procedure of determination processing for determining whether or not a document conforms to an input narrowing condition in step S 51 of FIG. 4. 図５のステップＳ５１２における、文書がファセットの絞込み条件に適合してるか否かの判断処理の詳細処理手順の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of a detailed processing procedure of determination processing for determining whether or not a document conforms to a facet narrowing condition in step S 512 of FIG. 5. 図４のステップＳ５４におけるスコアリング処理の詳細処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process sequence of the scoring process in step S54 of FIG. 図７のステップＳ５４２における検索フリーワードによる文書のスコア算出処理の詳細処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process sequence of the score calculation process of the document by the search free word in step S542 of FIG. スコアリング用フリーワード及びその重みをベクトル空間モデルにより算出する場合の、スコアリング用フリーワード抽出及び登録処理の詳細処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process sequence of the free word for scoring, and a registration process in the case of calculating the free word for scoring and its weight by a vector space model. スコアリング用フリーワード及びその重みをＲＳモデルにより算出する場合の、スコアリング用フリーワード抽出及び登録処理の詳細処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process sequence of the free word for scoring, and a registration process in the case of calculating the free word for scoring and its weight by RS model. ＲＳモデルによる文書ベクトルの算出及び補正処理を説明する模式図である。It is a schematic diagram explaining the calculation and correction process of the document vector by RS model. シソーラス上のファセットの一例を示す模式図である。It is a schematic diagram which shows an example of the facet on a thesaurus. 図１２において「東京都」を加点用キーワードとし、「Ａ」及び「Ｃ」を検索フリーワードとした場合のスコア算出の一例を説明する図である。FIG. 13 is a diagram for explaining an example of score calculation in the case where “Tokyo” is a keyword for addition in FIG. 12 and “A” and “C” are search free words. 図１２において「東京都」を加点用キーワードとし、「Ａ」及び「Ｃ」を検索フリーワードとし、「２０世紀」及び「２１世紀」を絞り込みキーワードとした場合のスコア算出の一例を説明する図である。FIG. 12 is a diagram for explaining an example of score calculation when “Tokyo” is used as a keyword for addition, “A” and “C” are search free words, and “20th century” and “21st century” are narrowed down keywords. It is. 図１２において「東京都」「大阪府」「２０世紀」「２１世紀」を絞込み用キーワードとし、「Ａ」及び「Ｃ」を検索フリーワードとした場合のスコア算出の一例を説明する図である。FIG. 13 is a diagram for explaining an example of score calculation when “Tokyo”, “Osaka Prefecture”, “20th century” and “21st century” are used as narrowing keywords and “A” and “C” are used as search free words. . 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索初期画面の一例を示す図である。It is a figure which shows an example of the search initial screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の実施形態に係る情報資源検索プログラムを実行するコンピュータのディスプレイ上に表示出力される検索結果画面の一例を示す図である。It is a figure which shows an example of the search result screen displayed and output on the display of the computer which performs the information resource search program which concerns on embodiment of this invention. 本発明の各実施形態に係る情報資源検索装置のハードウエア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the information resource search apparatus which concerns on each embodiment of this invention.

Explanation of symbols

入力部１
検索実行部２
データ記憶部３
出力部４
メタデータ自動生成部５
検索フリーワード入力部１１
加点用シソーラスキーワード入力部１２
絞込み条件シソーラスキーワード入力部１３
注目キーワード入力部１４
フリーワード検索スコアリング部２１
キーワード検索スコアリング部２２
スコア合算部２３
ソート部２４
フィルタリング部２５
メタデータ３１
本文・コンテンツ３２
ファセット対応表示部４１ Input part 1
Search execution part 2
Data storage unit 3
Output unit 4
Automatic metadata generation unit 5
Search free word input part 11
Adding point thesaurus keyword input part 12
Refinement condition thesaurus keyword input part 13
Attention keyword input part 14
Free word search scoring part 21
Keyword search scoring unit 22
Score totaling section 23
Sorting section 24
Filtering section 25
Metadata 31
Body / Content 32
Faceted display 41

Claims

Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its A metadata storage unit for storing metadata including importance pairs;
An input unit for inputting an input thesaurus keyword and an input free word for searching the information resource;
For an information resource that includes the free word that matches the input free word in the metadata, a first score is calculated based on the degree of importance paired with the matching free word, and the first score is retained A first score calculation unit
A second score is obtained for an information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. And calculating and multiplying the second score by a weight β (β ≦ 1) corresponding to the distance on the tree structure for the information resource including the thesaurus keyword belonging to the other node in the facet in the metadata. A second score calculator that holds the second score as a second score;
For each information resource, a score summation unit that sums the first score and the second score to obtain a third score;
An information resource search apparatus comprising: an output unit that displays a list of information resources sorted in descending order of the third score.

The first score calculation unit holds the first score calculated by the input of the input free word when the input free word is input subsequent to the output of the search result by the output unit. Add to the first score to get the first score,
The information output device according to claim 1, wherein the output unit updates and displays the information resources sorted again in descending order of the third score.

When the input thesaurus keyword is input subsequent to the output of the search result by the output unit, the second score calculation unit holds the second score calculated by inputting the input thesaurus keyword. Add to the second score to get the second score,
3. The information resource search apparatus according to claim 1, wherein the output unit updates and displays information resources sorted again in descending order of the third score.

4. The input unit according to claim 1, wherein the input unit selectively displays a plurality of facet tree structures, and the nodes on the displayed tree structures can be simultaneously selected. 5. Information resource retrieval device.

The output unit includes an information resource including, in the metadata, the thesaurus keyword that matches the input thesaurus keyword, or a thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. A first display output column for sorting and displaying a list in descending order of the third score;
And a second display output column for sorting and displaying information resources including thesaurus keywords belonging to other nodes in the facet in the metadata in the descending order of the third score. The information resource search device according to any one of claims 1 to 4.

6. The information resource search device according to claim 1, wherein the output unit substantially displays a list of information resources that can be displayed at a time within an effective display area of the display device.

The information resource retrieval apparatus further includes:
A narrowing condition input unit for inputting a narrowing condition thesaurus keyword for narrowing down the search result of the information resource;
Only the information resources that include in the metadata the thesaurus keywords that match the input narrowing condition thesaurus keywords, or the thesaurus keywords that belong to the lower nodes of the matching thesaurus keywords on the facet tree structure to which the narrowing condition thesaurus keywords belong, The information resource search device according to claim 1, further comprising a filtering processing unit that sets the information resource to be output.

The information resource retrieval apparatus further includes:
6. The information according to claim 5, further comprising: a third display output field for displaying a list of information resources that do not belong to any node in the facet, sorted in descending order of the third score. Resource search device.

Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its A metadata storage unit for storing metadata including importance pairs;
For an information resource that includes in the metadata the free word that matches the input free word for searching for the information resource, a first score is calculated based on the importance paired with the matching free word; A first score calculation unit for holding the first score;
An information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword for searching for the information resource, or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs For information resources that include the thesaurus keywords belonging to the other nodes in the facet in the metadata, the second score is weighted according to the tree structure distance β ( a second score that is multiplied by β ≦ 1) and holds the second score;
For each information resource, a score summation unit that sums the first score and the second score to obtain a third score;
An information resource search server device comprising: a sorting unit that obtains a list of information resources sorted in descending order of the third score as a search result to be output.

An input free word for searching an information resource is input, a plurality of facet tree structures on the thesaurus are selectively displayed, and nodes on the displayed tree structure are selected, whereby an input thesaurus keyword is selected. An input section to input,
Based on the metadata assigned to the information resource, a score is calculated for the first information resource group that is a set of information resources having metadata matching the input free word or the input thesaurus keyword, and the input A processing unit that calculates a score for a second information resource group that is a set of information resources having metadata matching the thesaurus keyword belonging to another node in the facet to which the thesaurus keyword belongs;
An information resource search comprising: an output unit that displays a list of the first information resource group and the second information resource group that are sorted in descending order of score so as to be distinguishable on a display screen. Client device.

A metadata storage unit that stores metadata including a thesaurus keyword that describes a node name in a facet on a thesaurus for each information resource;
An input unit for inputting an input thesaurus keyword for searching for the information resource;
A score is calculated for an information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. , For information resources that include thesaurus keywords belonging to other nodes in the facet in the metadata, multiply the score by a weight β (β ≦ 1) corresponding to the distance on the tree structure, and obtain the score. A score calculator to hold;
An information resource search apparatus comprising: an output unit that displays a list of information resources sorted in descending order of the score.

Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its Storing metadata including importance pairs in a metadata storage unit;
Inputting an input thesaurus keyword and an input free word for searching the information resource;
For an information resource that includes the free word that matches the input free word in the metadata, a first score is calculated based on the degree of importance paired with the matching free word, and the first score is retained And steps to
A second score is obtained for an information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. And calculating and multiplying the second score by a weight β (β ≦ 1) corresponding to the distance on the tree structure for the information resource including the thesaurus keyword belonging to the other node in the facet in the metadata. A second score, and retaining the second score;
For each information resource, adding the first score and the second score to obtain a third score;
And displaying a list of information resources sorted in descending order of the third score as a search result.

Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its Storing metadata including importance pairs in a metadata storage unit;
For an information resource that includes in the metadata the free word that matches the input free word for searching for the information resource, a first score is calculated based on the importance paired with the matching free word; Holding the first score;
An information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword for searching for the information resource, or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs For information resources that include the thesaurus keywords belonging to the other nodes in the facet in the metadata, the second score is weighted according to the tree structure distance β ( multiplying β ≦ 1) to obtain a second score, and holding the second score;
For each information resource, adding the first score and the second score to obtain a third score;
Obtaining a list of information resources sorted in descending order of the third score as a search result to be output.

An input free word for searching an information resource is input, a plurality of facet tree structures on the thesaurus are selectively displayed, and nodes on the displayed tree structure are selected, whereby an input thesaurus keyword is selected. Step to enter,
Based on the metadata assigned to the information resource, a score is calculated for the first information resource group that is a set of information resources having metadata matching the input free word or the input thesaurus keyword, and the input Calculating a score for a second information resource group that is a set of information resources having metadata that matches the thesaurus keyword belonging to another node in the facet to which the thesaurus keyword belongs;
A method of searching for an information resource, comprising: displaying a list of the first information resource group and the second information resource group that are sorted in descending order of score so as to be distinguishable on a display screen.

An information resource search program for causing a computer to execute an information resource search process, the program causing the computer to
Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its Processing to store metadata including the importance pair in the metadata storage unit;
A process of inputting an input thesaurus keyword and an input free word for searching the information resource;
For an information resource that includes the free word that matches the input free word in the metadata, a first score is calculated based on the importance that is paired with the matching free word, and the first score is retained Processing to
A second score is obtained for an information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs. And calculating and multiplying the second score by a weight β (β ≦ 1) corresponding to the distance on the tree structure for the information resource including the thesaurus keyword belonging to other nodes in the facet in the metadata. A process of setting the second score and holding the second score;
For each information resource, a process of adding the first score and the second score to obtain a third score;
An information resource search program for executing a process including a process of displaying a list of information resources sorted in descending order of the third score as a search result.

An information resource search program for causing a computer to execute an information resource search process, the program causing the computer to
Metadata assigned to an information resource, the metadata including, for each information resource, a thesaurus keyword that describes a node name in a facet on a thesaurus, a free word described in the information resource, and its Processing to store metadata including the importance pair in the metadata storage unit;
For an information resource that includes in the metadata the free word that matches the input free word for searching for the information resource, a first score is calculated based on the importance paired with the matching free word; Processing to hold the first score;
An information resource that includes in the metadata the thesaurus keyword that matches the input thesaurus keyword for searching for the information resource, or the thesaurus keyword that belongs to a lower node of the matching thesaurus keyword on the facet tree structure to which the input thesaurus keyword belongs For information resources that include the thesaurus keywords belonging to the other nodes in the facet in the metadata, the second score is weighted according to the tree structure distance β ( a process of multiplying β ≦ 1) to obtain a second score, and holding the second score;
For each information resource, a process of adding the first score and the second score to obtain a third score;
An information resource search program for executing a process including a process of obtaining a list of information resources sorted in descending order of the third score as a search result to be output.

An information resource search program for causing a computer to execute an information resource search process, the program causing the computer to
An input free word for searching an information resource is input, a plurality of facet tree structures on the thesaurus are selectively displayed, and nodes on the displayed tree structure are selected, whereby an input thesaurus keyword is selected. Process to enter,
Based on the metadata assigned to the information resource, a score is calculated for the first information resource group that is a set of information resources having metadata matching the input free word or the input thesaurus keyword, and the input A process of calculating a score for a second information resource group that is a set of information resources having metadata matching the thesaurus keyword belonging to another node in the facet to which the thesaurus keyword belongs;
It is for executing a process including a process of displaying the first information resource group and the second information resource group, which are sorted in descending order of scores, in a list so as to be distinguishable on a display screen. An information resource retrieval program characterized by