JP2013025421A

JP2013025421A - Information retrieval system and information retrieval method

Info

Publication number: JP2013025421A
Application number: JP2011157226A
Authority: JP
Inventors: Toyokazu Akiyama; 豊和秋山; Yukiko Kawai; 由起子河合
Original assignee: Kyoto Sangyo University
Current assignee: Kyoto Sangyo University
Priority date: 2011-07-15
Filing date: 2011-07-15
Publication date: 2013-02-04

Abstract

PROBLEM TO BE SOLVED: To provide an information retrieval system and an information retrieval method capable of outputting a retrieval result reflecting popularity, topicality and accuracy of description contents or the like of a Web page in a short time.SOLUTION: In the information retrieval system for retrieving a Web page on the basis of a keyword inputted in a terminal which has executed retrieval, a transition probability is computed from the association of a Web page to which a Web page to be a retrieval object is linked and a terminal connected to the Web page to be the retrieval object at present or within a fixed period. The Web page to be the retrieval object is sequenced on the basis of a computed value and outputted to the terminal.

Description

本発明は、ＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）の情報検索システムにおいて、目的の情報を効率良く容易に取得するための技術に関する。 The present invention relates to a technique for efficiently and easily acquiring target information in a WWW (World Wide Web) information search system.

ＷＷＷの膨大な情報の中から目的の情報を得ようとする場合、検索システムを利用して該当するＷｅｂページを特定することが一般的である。このような検索システムでは、ユーザが入力した検索キーワードに該当するＷｅｂページを特定して集め、集められたＷｅｂページに重要度を付けて重要度の高いものから順に提示している。 When obtaining target information from a vast amount of information on the WWW, it is common to use a search system to identify the corresponding Web page. In such a search system, Web pages corresponding to the search keyword input by the user are specified and collected, and the collected Web pages are given importance and presented in descending order of importance.

ここで、集められたＷｅｂページに重要度を付ける技術として、Ｗｅｂページ内の名詞の出現数やリンク構造等を基準にＷｅｂページのランク付けを行う技術が広く知られている。具体的には、例えば、ユーザが入力した検索キーワードがタイトル等の重要な箇所で使用されているＷｅｂページや、ユーザが入力した検索キーワードが適度に多く出現するＷｅｂページに対して高い重要度を付ける方法が知られている。また、他のＷｅｂページから数多くリンクされているＷｅｂページや、内容の関連性が高いＷｅｂページからリンクされているＷｅｂページに対して高い重要度を付ける方法が知られている。 Here, as a technique for assigning importance to collected Web pages, a technique for ranking Web pages based on the number of appearances of nouns in a Web page, a link structure, and the like is widely known. Specifically, for example, a high importance is given to a Web page in which a search keyword input by the user is used in an important part such as a title, or a Web page in which a search keyword input by the user appears moderately many times. The method of attaching is known. Also, there is known a method of assigning a high importance to a Web page linked from many other Web pages or a Web page linked from a Web page having high content relevance.

しかしながら、このようにＷｅｂページの内容やリンク構造だけを基準にＷｅｂページの重要度を規定すると、ユーザが検索用語を上手く選定できない場合、目的とする情報が開示されたＷｅｂページが上位に表示されず、ユーザが目的とする情報が開示されたＷｅｂページを発見できないという問題があった。さらにまた、目的とする情報が開示されたＷｅｂページを特定できた場合であっても、目的とする情報が難解であった場合、ユーザが深い理解を得られないという問題があった。 However, if the importance of the Web page is defined based on only the content and link structure of the Web page in this way, if the user cannot select a search term well, the Web page that discloses the target information is displayed at the top. Therefore, there has been a problem that a Web page in which information intended by the user is disclosed cannot be found. Furthermore, even if the Web page in which the target information is disclosed can be identified, there is a problem that the user cannot obtain a deep understanding if the target information is difficult to understand.

そこで本発明者らは、このような問題に鑑み、検索結果として集められた各Ｗｅｂページの人気や話題性、記載内容の精度等に基づいてＷｅｂページに重要度を付け、重要度の高いものから順に提示することができる情報検索システム及び情報検索方法を先の出願（特許文献１）により提案した。 Therefore, in view of such problems, the inventors assign importance to Web pages based on the popularity and topicality of each Web page collected as a search result, accuracy of description contents, etc. An information search system and an information search method that can be presented in order are proposed in the previous application (Patent Document 1).

具体的に説明すると、本発明者らが提案した情報検索システムでは、検索結果として集められた複数のＷｅｂページを、検索実行時にそれぞれのＷｅｂページを閲覧しているユーザの数が多い順に並び替えて検索結果を表示することができる。即ち、検索実行時に多くのユーザが注目しているＷｅｂページとは、検索実行時において話題性（必要性）の高いＷｅｂページであり、そのようなＷｅｂページを検索結果の上位に表示することによって、検索実行時に必要性が高い情報を取得し易くしている。 Specifically, in the information search system proposed by the present inventors, a plurality of Web pages collected as search results are rearranged in descending order of the number of users browsing each Web page at the time of search execution. Search results can be displayed. That is, the Web page that many users are paying attention to when executing a search is a Web page with high topicality (necessity) at the time of executing the search, and by displaying such a Web page at the top of the search results. This makes it easy to obtain information that is highly necessary when performing a search.

また、本発明者らが提案した情報検索システムでは、予め情報検索システムを使用する全てのユーザの情報を取得しておき、取得した情報を検索結果に反映することができる。例えば、ある特定のユーザが検索を実行したとき、検索結果として集められたそれぞれのＷｅｂページをどのようなユーザが閲覧しているのか情報検索システムが判別し、特定の条件を満たしたユーザが閲覧しているＷｅｂページを上位に表示する。例えば、検索を実行したユーザが「プログラム」というキーワードで検索を実行したのであれば、「職業」が「プログラマ」であるユーザが閲覧しているＷｅｂページが上位に表示される。このように、検索結果として集められたＷｅｂページに、検索のキーワードに関連する分野の専門家であるユーザが閲覧しているＷｅｂページあれば、そのＷｅｂページを上位に表示する。即ち、専門家が多く閲覧しているＷｅｂページは内容が充実した（内容の精度が高い）Ｗｅｂページであり、そのようなＷｅｂページを検索結果の上位に表示することによって、ユーザが精度の高い情報を取得し易くしている。 In the information search system proposed by the present inventors, information on all users who use the information search system can be acquired in advance, and the acquired information can be reflected in the search result. For example, when a specific user performs a search, the information search system determines which user is browsing each Web page collected as a search result, and a user who satisfies a specific condition browses The Web page being displayed is displayed at the top. For example, if the user who executed the search performs a search using the keyword “program”, the Web page viewed by the user whose “profession” is “programmer” is displayed at the top. In this way, if the web page collected as a search result is a web page being browsed by a user who is an expert in a field related to the search keyword, the web page is displayed at the top. In other words, Web pages that are viewed by many experts are Web pages that are rich in content (content accuracy is high), and by displaying such Web pages at the top of the search results, the user has high accuracy. It is easy to obtain information.

そして、本発明者らが提案した情報検索システムでは、検索実行後に特定したＷｅｂページを閲覧するとき、同じＷｅｂページを閲覧している他のユーザと通信可能となっている。そのことにより、特定したＷｅｂページの内容が難解であった場合であっても、他のユーザに質問することで、ユーザはＷｅｂページの内容に対する理解を深めることができる。 The information search system proposed by the present inventors can communicate with other users who are browsing the same Web page when browsing the specified Web page after executing the search. As a result, even if the content of the specified Web page is difficult, the user can deepen understanding of the content of the Web page by asking other users questions.

特開２０１１−３９６２５号公報JP 2011-39625 A

上記したように、検索結果として集められた各Ｗｅｂページの人気や話題性、記載内容の精度等に基づいてＷｅｂページに重要度を付け、重要度の高いものから順に提示する情報検索システムによると、ユーザが真に必要とする情報をより容易に取得できる。そこで本発明は、人気や話題性、記載内容の精度等を反映したＷｅｂページの重要度の算出がより正確に実施可能であり、算出したＷｅｂページの重要度を反映した検索結果を短時間で出力可能な情報検索システム及び情報検索方法を提供することを課題とする。 As described above, according to the information search system that assigns importance to Web pages based on the popularity and topicality of each Web page collected as a search result, the accuracy of description, etc., and presents them in descending order of importance. Information that the user really needs can be acquired more easily. Therefore, according to the present invention, it is possible to calculate the importance of the Web page reflecting the popularity, the topicality, the accuracy of the description content, and the like more accurately, and the search result reflecting the calculated importance of the Web page can be obtained in a short time. It is an object of the present invention to provide an information search system and an information search method that can be output.

上記課題を解決するための本発明の一態様は、検索を実行した端末で入力されたキーワードに基づいてＷｅｂページの検索を行うと共に、その検索結果を前記端末に出力して表示させるための情報検索システムであって、検索対象となるＷｅｂページを序列化する情報序列化手段を有するものであり、情報序列化手段は、検索対象となるＷｅｂページがリンクしているＷｅｂページと、検索対象となるＷｅｂページに対して現にあるいは一定期間内に接続された端末との関連から遷移傾向を推定する演算を実施する機能を備えたことを特徴とする情報検索システムである。
なお、「遷移傾向を推定する演算」とは、実際の遷移確率の演算に加えて、遷移確率の演算に必要な要素に改変を加えた上で行う演算を含む。また、「遷移」とは、Ｗｅｂページ又は端末を頂点とし、特定のＷｅｂページからリンクを辿って他のＷｅｂページを表示することとする。さらに、特定の端末による特定のＷｅｂページの出力があったとき、特定のＷｅｂページを含む２つのＷｅｂページの間にリンクが形成されているものと同様な遷移が想定されるものとする。
また、このとき想定される「特定のＷｅｂページを含む２つのＷｅｂページの間にリンクが形成されているものと同様な遷移」は、「２つのＷｅｂページの間に相互リンクが形成されているものとする遷移」とすることが望ましい。なぜなら、「２つのＷｅｂページの間にいずれか一方のＷｅｂページから他方のＷｅｂページにリンクが形成されているものとする遷移」が想定された場合に比べて、ユーザに閲覧されることによるＷｅｂページの重要度の上昇がより正確に反映されるためである。 One aspect of the present invention for solving the above problems is information for searching a Web page based on a keyword input at a terminal that has executed a search and outputting the search result to the terminal for display. The search system includes information ordering means for ordering Web pages to be searched. The information ordering means includes a Web page linked to the Web page to be searched, a search target, An information search system having a function of performing an operation for estimating a transition tendency from a relation with a terminal that is currently connected to a Web page or within a certain period of time.
The “calculation for estimating a transition tendency” includes a calculation performed after modifying an element necessary for calculating the transition probability in addition to the calculation of the actual transition probability. In addition, “transition” refers to displaying another Web page by tracing a link from a specific Web page with the Web page or terminal as a vertex. Furthermore, when a specific web page is output by a specific terminal, a transition similar to that in which a link is formed between two web pages including the specific web page is assumed.
In addition, “transition similar to that in which a link is formed between two Web pages including a specific Web page” assumed at this time is “a mutual link is formed between the two Web pages. It is desirable to use “transition to be”. This is because the Web that is viewed by the user compared to the case where “a transition in which a link is formed from one Web page to the other Web page between two Web pages” is assumed. This is because an increase in the importance of the page is more accurately reflected.

本発明の第二の態様は、遷移確率の演算は、検索対象となるＷｅｂページと検索対象となるＷｅｂページに対して現にあるいは一定期間内に接続された端末を含む構成によって正方行列を設定し、当該正方行列の固有ベクトルを演算する演算内容を含むことを特徴とする請求項１に記載の情報検索システムである。 In the second aspect of the present invention, the calculation of transition probability sets a square matrix by a configuration including a Web page to be searched and a terminal that is currently connected to the Web page to be searched or within a certain period of time. The information search system according to claim 1, further comprising: calculation content for calculating an eigenvector of the square matrix.

本発明の第三の態様は、前記正方行列の要素は、Ｗｅｂページがリンクしているリンク先の数に基づく値と、端末が現にあるいは一定期間内に接続したＷｅｂページの数に基づく値によって構成されていることを特徴とする請求項２に記載の情報検索システムである。 According to a third aspect of the present invention, the elements of the square matrix are based on a value based on the number of link destinations to which the web page is linked and a value based on the number of web pages to which the terminal is actually connected within a certain period. The information search system according to claim 2, wherein the information search system is configured.

本発明の第四の態様は、前記正方行列の要素は、Ｗｅｂページがリンクしているリンク先の数に基づく値をＡとし、端末が現にあるいは一定期間内に接続したＷｅｂページの数をＢとしたとき、Ａ／（Ａ＋Ｂ）、Ｂ／（Ａ＋Ｂ）あるいはこれらに系数が掛けられた値であることを特徴とする請求項２又は３に記載の情報検索システムである。 In a fourth aspect of the present invention, the element of the square matrix is A based on the number of link destinations to which the Web page is linked, and B is the number of Web pages that the terminal has actually connected within a certain period. The information search system according to claim 2 or 3, wherein A / (A + B), B / (A + B) or a value obtained by multiplying these by a system number.

本発明の第五の態様は、検索実行時又は検索実行以前に、検索対象となるＷｅｂページを出力している端末に係る情報である端末情報を取得する情報取得手段を有しており、情報序列化手段は、少なくとも端末情報に基づいて所定の条件を満たすＷｅｂページと、当該Ｗｅｂページを出力している又は出力した端末である出力端末を特定するものであって、前記出力端末を仮想のＷｅｂページとし、出力端末が出力しているＷｅｂページと仮想のＷｅｂページとの間に相互リンクが形成されるとしたとき、Ｗｅｂページ及び仮想のＷｅｂページによって形成されるリンク構造に基づき、リンク元のＷｅｂページの表示からリンク先のＷｅｂページの表示への切り替えを遷移とした遷移確率行列を設定し、設定した遷移確率行列に基づいて遷移確率を演算し、演算結果に基づいて検索対象となるＷｅｂページを序列化することを特徴とする請求項１乃至３のいずれかに記載の情報検索システムである。
なお、上記リンク構造には仮想のリンクを含み、上記リンク元のＷｅｂページ又は上記リンク先のＷｅｂページには、仮想のページを含むものとする。 The fifth aspect of the present invention has information acquisition means for acquiring terminal information, which is information relating to a terminal outputting a Web page to be searched, at the time of search execution or before search execution, The ordering means identifies a Web page that satisfies a predetermined condition based on at least terminal information, and an output terminal that is outputting or outputting the Web page. When it is assumed that a mutual link is formed between the web page output from the output terminal and the virtual web page, the link source is based on the link structure formed by the web page and the virtual web page. Set a transition probability matrix with a transition from the display of the Web page to the display of the linked Web page as a transition, and transition based on the set transition probability matrix It calculates the rate is information retrieval system according to any one of claims 1 to 3, characterized in that ranks the Web pages to be searched according to the result.
The link structure includes a virtual link, and the link source Web page or the link destination Web page includes a virtual page.

このような情報検索システムによると、人気や話題性、記載内容の精度等を反映したＷｅｂページの重要度の算出がより正確に実施可能であり、算出したＷｅｂページの重要度を反映した検索結果を短時間で出力可能となっている。したがって、並び替えの基準となる項目が多くなる等によって計算量が膨大になってしまった場合であっても、短時間で検索結果を取得することができる。 According to such an information search system, it is possible to more accurately calculate the importance level of a Web page that reflects popularity, topicality, accuracy of description contents, and the like, and a search result that reflects the calculated importance level of the Web page Can be output in a short time. Therefore, even if the amount of calculation becomes enormous due to an increase in the items to be used as the sorting reference, the search result can be acquired in a short time.

本発明の第六の態様は、前記端末情報に基づいて、前記出力端末を使用する使用者の特定項目に対する知識及び／又は使用者の検索能力に対するユーザの重要度を示すユーザランク値を算出するものであり、
前記ユーザランク値に基づいて前記正方行列と前記遷移確率行列のいずれかの要素、及び／又は算出された正方行列又は前記遷移確率行列の固有ベクトルの少なくともいずれかを変更することを特徴とする請求項２乃至５のいずれかに記載の情報検索システムである。 According to a sixth aspect of the present invention, based on the terminal information, a user rank value indicating a user's importance with respect to a user's specific item and / or user's search ability using the output terminal is calculated. Is,
The element of the square matrix and the transition probability matrix and / or at least one of the calculated square matrix or the eigenvector of the transition probability matrix is changed based on the user rank value. The information search system according to any one of 2 to 5.

本発明の第六の態様では、特定項目に対する知識が高いユーザや検索能力が高いユーザが閲覧している、内容の充実したＷｅｂページを上位に表示させることができる。そのことにより、ユーザが検索実行時に必要性の高い情報を容易に取得することができる。 In the sixth aspect of the present invention, it is possible to display a Web page with a high content that is viewed by a user who has a high knowledge of a specific item or a user who has a high search capability. This makes it possible for the user to easily acquire information that is highly necessary when performing a search.

本発明の第七の態様は、ネットワーク上のＷｅｂページの内容に関する情報を取得可能な情報取得機能を有し、情報取得機能が取得した情報に基づいてＷｅｂページの内容の重要度を示すリレーションランク値を算出するものであり、前記リレーションランク値に基づいて前記正方行列と前記遷移確率行列のいずれかの要素、及び／又は算出された正方行列又は前記遷移確率行列の固有ベクトルの少なくともいずれかを変更することを特徴とする請求項２乃至６のいずれかに記載の情報検索システムである。 The seventh aspect of the present invention has an information acquisition function capable of acquiring information related to the contents of a Web page on a network, and the relation rank indicating the importance of the contents of the Web page based on the information acquired by the information acquisition function A value is calculated, and at least one of the square matrix and the transition probability matrix and / or the calculated square matrix or the eigenvector of the transition probability matrix is changed based on the relation rank value. The information search system according to claim 2, wherein the information search system is an information search system.

本発明の第七の態様では、検索条件と関連性の高い内容のＷｅｂページを上位に表示させることができる。そのことにより、ユーザが検索実行時に必要性の高い情報を容易に取得することができる。 In the seventh aspect of the present invention, it is possible to display a Web page having contents highly relevant to the search condition at the top. This makes it possible for the user to easily acquire information that is highly necessary when performing a search.

本発明の第八の態様は、端末で入力されたキーワードに基づいてデータベースの検索を行なうと共に、その検索結果を前記端末に出力して表示させるための情報検索システムであって、検索実行時又は検索実行以前に、検索対象となるＷｅｂページを出力している端末に係る情報である端末情報を取得する情報取得手段を有しており、検索対象となるＷｅｂページを序列化する情報序列化手段を有するものであり、情報序列化手段は、少なくとも端末情報に基づいて所定の条件を満たすＷｅｂページと、当該Ｗｅｂページを出力している又は出力した端末である出力端末を特定し、前記Ｗｅｂページと出力端末に基づいて正方行列を設定するものであって、前記正方行列はＷｅｂページ及び出力端末に対応する行要素及び列要素から形成されるものであり、その各要素Ｐ（ｉ，ｊ）は下記式（１）の関係を満たすものであり、設定した前記正方行列の固有ベクトルを算出し、算出した固有ベクトルの成分の大きさに基づいて検索対象となるＷｅｂページを序列化することを特徴とする情報検索システムである。
Ｐ（ｉ，ｊ）＝ αＸ／Ｙ・・・（１）
α ＝０以外の数
Ｘ＝列「ｊ」に対応するＷｅｂページから行「ｉ」に対応するＷｅｂページへのリンクの数、又は、列「ｊ」に対応するＷｅｂページの行「ｉ」に対応する出力端末での表示数、又は、列「ｊ」に対応する出力端末での行「ｉ」に対応するＷｅｂページの表示数、又は０
Ｙ＝列「ｊ」に対応するＷｅｂページの出力端末での表示数と、列「ｊ」に対応するＷｅｂページに形成されているリンク数の総和 An eighth aspect of the present invention is an information search system for searching a database based on a keyword input at a terminal, and outputting and displaying the search result on the terminal. An information ordering unit that has information acquisition means for acquiring terminal information that is information related to a terminal that outputs a Web page to be searched before performing search, and ranks the Web pages to be searched The information ordering means identifies a Web page that satisfies a predetermined condition based on at least terminal information and an output terminal that is outputting or outputting the Web page, and the Web page A square matrix is set based on the output terminal, and the square matrix is formed from row elements and column elements corresponding to the Web page and the output terminal. Each element P (i, j) satisfies the relationship of the following expression (1), calculates the eigenvector of the set square matrix, and searches based on the magnitude of the calculated eigenvector component It is an information search system characterized by ordering target Web pages.
P (i, j) = αX / Y (1)
α = a number other than 0
X = number of links from the web page corresponding to column “j” to the web page corresponding to row “i”, or at the output terminal corresponding to row “i” of the web page corresponding to column “j” Number of displays, or number of web pages displayed corresponding to row “i” on the output terminal corresponding to column “j”, or 0
Y = sum of the number of Web pages displayed on the output terminal corresponding to the column “j” and the number of links formed on the Web page corresponding to the column “j”

本発明の第九の態様は、検索を実行した端末で入力されたキーワードに基づいてＷｅｂページの検索を行うと共に、その検索結果を前記端末に出力して表示させるための情報検索方法であって、検索対象となるＷｅｂページがリンクしているＷｅｂページと、検索対象となるＷｅｂページに対して現にあるいは一定期間内に接続された端末との関連から遷移傾向を推定する演算を実施することを特徴とする情報検索方法である。 According to a ninth aspect of the present invention, there is provided an information search method for searching a Web page based on a keyword input at a terminal that has executed a search, and outputting and displaying the search result on the terminal. Performing a calculation for estimating the transition tendency from the relationship between the Web page linked to the Web page to be searched and the terminal currently connected to the Web page to be searched or within a certain period of time. This is a characteristic information search method.

本発明の第十の態様は、検索実行時又は検索実行以前に、検索対象となるＷｅｂページを出力している端末に係る情報である端末情報を取得し、少なくとも端末情報に基づいて所定の条件を満たすＷｅｂページと、当該Ｗｅｂページを出力している又は出力した端末である出力端末を特定するものであり、前記出力端末を仮想のＷｅｂページとし、出力端末が出力しているＷｅｂページと仮想のＷｅｂページとの間に相互リンクが形成されるとしたとき、Ｗｅｂページ及び仮想のＷｅｂページによって形成されるリンク構造に基づき、リンク元のＷｅｂページの表示からリンク先のＷｅｂページの表示への切り替えを遷移とした遷移確率行列を設定し、設定した遷移確率行列に基づいて遷移確率を演算し、演算結果に基づいて検索対象となるＷｅｂページを序列化することを特徴とする請求項９に記載の情報検索方法。 According to a tenth aspect of the present invention, terminal information, which is information related to a terminal outputting a Web page to be searched, is acquired at the time of search execution or before search execution, and at least a predetermined condition based on the terminal information And the output terminal that is the terminal that has output or has output the Web page, the output terminal is a virtual Web page, and the Web page that is output from the output terminal and the virtual When a mutual link is formed with the Web page, the display from the link source Web page to the link destination Web page is displayed based on the link structure formed by the Web page and the virtual Web page. Set a transition probability matrix with switching as a transition, calculate the transition probability based on the set transition probability matrix, and search based on the calculation result Information search method according to claim 9, characterized in that ranks the eb page.

本発明の第九の態様及び第十の様態では、検索実行時において人気の高い情報を有するＷｅｂページや、ものごとに対する興味が似ているユーザ、検索目的の分野に精通しているユーザ等が閲覧するＷｅｂページ等が検索結果の上位に表示されるため、必要性が高い情報や質の高い情報を取得し易い。また、そのような検索結果を短時間で出力できる。 In the ninth aspect and the tenth aspect of the present invention, there are Web pages that have popular information at the time of search execution, users who have similar interest in things, users who are familiar with the field of search purposes, etc. Since the web page to be browsed is displayed at the top of the search result, it is easy to acquire highly necessary information or high quality information. Moreover, such a search result can be output in a short time.

本発明は、検索を実行した端末で入力されたキーワードに基づいてＷｅｂページの検索を行い、その検索結果を前記端末に出力して表示させるための情報検索システムにおいて、Ｗｅｂページの人気や話題性、記載内容の精度等を反映した検索結果をより速く出力できるという効果がある。 The present invention provides an information search system for searching a Web page based on a keyword input at a terminal that has executed a search, and outputting and displaying the search result on the terminal. There is an effect that the search result reflecting the accuracy of the description content can be output more quickly.

本発明の第１の実施形態に係る情報検索システムの構成を示す概念図である。It is a conceptual diagram which shows the structure of the information search system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報検索システムの機能ブロックの一例を示す図である。It is a figure which shows an example of the functional block of the information search system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報検索システムのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the information search system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報検索システムによる検索手順の一例を示すフローチャートである。It is a flowchart which shows an example of the search procedure by the information search system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報検索システムで検索を実施する際に検索対象となるＷｅｂページのリンク構造、及び当該Ｗｅｂページの閲覧状況の一例を示す説明図である。It is explanatory drawing which shows an example of the link structure of the web page used as a search object, and the browsing condition of the said web page when it searches with the information search system which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る情報検索システムで検索を実施する際に設定するランク値演算用行列の一例を示す説明図である。It is explanatory drawing which shows an example of the matrix for rank value calculation set when implementing a search with the information search system which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る情報検索システムで検索を実施する際に設定するランク値演算用行列の一例を示す説明図である。It is explanatory drawing which shows an example of the matrix for rank value calculation set when implementing a search with the information search system which concerns on the 2nd Embodiment of this invention. 本発明の実施例において固有値演算の速度を検証するために用いた行列を示す説明図であり、（ａ）はラプラシアン行列を示し、（ｂ）は遷移確率行列を示す。It is explanatory drawing which shows the matrix used in order to verify the speed of eigenvalue calculation in the Example of this invention, (a) shows a Laplacian matrix, (b) shows a transition probability matrix. 本発明の実施例において、ラプラシアン行列の固有値演算におけるプロセッサ数を変化させたときの処理時間比を示すグラフである。In the Example of this invention, it is a graph which shows the processing time ratio when changing the number of processors in the eigenvalue calculation of a Laplacian matrix. 本発明の実施例において、ラプラシアン行列の固有値演算における処理時間を示すグラフである。In the Example of this invention, it is a graph which shows the processing time in the eigenvalue calculation of a Laplacian matrix. 本発明の実施例において、遷移確率行列の固有値演算におけるプロセッサ数を変化させたときの処理時間比を示すグラフである。In the Example of this invention, it is a graph which shows the processing time ratio when changing the number of processors in the eigenvalue calculation of a transition probability matrix. 本発明の実施例において、遷移確率行列の固有値演算における並列化効率を示すグラフである。In the Example of this invention, it is a graph which shows the parallelization efficiency in the eigenvalue calculation of a transition probability matrix. 本発明の実施例において、複数ノードでのラプラシアン行列の固有値演算におけるネットワーク性能の影響を示すグラフである。In the Example of this invention, it is a graph which shows the influence of the network performance in the eigenvalue calculation of the Laplacian matrix in multiple nodes.

本発明の実施形態に係る情報検索システム、並びに情報検索方法の提供方法についてそれぞれ図面を参照しながら詳細に説明する。 An information search system and an information search method providing method according to an embodiment of the present invention will be described in detail with reference to the drawings.

図１は、第１の実施形態の情報検索システム１の構成の概略を示す概念図である。図１に示す様に、情報検索システム１は、検索装置２とクライアント端末４（図１では符号４ａ〜４ｄ及び・・・で示す）で構成されており、それぞれ電気通信網を介して接続されている。クライアント端末４にて、それぞれブラウザ５（図１では符号５ａ〜５ｄ及び・・・で示す）が起動されている。 FIG. 1 is a conceptual diagram showing an outline of a configuration of an information search system 1 according to the first embodiment. As shown in FIG. 1, the information retrieval system 1 is composed of a retrieval device 2 and a client terminal 4 (indicated by reference numerals 4a to 4d and... In FIG. 1), which are connected via a telecommunication network. ing. On the client terminal 4, browsers 5 (indicated by reference numerals 5a to 5d and... In FIG. 1) are activated.

第１の実施形態の情報検索システム１は、入力情報を取得可能となっている。具体的には、この入力情報は、ブラウザ５にプラグイン形式等の所定の手段で組み込んだ端末側情報取得装置７で取得され、検索装置２へ送信される。なお、入力情報とは、ユーザがブラウザ５やブラウザ５と連動するプログラム（例えば、プラグインツールバー等）を介して入力した検索キーワード等に関する情報である。 The information search system 1 of the first embodiment can acquire input information. Specifically, this input information is acquired by the terminal-side information acquisition device 7 incorporated in the browser 5 by a predetermined means such as a plug-in format and transmitted to the search device 2. Note that the input information is information related to a search keyword or the like input by the user via the browser 5 or a program (for example, a plug-in toolbar) linked to the browser 5.

また端末側情報取得装置７は、上記した入力情報だけでなく、ブラウザ５を介したＷｅｂページの出力に関する情報である使用情報も検索装置２へ送信することができる。具体的には、端末側情報取得装置７は、使用情報として、ブラウザ５における規定のＷｅｂページの表示開始時間や表示終了時間といったＷｅｂページの表示時間に関わる情報を検索装置２へ送信することができる。さらに、使用情報として、規定のＷｅｂページからリンクを辿って他のＷｅｂページを表示させた場合に、移動元のＷｅｂページのＵＲＬや移動先のＷｅｂページのＵＲＬといったＷｅｂページの遷移経路を示す情報を検索装置２へ送信することができる。 Further, the terminal-side information acquisition device 7 can transmit not only the input information described above but also usage information that is information related to the output of the Web page via the browser 5 to the search device 2. Specifically, the terminal-side information acquisition device 7 may transmit, as usage information, information related to the Web page display time, such as the specified Web page display start time and display end time in the browser 5, to the search device 2. it can. Further, as usage information, when a link is traced from a specified Web page and another Web page is displayed, information indicating the transition path of the Web page such as the URL of the Web page of the movement source and the URL of the Web page of the movement destination Can be transmitted to the search device 2.

このように、端末側情報取得装置７は、上記したような入力情報や使用情報を端末情報として検索装置２へ送信する。このとき、端末側情報取得装置７による情報の送信は、ブラウザ５が起動された場合、検索が実行された等の所定の操作が実行された場合、前回情報の送信が終了してから規定時間が経過した場合、といった所定のタイミングに自動で実施される。 Thus, the terminal-side information acquisition device 7 transmits the input information and usage information as described above to the search device 2 as terminal information. At this time, the transmission of information by the terminal-side information acquisition device 7 is performed for a specified time after the transmission of the previous information is completed, when a predetermined operation such as a search is executed, such as when the browser 5 is activated. This is automatically performed at a predetermined timing such as when elapses.

一方、検索装置２では、端末側情報取得装置７から送信された端末情報を検索用データベース３に格納する。即ち、ネットワーク上の各クライアント端末４（４ａ〜４ｄ，・・・）から検索装置２に送信された端末情報が、検索装置２の検索用データベース３に格納される（図２）。 On the other hand, the search device 2 stores the terminal information transmitted from the terminal-side information acquisition device 7 in the search database 3. That is, the terminal information transmitted from the client terminals 4 (4a to 4d,...) On the network to the search device 2 is stored in the search database 3 of the search device 2 (FIG. 2).

また、検索装置２は情報収集部１２を有している。この情報収集部１２は、所謂クローラとインデクサの機能を有するものであって、ＷＷＷ上を巡回し、リソースに関する情報を取得し、まとめて、検索用データベース３に蓄積する。そして、検索用データベース３に蓄積するときは、Ｗｅｂページのリンク構造に関する情報や、記載されているテキスト、タイトル、各種タグ等の内容に関する情報を特徴毎に解析、選別、分類し、検索しやすいようにしておく。 Further, the search device 2 has an information collection unit 12. The information collecting unit 12 has a function of a so-called crawler and indexer. The information collecting unit 12 circulates on the WWW, acquires information about resources, and collectively stores the information in the search database 3. When storing in the search database 3, it is easy to search by analyzing, selecting, and classifying the information about the link structure of the Web page and the information about the contents such as the text, title, and various tags described for each feature. Keep it like that.

情報検索システム１は、いずれかのクライアント端末４で検索が実行されると、入力された検索キーワードより、検索クエリを作成してサーバである検索装置２に送信する。 When a search is executed in any of the client terminals 4, the information search system 1 creates a search query from the input search keyword and transmits it to the search device 2 that is a server.

検索装置２で検索クエリを受信すると、検索装置２の検索実行部１３は、検索用データベース３に蓄積されたＷｅｂページの中から検索対象となるＷｅｂページを特定する。また、特定されたそれぞれのＷｅｂページを閲覧している、又は閲覧したユーザの数を特定する。そして、詳しくは後述するが、これらの情報に基づいて形成されるランク値演算用行列の固有値を算出し、算出した固有値を検索対象となる各Ｗｅｂページのランク値（重要度の値）とする。 When the search device 2 receives the search query, the search execution unit 13 of the search device 2 specifies the Web page to be searched from the Web pages stored in the search database 3. In addition, the number of users browsing or browsing each identified web page is identified. As will be described in detail later, eigenvalues of the rank value calculation matrix formed based on these pieces of information are calculated, and the calculated eigenvalues are used as rank values (importance values) of each Web page to be searched. .

さらに検索実行部１３は、検索結果のＷｅｂページを取得したランク値に基づいて並び替える。そして、検索装置２の検索結果出力部１５が、検索が実行されたクライアント端末４に検索結果を送信し、クライアント端末４で検索結果のＷｅｂページが並び替えた順に表示される。
このことにより、検索実行時に人気が高く、話題性のあるＷｅｂページを検索結果の上位に表示させることができる。 Further, the search execution unit 13 rearranges the search result Web pages based on the acquired rank value. Then, the search result output unit 15 of the search device 2 transmits the search results to the client terminal 4 on which the search has been executed, and the search result Web pages are displayed in the order in which the search results are sorted.
This makes it possible to display a Web page that is popular and topical at the time of executing the search at the top of the search results.

以下、本実施形態の情報検索システム１についてより詳細に説明する。 Hereinafter, the information search system 1 of the present embodiment will be described in more detail.

まず、本実施形態の情報検索システム１の機能的構成について説明する。
なお、以下に記載する本実施形態の情報検索システム１の各機能ブロックは、ハードウェア、ソフトウェア、又はハードウェア及びソフトウェアの両方として実現され得る。具体的には、コンピュータを利用するものであれば、ＣＰＵや主メモリ、バス、あるいは不揮発性メモリや、所謂外部記憶装置（ハードディスクやフロッピーディスク（登録商標）、ＭＯ、ＣＤ、ＤＶＤ、ＢＤ、磁気テープ等々の記憶装置、及びそれらメディアの読み取り装置）、印刷機器、表示装置、その他外部周辺装置等のハードウェア構成部、またその外部周辺装置用のＩ／Ｏポート、それらハードウェアを制御するためのドライバプログラムやその他アプリケーションプログラム、情報入力に利用されるユーザーインターフェース等が挙げられる。 First, a functional configuration of the information search system 1 of the present embodiment will be described.
Note that each functional block of the information search system 1 of the present embodiment described below can be realized as hardware, software, or both hardware and software. Specifically, if a computer is used, a CPU, a main memory, a bus, a nonvolatile memory, a so-called external storage device (hard disk or floppy disk (registered trademark), MO, CD, DVD, BD, magnetic) To control storage devices such as tape and reading devices for these media), printing equipment, display devices, other external peripheral devices, etc., I / O ports for the external peripheral devices, and hardware Driver programs, other application programs, and user interfaces used for information input.

そして主メモリ上に展開したプログラムに従ったＣＰＵの演算処理によって、インターフェースを介して入力され、各種メモリやハードディスク上に保持されているデータなどが加工、蓄積されたり、上記各ハードウェアやソフトウェアを制御するための命令が生成されたりする。また、本発明はシステムとして実現できるのみでなく、方法としても実現可能である。加えて、本発明の一部をソフトウェアとして構成することができる。そして、そのようなソフトウェアをコンピュータに実行させるために用いるソフトウェア製品、及び同製品を記録媒体に固定した記録媒体も、当然に本発明の技術的な範囲に含まれる（本明細書の全体を通じて同様である）。 Then, the arithmetic processing of the CPU according to the program developed on the main memory, the data input through the interface and stored in various memories and hard disks, etc. are processed and accumulated, and the above hardware and software are installed. An instruction for control is generated. Further, the present invention can be realized not only as a system but also as a method. In addition, a part of the present invention can be configured as software. A software product used for causing a computer to execute such software and a recording medium in which the product is fixed to a recording medium are naturally included in the technical scope of the present invention (the same applies throughout the present specification). Is).

図２は、本実施形態の情報検索システム１における機能ブロックの一例を示す図である。この図にあるように、本実施形態の検索システム１は、主に端末側情報取得装置７と検索装置２から構成されている。 FIG. 2 is a diagram illustrating an example of functional blocks in the information search system 1 of the present embodiment. As shown in this figure, the search system 1 of the present embodiment is mainly composed of a terminal-side information acquisition device 7 and a search device 2.

端末側情報取得装置７は、端末情報取得部１０と端末情報送信部１１とを少なくとも有している。 The terminal-side information acquisition device 7 has at least a terminal information acquisition unit 10 and a terminal information transmission unit 11.

端末情報取得部１０は、クライアント端末４のブラウザから、少なくとも閲覧ＵＲＬを含む、端末情報を取得する機能を有する。ここで閲覧ＵＲＬとは、ブラウザを介して表示された（クライアント端末４でユーザが閲覧していた）ＷｅｂページのＷｅｂ上（ネットワーク上）での所在を示す情報であり、例えば、「ｈｔｔｐ：／／・・・／ＡＢＣ．ｊｐｇ」のように記述される情報である。
ここで、ブラウザから端末情報を取得する手段は、特に限定されるものでなく、適宜の方法を用いてよい。これらの方法は、例えば、ＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ）やＤＤＥ（ＤｙｎａｍｉｃＤａｔａＥｘｃｈａｎｇｅ）、ＯＬＥ（ＯｂｊｅｃｔＬｉｎｋｉｎｇａｎｄＥｍｂｅｄｄｉｎｇ）やファイルシステム等が挙げられる。 The terminal information acquisition unit 10 has a function of acquiring terminal information including at least a browsing URL from the browser of the client terminal 4. Here, the browsing URL is information indicating the location on the Web (on the network) of the Web page displayed by the browser (which the user has browsed on the client terminal 4). For example, “http: // /.../ABC.jpg ".
Here, the means for acquiring the terminal information from the browser is not particularly limited, and an appropriate method may be used. Examples of these methods include API (Application Program Interface), DDE (Dynamic Data Exchange), OLE (Object Linking and Embedding), and a file system.

また端末情報取得部１０が取得する端末情報には、閲覧ＵＲＬ以外にもブラウザでのＷｅｂページの閲覧に関する各種情報が含まれてよい。これらの情報には、例えば、システム利用者のユーザＩＤ、ブラウザＩＤ、Ｗｅｂページの閲覧開始時刻、Ｗｅｂページの閲覧終了時刻、あるいは、リンクを辿って２以上のＷｅｂページを閲覧した際の「移動元Ｗｅｂページ」や「移動先Ｗｅｂページ」などの移行情報などが挙げられる。
さらに、端末情報には、特定のＷｅｂページにおいて、ユーザが入力した検索キーワード等の入力情報が含まれてよい。 In addition to the browsing URL, the terminal information acquired by the terminal information acquiring unit 10 may include various types of information related to browsing a web page on a browser. These information include, for example, the system user's user ID, browser ID, Web page browsing start time, Web page browsing end time, or “movement” when browsing two or more Web pages by following links. For example, migration information such as “original web page” and “destination web page” may be used.
Further, the terminal information may include input information such as a search keyword input by the user on a specific Web page.

ユーザＩＤとはユーザを識別するための情報を示し、例えば、クライアント端末４へのログイン時のＩＤや、サービス単位又はソフトウェア単位で割り当てられるＩＤなどが挙げられる。ブラウザＩＤも同様に、ブラウザを識別するための情報であり、例えば、これらの情報は予めブラウザプログラムやツールバープログラムに記録されている製品ＩＤなどを利用することで取得することができる。
そして、これら識別情報を利用することでクライアント端末４を識別することができるので、クライアント端末４（ユーザ）単位での閲覧情報の管理、分析を行うことができる。なお、これらＩＤ情報の保持は、例えば、端末側情報取得装置７自身が保持する方法や、ブラウザが有するｃｏｏｋｉｅ機能を利用する方法等、適宜な方法で行われる。 The user ID indicates information for identifying the user, and includes, for example, an ID when logging in to the client terminal 4 and an ID assigned in units of services or software. Similarly, the browser ID is information for identifying the browser. For example, such information can be acquired by using a product ID recorded in a browser program or a toolbar program in advance.
Since the client terminal 4 can be identified by using these identification information, the browsing information can be managed and analyzed in units of the client terminal 4 (user). The ID information is held by an appropriate method such as a method held by the terminal-side information acquisition device 7 itself or a method using a cookie function of the browser.

閲覧開始時刻とは、ブラウザにてＷｅｂ上のＷｅｂページにアクセスした時刻、あるいはそのアクセスによってブラウザにＷｅｂページが取得された時刻や、表示された時刻などをいう。具体的に説明すると、当該時間は、例えば、ブラウザにてリクエストを行ったリソースのロード完了を検出した時刻や、ＷｅｂサーバがＨＴＴＰステータスコード（例えば、「２００：ＯＫ」と記述されるコード）等の信号を応答した時刻等が挙げられる。 The browsing start time refers to the time when a web page is accessed on the web by the browser, the time when the web page is acquired by the browser by the access, the time when the web page is displayed, or the like. Specifically, the time is, for example, the time when the loading completion of the resource requested by the browser is detected, the HTTP status code of the Web server (for example, a code described as “200: OK”), etc. For example, the time when the signal is responded.

閲覧終了時刻とは、ブラウザにてＷｅｂページの表示を終了した時刻をいい、例えば、ブラウザのＵＩウィンドウがクローズした時刻や、当該ウィンドウがアクティブでなくなった時刻、ウィンドウ領域内からマウスポインタが検出されなくなった時刻（マウスポインタがウィンドウ領域外へ移動した時刻）等が挙げられる。また、一定時間の操作がないためクライアント端末４のＯＳが自動的にログオフ処理を行ったり、スクリーンセーバ等が起動したりした場合には、それらが行われた時刻を閲覧終了時刻としてもよい。 The browsing end time refers to the time at which the display of the Web page is ended in the browser. For example, the time when the browser UI window is closed, the time when the window becomes inactive, and the mouse pointer is detected in the window area. The time when it disappears (the time when the mouse pointer moves out of the window area), and the like. In addition, since there is no operation for a certain period of time, when the OS of the client terminal 4 automatically performs a logoff process, or when a screen saver or the like is activated, the time when these operations are performed may be set as the browsing end time.

移行情報とは、２以上のＷｅｂページを連続で閲覧した際の「移動元Ｗｅｂページ」と「移動先Ｗｅｂページ」を示す情報のことであり、例えば、ハイパーリンクのクリックによる移行（遷移）の他、ブラウザやツールバーのＵＲＬ入力欄に連続で入力された２以上のＵＲＬを「移動先」、「移動元」とする移行情報などが挙げられる。 The migration information is information indicating the “movement source web page” and the “movement destination web page” when two or more web pages are browsed continuously. For example, the migration information is a transition (transition) by clicking a hyperlink. Other examples include migration information in which two or more URLs successively input in the URL input field of a browser or a toolbar are “move destination” and “move source”.

端末情報送信部１１は、取得した端末情報を所定の検索装置２に送信する機能を有する。具体的には、当該端末情報を、クライアント端末の通信Ｉ／Ｆを使用し通信回線等を介して、所定の検索装置２（サーバ装置）に自動送信する機能を有する。
なお、端末情報の送信先となる所定の検索装置２の送信先アドレスは、例えば端末側情報取得装置７のプログラム内に予め記述されることで特定する方法などが挙げられる。
また、この端末情報送信部１１から端末情報が送信されるタイミングは、適宜のタイミングに設定可能である。例えば、ブラウザにて表示されているＷｅｂページが切換わるごとに送信するといったタイミングや、一定の時間、又は時刻が到来した際にバッチ処理で送信するタイミング、あるいはツールバー起動時または終了時に送信するタイミングなどが挙げられる。 The terminal information transmission unit 11 has a function of transmitting the acquired terminal information to a predetermined search device 2. Specifically, it has a function of automatically transmitting the terminal information to a predetermined search device 2 (server device) via a communication line or the like using the communication I / F of the client terminal.
Note that, for example, a method of specifying the transmission destination address of the predetermined search device 2 as the transmission destination of the terminal information by being described in advance in the program of the terminal side information acquisition device 7 may be used.
The timing at which the terminal information is transmitted from the terminal information transmission unit 11 can be set to an appropriate timing. For example, the timing of sending each time the Web page displayed on the browser is switched, the timing of sending in batch processing when a certain time or time arrives, or the timing of sending when the toolbar is started or ended Etc.

このように、各クライアント端末４でのブラウザ５を介したＷｅｂページの出力（表示）に対する端末情報が、端末側情報取得装置７によって取得され、検索装置２に送信される。そして、検索装置２で当該端末情報が収集されることにより、クライアント端末４で閲覧している、又は閲覧されたＷｅｂページが特定できる。つまり、各クライアント端末４のブラウジング（閲覧）履歴を容易に把握することができる、 In this way, terminal information for the output (display) of the Web page via the browser 5 at each client terminal 4 is acquired by the terminal-side information acquisition device 7 and transmitted to the search device 2. Then, by collecting the terminal information by the search device 2, it is possible to specify the Web page that is being browsed or browsed by the client terminal 4. In other words, the browsing history of each client terminal 4 can be easily grasped.

次に検索装置２の機能的構成を説明する。 Next, the functional configuration of the search device 2 will be described.

検索装置２は、サーバ装置であって、検索用データベース３、情報収集部１２、検索実行部１３、検索結果出力部１５を有している。 The search device 2 is a server device, and includes a search database 3, an information collection unit 12, a search execution unit 13, and a search result output unit 15.

検索用データベース３は、クライアント端末４から送信される端末情報を受信、管理、蓄積する機能を有するデータベースである。 The search database 3 is a database having a function of receiving, managing, and storing terminal information transmitted from the client terminal 4.

情報収集部１２は、上記したように、ＷＷＷ上を巡回し、Ｗｅｂページの構造や内容に関する情報を取得し、取得した内容に基づいてインデックスファイルを作成することができる。また、作成したインデックスファイルを検索用データベース３に格納、保存することができる。 As described above, the information collection unit 12 can circulate on the WWW, acquire information on the structure and content of the Web page, and create an index file based on the acquired content. The created index file can be stored and saved in the search database 3.

検索実行部１３は、いずれかのクライアント端末４で検索が実行された時に、ユーザからの検索キーワード等に基づいて、検索用データベース３に格納された情報を参照し、検索を実行する。具体的には、検索用データベース３を参照することにより、検索対象となるＷｅｂページを特定し、加えて特定した各Ｗｅｂページを出力している端末数を特定する。そして、詳しくは後述するが、これらの情報に基づいて形成されるランク値演算用行列の固有値を算出し、算出した固有値を検索対象となる各Ｗｅｂページのランク値（重要度の値）とする。またさらに、取得したランク値を用いて検索結果となるＷｅｂの並び替えを実行する。 When a search is executed in any one of the client terminals 4, the search execution unit 13 executes a search by referring to information stored in the search database 3 based on a search keyword from the user. Specifically, by referring to the search database 3, a Web page to be searched is specified, and in addition, the number of terminals outputting each specified Web page is specified. As will be described in detail later, eigenvalues of the rank value calculation matrix formed based on these pieces of information are calculated, and the calculated eigenvalues are used as rank values (importance values) of each Web page to be searched. . Still further, the Web sorting that is the search result is executed using the acquired rank value.

検索結果出力部１５は、検索用データベース３から抽出し、並び替えを実行した検索結果をクライアント端末４のブラウザ５に表示する機能を有する。具体的には、クライアント端末の送信先アドレスを適宜の方法で取得する。ここで、適宜の方法とは、検索クエリと同時に受信する方法や、ユーザＩＤとあらかじめ登録しておいたＩＰアドレスを照合する方法等がある。そして、端末側情報取得装置７を介して、特定したクライアント端末４のブラウザ５上で検索結果画面を表示する。 The search result output unit 15 has a function of displaying the search results extracted from the search database 3 and rearranged on the browser 5 of the client terminal 4. Specifically, the transmission destination address of the client terminal is acquired by an appropriate method. Here, as an appropriate method, there are a method of receiving simultaneously with a search query, a method of collating a user ID with a pre-registered IP address, and the like. Then, a search result screen is displayed on the browser 5 of the identified client terminal 4 via the terminal-side information acquisition device 7.

次に、本実施形態における、情報検索システム１のハードウェア的構成について説明する。
図３は上記した機能的な各構成要件を、ハードウェアとして実現した際の、情報検索システム１における構成の一例を表わす概略図である。この図を利用して、本実施形態の検索システム１におけるそれぞれのハードウェア構成の働きについて説明する。図３にあるように、本実施形態の情報検索システム１のハードウェア構成は、クライアント端末４に組み込まれた端末側情報取得装置７と、サーバ装置としてネットワーク上に配置されている検索装置２により構成される。 Next, a hardware configuration of the information search system 1 in the present embodiment will be described.
FIG. 3 is a schematic diagram showing an example of a configuration in the information search system 1 when the above functional components are realized as hardware. The operation of each hardware configuration in the search system 1 of the present embodiment will be described using this figure. As shown in FIG. 3, the hardware configuration of the information search system 1 according to the present embodiment includes a terminal-side information acquisition device 7 incorporated in the client terminal 4 and a search device 2 arranged on the network as a server device. Composed.

端末側情報取得装置７と検索装置２は電気通信回線を介して相互に接続され、情報の送受信を行う。なお、電気通信回線はインターネットを含む。 The terminal-side information acquisition device 7 and the search device 2 are connected to each other via an electric communication line to transmit and receive information. The telecommunication line includes the Internet.

端末側情報取得装置７においては、端末情報取得部１０を実現し、またその他各種演算処理を行うＣＰＵ（中央演算装置）とＲＡＭや、端末情報送信部１１である通信Ｉ／Ｆを備える。またキーボード、マウス等の入力装置であるＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）や、ブラウザプログラムにて処理されたリソースを表示するためのＶＲＡＭや、ディスプレイなどの表示装置も備える。そしてそれらがシステムバスなどのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 The terminal-side information acquisition device 7 includes a CPU (Central Processing Unit) and a RAM that realize the terminal information acquisition unit 10 and performs various other arithmetic processes, and a communication I / F that is a terminal information transmission unit 11. Also, a UI (User Interface) that is an input device such as a keyboard and a mouse, a VRAM for displaying resources processed by the browser program, and a display device such as a display are provided. Then, they are connected to each other by a data communication path such as a system bus, and perform transmission / reception and processing of information.

端末側情報取得装置７のＲＡＭ上には、ブラウザプログラムと、プラグインプログラムとが格納されており、これらプログラムに従い端末情報の取得や送信処理やその他処理に係るＣＰＵの各種演算処理が実行される。また上記プラグインプログラムによって、ＲＡＭ上の所定のアドレスには閲覧ＵＲＬやユーザＩＤ等の端末情報を格納する領域が確保されている。 A browser program and a plug-in program are stored on the RAM of the terminal-side information acquisition device 7, and various arithmetic processes of the CPU related to acquisition of terminal information, transmission processing, and other processes are executed in accordance with these programs. . The plug-in program secures an area for storing terminal information such as a browsing URL and a user ID at a predetermined address on the RAM.

一方、検索装置２においては、各種演算処理を行うＣＰＵと、ＲＡＭと、通信Ｉ／Ｆと、大量の端末情報等を蓄積するためのハードディスクなどの外部記憶装置とを有している。そしてそれらがシステムバスなどのデータ通信経路によって相互に接続され、情報の送受信や処理を行う。 On the other hand, the search device 2 includes a CPU that performs various arithmetic processes, a RAM, a communication I / F, and an external storage device such as a hard disk for storing a large amount of terminal information. Then, they are connected to each other by a data communication path such as a system bus, and perform transmission / reception and processing of information.

また検索装置２のＲＡＭ上には、検索実行部や検索結果出力部の各機能を実現するプログラムやデータベース管理プログラム（図示せず）等の情報検索システム１の機能を実現する各プログラムが格納されており、当該プログラムに従い端末情報の管理処理やその他処理に係るＣＰＵの各種演算処理が実行される。 The RAM of the search device 2 stores programs for realizing the functions of the information search system 1 such as a program for realizing the functions of the search execution unit and the search result output unit and a database management program (not shown). In accordance with the program, various arithmetic processes of the CPU related to the terminal information management process and other processes are executed.

ここで、端末側情報取得装置７では、ユーザのＵＩを介した操作入力を受け付け、ＲＡＭ上のブラウザプログラムおよびプラグインプログラムに従い以下のような処理が実行される。 Here, the terminal-side information acquisition device 7 receives an operation input via the user's UI, and the following processing is executed according to the browser program and the plug-in program on the RAM.

例えば、ＵＩを介したブラウザ操作によってＷｅｂページへのアクセス指示が入力されると、そのアクセス指示で指定されたＷｅｂページのＵＲＬがＲＡＭの所定アドレスに格納される。すると、ブラウザプログラムに従い指定されたＵＲＬに対して、ＨＴＴＰリクエストがクライアント端末４から検索装置２へ送信される。そして、そのＨＴＴＰリクエストに対するレスポンスをアクセスリソースのコンテンツとして通信Ｉ／Ｆにて該クライアント端末４が受信する。続いてクライアント端末４が受信したコンテンツに含まれるＨＴＭＬファイル、イメージファイルなどを、ブラウザ５が有するＨＴＭＬレンダリングエンジンなどに従ったＣＰＵの演算処理よってレンダリング処理する。そしてその処理結果がＶＲＡＭに転送され、表示装置上にＷｅｂコンテンツが表示される。それから、ユーザは表示された当該ＷＷＷ上のＷｅｂページなどのリソースを閲覧する。 For example, when an instruction to access a Web page is input by a browser operation via the UI, the URL of the Web page specified by the access instruction is stored at a predetermined address in the RAM. Then, an HTTP request is transmitted from the client terminal 4 to the search device 2 for the URL specified according to the browser program. The client terminal 4 receives the response to the HTTP request as the content of the access resource through the communication I / F. Subsequently, an HTML file, an image file, and the like included in the content received by the client terminal 4 are subjected to a rendering process by a CPU arithmetic process according to an HTML rendering engine or the like included in the browser 5. The processing result is transferred to the VRAM, and the Web content is displayed on the display device. Then, the user browses resources such as the displayed Web page on the WWW.

また、それとともに前記ブラウザでのアクセス指示入力に応じてプラグインプログラムは次の処理を実行する。即ち、ＲＡＭの所定アドレスに格納されたＵＲＬを端末情報として、前述のような所定の各タイミングで通信Ｉ／Ｆから検索装置２に自動送信する、という具合である。 At the same time, the plug-in program executes the following process in response to the access instruction input from the browser. That is, the URL stored at a predetermined address in the RAM is automatically transmitted from the communication I / F to the search device 2 at the predetermined timings as described above as terminal information.

これに対して、検索装置２では、通信Ｉ／Ｆにて各クライアント端末４の端末側情報取得装置７から送信されてきた端末情報を受信し、ＲＡＭの所定アドレス等に格納する。そして、検索装置２のプログラムにしたがって、例えば、端末情報に含まれるＵＲＬやユーザＩＤなどをパラメータとして、ＣＰＵの演算処理によって検索用データベース３のファイルを作成又は、作成されている検索用データベース３のファイルに追記し、ＨＤＤ等の外部記憶装置に格納する、という具合である。 On the other hand, the search device 2 receives the terminal information transmitted from the terminal-side information acquisition device 7 of each client terminal 4 through the communication I / F and stores it in a predetermined address of the RAM. Then, according to the program of the search device 2, for example, the URL of the terminal information or the user ID is used as a parameter to create a file of the search database 3 by the arithmetic processing of the CPU or For example, it is added to a file and stored in an external storage device such as an HDD.

また、ＵＩによるブラウザ操作によって、検索エンジン（検索用Ｗｅｂページ）にアクセスし、検索クエリの入力、送信を行なった場合や、ツールバーに検索クエリの入力、送信を行った場合のように、クライアント端末４にて検索が実行された場合について説明する。その際、検索装置２は通信Ｉ／Ｆにて該検索クエリを含むＨＴＴＰリクエストを受信し、ＲＡＭの所定アドレスに格納する。すると、検索装置２では検索サーバプログラムに従い以下の処理を実行する。即ち、ＲＡＭに格納されている検索クエリをキーとして、外部記憶装置に保存されている、インデックスファイルデータベースの中から、検索クエリの条件を満たすＷｅｂページを抽出する。続いて、ＣＰＵの比較演算処理によって、抽出されたＷｅｂページの情報と外部記憶装置に保存されている端末情報（端末側情報取得装置７から送信された端末情報）とを比較演算することにより、抽出されたＷｅｂページを出力している端末を特定する。そして、詳しくは後述するが、抽出されたＷｅｂページのリンク構造と、抽出されたＷｅｂページが出力された端末の情報からランク値演算用行列を設定する。さらにまた、設定されたランク値演算用行列の固有値を規定のアルゴリズムに基づいて算出することにより、抽出された各Ｗｅｂページのランク値を取得し、ＣＰＵの比較演算処理によって、抽出されたＷｅｂページをランク値の高い順に並び替える。加えて、検索装置２は、Ｗｅｂページを並べて箇条形式とした検索結果をＣＰＵの演算処理によって生成し、通信Ｉ／Ｆにより検索クエリの送信元のクライアント端末４のブラウザ５に対して送信する、と言う具合である。 In addition, when a search engine (Web page for search) is accessed by a browser operation using a UI and a search query is input and transmitted, or when a search query is input and transmitted to a toolbar, the client terminal A case where the search is executed at 4 will be described. At that time, the search device 2 receives an HTTP request including the search query through the communication I / F and stores it at a predetermined address in the RAM. Then, the search device 2 executes the following processing according to the search server program. In other words, Web pages that satisfy the search query condition are extracted from the index file database stored in the external storage device using the search query stored in the RAM as a key. Subsequently, by comparing and calculating the extracted Web page information and the terminal information stored in the external storage device (terminal information transmitted from the terminal-side information acquisition device 7) by the CPU comparison calculation process, The terminal that outputs the extracted Web page is specified. As will be described in detail later, a rank value calculation matrix is set from the link structure of the extracted Web page and the information of the terminal from which the extracted Web page is output. Furthermore, the rank value of each extracted Web page is obtained by calculating the eigenvalue of the set rank value calculation matrix based on a prescribed algorithm, and the extracted Web page is obtained by the comparison operation processing of the CPU. Are sorted in descending order of rank. In addition, the search device 2 generates a search result in which the Web pages are arranged in the form of an item by calculation processing of the CPU, and transmits the search result to the browser 5 of the client terminal 4 that is the transmission source of the search query by the communication I / F. That is how it is said.

次に、本実施形態の情報検索システム１の検索実行時の処理の流れについて説明する。
図４は、本実施形態の情報検索システム１の検索実行時の処理の流れの一例を表すフローチャートである。なお、以下に示すステップは、媒体に記録され、計算機を制御するためのプログラムを構成する処理ステップであっても構わない。 Next, the flow of processing at the time of search execution of the information search system 1 of the present embodiment will be described.
FIG. 4 is a flowchart showing an example of the flow of processing at the time of executing the search of the information search system 1 of the present embodiment. Note that the steps shown below may be processing steps that are recorded on a medium and constitute a program for controlling a computer.

いずれかのクライアント端末４で検索が実行され（ステップ１０１でＹｅｓの場合）、クライアント端末４から検索装置２に検索クエリが送信されると、検索クエリを受信した検索装置２が検索用データベース３から検索クエリの条件を満たすＷｅｂページを抽出する（ステップ１０２）。即ち、検索クエリをキーワードにして該当するインデックスファイルを検索する。 When a search is executed in any one of the client terminals 4 (Yes in Step 101) and a search query is transmitted from the client terminal 4 to the search device 2, the search device 2 that has received the search query receives a search query from the search database 3. Web pages that satisfy the search query conditions are extracted (step 102). That is, the corresponding index file is searched using the search query as a keyword.

さらに検索装置２は、抽出したＷｅｂページのデータと、検索用データベース３に格納された端末情報とを比較することにより、抽出したＷｅｂページを出力している端末を特定する（ステップ１０３）。即ち、各クライアント端末４から送信された、ブラウザ５を介したＷｅｂページの出力、又は過去の出力に係る情報を参照することにより、抽出したＷｅｂページを出力している端末、又は出力した端末を特定する。 Further, the search device 2 compares the extracted web page data with the terminal information stored in the search database 3 to identify the terminal that outputs the extracted web page (step 103). That is, by referring to the information relating to the output of the Web page transmitted from each client terminal 4 via the browser 5 or the past output, the terminal that has output the extracted Web page or the terminal that has output the Identify.

次に、検索装置２は、検索クエリの条件を満たすＷｅｂページに係る情報（ステップ１０２で取得した情報）と、抽出したＷｅｂページを出力している端末に係る情報（ステップ１０３で取得した情報）からランク値演算用行列を設定する（ステップ１０４）。本発明の特徴的構成であるところのランク値演算用行列の設定処理について、図５，６を参照しつつ以下で詳細に説明する。 Next, the search device 2 includes information related to the Web page that satisfies the search query condition (information acquired in Step 102) and information related to the terminal that outputs the extracted Web page (information acquired in Step 103). A rank value calculation matrix is set from (step 104). The rank value calculation matrix setting process, which is a characteristic configuration of the present invention, will be described in detail below with reference to FIGS.

ランク値演算用行列は、ネットワーク上で公開された複数のＷｅｂページとユーザからなるリンク構造に対応して設定するものであり、その各「列」は、「列」に対応するＷｅｂページから他のＷｅｂページへのリンクを示すものとなっている。即ち、「列」に属する各要素がそれぞれ別箇の他のＷｅｂページへのリンクを示している。さらにまた、その各「行」は、他のＷｅｂページから「行」に対応するＷｅｂページへのリンクを示すものとなっている。即ち、「行」に属する各要素がそれぞれ別箇の他のＷｅｂページから「行」に対応するＷｅｂページへのリンクを示している。
具体的に説明すると、例えば、図６で示されるようなランク値演算用行列Ｐであれば、その行列Ｐの各要素は、行列の上部に示したＷｅｂページＡ，Ｂ，Ｃ，Ｄ及びユーザ（Ｗｅｂページの閲覧者）Ｅ，Ｆ，Ｇ，Ｈから、行列の左部に表示したＷｅｂページＡ，Ｂ，Ｃ，Ｄ及びユーザＥ，Ｆ，Ｇ，Ｈに対するリンク又はユーザによるＷｅｂページの閲覧の状況を示す数値である。
即ち、図６に示す正方行列（ランク値演算用行列Ｐ）は、検索対象となるＷｅｂページがリンク先となっているＷｅｂページと、検索対象となるＷｅｂページがリンクしているＷｅｂページに対して現に、あるいは一定期間内に接続された端末との関連を示すものである。 The rank value calculation matrix is set corresponding to a link structure made up of a plurality of Web pages published on the network and users, and each “column” is changed from the Web page corresponding to the “column”. This indicates a link to the Web page. That is, each element belonging to the “column” indicates a link to another Web page. Furthermore, each “line” indicates a link from another Web page to a Web page corresponding to the “line”. That is, each element belonging to “row” indicates a link from another web page to a web page corresponding to “row”.
More specifically, for example, in the case of a rank value calculation matrix P as shown in FIG. 6, each element of the matrix P includes the Web pages A, B, C, D and the user shown at the top of the matrix. (Web page viewer) From E, F, G, H, Web pages A, B, C, D displayed on the left part of the matrix and links to users E, F, G, H, or Web page browsing by users It is a numerical value indicating the situation of
That is, the square matrix (rank value calculation matrix P) shown in FIG. 6 is for the Web page to which the Web page to be searched is linked and the Web page to which the Web page to be searched is linked. It shows the relationship with the terminal that is actually connected within a certain period.

ここで、ランク値演算用行列についてモデルを用いて具体的に説明する。なお、本発明のランク値演算用行列がモデルのものに限らないことは当然である。
図５で示されるように、４つのＷｅｂページＡ，Ｂ，Ｃ，ＤがＷｅｂサーバ上に保持され、インターネットやＬＡＮ等のネットワークに接続されたコンピュータでブラウザを介して閲覧可能となっているものとする。即ち、４つのＷｅｂページＡ，Ｂ，Ｃ，Ｄがネットワーク上に公開された状態となっているものとする。ここで、１つめのＷｅｂページＡは、ＷｅｂページＢ，ＷｅｂページＣにそれぞれリンクを張った状態となっている。
ここで、「リンクを張った状態」とは、リンク元となる特定のＷｅｂページにリンク先となる他のＷｅｂページのＵＲＬが関連付けられており、例えば、リンク元のＷｅｂページ上の所定の部分をクリックするといった所定の操作によって、リンク先のＷｅｂページがブラウザで表示される構造となっている状態とする。 Here, the rank value calculation matrix will be specifically described using a model. Of course, the rank value calculation matrix of the present invention is not limited to the model.
As shown in FIG. 5, four Web pages A, B, C, and D are held on a Web server and can be browsed via a browser on a computer connected to a network such as the Internet or a LAN. And That is, it is assumed that four Web pages A, B, C, and D are open on the network. Here, the first Web page A is in a state of being linked to Web page B and Web page C.
Here, the “linked state” means that the URL of another Web page that is the link destination is associated with the specific Web page that is the link source. For example, a predetermined part on the Web page that is the link source By clicking a predetermined operation such as clicking, the linked Web page is displayed on the browser.

さらに、２つめのＷｅｂページＢは、ＷｅｂページＤにリンクを張った状態となっている。３つめのＷｅｂページＣは、ＷｅｂページＤにリンクを張った状態となっている。４つめのＷｅｂページＤは、ＷｅｂページＡ，ＷｅｂページＣにそれぞれリンクを張った状態となっている。 Further, the second Web page B is in a state where a link is made to the Web page D. The third Web page C is in a state of being linked to the Web page D. The fourth Web page D is in a state where links are made to Web page A and Web page C, respectively.

換言すると、１つ目のＷｅｂページＡは、４つ目のＷｅｂページＤにリンクを張られた状態となっている。
ここで、「リンクを張られた状態」とは、リンク元となる他のＷｅｂページが、リンク先となる特定のＷｅｂページにリンクを張った状態であることとする。即ち、他のＷｅｂページに特定のＷｅｂページのＵＲＬが関連付けられており、他のＷｅｂページで所定の操作を行うことによって、特定のＷｅｂページがブラウザで表示される構造となっている状態とする。 In other words, the first Web page A is linked to the fourth Web page D.
Here, the “linked state” refers to a state in which another Web page that is a link source links to a specific Web page that is a link destination. That is, a URL of a specific Web page is associated with another Web page, and a specific operation is performed on the other Web page, so that the specific Web page is displayed on the browser. .

同様に、２つめのＷｅｂページＢは、ＷｅｂページＡにリンクを張られた状態となっている。３つめのＷｅｂページＣは、ＷｅｂページＡとＷｅｂページＤにそれぞれリンクを張られた状態となっている。４つめのＷｅｂページＤは、ＷｅｂページＢ，ＷｅｂページＣにそれぞれリンクを張られた状態となっている。 Similarly, the second web page B is linked to the web page A. The third Web page C is in a state in which links are made to Web page A and Web page D, respectively. The fourth Web page D is in a state in which links are made to Web page B and Web page C, respectively.

ここで、ＷｅｂページＡをユーザＥが閲覧しているものとする。即ち、閲覧者Ｅが使用しているクライアント端末４ａのブラウザ５が、インターネットやＬＡＮ等のネットワークを介してＷｅｂページＡを保持しているＷｅｂサーバからデータを取得し、ＷｅｂページＡを閲覧可能な状態に出力しているものとする。 Here, it is assumed that the user E is browsing the Web page A. That is, the browser 5 of the client terminal 4a used by the viewer E can acquire data from a Web server holding the Web page A via a network such as the Internet or a LAN, and can browse the Web page A. Assume that the status is output.

また同様に、ＷｅｂページＢをユーザＦ，Ｇ，Ｈがそれぞれ閲覧しているものとする。即ち、クライアント端末４ｂ，４ｃ，４ｄの各ブラウザ５が、ＷｅｂページＢを閲覧可能な状態に出力しているものとする。 Similarly, it is assumed that the users F, G, and H are browsing the web page B, respectively. That is, it is assumed that the browsers 5 of the client terminals 4b, 4c, and 4d output the Web page B in a state where it can be browsed.

以上説明した状況において、いずれかのクライアント端末４で検索が実行され、ＷｅｂページＡ，Ｂ，Ｃ，Ｄがいずれも検索クエリの条件を満たすＷｅｂページであったとする。このとき、検索装置２は、ＷｅｂページＡ，Ｂ，Ｃ，Ｄを出力しているクライアント端末４ａ，４ｂ，４ｃ，４ｄを特定し、ユーザＥ，Ｆ，Ｇ，ＨがＷｅｂページＡ，Ｂ，Ｃ，Ｄを閲覧しているものとして、図６で示されるようなランク値演算用行列Ｐを設定する。 In the situation described above, it is assumed that a search is executed in any one of the client terminals 4 and the Web pages A, B, C, and D are all Web pages that satisfy the search query condition. At this time, the search device 2 identifies the client terminals 4a, 4b, 4c, and 4d that output the Web pages A, B, C, and D, and the users E, F, G, and H specify the Web pages A, B, Assuming that C and D are viewed, a rank value calculation matrix P as shown in FIG. 6 is set.

ランク値演算用行列Ｐの各「列」要素に注目すると、左側部分ＲＬはＷｅｂページＡ，Ｂ，Ｃ，Ｄに関する要素となっており、右側部分ＲＲはユーザＥ，Ｆ，Ｇ，Ｈに関する要素となっている。即ち、複数のＷｅｂページＡ，Ｂ，Ｃ，Ｄに関する要素から構成される要素群と、複数のユーザＥ，Ｆ，Ｇ，Ｈに関する要素から構成される要素群とが隣合わせに配されている。 When attention is paid to each “column” element of the rank value calculation matrix P, the left part RL is an element relating to the Web pages A, B, C, and D, and the right part RR is an element relating to the users E, F, G, and H. It has become. That is, an element group composed of elements related to a plurality of Web pages A, B, C, D and an element group composed of elements related to a plurality of users E, F, G, H are arranged next to each other.

このように本実施形態の検索装置２は、ユーザＥ，Ｆ，Ｇ，Ｈを仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈとしてランク値演算用行列Ｐを設定している。即ち、「特定のユーザによる特定のＷｅｂページの閲覧」を「仮想のＷｅｂページと特定のＷｅｂページの間で互いにリンクが張られている」状態としてランク値演算用行列Ｐに反映させている。換言すると、図６に示す正方行列では、ＷｅｂページＡ，Ｂ，Ｃ，Ｄと、その閲覧者Ｅ，Ｆ，Ｇ，Ｈを同格に扱い、行列の各要素に反映させている。 Thus, the search device 2 of this embodiment sets the rank value calculation matrix P with the users E, F, G, and H as virtual Web pages E, F, G, and H. That is, “browsing a specific web page by a specific user” is reflected in the rank value calculation matrix P as a state where “a virtual web page and a specific web page are linked to each other”. In other words, in the square matrix shown in FIG. 6, the Web pages A, B, C, and D and the viewers E, F, G, and H are treated equally and are reflected in each element of the matrix.

詳説すると、上記したようにＷｅｂページＡをユーザＥが閲覧しているとする（図５参照）。このとき、検索装置２は、ＷｅｂページＡ，Ｂ，Ｃ，Ｄからなるリンク構造に、実際には存在しない仮想のＷｅｂページＥがあるものとし、ＷｅｂページＡ，Ｂ，Ｃ，Ｄ，Ｅからなるリンク構造であるものとする。そして、ＷｅｂページＡ，Ｂ，Ｃ，Ｄに対応する「行」及び「列」に加えて、仮想ＷｅｂページＥに対応する新たな「行」と「列」とを追加してランク値演算用行列Ｐを設定する。換言すると、ユーザＥが使用するクライアント端末４ａでＷｅｂページＡが出力されたことを条件に、ランク値演算用行列Ｐに新たな「行」と「列」とを追加する。 More specifically, it is assumed that the user E is browsing the web page A as described above (see FIG. 5). At this time, the search device 2 assumes that there is a virtual web page E that does not actually exist in the link structure composed of the web pages A, B, C, and D, and from the web pages A, B, C, D, and E. The link structure is as follows. Then, in addition to “rows” and “columns” corresponding to the Web pages A, B, C, and D, new “rows” and “columns” corresponding to the virtual Web page E are added to calculate rank values. Set the matrix P. In other words, new “rows” and “columns” are added to the rank value calculation matrix P on the condition that the web page A is output from the client terminal 4a used by the user E.

したがって、上記のように４人のユーザＥ，Ｆ，Ｇ，Ｈが、それぞれ検索対象となるＷｅｂページＡ，Ｂ，Ｃ，Ｄのいずれかを閲覧している場合、ＷｅｂページＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈからなるリンク構造であるものとして、仮想ＷｅｂページＥ，Ｆ，Ｇ，Ｈに対応する新たな「行」と「列」を４つずつ追加してランク値演算用行列Ｐを設定する。即ち、検索対象となるＷｅｂページの数と、検索対象となるＷｅｂページを出力している端末の数との合計がランク値演算用行列Ｐの行数及び列数となる。 Therefore, when the four users E, F, G, and H are browsing any of the Web pages A, B, C, and D to be searched as described above, the Web pages A, B, C , D, E, F, G, H, and rank value by adding four new “rows” and “columns” corresponding to virtual web pages E, F, G, H. A calculation matrix P is set. That is, the sum of the number of Web pages to be searched and the number of terminals outputting the Web pages to be searched is the number of rows and columns of the rank value calculation matrix P.

ここで、ランク値演算用行列Ｐの各「列」は、上記した検索対象となるＷｅｂページ、又は検索対象となるＷｅｂページの出力を条件にして定義される仮想のＷｅｂページに対応している。そして、「列」に属する各要素は、対応するＷｅｂページ又は仮想のＷｅｂページ（以下単にＷｅｂページとも称す）から、他のページへのリンクを示すものとなっている。具体的には、「列」に属する要素Ｐ（ｉ，ｊ）は、下記式（１）：
Ｐ（ｉ，ｊ）＝ αＸ／Ｙ・・・（１）
α ＝０以外の数
Ｘ＝列「ｊ」に対応するＷｅｂページから行「ｉ」に対応するＷｅｂページへのリンクの数、又は、列「ｊ」に対応するＷｅｂページの行「ｉ」に対応する出力端末での表示数、又は、列「ｊ」に対応する出力端末での行「ｉ」に対応するＷｅｂページの表示数、又は０
Ｙ＝列「ｊ」に対応するＷｅｂページの出力端末での表示数と、列「ｊ」に対応するＷｅｂページに形成されているリンク数の総和

の関係を満たすものである。
即ち、Ｗｅｂページがリンクしているリンク先の数に基づく値をＡとし、端末が現にあるいは一定期間内に接続したＷｅｂページの数をＢとしたとき、ランク値演算用行列Ｐの各要素は、Ａ／（Ａ＋Ｂ）、Ｂ／（Ａ＋Ｂ）あるいはこれらに系数αが掛けられた値となっている。 Here, each “column” of the rank value calculation matrix P corresponds to the above-described Web page to be searched or a virtual Web page defined on the condition that the Web page to be searched is output. . Each element belonging to the “column” indicates a link from the corresponding Web page or virtual Web page (hereinafter also simply referred to as a Web page) to another page. Specifically, the element P (i, j) belonging to the “column” is expressed by the following formula (1):
P (i, j) = αX / Y (1)
α = a number other than 0
X = number of links from the web page corresponding to column “j” to the web page corresponding to row “i”, or at the output terminal corresponding to row “i” of the web page corresponding to column “j” Number of displays, or number of web pages displayed corresponding to row “i” on the output terminal corresponding to column “j”, or 0
Y = sum of the number of Web pages displayed on the output terminal corresponding to the column “j” and the number of links formed on the Web page corresponding to the column “j”

It satisfies the relationship.
That is, when the value based on the number of link destinations to which the Web page is linked is A, and the number of Web pages that the terminal is currently connected to within a certain period is B, each element of the rank value calculation matrix P is , A / (A + B), B / (A + B) or a value obtained by multiplying these by the system number α.

ここで、上記したように、本実施形態の検索装置２は、ユーザＥが使用するクライアント端末４ａでＷｅｂページＡが出力されていることを条件に、ＷｅｂページＡから仮想のＷｅｂページＥへリンクが張られ、仮想のＷｅｂページＥからＷｅｂページＡへリンクが張られているものとして、ランク値演算用行列Ｐを設定する。換言すると、ユーザＥによるＷｅｂページＡの閲覧を、実際には存在しないＷｅｂページＥとＷｅｂページＡの相互リンク（互いにリンクを張っている状態）として、ランク値演算用行列Ｐに反映させる。 Here, as described above, the search device 2 of the present embodiment links the Web page A to the virtual Web page E on condition that the Web page A is output from the client terminal 4a used by the user E. And rank value calculation matrix P is set assuming that a link is established from virtual web page E to web page A. In other words, the browsing of the web page A by the user E is reflected in the rank value calculation matrix P as a mutual link between the web page E and the web page A that does not actually exist (a state in which they are linked to each other).

つまり、図５で示されるように、特定のＷｅｂページＡから他の２つのＷｅｂページＢ，Ｃにそれぞれリンクが張られ、ユーザＥが前記特定のＷｅｂページＡを閲覧しているとする。このとき、検索装置２は、特定のＷｅｂページＡが２つのＷｅｂページＢ，Ｃに加えて、仮想のＷｅｂページＥにリンクを張っているとする。このとき、図６で示されるように、ランク値演算用行列Ｐの特定のＷｅｂページＡに対応する列（図６で最も左側の列）を形成する各要素Ｐ（ｉ，１）に注目すると、ＷｅｂページＢ，Ｃ及び、仮想のＷｅｂページＥへのリンクとそれぞれ対応する要素であるＰ（２，１），Ｐ（３，１），Ｐ（５，１）はいずれも１／３となっている。これらは、特定のＷｅｂページＡのリンクの総数が３であり、特定のＷｅｂページＡからＷｅｂページＢ，Ｃ及び、仮想のＷｅｂページＥにそれぞれ１つずつリンクが張られていることを示している。したがって、ランク値演算用行列Ｐの各「列」の要素の総和は１となっている。 That is, as shown in FIG. 5, it is assumed that a specific Web page A is linked to the other two Web pages B and C, and the user E is browsing the specific Web page A. At this time, it is assumed that the search device 2 links a specific Web page A to a virtual Web page E in addition to the two Web pages B and C. At this time, as shown in FIG. 6, when attention is paid to each element P (i, 1) forming a column (leftmost column in FIG. 6) corresponding to a specific Web page A of the rank value calculation matrix P. , P (2,1), P (3,1), and P (5,1) that are elements corresponding to links to the web pages B and C and the virtual web page E are all 1/3. It has become. These indicate that the total number of links of the specific Web page A is 3, and that one link is provided from the specific Web page A to each of the Web pages B and C and the virtual Web page E. Yes. Therefore, the sum of the elements of each “column” of the rank value calculation matrix P is 1.

また、ここでランク値演算用行列Ｐの各「行」要素に注目すると、上側部分ＬＨはＷｅｂページＡ，Ｂ，Ｃ，Ｄに関する要素となっており、下側部分ＬＬはユーザＥ，Ｆ，Ｇ，Ｈに関する要素となっている。即ち、複数のＷｅｂページＡ，Ｂ，Ｃ，Ｄに関する要素から構成される要素群と、複数のユーザＥ，Ｆ，Ｇ，Ｈに関する要素から構成される要素群とが上下に連続して配されている。 When attention is paid to each “row” element of the rank value calculation matrix P, the upper part LH is an element related to the Web pages A, B, C, and D, and the lower part LL is the user E, F, It is an element related to G and H. That is, an element group composed of elements related to a plurality of Web pages A, B, C, and D and an element group composed of elements related to a plurality of users E, F, G, H are continuously arranged in the vertical direction. ing.

以上のことから、図６で示されるように、ランク値演算用行列Ｐは、検索対象となるＷｅｂページＡ，Ｂ，Ｃ，Ｄと上記仮想のＷｅｂページＥ，Ｆ，Ｇ，ＨとからなるＷｅｂページ群において、各ＷｅｂページＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈを頂点とし、リンクを辿ることを遷移（推移）とするものであり、さらに、Ｗｅｂページ内の各リンクから一様な確率で１つのリンクを辿ることを想定する遷移確率行列（推移確率行列）となっているといえる。
なお、ここでいう「リンクを辿る」とは、リンク元となる特定のＷｅｂページにリンク先となる他のＷｅｂページのＵＲＬが関連付けられており、例えば、リンク元のＷｅｂページ上の所定の部分をクリックするといった所定の操作によって、リンク先のＷｅｂページがブラウザで表示されることとする。そしてさらに、実際のＷｅｂページと仮想のＷｅｂページの間に定義したリンクにおいても、実際のＷｅｂページ間のリンクと同様に取り扱うものとする。 From the above, as shown in FIG. 6, the rank value calculation matrix P includes the Web pages A, B, C, and D to be searched and the virtual Web pages E, F, G, and H. In the Web page group, each Web page A, B, C, D, E, F, G, H is a vertex, and tracing the link is a transition (transition), and each link in the Web page Therefore, it can be said that the transition probability matrix (transition probability matrix) is assumed to follow one link with a uniform probability.
Here, “follow a link” means that the URL of another Web page that is the link destination is associated with the specific Web page that is the link source. For example, a predetermined portion on the Web page that is the link source It is assumed that the linked Web page is displayed on the browser by a predetermined operation such as clicking. Further, a link defined between an actual web page and a virtual web page is handled in the same manner as a link between actual web pages.

このとき、ランク値演算用行列Ｐに注目すると、図６で示されるように、左上に位置するエリアＥ１に属する各要素は、実際に存在するＷｅｂページＡ，Ｂ，Ｃ，Ｄの間に形成されるリンクを示している。また、右上に位置するエリアＥ２に属する各要素は、仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈのいずれかから実際に存在するＷｅｂページＡ，Ｂ，Ｃ，Ｄのいずれかへのリンクを示している。さらに、左下に位置するエリアＥ３に属する各要素は、実際に存在するＷｅｂページＡ，Ｂ，Ｃ，Ｄのいずれかから、仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈのいずれかへのリンクを示している。そして、右下に位置するエリアＥ４は、仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈの間に形成される仮想のリンクを示している。 At this time, paying attention to the rank value calculation matrix P, as shown in FIG. 6, each element belonging to the area E1 located in the upper left is formed between the Web pages A, B, C, and D that actually exist. Shows the links to be. Each element belonging to the area E2 located in the upper right indicates a link from any one of the virtual Web pages E, F, G, and H to the actually existing Web pages A, B, C, and D. ing. Furthermore, each element belonging to the area E3 located at the lower left has a link from any of the existing Web pages A, B, C, and D to any of the virtual Web pages E, F, G, and H. Show. An area E4 located in the lower right indicates a virtual link formed between the virtual Web pages E, F, G, and H.

このようなランク値演算用行列Ｐが設定されると、図４で示されるように、ステップ１０４からステップ１０５へ移行して、検索装置２がランク値演算用行列Ｐの固有値を算出する。つまり、遷移確率行列の各頂点であるＷｅｂページＡ，Ｂ，Ｃ，Ｄ及び仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈのそれぞれにおける遷移確率を演算する。そして、検索装置２は、算出したランク値演算用行列Ｐの固有値に基づいて検索対象となるＷｅｂページを並び替える（ステップ１０６）。 When such a rank value calculation matrix P is set, as shown in FIG. 4, the process proceeds from step 104 to step 105, and the search device 2 calculates an eigenvalue of the rank value calculation matrix P. That is, the transition probabilities are calculated for each of the Web pages A, B, C, and D and the virtual Web pages E, F, G, and H that are the vertices of the transition probability matrix. Then, the search device 2 rearranges the Web pages to be searched based on the calculated eigenvalues of the rank value calculation matrix P (step 106).

詳説すると、例えば、図６で示されるランク値演算用行列Ｐでは、その固有値は下記式（２）：
〔Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈ〕＝〔０．４０８，０．５４４，０．４０８，０．５４４，０．１３６，０．１３６，０．１３６，０．１３６〕・・・（２）
の関係を満たしている。
したがって、図５で示されるＷｅｂページＡ，Ｂ，Ｃ，Ｄの固有値はそれぞれ０．４０８，０．５４４，０．４０８，０．５４４となる。ここで、このＷｅｂページＡ，Ｂ，Ｃ，Ｄを固有値の高い順に並べ替えるのであれば、ＷｅｂページＢ，ＷｅｂページＤ，ＷｅｂページＡ，ＷｅｂページＣの順となる。 More specifically, for example, in the rank value calculation matrix P shown in FIG. 6, the eigenvalue is expressed by the following equation (2):
[A, B, C, D, E, F, G, H] = [0.408, 0.544, 0.408, 0.544, 0.136, 0.136, 0.136, 0.136 ] ... (2)
Meet the relationship.
Therefore, the eigenvalues of Web pages A, B, C, and D shown in FIG. 5 are 0.408, 0.544, 0.408, and 0.544, respectively. Here, if the Web pages A, B, C, and D are rearranged in descending order of eigenvalues, the Web page B, Web page D, Web page A, and Web page C are arranged in this order.

そして検索装置２は、並び替えた検索対象となるＷｅｂページを検索結果としてクライアント端末４へ送信する（ステップ１０７）。
以上で情報検索システム１の検索実行時の処理の流れの説明を終了する。 The search device 2 transmits the sorted Web pages to be searched to the client terminal 4 as a search result (step 107).
This is the end of the description of the flow of processing when the information retrieval system 1 performs the retrieval.

上記した実施形態では、ランク値演算用行列Ｐを設定するとき、ユーザＥ，Ｆ，Ｇ，Ｈをいずれも画一的なユーザとして仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈを設定した。つまり、Ｗｅｂページを閲覧したユーザがいずれのユーザＥ，Ｆ，Ｇ，Ｈであっても、仮想のＷｅｂページと実在のＷｅｂページの間に定義される仮想のリンクの数は同一となっている。即ち、いずれのユーザであっても１つの仮想のＷｅｂページと１つの実在のＷｅｂページの間に１つの相互リンクが定義される構成となっている。 In the above-described embodiment, when setting the rank value calculation matrix P, the virtual web pages E, F, G, and H are set with the users E, F, G, and H as uniform users. That is, the number of virtual links defined between the virtual web page and the actual web page is the same regardless of which user E, F, G, or H is the user who viewed the web page. . In other words, any user has a configuration in which one mutual link is defined between one virtual web page and one real web page.

しかしながら、本発明の情報検索システムはこのような構成に限るものではない。即ち、各ユーザＥ，Ｆ，Ｇ，Ｈの間で特定項目に対する知識の深さや、検索スキル等が異なっている場合、ランク値演算用行列の設定に閲覧しているユーザＥ，Ｆ，Ｇ，Ｈの知識やスキルの違いを反映させる構成であってもよい。さらにまた、ランク値演算用行列Ｐに、検索キーワード等の出現数といったＷｅｂページＡ，Ｂ，Ｃ，Ｄの内容を反映させる構成であってもよい。 However, the information search system of the present invention is not limited to such a configuration. That is, when the depth of knowledge for a specific item, search skill, etc. are different among the users E, F, G, H, the users E, F, G, It may be configured to reflect differences in knowledge and skills of H. Furthermore, the configuration may be such that the contents of the Web pages A, B, C, and D such as the number of appearances of the search keyword are reflected in the rank value calculation matrix P.

以下で第２の実施形態及び第３の実施形態を説明するが、特に説明がない限り、第１の実施形態における情報検索システム１と同様の部分については、同様の符号を付して重複する説明を省略する。 Hereinafter, the second embodiment and the third embodiment will be described. Unless otherwise specified, the same parts as those of the information search system 1 in the first embodiment are denoted by the same reference numerals and overlapped. Description is omitted.

第２の実施形態では、ランク値演算用行列を設定するとき、ユーザＥ，Ｆ，Ｇ，Ｈの知識や検索スキル等の違いや、ＷｅｂページＡ，Ｂ，Ｃ，Ｄの内容の違いに応じて、ランク値演算用行列の各要素を変更することを特徴とする。 In the second embodiment, when the rank value calculation matrix is set, it depends on the difference in knowledge and search skills of the users E, F, G, and H and the difference in the contents of the Web pages A, B, C, and D. Thus, each element of the rank value calculation matrix is changed.

第２の実施形態の情報検索システム２０は、第１の実施形態と同様に、検索装置２、各クライアント端末４、そして、各クライアント端末４で動作するブラウザ５及び端末側情報取得装置７によって構成されている（図１）。 As in the first embodiment, the information search system 20 according to the second embodiment includes the search device 2, each client terminal 4, and the browser 5 and the terminal-side information acquisition device 7 that operate on each client terminal 4. (FIG. 1).

第２の実施形態の端末情報取得装置７は、取得した端末情報を利用することで各ユーザのユーザランク値を算出する。本明細書における、「ユーザランク値」とは、各ユーザの特定項目に対する知識の深さや、検索スキル等を基に決定される、ユーザの重要度を数値化した値を示す。算出したユーザランク値は検索用データベース３に端末情報として格納される。 The terminal information acquisition device 7 of the second embodiment calculates the user rank value of each user by using the acquired terminal information. In this specification, the “user rank value” indicates a value obtained by quantifying the importance of the user, which is determined based on the depth of knowledge of each user regarding a specific item, search skill, and the like. The calculated user rank value is stored as terminal information in the search database 3.

ユーザランク値の算出方法は、例えば、上記した端末情報取得装置７が取得した端末情報と、情報収集機能（情報収集部１２）が収集したＷｅｂページの情報に基づいて、ユーザがブラウザ５を介して出力したＷｅｂページの種類や、Ｗｅｂページの閲覧時間、リンクを辿ったときの変遷履歴等に基づいて算出してもよい。
また、端末情報取得装置７をブラウザに組み込むとき、又はソフトウェアのインストール時等に、ユーザの職業や趣味等の個人情報を取得する方法がある。また、ブラウザ５に記録された頻繁に閲覧するＷｅｂページへのショートカットの情報を取得するという方法がある。即ち、検索が実行されたときに、検索対象となるＷｅｂページをユーザが閲覧しており、且つそのユーザの職業や趣味等がＷｅｂページの内容と関連していた場合、そのユーザのユーザランク値を高くするという方法である。
さらに、ユーザが、ＢＢＳ等で特定の項目と関連する質問に対して回答を行った際に、必要に応じて当該ＢＢＳ（特定のＷｅｂページ）からデータを取得する方法がある。具体的には、ユーザが行った回答に対する他のユーザの評価の情報等を必要に応じて取得する。そして、他のユーザから高い評価を得た回答を行ったユーザのユーザランク値を高くするという方法である。
そしてまた、検索キーワード、検索後に閲覧したＷｅｂページ、及び検索後にＷｅｂページを閲覧した時間、検索後のＷｅｂページからリンクしたページ等を解析することにより、ユーザの検索スキルを判定し、それを基にユーザランク値を付与するという方法が考えられる。
このような、検索内容に対する知識の深さや、検索スキル等を基にユーザを評価する方法から、適宜なものを少なくとも一つ選択し、端末情報取得装置７や情報収集部１２に情報を収集させ、検索装置２において解析処理を実行することで、ユーザランク値を決定する。 The user rank value is calculated by, for example, the user via the browser 5 based on the terminal information acquired by the terminal information acquisition device 7 and the information on the Web page collected by the information collection function (information collection unit 12). The web page may be calculated based on the type of web page output, the browsing time of the web page, the transition history when the link is followed, and the like.
There is also a method for acquiring personal information such as a user's occupation and hobbies when the terminal information acquisition device 7 is incorporated in a browser or when software is installed. Further, there is a method of acquiring information on shortcuts to frequently viewed web pages recorded in the browser 5. That is, when a user browses a Web page to be searched when the search is executed and the user's occupation or hobby is related to the content of the Web page, the user rank value of the user It is a method of raising the.
Furthermore, there is a method of acquiring data from a BBS (specific Web page) as necessary when a user answers a question related to a specific item using a BBS or the like. Specifically, information on evaluation of other users with respect to answers made by the user is acquired as necessary. And it is the method of raising the user rank value of the user who made the reply which got high evaluation from the other user.
Further, by analyzing the search keyword, the web page browsed after the search, the time when the web page was browsed after the search, the page linked from the web page after the search, etc., the user's search skill is determined, and based on it. It is conceivable to assign a user rank value to.
From such methods of evaluating users based on the depth of knowledge of search contents, search skills, etc., at least one appropriate one is selected, and the terminal information acquisition device 7 and the information collection unit 12 collect information. The user rank value is determined by executing an analysis process in the search device 2.

第２の実施形態の端末情報取得装置７では、情報収集部１２が収集した情報に基づいて、各Ｗｅｂページのリレーションランク値を算出する。本明細書における、「リレーションランク値」とは、各Ｗｅｂページの内容に特定の単語がどれだけ記載されているか、特定の単語がタイトル等の所定の箇所に記載されているか等の情報を基に決定される、Ｗｅｂページの内容と特定項目との関連性の高さを数値化した値を示す。算出したリレーションランク値は検索用データベース３に端末情報として格納される。 In the terminal information acquisition device 7 of the second embodiment, the relation rank value of each Web page is calculated based on the information collected by the information collection unit 12. In this specification, the “relation rank value” is based on information such as how many specific words are described in the contents of each Web page and whether specific words are described in a predetermined location such as a title. A value obtained by quantifying the degree of relevance between the content of the Web page and the specific item determined in the above. The calculated relation rank value is stored in the search database 3 as terminal information.

次に、本実施形態の情報検索システム１の検索実行時の処理の流れについて説明する。
いずれかのクライアント端末４で検索が実行されると、第１の実施形態と同様の手順によって、検索装置２がランク値演算用行列を設定する。 Next, the flow of processing at the time of search execution of the information search system 1 of the present embodiment will be described.
When a search is executed in any one of the client terminals 4, the search device 2 sets a rank value calculation matrix in the same procedure as in the first embodiment.

ここで、本実施形態では、検索によって抽出された各ＷｅｂページＡ，Ｂ，Ｃ，Ｄのリレーションランク値と、抽出された各Ｗｅｂページを閲覧しているユーザＥ，Ｆ，Ｇ，Ｈのユーザランク値を参照してランク値演算用行列の各要素の値を変更する。 Here, in the present embodiment, the relation rank values of the Web pages A, B, C, and D extracted by the search and the users E, F, G, and H who are browsing the extracted Web pages. The value of each element of the rank value calculation matrix is changed with reference to the rank value.

モデルを用いて具体的に説明すると、図５で示されるモデルにおいて、ユーザＦがユーザＧ，Ｈに比べてユーザランクが高いユーザであったとする。このとき、図６で示される
ランク値演算用行列Ｐの要素の値を、図７で示されるランク値演算用行列Ｐのように変更する。即ち、Ｐ（４，２），Ｐ（７，２），Ｐ（８，２）の値をそれぞれ１／４から１／５に変更し、Ｐ（６，２）の値を１／４から２／５に変更している。このことで、ＷｅｂページＢから仮想のＷｅｂページＦへのリンクの重要度を、ＷｅｂページＢから仮想のＷｅｂページＧ，Ｈそれぞれへの各リンクの重要度より高くしている。 More specifically, it is assumed that the user F is a user with a higher user rank than the users G and H in the model shown in FIG. At this time, the element values of the rank value calculation matrix P shown in FIG. 6 are changed to the rank value calculation matrix P shown in FIG. That is, the values of P (4,2), P (7,2), and P (8,2) are changed from 1/4 to 1/5, and the value of P (6,2) is changed from 1/4. It has been changed to 2/5. Thus, the importance of the link from the web page B to the virtual web page F is made higher than the importance of each link from the web page B to each of the virtual web pages G and H.

このように、作成したランク値演算用行列の各要素の値を、ユーザランク値とリレーションランク値のいずれか一方又は両方の値に基づき、適宜変更することにより、実在するＷｅｂページＡ，Ｂ，Ｃ，Ｄ、やユーザＥ，Ｆ，Ｇ，Ｈによって定義される仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈの重要度をランク値演算用行列に反映させる。なお、このとき、ランク値演算用行列の各要素の値を変更しても、ランク値演算用行列Ｐの各「列」の要素の総和は１となる。したがって、図６で示されるランク値演算用行列Ｐから要素Ｐ（６，２）値を変更するとき、１／４から（１＋α）／（４＋α）（αは０でない数）のように変更される。 As described above, by appropriately changing the value of each element of the created rank value calculation matrix based on either or both of the user rank value and the relation rank value, the existing Web pages A, B, The importance of virtual web pages E, F, G, and H defined by C, D, and users E, F, G, and H is reflected in the rank value calculation matrix. At this time, even if the value of each element of the rank value calculation matrix is changed, the sum of the elements of each “column” of the rank value calculation matrix P is 1. Therefore, when the element P (6,2) value is changed from the rank value calculation matrix P shown in FIG. 6, the value is changed from 1/4 to (1 + α) / (4 + α) (α is a number other than 0). The

さらに、必要に応じて各要素の値の変更を行ったランク値演算用行列の固有値を算出する。即ち、遷移確率行列の各要素の値を変更してから演算することで、遷移確率行列の各頂点であるＷｅｂページＡ，Ｂ，Ｃ，Ｄ及び仮想のＷｅｂページＥ，Ｆ，Ｇ，Ｈのそれぞれにおける遷移確率に基づく遷移傾向を推定する演算を実施する。そして、算出した固有値に基づいて抽出したＷｅｂページを並び替える。 Further, the eigenvalue of the rank value calculation matrix in which the value of each element is changed as necessary is calculated. That is, by calculating after changing the value of each element of the transition probability matrix, the Web pages A, B, C, D and the virtual Web pages E, F, G, H which are the vertices of the transition probability matrix The calculation which estimates the transition tendency based on the transition probability in each is implemented. Then, the extracted Web pages are rearranged based on the calculated eigenvalues.

また、検索装置２は、並び替えた検索対象となるＷｅｂページを検索結果としてクライアント端末４へ送信する。以上で本実施形態の情報検索システム１の検索実行時の処理の流れの説明を終了する。 Further, the search device 2 transmits the sorted Web pages to be searched to the client terminal 4 as a search result. Above, description of the flow of the process at the time of search execution of the information search system 1 of this embodiment is completed.

次に第３の実施形態について説明する。
第３の実施形態では、ランク値演算用行列を設定して固有値を算出した後、各固有値あるいは行列の要素に対してユーザランク値やリレーションランク値に基づく値を付加することを特徴とする。
詳説すると、第１の実施形態又は第２の実施形態と同様の手順により、ランク値演算用行列を設定し、固有値を算出する。そして、算出した各固有値を変更し、変更された固有値でＷｅｂページの並び替えを実施する。 Next, a third embodiment will be described.
The third embodiment is characterized in that after a rank value calculation matrix is set and eigenvalues are calculated, values based on user rank values and relation rank values are added to each eigenvalue or matrix element.
More specifically, a rank value calculation matrix is set and eigenvalues are calculated by the same procedure as in the first embodiment or the second embodiment. Then, the calculated eigenvalues are changed, and the web pages are rearranged with the changed eigenvalues.

例えば、算出されたＷｅｂページＡ，Ｂ，Ｃ，Ｄの固有値が、それぞれ０．４０８，０．５４４，０．４０８，０．５４４であったとする。そして、このとき、ＷｅｂページＡをユーザランク値の高いユーザが閲覧していたとする。すると、検索装置２は、ＷｅｂページＡの固有値に所定の値を加えて０．６０８とする。そして、変更したＷｅｂページＡ，Ｂ，Ｃ，Ｄに対応する固有値０．６０８，０．５４４，０．４０８，０．５４４に基づいてＷｅｂページＡ，Ｂ，Ｃ，Ｄの並び替えを実施する。このように、ユーザランク値の高いユーザが閲覧しているＷｅｂページやリレーションランク値の高いＷｅｂページに対応する固有値を変更したのち、Ｗｅｂページの並び替えを実施する。なお、上記した例では、ユーザランク値の高いユーザが閲覧しているＷｅｂページやリレーションランク値の高いＷｅｂページに対応する固有値に値を付与する例を示したが、固有値の変更方法はこれに限るものではない。例えば、リレーションランク値の低いＷｅｂページに対応する固有値の値を差し引いてもよい。固有値の変更は、ユーザランク値又はリレーションランク値に応じて適宜変更してよい。 For example, it is assumed that the calculated eigenvalues of Web pages A, B, C, and D are 0.408, 0.544, 0.408, and 0.544, respectively. At this time, it is assumed that a user with a high user rank value is browsing Web page A. Then, the search device 2 adds a predetermined value to the unique value of the Web page A to obtain 0.608. Then, the Web pages A, B, C, and D are rearranged based on the eigenvalues 0.608, 0.544, 0.408, and 0.544 corresponding to the changed Web pages A, B, C, and D. . In this way, after changing the eigenvalues corresponding to the Web page being browsed by the user with the higher user rank value or the Web page with the higher relation rank value, the Web pages are rearranged. In the above-described example, an example is shown in which values are assigned to eigenvalues corresponding to Web pages viewed by users with a high user rank value or Web pages with a high relationship rank value. It is not limited. For example, the eigenvalue value corresponding to the Web page having a low relation rank value may be subtracted. The change of the eigenvalue may be appropriately changed according to the user rank value or the relation rank value.

上記した各実施形態では、ランク値演算用行列Ｐは、検索対象となるＷｅｂページＡ，Ｂ，Ｃ，Ｄと上記仮想のＷｅｂページＥ，Ｆ，Ｇ，ＨとからなるＷｅｂページ群において、各ＷｅｂページＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈを頂点とする遷移確率行列なっている。したがって、このランク値演算用行列Ｐの設定は、例えば、各ＷｅｂページＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇ，Ｈを頂点とする隣接行列を作成し、作成した隣接行列を基に遷移確率行列を導出してもよい。ランク値演算用行列Ｐの設定するときの導出方法は、適宜変更してよい。 In each of the above-described embodiments, the rank value calculation matrix P is set in each Web page group including the Web pages A, B, C, and D to be searched and the virtual Web pages E, F, G, and H. It is a transition probability matrix having vertices of Web pages A, B, C, D, E, F, G, and H. Therefore, the rank value calculation matrix P is set by, for example, creating an adjacency matrix having apexes of the Web pages A, B, C, D, E, F, G, and H, based on the created adjacency matrix. A transition probability matrix may be derived. The derivation method for setting the rank value calculation matrix P may be changed as appropriate.

上記した各実施形態では、検索が実行されたとき、検索クエリの条件を満たすＷｅｂページを抽出し、抽出したＷｅｂページを閲覧しているユーザを特定して、抽出したＷｅｂページとそれを閲覧しているユーザとに対応するランク値演算用行列Ｐを設定した。しかしながら、本発明の検索システムの検索実行時の処理はこれに限るものではない。
例えば、ネットワーク上の全てのＷｅｂページとそれを閲覧しているユーザとに対応するランク値演算用行列Ｐ２を予め導出しておき、検索が実行された場合に、予め導出しておいたランク値演算用行列Ｐ２から、「行」の要素や「列」の要素から必要な要素を抜き出してランク値演算用行列Ｐを設定してもいい。即ち、予め導出しておいたランク値演算用行列Ｐ２から、必要な「行」の要素や「列」の要素を抜き出し、組み合わせることによって、検索クエリの条件を満たすＷｅｂページと、それを閲覧しているユーザとに対応するランク値演算用行列Ｐを設定してもよい。 In each of the above-described embodiments, when a search is executed, a Web page that satisfies the search query condition is extracted, a user browsing the extracted Web page is specified, and the extracted Web page and the Web page are browsed. The matrix P for rank value calculation corresponding to the user who is present is set. However, the processing at the time of search execution of the search system of the present invention is not limited to this.
For example, a rank value calculation matrix P2 corresponding to all Web pages on the network and users browsing the Web page is derived in advance, and the rank value previously derived when the search is executed. The rank value calculation matrix P may be set by extracting necessary elements from the “row” and “column” elements from the calculation matrix P2. That is, by extracting and combining the necessary “row” and “column” elements from the previously derived rank value calculation matrix P2, a Web page that satisfies the search query condition and browsing it are browsed. The rank value calculation matrix P corresponding to the user who is present may be set.

また同様に、特定の条件を満たすＷｅｂページと、それを閲覧しているユーザとに対応するランク値演算用行列Ｐ３を予め導出しておき、検索が実行された場合に、予め導出しておいたランク値演算用行列Ｐ３から、「行」の要素や「列」の要素から必要な要素を抜き出してランク値演算用行列Ｐを設定してもいい。 Similarly, a rank value calculation matrix P3 corresponding to a Web page satisfying a specific condition and a user viewing the Web page is derived in advance, and when a search is executed, the rank is calculated in advance. The rank value calculation matrix P may be set by extracting necessary elements from the “row” elements and “column” elements from the rank value calculation matrix P3.

上記した各実施形態では、ランク値演算用行列の行要素と列要素において、ＷｅｂページＡ，Ｂ，Ｃ，Ｄに対応する要素群とユーザＥ，Ｆ，Ｇ，Ｈに対応する要素群とが隣接して形成されるランク値演算用行列を例に挙げて説明したが、本発明のランク値演算用行列はこれに限るものではない。例えば、ランク値演算用行列の行要素と列要素の順番は適宜入れ替わってもよい。また、説明の便宜上、ランク値演算用行列を行列の書式で表しているが、実際に検索システムが演算する際の行列の取り扱いは適宜設定してよい。つまり、行列を形成する各要素、又は行列を形成する各要素の内で演算に必要な要素のみを記憶保持して、必要な計算を実行してもよい。必ずしも、上記したような行列の形状で取り扱わなくてもよい。 In each of the above-described embodiments, in the row element and the column element of the rank value calculation matrix, the element group corresponding to the Web pages A, B, C, and D and the element group corresponding to the users E, F, G, and H are The rank value calculation matrix formed adjacently has been described as an example, but the rank value calculation matrix of the present invention is not limited to this. For example, the order of the row elements and the column elements of the rank value calculation matrix may be appropriately switched. Further, for convenience of explanation, the rank value calculation matrix is represented in the matrix format, but the handling of the matrix when the search system actually calculates may be set as appropriate. In other words, each element forming the matrix or only elements necessary for the calculation among the elements forming the matrix may be stored and held, and the necessary calculation may be executed. It is not always necessary to deal with the matrix shape as described above.

上記した各実施形態では、検索が実行されたとき、端末側情報取得装置７が予め取得した端末情報と、情報収集部１２が予め取得したＷｅｂページの構造に係る情報とを検索用データベース３から取得してランク値演算用行列Ｐを設定したが、ランク値演算用行列の設定方法はこれに限るものではない。例えば、検索が実行されたとき、検索装置２がクライアント端末４から、Ｗｅｂページの閲覧に係る情報を取得する構成であってもよい。また、取得した情報をデータベースに保持させずにランク値演算用行列の設定に使用してもよい。即ち、クライアント端末４から検索装置２への端末情報の送信は、適時実施してもよい。そして、情報のデータベースへの格納は、適時実施してもよく、場合によっては実施しなくてもよい。 In each of the above-described embodiments, when the search is executed, the terminal information acquired in advance by the terminal-side information acquisition device 7 and the information related to the web page structure acquired in advance by the information collection unit 12 are retrieved from the search database 3. Although the rank value calculation matrix P is acquired and set, the method for setting the rank value calculation matrix is not limited to this. For example, when the search is executed, the search device 2 may acquire information related to browsing of the Web page from the client terminal 4. Further, the acquired information may be used for setting the rank value calculation matrix without being held in the database. That is, transmission of terminal information from the client terminal 4 to the search device 2 may be performed in a timely manner. Then, the storage of information in the database may be performed in a timely manner or may not be performed in some cases.

上記した各実施形態では、Ｗｅｂページを固有値の高い順に並び替えたが、本発明はこれに限らず、例えば、Ｗｅｂページを固有値の低い順に並び替えてもよい。しかしながら、Ｗｅｂページを固有値の高い順に並び替える方法によると、検索実行時に話題性の高いＷｅｂページ（例えば、多数のクライアント端末４で出力されたＷｅｂページ）が上位に表示されるため、望ましい。 In each of the embodiments described above, the Web pages are rearranged in descending order of eigenvalues. However, the present invention is not limited to this, and for example, the Web pages may be rearranged in ascending order of eigenvalues. However, according to the method of rearranging Web pages in descending order of eigenvalues, Web pages with high topicality (for example, Web pages output from a large number of client terminals 4) are displayed at the top when performing search.

また、上記した各実施形態では、検索結果となるＷｅｂページを並び替えるとき、Ｗｅｂページの固有値が同じ場合、閲覧者が多いページを上位に表示している。しかしながら、本発明の検索システムはこれに限るものではなく、例えば、Ｗｅｂページの固有値が同じだった場合、閲覧者が少ないページを上位に表示してもよい。 Further, in each of the above-described embodiments, when the Web pages that are the search results are rearranged, if the unique values of the Web pages are the same, pages with a large number of viewers are displayed at the top. However, the search system of the present invention is not limited to this. For example, when the eigenvalues of Web pages are the same, a page with fewer viewers may be displayed at the top.

上記した各実施形態では、図３に示すように、検索用データベース３及びクライアント端末４から送信された情報を受信、管理する機能が検索装置２に含まれる構成を例に挙げているが、本発明の検索システム１はこれに限定されるものではない。例えば、検索用データベース３をネットワーク上の別途のサーバ装置に設けておき、検索装置２が検索を実行する際に、当該サーバの検索用データベース３から情報を取得するという構成でもよい。即ち、システム全体で機能を実現できればよく、設置するサーバの数及び各機能（プログラム）、データベースの配置は、サーバの能力等により適宜変更可能である。 In each of the above-described embodiments, as illustrated in FIG. 3, a configuration in which the search device 2 includes a function of receiving and managing information transmitted from the search database 3 and the client terminal 4 is exemplified. The search system 1 of the invention is not limited to this. For example, the configuration may be such that the search database 3 is provided in a separate server device on the network, and information is acquired from the search database 3 of the server when the search device 2 executes a search. That is, it is only necessary to realize the functions in the entire system, and the number of servers to be installed, the functions (programs), and the arrangement of the databases can be appropriately changed depending on the capabilities of the servers.

なお、上記した各実施形態のプログラム、及びシステムは、ＣＤ−ＲＯＭ等の光学ディスク、磁気ディスク、半導体メモリなどの各種の記録媒体を通じて、又は電気通信網などを介してダウンロードすることにより、コンピュータにインストール又はロードすることができる。 Note that the programs and systems of the above-described embodiments are downloaded to a computer through various recording media such as an optical disk such as a CD-ROM, a magnetic disk, and a semiconductor memory, or via a telecommunication network. Can be installed or loaded.

ところで、上記したランク値演算用行列の固有値を求める際の演算は、適宜のハードウェア構成、適宜の演算アルゴリズムで行ってよい。しかし、より高速にランク値演算用行列の固有値を求めるべく、下記の検証を実施した。本発明の情報検索システムは、下記の検証を参考にハードウェア構成及び演算アルゴリズムを選択してもよい。 By the way, the calculation for obtaining the eigenvalue of the rank value calculation matrix described above may be performed with an appropriate hardware configuration and an appropriate calculation algorithm. However, the following verification was performed in order to obtain eigenvalues of the rank value calculation matrix at higher speed. The information search system of the present invention may select a hardware configuration and an arithmetic algorithm with reference to the following verification.

検証を実施するに当たって、ＬＲ法やＱＲ法等の古典的な手法は一般的に大きな手間を要するので、固有値が等しく、元の行列よりサイズの小さい行列に変換することで計算時間を削減する「射影法」に基づく手法について検証を行った。Ｌａｎｚｃｏｓ，Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒ等の射影法に基づく手法は，Ｇｒａｍ−Ｓｃｈｍｉｄｔの手法をベースにした直交化のフェーズと、マルチシフトＱＲ法による固有値計算およびＳｃｈｕｒ分解のフェーズからなる手法で、分解フェーズでブロックを並べかえることで、最大固有値、最小固有値など，欲しい固有値および固有ベクトルの組から求められる手法である。そのため、最大固有値と対応する固有ベクトルのみが必要なアプリケーションでは、大幅に計算時間を削減できる可能性がある。そのため、「射影法」に基づく手法は、上記したランク値演算用行列の固有値を求める演算に適していると考えられる。 In performing the verification, classical methods such as the LR method and the QR method generally require a large amount of labor. Therefore, the calculation time is reduced by converting to a matrix having the same eigenvalue and a smaller size than the original matrix. We verified the method based on the projection method. A method based on the projection method such as Lanzcos, Arnoldi, Krylov-Schur, etc. is a method comprising an orthogonalization phase based on the Gram-Schmidt method, an eigenvalue calculation based on the multi-shift QR method, and a Schur decomposition phase. This is a method that is obtained from a set of desired eigenvalues and eigenvectors such as the maximum eigenvalue and minimum eigenvalue by rearranging the blocks. Therefore, in an application that requires only the eigenvector corresponding to the maximum eigenvalue, the calculation time may be significantly reduced. Therefore, the method based on the “projection method” is considered suitable for the calculation for obtaining the eigenvalue of the rank value calculation matrix.

これらの射影法を実装したライブラリの中で、大規模行列の計算に対応したものとしてＳＬＥＰｃがある。ＳＬＥＰｃでは、ＰＥＴＳｃ、ＢＬＡＳ、ＬＡＰＡＣＫなどの既存の数値計算ライブラリを用いて、最新の固有値計算手法を実装しており、大幅な計算時間の短縮が期待できる。ＳＬＥＰｃが用いているＰＥＴＳｃという数値計算ライブラリでは、ＯｐｅｎＭＰＩを用いてベクトルや行列の演算を複数プロセスで並列処理する機能を備えている。ＰＥＴＳｃはＣ言語で記述されており、高速に演算を行える。また、ＯｐｅｎＭＰＩはプロセス間通信方式として、共有メモリ、ＩｎｆｉｎｉＢａｎｄ、ＴＣＰ／ＩＰ等、さまざまな方式をサポートしており、内部で自動的に切り替えてくれるため、準備できる環境に応じて最適な並列化が容易に実現できる。 Among libraries that implement these projection methods, SLEPc corresponds to the calculation of a large-scale matrix. In SLEPc, the latest eigenvalue calculation method is implemented using existing numerical calculation libraries such as PETSc, BRAS, LAPACK, etc., and a significant reduction in calculation time can be expected. A numerical calculation library called PETSc used by SLEPc has a function of performing parallel processing of vectors and matrices in a plurality of processes using OpenMPI. PETSc is written in C language and can perform calculations at high speed. In addition, OpenMPI supports various methods such as shared memory, InfiniBand, TCP / IP as inter-process communication methods, and automatically switches internally so that optimal parallelization is possible according to the environment that can be prepared It can be easily realized.

これに対して、一般的な検索サービスでは、常に大規模な分散環境における処理を前提としてシステムを構築している。これらのミドルウェアは、ノード故障等が頻繁に発生する環境で、安定して大量のバッチ処理を実行できるという特徴を有する。しかしながら、冗長化のためのオーバヘッドも大きく、大量の通信を伴う処理では高い性能を出すことができない。 In contrast, a general search service always builds a system on the premise of processing in a large-scale distributed environment. These middlewares are characterized in that a large amount of batch processing can be stably executed in an environment where node failures or the like frequently occur. However, the overhead for redundancy is large, and high performance cannot be achieved in processing involving a large amount of communication.

つまり、中小規模の計算機環境では密結合な処理の方が高性能で処理できる可能性があるため、本検証では、ＭＰＩベースで実装されたクラスタ向けのミドルウェアも含めて調査することにした。 In other words, because there is a possibility that tightly coupled processing can be performed with higher performance in small and medium-sized computer environments, in this verification, we decided to investigate middleware for clusters implemented based on MPI.

まず、ＯｐｅｎＭＰＩで実装されたＳＬＥＰｃの性能について調査した。詳細には、本発明の情報検索システムが要求するスケーラビリティを実現するため、どの程度のリソースを要するのかを調査した。さらに具体的には、Ｗｅｂページと閲覧者をあわせて１０万エントリ程度の計算を実施する場合を想定した性能について調査した。 First, the performance of SLEPc implemented with OpenMPI was investigated. Specifically, it was investigated how much resources are required to achieve the scalability required by the information retrieval system of the present invention. More specifically, we investigated the performance assuming a case where a calculation of about 100,000 entries is performed for the Web page and the viewer.

検証における評価環境は以下の通りである。
計算機：ＰｏｗｅｒＥｄｇｅＲ４１０×２（ＤＥＬＬ社製）
ＣＰＵ：Ｉｎｔｅｌ（Ｒ）Ｘｅｏｎ（Ｒ）Ｌ５５２０（２．２６ＧＨｚ，８ＭＢキャッシュ，５．８６ＧＴ／ｓＱＰＩ） ×２（コア数４）
メモリ：３２ＧＢ（４ＧＢ×８／２Ｒ／１０６６ＭＨｚ／ＤＤＲ３ＲＤＩＭＭ）
ディスク：ＲＡＩＤ６（ＰＥＲＣ６ｉ），６００ＧＢ，１５，０００ＲＰＭＳＡＳ
ネットワーク：ＢｒｏａｄｃｏｍＮｅｔＸｔｒｅｍｅＩＩＢＣＭ５７１６１０００Ｂａｓｅ−ＴＰＣＩＥｘｐｒｅｓｓ
スイッチ：ＣｉｓｃｏＣａｔａｌｙｓｔ２９６０Ｇ−８ＴＣ−Ｌ
ＯＳ：ＵｂｕｎｔｕＬｉｎｕｘ１０．０４
ライブラリ：ＰＥＴＳｃ３．０．０，ＳＬＥＰｃ３．０．０，ＬＡＰＡＣＫ３．２．１，ＢＬＡＳ１．２ The evaluation environment for verification is as follows.
Calculator: PowerEdge R410 × 2 (DELL)
CPU: Intel (R) Xeon (R) L5520 (2.26 GHz, 8 MB cache, 5.86 GT / s QPI) x 2 (4 cores)
Memory: 32GB (4GB x 8 / 2R / 1066MHz / DDR3 RDIMM)
Disk: RAID6 (PERC6i), 600GB, 15,000 RPM SAS
Network: Broadcom NetXtreme II BCM5716 1000Base-T PCI Express
Switch: Cisco Catalyst 2960G-8TC-L
OS: Ubuntu Linux 10.04
Library: PETSc 3.0.0, SLEPc 3.0.0, LAPACK 3.2.1, BRAS 1.2

性能評価では、対称行列の例としてラプラシアン行列を用い、非対称行列の例として要素間をランダムに遷移するモデルを想定して生成した遷移確率行列を用いて固有値、固有ベクトルを計算した。本実験で用いたラプラシアン行列は、図８（ａ）に示したような行列で、対角要素が２、その両脇の要素が１の行列とした。また、遷移確率行列は、三角格子の上をランダムに移動することを想定して生成した図８（ｂ）のような行列とした。本検証ではまず単一ノード内での並列処理性能について評価し、その後ＴＣＰ／ＩＰを用いて複数ノードでの評価を行なう。 In the performance evaluation, eigenvalues and eigenvectors were calculated using a Laplacian matrix as an example of a symmetric matrix and a transition probability matrix generated assuming a model that randomly transitions between elements as an example of an asymmetric matrix. The Laplacian matrix used in this experiment is a matrix as shown in FIG. 8A, in which the diagonal elements are 2 and the elements on both sides are 1. Further, the transition probability matrix is a matrix as shown in FIG. 8B generated on the assumption that it moves randomly on the triangular lattice. In this verification, first, parallel processing performance in a single node is evaluated, and then evaluation is performed on a plurality of nodes using TCP / IP.

実機での評価の前にＳＬＥＰｃのソースコードから、Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒの実装について調査した。 Before the evaluation with the actual machine, the implementation of Arnoldi, Krylov-Schur was investigated from the source code of SLEPc.

ＳＬＥＰｃで用いられているＰＥＴＳｃでは、ＯｐｅｎＭＰＩの集合通信機能を利用し、行列の１行目からｎ１行目、ｎ１＋１行目からｎ２行目など、指定した範囲で分割して、各プロセッサごとに部分行列を生成、処理する機能を実装している。直交化の計算では、各プロセスに分散配置した部分行列およびベクトルの中間計算結果をギャザ，スキャタ，レデュース等の機能で相互にやりとりする必要があるため、並列化時の通信オーバヘッドが大きくなると考えられる。 In PETSc used in SLEPc, the OpenMPI collective communication function is used to divide the matrix into specified ranges, such as from the first row to the n1 row and from the n1 + 1 row to the n2 row. Implements a function to generate and process matrices. In the orthogonalization calculation, it is necessary to exchange the intermediate calculation results of submatrix and vector distributed in each process with functions such as gather, scatter, reduce, etc., so the communication overhead during parallelization will increase. .

対して、マルチシフトＱＲやＳｃｈｕｒ分解の処理では、注目している部分行列のみに注目して処理できるため、通信オーバヘッドは発生しないと考えられる。以上の点について、実際に性能計測して確認した。 On the other hand, in the multi-shift QR and Schur decomposition processes, processing can be performed by paying attention only to the submatrix that is focused on, and it is considered that no communication overhead occurs. The above points were confirmed by actually measuring performance.

ＰＥＴＳｃでは、−ｌｏｇｓｕｍｍａｒｙというオプションにより、プログラムの実行時間、事前設定した箇所の呼び出し回数、実行時間などを分析する機能を提供している。下記の実験では、この−ｌｏｇｓｕｍｍａｒｙの機能を用いて性能計測した。 PETSc provides a function for analyzing the execution time of a program, the number of times of calling a preset location, the execution time, etc., with the option of -log summary. In the following experiment, performance was measured using the function of this -log summary.

またさらに、ＭＰＩにおけるオールギャザ、オールレデュース等の集合通信では、プロセッサ数が偶数であることを想定しており、奇数の場合余ったノードの通信は効率的に行えないため、集合通信を多用する計算では偶数個のプロセッサを利用するのが良いと考えられる。実際に奇数ｎ個のプロセッサを利用した場合、偶数ｎ１個のプロセッサ数と同等程度の性能しか得られなかった。そのため、以下の評価では偶数個のプロセッサの性能について評価した。 Furthermore, in collective communication such as all gather and all reduce in MPI, it is assumed that the number of processors is an even number, and in the case of an odd number, communication of the remaining nodes cannot be performed efficiently, so collective communication is frequently used. It is considered that an even number of processors should be used in the calculation. When actually using an odd number of n processors, only a performance equivalent to the number of even number n1 processors was obtained. Therefore, in the following evaluation, the performance of an even number of processors was evaluated.

計算対象となる行列の値により実際に並列に実行されるコードが異なってくるので，ソースコードから直接並列化率を求めることはできないが、性能測定結果から指定したパラメータでの並列化率を算出し、ライブラリの性能を評価できると考えられる。以下で求めた並列化率は指定したパラメータでの並列化率である。また、ＳＬＥＰｃでは、求める固有値の許容誤差を指定することで、計算結果が指定した誤差内に収まるまで、反復計算を継続する。以下の実験では許容誤差を１０の−７乗として計算した。 Since the code actually executed in parallel differs depending on the value of the matrix to be calculated, the parallelization rate cannot be obtained directly from the source code, but the parallelization rate with the specified parameters is calculated from the performance measurement results The library performance can be evaluated. The parallelization rate obtained below is the parallelization rate with the specified parameters. In SLEPc, iterative calculation is continued until the calculation result falls within the specified error by specifying the allowable error of the eigenvalue to be obtained. In the following experiment, the tolerance was calculated as 10 −7.

まず、ラプラシアン行列での計算時間について調査する。 First, we investigate the computation time in the Laplacian matrix.

調査に当たって、１０万×１０万のサイズのラプラシアン行列の固有値計算時間を測定した。このとき、まず、同じ反復回数での処理時間を比較した。反復回数を１２５００回程度に制限すると、このサイズの行列ではどの手法もまだ固有値が収束しないので、同じ繰り返し回数だけ計算される。単一プロセッサで計算したときの処理時間をＴ１、プロセッサ数ｎで計算したときの処理時間をＴｎとしたとき，プロセッサ数を変化させた場合の処理時間比Ｔ１＝Ｔｎは図９のようになった。 In the investigation, the eigenvalue calculation time of a Laplacian matrix having a size of 100,000 × 100,000 was measured. At this time, first, the processing times at the same number of iterations were compared. If the number of iterations is limited to about 12,500, the eigenvalues are not yet converged in any method of this size matrix, so that the same number of iterations is calculated. FIG. 9 shows the processing time ratio T1 = Tn when the number of processors is changed, where T1 is the processing time when calculated with a single processor and Tn is the processing time when calculated with the number of processors n. It was.

利用したハードウェアはノードあたり８コアのプロセッサが利用でき、さらにＩｎｔｅｌのＨｙｐｅｒＴｈｒｅａｄｉｎｇ技術により、仮想的に１６コアのプロセッサが利用できる。しかし、図９に示すように、プロセッサ数が８から９に増えるところで性能が低下している。そこでＨｙｐｅｒＴｈｒｅａｄｉｎｇ機能をＯＦＦにして各手法の処理時間をプロセス数８まで測定したところ、プロセッサ数８まではほぼ同様な変化を示した。そのため、複数ＣＰＵの利用ではなくＨｙｐｅｒＴｈｒｅａｄｉｎｇが原因で性能低下していると考えられる。Ｕｂｕｎｔｕの実装が影響していることも考えられるが、詳細な原因までは調査できていない。このことを受け、以下の実験ではＨｙｐｅｒＴｈｒｅａｄｉｎｇ機能をＯＦＦにして評価を実施した。 As the hardware used, an 8-core processor can be used per node, and a 16-core processor can be virtually used by Intel's Hyper Threading technology. However, as shown in FIG. 9, the performance is degraded when the number of processors increases from 8 to 9. Therefore, when the Hyper Threading function was turned off and the processing time of each method was measured up to 8 processes, the same changes were observed up to 8 processors. For this reason, it is considered that the performance is degraded due to Hyper Threading rather than using multiple CPUs. It is possible that the implementation of Ubuntu is influential, but we have not investigated the detailed cause. In response to this, in the following experiments, evaluation was performed with the Hyper Threading function turned off.

プロセッサ数８までの結果では、アムダールの法則が示すとおりプロセッサ数の増加に対して処理時間比の伸びが鈍化している。プログラム中で並列化されている部分の割合(並列化率:ＰａｒａｌｌｅｌＰｏｒｔｉｏｎ)をｒ（０＜＝ｒ＜＝１）とし、プロセッサ数をｎとすると、オーバヘッドを無視した並列処理部分の処理時間はｒＴ１＝ｎ、逐次処理部分の処理時間は（１―ｒ）Ｔ１なので、処理時間比Ｓｐｅｅｄｕｐは下記〔数１〕のようになっている。 In the results up to eight processors, the increase in the processing time ratio has slowed as the number of processors increases as Amdahl's law shows. If the ratio (parallelization ratio: Parallel Portion) in the program is r (0 <= r <= 1) and the number of processors is n, the processing time of the parallel processing part ignoring the overhead is Since rT1 = n and the processing time of the sequential processing portion is (1-r) T1, the processing time ratio Speedup is as shown in the following [Equation 1].

計測した値から最小二乗法でｒを推定すると、Ｌａｎｚｃｏｓ，Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒでそれぞれ、およそ０．９６５３（９６．５３％），０．９７２０（９７．２０％）０．９５８３（９５．８３％）という値になった。なお、図９中の直線は理想的な性能，曲線は推定した曲線を表す。オーバヘッドを無視した式で推定しているため、観測値と推定曲線にはずれがある。８プロセッサでの計測値が近似曲線よりも下に来ているので、１０プロセッサ以上の性能を測定して再計算すれば、並列化率は低下すると考えられる。次に、それぞれの手法における８プロセッサでの並列化効率(ＰａｒａｌｌｅｌＥｆｆｉｃｉｅｎｃｙ)を調べた。並列化効率ＰＥは、処理時間比をプロセッサ数で割った値であるため百分率で表すと、下記〔数２〕のようになっている。 When r is estimated from the measured value by the least square method, Lanzcos, Arnoldi, and Krylov-Schur are approximately 0.9653 (96.53%), 0.9720 (97.20%) 0.9583 (95.83), respectively. %). The straight line in FIG. 9 represents ideal performance, and the curve represents an estimated curve. Since the estimation is performed by ignoring the overhead, there is a difference between the observed value and the estimated curve. Since the measured value with 8 processors is below the approximate curve, the parallelization rate is considered to decrease if the performance of 10 processors or more is measured and recalculated. Next, the parallelization efficiency (Parallel Efficiency) with 8 processors in each method was examined. Since the parallelization efficiency PE is a value obtained by dividing the processing time ratio by the number of processors, it is expressed by the following [Equation 2] when expressed as a percentage.

並列化効率は、Ｌａｎｚｃｏｓ，Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒｂで、それぞれ、およそ７４．５８％，７６．１１％，６８．８７％となり、Ａｒｎｏｌｄｉが最も良い数値を示した。 The parallelization efficiency was approximately 74.58%, 76.11%, and 68.87% for Lanzcos, Arnoldi, and Krylov-Schurb, respectively, and Arnoldi showed the best values.

対称行列での並列化効率は３手法とも、８プロセッサまでであれば５０％は割り込まないが、それ以上になると並列化効率が低下する可能性がある。例えば、大規模なシステムであれば、システムの利用効率を維持するため，５０％の並列化効率に満たないジョブの投入を断るという事例がある。したがって、今後、本発明の情報検索システムで要求される処理性能の実現に８ノード以上が必要であることが判明し、かつ、８プロセッサ以上で並列化効率が５０％を割り込むことが判明した場合は、ハードウェア追加による投資対効果を得るために、他のライブラリや計算手法の利用が考えられる。 For all three methods, the parallelization efficiency in the symmetric matrix is less than 50% if it is up to 8 processors, but if it exceeds that, the parallelization efficiency may decrease. For example, in the case of a large-scale system, there is a case where a job less than 50% parallelization efficiency is refused in order to maintain the system utilization efficiency. Therefore, in the future, it has been found that 8 nodes or more are required to realize the processing performance required by the information retrieval system of the present invention, and that it is found that the parallelization efficiency falls below 50% with 8 processors or more. In order to obtain the return on investment by adding hardware, it is possible to use other libraries and calculation methods.

３手法の繰り返し回数１２５００回の計算時間を比較したグラフを図１０に示す。対称行列では、Ｌａｎｚｃｏｓの計算時間が最も少ないことが分かる。非対称行列も計算可能な残りの２手法では、Ｋｒｙｌｏｖ−Ｓｃｈｕｒがより計算時間が短いことが分かる。ただし、これらは繰り返し回数を等しくした場合の計算時間であり、実際には各手法で固有値が収束するまでに必要な繰り返し回数は異なっている。例えば、今回調査した中ではＡｒｎｏｌｄｉは３６５３４回の繰り返しで収束している。 FIG. 10 shows a graph comparing the calculation times of 12,500 iterations of the three methods. It can be seen that the calculation time of Lanzcos is the shortest in the symmetric matrix. It can be seen that the remaining two methods that can also calculate the asymmetric matrix have a shorter calculation time for Krylov-Schur. However, these are the calculation times when the number of iterations is made equal, and actually, the number of iterations required until the eigenvalue converges in each method is different. For example, in this investigation, Arnoldi has converged after 36534 iterations.

次に生成した遷移確率行列での計算時間について調査した。 Next, the calculation time in the generated transition probability matrix was investigated.

生成した行列は非対称行列であり、遷移確率行列であるため、ラプラシアン行列よりも対象とするＷｅｂページのリンク構造に近い行列となっている。行列の列方向の値の和が１になるように行列を生成するため制約条件があり、行列のサイズは中途半端な値となる。今回２８１６２５というサイズの行列を用いた。また、繰り返し回数は各手法で固有値が収束しない５００回に制限して実施した。Ｌａｎｚｃｏｓは対称行列を対象とした計算手法であるため、非対称行列については、Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒの２手法について比較した。Ｋｒｙｌｏｖ−Ｓｃｈｕｒでは、行列が対称の場合、部分的に計算アルゴリズムをＬａｎｚｃｏｓをベースとした計算手法に切り替えているため、非対称の場合では性能特性が異なっている。 Since the generated matrix is an asymmetric matrix and is a transition probability matrix, the matrix is closer to the link structure of the target Web page than the Laplacian matrix. There is a constraint condition for generating the matrix so that the sum of the values in the column direction of the matrix becomes 1, and the size of the matrix becomes a halfway value. This time, a matrix with a size of 281625 was used. In addition, the number of repetitions was limited to 500 times in which eigenvalues did not converge in each method. Since Lanzcos is a calculation method for a symmetric matrix, two methods, Arnoldi and Krylov-Schur, were compared for an asymmetric matrix. In Krylov-Schur, when the matrix is symmetric, the calculation algorithm is partially switched to a calculation method based on Lanzcos.

図１１にプロセッサ数を変化させた場合の処理時間比Ｔ１＝Ｔｎの結果を示す。最小二乗法で推定した並列化率は、Ａｒｎｏｌｄｉ,Ｋｒｙｌｏｖ−Ｓｃｈｕｒがそれぞれ０．８１７５９１(８１．７５％), ０．８３９２９８（８３．９２％）となっており、非対称行列ではＡｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒともに対称行列の場合と比べて並列化率が低下している。同様に、プロセッサ数を変化させた場合の並列化効率(ＰａｒａｌｌｅｌＥｆｆｉｃｉｅｎｃｙ)の結果を図１２に示す。Ａｒｎｏｌｄｉ，Ｋｒｙｌｏｖ−Ｓｃｈｕｒともにプロセッサ数８で並列化効率が５０％を切っており、上述の例を考えると現在の実装では効果的にＣＰＵを利用できていないおそれがある。 FIG. 11 shows the result of the processing time ratio T1 = Tn when the number of processors is changed. The parallelization rates estimated by the least square method are 0.817591 (81.75%) and 0.839298 (83.92%) for Arnoldi and Krylov-Schur, respectively, and Arnoldi and Krylov-Schur for the asymmetric matrix In both cases, the parallelization rate is lower than in the case of the symmetric matrix. Similarly, FIG. 12 shows the result of parallel efficiency when the number of processors is changed. Both Arnoldi and Krylov-Schur have 8 processors and the parallelization efficiency is below 50%. Considering the above example, there is a possibility that the current implementation cannot effectively use the CPU.

しかし、遷移確率行列は疎行列であるため固有値の収束が速く、収束までの繰り返し回数が少ない。実際に収束までにかかった繰り返し回数は、手法やプロセッサ数によって差があるが、Ｋｒｙｌｏｖ−Ｓｃｈｕｒでは、１１５７回以内で収束することが確認できた。処理時間は２プロセッサでおよそ１０２．２秒，５６．７７秒であり、ラプラシアン行列の場合の１／４程度の処理時間となっている。そのため、必要となるプロセッサ数も少なくなると考えられる。 However, since the transition probability matrix is a sparse matrix, eigenvalues converge quickly and the number of iterations until convergence is small. Although the number of iterations actually required for convergence varies depending on the method and the number of processors, it has been confirmed that Krylov-Schur converges within 1157 times. The processing time is about 102.2 seconds and 56.77 seconds with two processors, which is about 1/4 of the processing time in the case of a Laplacian matrix. Therefore, it is considered that the number of required processors is reduced.

次に、通信オーバヘッドの影響を調査するため、単一ノードと複数ノードで計算した場合を比較した。具体的には、１ノードで実行した場合と２ノードで実行した場合を比較した。なお、評価はＴＣＰ／ＩＰで実施した。 Next, in order to investigate the influence of communication overhead, we compared the case of calculation with a single node and multiple nodes. Specifically, the case of executing on one node and the case of executing on two nodes were compared. In addition, evaluation was implemented by TCP / IP.

遷移確率行列は疎行列であるため、固有値の収束が速く、計算の繰り返し回数が少なくなる。そのため、計算の繰り返し回数が多くなるラプラシアン行列の計算結果で確認した。２プロセッサを利用した計算で、繰り返し回数を変化させた場合の処理時間について、図１３に示す。 Since the transition probability matrix is a sparse matrix, eigenvalues converge quickly and the number of calculation iterations decreases. Therefore, we confirmed the result of the Laplacian matrix, which increases the number of iterations. FIG. 13 shows the processing time when the number of repetitions is changed by calculation using two processors.

単一ノードで処理した場合に比べて、ＭＰＩ関連の関数呼び出し時間は大きくなるが、その他の部分の処理時間の関係で、複数ノードを利用した方が性能が良くなっているケースがある。これらは、ＯｐｅｎＭＰＩのドライバの実装等に依存している可能性がある。本検証の結果より、今後、情報検索システムが要求する性能が単一ノードのプロセッサ数では実現できないと判明した場合、今回調査した性能向上率で対応できる範囲であれば、ノード間通信を利用してプロセッサを追加することで対応できる可能性がある。 Compared to processing with a single node, the MPI-related function call time is longer, but there are cases where the performance is better when a plurality of nodes are used due to the processing time of other parts. These may depend on the implementation of the Open MPI driver. As a result of this verification, if it becomes clear that the performance required by the information retrieval system will not be realized with the number of processors of a single node in the future, inter-node communication is used as long as the performance improvement rate investigated this time is within the range. There is a possibility that it can be supported by adding a processor.

以上で、ＭＰＩベースで実装されたクラスタ向けのミドルウェアも含めた固有値計算の処理速度に関する検証の説明を終了する。 This is the end of the description of the verification regarding the processing speed of the eigenvalue calculation including the middleware for clusters implemented on the basis of MPI.

上記した検証では、ＳＬＥＰｃに実装されたＰｏｗｅｒ（べき乗）法についての検証を行っていないが、この方法も本発明の情報検索システムに好適に使用できる可能性がある In the above verification, the Power method implemented in SLEPc is not verified, but this method may also be suitably used in the information search system of the present invention.

上記した検証では、ランキング機能の固有値計算部分の高速化についてのみ検討を行った。しかし、実際にはページ間のリンク構造を取得して遷移確率行列として固有値計算プログラムに入力する必要がある。行列サイズが大きくなると、入出力がボトルネックになる可能性がある。ここで、ＭＰＩでは並列入出力の機能を備えており、これを利用することで入出力による性能低下を回避できる可能性がある。また、ＰＥＴＳｃは疎行列を効率的に格納するデータ構造をサポートしているため、メモリ使用量を抑えることができる。 In the above verification, only the speeding up of the eigenvalue calculation part of the ranking function was examined. However, in practice, it is necessary to acquire a link structure between pages and input it to the eigenvalue calculation program as a transition probability matrix. As the matrix size increases, I / O can become a bottleneck. Here, the MPI has a parallel input / output function, and by using this, there is a possibility that performance degradation due to input / output can be avoided. In addition, since PETSc supports a data structure for efficiently storing a sparse matrix, the memory usage can be suppressed.

上記した検証によって、並列計算ライブラリＳＬＥＰｃを用いることで、８コア程度のＰＣで、数十万規模の遷移確率行列の固有値および固有ベクトルの計算が数十秒で完了できることが確認された。 From the above verification, it was confirmed that eigenvalues and eigenvectors of the transition probability matrix of hundreds of thousands of scales can be completed in tens of seconds on a PC of about 8 cores by using the parallel calculation library SLEPc.

１情報検索システム
１０端末情報取得部（情報取得手段）
１３検索実行部（情報序列化手段） 1 Information Retrieval System 10 Terminal Information Acquisition Unit (Information Acquisition Unit)
13 Search execution part (information ordering means)

Claims

An information search system for searching a Web page based on a keyword input at a terminal that has executed a search, and outputting and displaying the search result on the terminal,
Having information ordering means for ordering Web pages to be searched;
The information ordering means is an operation for estimating a transition tendency from a relation between a Web page linked to a Web page to be searched and a terminal currently connected to the Web page to be searched or within a certain period of time. An information retrieval system having a function of implementing

The transition probability is calculated by setting a square matrix by a configuration including a Web page to be searched and a terminal currently connected to the Web page to be searched or within a certain period, and calculating an eigenvector of the square matrix. The information search system according to claim 1, further comprising a calculation content.

The elements of the square matrix are configured by a value based on the number of link destinations to which a web page is linked and a value based on the number of web pages to which the terminal is actually connected within a certain period. The information search system according to claim 2.

The element of the square matrix is A / (A + B) where A is a value based on the number of link destinations to which the Web page is linked, and B is the number of Web pages that the terminal is currently connected to within a certain period. B / (A + B) or a value obtained by multiplying these by a system number.

It has information acquisition means for acquiring terminal information, which is information related to a terminal outputting a Web page to be searched, at the time of search execution or before search execution,
The information ordering means specifies a Web page that satisfies a predetermined condition based on at least terminal information, and an output terminal that is a terminal that has output or has output the Web page,
When the output terminal is a virtual web page and a mutual link is formed between the web page output by the output terminal and the virtual web page, the link formed by the web page and the virtual web page Based on the structure, set a transition probability matrix with a transition from the display of the link source Web page to the display of the link destination Web page as a transition,
The information search system according to any one of claims 1 to 3, wherein a transition probability is calculated based on a set transition probability matrix, and Web pages to be searched are ranked based on a calculation result.

Based on the terminal information, to calculate a user rank value indicating the importance of the user with respect to the knowledge and / or user search ability of the user using the output terminal,
The element of the square matrix and the transition probability matrix and / or at least one of the calculated square matrix or the eigenvector of the transition probability matrix is changed based on the user rank value. The information search system according to any one of 2 to 5.

It has an information acquisition function capable of acquiring information related to the content of the web page on the network, and calculates a relation rank value indicating the importance of the content of the web page based on the information acquired by the information acquisition function.
The element of the square matrix and the transition probability matrix and / or at least one of the calculated square matrix or eigenvector of the transition probability matrix is changed based on the relation rank value. The information search system according to any one of 2 to 6.

An information search system for searching a database based on a keyword input at a terminal and outputting and displaying the search result on the terminal,
It has information acquisition means for acquiring terminal information, which is information related to a terminal outputting a Web page to be searched, at the time of search execution or before search execution,
Having information ordering means for ordering Web pages to be searched;
The information ordering means identifies a Web page that satisfies a predetermined condition based on at least terminal information, an output terminal that is outputting or outputting the Web page, and based on the Web page and the output terminal Set a square matrix,
The square matrix is formed from a row element and a column element corresponding to a Web page and an output terminal, and each element P (i, j) satisfies the relationship of the following formula (1),
An information search system, wherein eigenvectors of the set square matrix are calculated, and Web pages to be searched are ordered based on the magnitudes of the calculated eigenvector components.
P (i, j) = αX / Y (1)
α = a number other than 0
X = number of links from the web page corresponding to column “j” to the web page corresponding to row “i”, or at the output terminal corresponding to row “i” of the web page corresponding to column “j” Number of displays, or number of web pages displayed corresponding to row “i” on the output terminal corresponding to column “j”, or 0
Y = sum of the number of Web pages displayed on the output terminal corresponding to the column “j” and the number of links formed on the Web page corresponding to the column “j”

An information search method for searching a Web page based on a keyword input at a terminal that has executed a search, and outputting and displaying the search result on the terminal,
Performing an operation of estimating a transition tendency from a relationship between a Web page linked to a Web page to be searched and a terminal that is currently connected to the Web page to be searched or within a certain period of time. Information retrieval method.

At the time of search execution or before execution of search, terminal information that is information related to the terminal that outputs the Web page to be searched is acquired,
A web page that satisfies a predetermined condition based on at least terminal information, and an output terminal that is the terminal that has output or has output the web page,
When the output terminal is a virtual web page and a mutual link is formed between the web page output by the output terminal and the virtual web page, the link formed by the web page and the virtual web page Based on the structure, set a transition probability matrix with a transition from the display of the link source Web page to the display of the link destination Web page as a transition,
The information search method according to claim 9, wherein the transition probability is calculated based on the set transition probability matrix, and the Web pages to be searched are ranked based on the calculation result.