JP2013168177A

JP2013168177A - Information provision program, information provision apparatus, and provision method of retrieval service

Info

Publication number: JP2013168177A
Application number: JP2013098004A
Authority: JP
Inventors: Tetsuro Takahashi; 哲朗高橋; Seishi Okamoto; 青史岡本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-05-07
Filing date: 2013-05-07
Publication date: 2013-08-29

Abstract

PROBLEM TO BE SOLVED: To improve efficiency of retrieval activity by a user in a retrieval system.SOLUTION: Articles published in blogs are extracted from a set of the blogs obtained by giving a retrieval query, and the extracted articles are classified into a plurality of clusters according to published contents. Then, conformity degree of the respective blogs to the retrieval query is calculated on the basis of correlation of the blogs and the clusters, and retrieved result relating to the blogs conforming to the retrieval query is presented on the basis of the calculated result. Thereby, since the respective blogs are evaluated considering the correlation of the blogs and topics on the basis of the published contents, appropriate scoring can be performed even for the blog having little link relation with other blogs.

Description

この発明は、ユーザから受け付けた検索クエリを与えることで得られるページ群を検索結果として提示する情報提供方法、情報提供装置、情報提供プログラム、および該プログラムをコンピュータに記録した記録媒体に関する。 The present invention relates to an information providing method, an information providing apparatus, an information providing program, and a recording medium in which the program is recorded in a computer. The information providing method presents a page group obtained by giving a search query received from a user as a search result.

従来より、情報検索サービスの分野において、ユーザが検索活動を効率よくおこなうための技術が要求されている。この要求に対して、例えば、検索結果を提示する際に、ページ間のリンク関係に基づいて、各ページのスコアを算出し、そのスコアをもとに検索結果のランキングをおこなう技術が知られている。 2. Description of the Related Art Conventionally, in the field of information search service, a technique for allowing a user to efficiently perform a search activity has been required. In response to this request, for example, when a search result is presented, a technique for calculating the score of each page based on the link relationship between pages and ranking the search result based on the score is known. Yes.

具体的には、例えば、ページ上のリンク、リンクのアンカー、別のページにリンクするページの人気度などの要因を考慮して、検索クエリに関連するページのランキングがおこなわれる。この場合、「他のページから多くリンクされているページが重要」、「重要なページからリンクされているページが重要」という関係が成立する。 Specifically, ranking of pages related to a search query is performed in consideration of factors such as links on a page, link anchors, and popularity of a page linked to another page. In this case, the relationship that “a page linked many from other pages is important” and “a page linked from important pages is important” is established.

また、下記特許文献１には、リンク関係に基づくランキング手法として、ワールドワイドウェブの階層構造およびリンク関係を用いて、ウェブ検索のためのページ重要性ランキングをおこなう技術が記載されている。具体的には、リンクグラフ分析を集約されたリンク関係に関して実行し、各ノードの重要性を判定する。各ノードの重要性は、そのノードに関連づけられたページに伝搬する。そして、ページごとに、そのページの重要性と、そのページに関連づけられたノードの重要性とを用いて、ページ重要性ランキングを計算する。 Patent Document 1 below describes a technique for performing page importance ranking for web search using a hierarchical structure and link relationship of the World Wide Web as a ranking method based on link relationship. Specifically, link graph analysis is performed on the aggregated link relationships to determine the importance of each node. The importance of each node propagates to the page associated with that node. Then, for each page, the page importance ranking is calculated using the importance of the page and the importance of the node associated with the page.

また、下記特許文献２には、検索クエリに関するトピックを用いた検索技術として、ウェブ検索のためのインデックスの作成、または、ウェブ検索のために、トピック集合からトピックを抽出して、そのトピックにより検索結果を得る技術が記載されている。具体的には、情報網が有するトピックマップから、データ群の検索に用いるインデックスを生成し、検索条件に適合したデータの索引データを検索結果として表示する。 Further, in Patent Document 2 below, as a search technique using a topic related to a search query, a topic is extracted from a set of topics for creating an index for web search or for web search, and search is performed based on the topic. Techniques for obtaining results are described. Specifically, an index used for searching a data group is generated from a topic map included in the information network, and index data of data that meets the search condition is displayed as a search result.

特開２００６−１２７５２９号公報JP 2006-127529 A 特開２００６−２１５７５３号公報JP 2006-215753 A

しかしながら、上述した特許文献１および２に記載の従来技術によれば、他のページとのリンク関係が少ない、あるいは、リンク関係を有していないページに対しては、適切なランキングをおこなうことができないという問題がある。例えば、ネットワーク上に登録されて間もないページは、他のユーザからの認知度が低く、他のページからリンクされる可能性が低いため、リンク関係の多いページに比べて下位にランク付けされてしまう。 However, according to the prior art described in Patent Documents 1 and 2 described above, appropriate ranking can be performed for pages that have little or no link relationship with other pages. There is a problem that you can not. For example, a page that has just been registered on the network is ranked lower than a page with many links because it is less recognized by other users and is less likely to be linked from other pages. End up.

また、リンク関係に基づくランキングは、検索クエリや掲載内容を考慮した手法ではないため、ページの重要性とリンク関係との間に大きな乖離が生じてしまう場合がある。このため、リンク関係を有していないために、検索クエリと関係の強いページや検索者が求める情報が掲載されたページが下位にランク付けされ、結果的に検索活動にかかるユーザの作業時間および作業負担の増大化を招くという問題がある。 In addition, ranking based on the link relationship is not a method that takes into account the search query and the posted content, so there may be a large divergence between the importance of the page and the link relationship. For this reason, because it does not have a link relationship, pages that have a strong relationship with the search query or pages on which information requested by the searcher is ranked lower, resulting in the user's work time and the search activity There is a problem that the work burden increases.

この発明は、上述した従来技術による問題点を解消するため、ページの掲載内容を考慮した適切なスコアリングをおこなうことにより、検索結果のランキングを最適化し、ユーザの検索活動の効率化を図ることができる情報提供方法、情報提供装置、情報提供プログラム、および該プログラムをコンピュータに記録した記録媒体を提供することを目的とする。 In order to solve the above-described problems caused by the prior art, the present invention optimizes the ranking of search results and increases the efficiency of user search activities by performing appropriate scoring in consideration of the content of the page. It is an object to provide an information providing method, an information providing apparatus, an information providing program, and a recording medium in which the program is recorded in a computer.

本発明の一側面によれば、検索クエリに応じて検索された複数のウェブページのそれぞれの掲載内容に応じて形成した複数の集合のそれぞれの集合名を表示し、表示された前記複数の集合のそれぞれの集合名のうち、選択された集合名の集合に属するウェブページのタイトル情報を表示する情報提供プログラムおよび情報提供装置が提案される。 According to one aspect of the present invention, each set name of a plurality of sets formed in accordance with each posting content of a plurality of web pages searched according to a search query is displayed, and the plurality of displayed sets An information providing program and an information providing apparatus for displaying title information of web pages belonging to a set of selected set names are proposed.

また、本発明の一側面によれば、クライアントからの検索依頼に応じて検索結果を該クライアントに出力する、サーバによって実行される検索サービスの提供方法であって、前記クライアントから送信された検索クエリに応じて検索された複数のウェブページのそれぞれの掲載内容に応じて形成した複数の集合のそれぞれの集合名を前記クライアントに提供し、前記クライアントにより前記複数の集合のそれぞれの集合名のうちのいずれかの集合名の選択がなされると、選択された該集合名の集合に属するウェブページのタイトル情報を前記クライアントに提供する検索サービスの提供方法が提案される。 According to another aspect of the present invention, there is provided a search service executed by a server that outputs a search result to a client in response to a search request from the client, the search query transmitted from the client Providing a set name of each of a plurality of sets formed in accordance with each posting content of the plurality of web pages searched according to the client, of the set names of the plurality of sets by the client When any set name is selected, a search service providing method for providing title information of web pages belonging to the selected set of set names to the client is proposed.

この情報提供方法、情報提供装置、情報提供プログラム、および該プログラムをコンピュータに記録した記録媒体によれば、ページの掲載内容を考慮した適切なスコアリングをおこなうことにより、検索結果のランキングを最適化し、ユーザの検索活動の効率化を図ることができるという効果を奏する。 According to the information providing method, the information providing apparatus, the information providing program, and the recording medium in which the program is recorded on the computer, the ranking of the search result is optimized by performing appropriate scoring in consideration of the content of the page. Thus, it is possible to improve the efficiency of user search activities.

以下に添付図面を参照して、この情報提供方法、情報提供装置、情報提供プログラム、および該プログラムをコンピュータに記録した記録媒体の好適な実施の形態を詳細に説明する。 Exemplary embodiments of an information providing method, an information providing apparatus, an information providing program, and a recording medium recording the program on a computer will be described below in detail with reference to the accompanying drawings.

（検索システムのシステム構成）
まず、本実施の形態にかかる検索システム１００のシステム構成について説明する。図１は、検索システムのシステム構成図である。図１において、検索システム１００は、インターネット、ＬＡＮ、ＷＡＮなどのネットワーク１１０を介して通信可能な情報提供装置１０１と、データベースサーバ１０２と、クライアント端末１０３−１〜１０３−ｎと、から構成される。 (System configuration of search system)
First, the system configuration of the search system 100 according to the present embodiment will be described. FIG. 1 is a system configuration diagram of a search system. In FIG. 1, a search system 100 includes an information providing apparatus 101 capable of communicating via a network 110 such as the Internet, a LAN, and a WAN, a database server 102, and client terminals 103-1 to 103-n. .

情報提供装置１０１は、ネットワーク１１０上に公開されているページ群の中から、クライアント端末１０３−１〜１０３−ｎから受け付けた検索クエリに適合するページを検索する機能を有するコンピュータ装置である。具体的には、クライアント端末１０３−１〜１０３−ｎから受け付けた検索クエリをデータベースサーバ１０２に送信することで、その検索クエリに適合するページを検索する。 The information providing apparatus 101 is a computer apparatus having a function of searching for a page that matches a search query received from the client terminals 103-1 to 103-n from a group of pages disclosed on the network 110. Specifically, the search query received from the client terminals 103-1 to 103-n is transmitted to the database server 102 to search for a page that matches the search query.

ここで、ページとは、例えば、インターネット上に公開されているウェブサイトであってもよく、また、ＬＡＮなどの限定されたネットワーク１１０上にあるフォルダ内のファイル群であってもよい。ウェブサイトの具体例としては、作成者（著者）を特定可能なウェブログ（以下、「ブログ」という）がある。より詳細に説明すると、ページは、テキストデータやＨＴＭＬによるレイアウト情報、文書中に埋め込まれた画像、音声および動画などから構成される電子情報である。 Here, the page may be, for example, a website published on the Internet, or may be a group of files in a folder on a limited network 110 such as a LAN. As a specific example of the website, there is a weblog (hereinafter referred to as “blog”) that can identify the creator (author). More specifically, a page is electronic information composed of text data, HTML layout information, images embedded in a document, audio, and moving images.

また、ページには、少なくとも一つ以上の掲載情報が含まれている。ここで、掲載情報とは、例えば、ウェブサイトを構成する個々のウェブページであってもよく、また、上述したフォルダ内の個々のファイルであってもよい。具体的には、ページがブログであった場合、更新される都度、新たに登録される１回分の情報（エントリ）を掲載情報としてもよい。また、一画面に表示される内容から分割されたフレーム単位の情報を掲載情報としてもよい。 Further, the page includes at least one publication information. Here, the posted information may be, for example, individual web pages constituting the website, or may be individual files in the above-described folder. Specifically, when the page is a blog, information (entries) for one time newly registered each time it is updated may be used as the posting information. Further, information in units of frames divided from the contents displayed on one screen may be used as posted information.

また、情報提供装置１０１は、検索されたページごとに検索クエリに対する適合度を算出する。さらに、各ページの適合度に基づいて検索結果をあらわすページ情報を生成し、そのページ情報をクライアント端末１０３−１〜１０３−ｎに提示する。この情報提供装
置１０１は、例えば、情報検索サービスの検索エンジンに適用される。 In addition, the information providing apparatus 101 calculates the degree of fitness for the search query for each searched page. Furthermore, page information representing the search result is generated based on the degree of matching of each page, and the page information is presented to the client terminals 103-1 to 103-n. This information providing apparatus 101 is applied to a search engine of an information search service, for example.

データベースサーバ１０２は、不図示のページＤＢ（データベース）を備え、ネットワーク１１０上のページ群を管理する機能を有するコンピュータ装置である。データベースサーバ１０２は、情報提供装置１０１から受信した検索クエリに適合するページをページＤＢの中から検索し、その検索結果を情報提供装置１０１に送信する。 The database server 102 is a computer device that includes a page DB (database) (not shown) and has a function of managing a group of pages on the network 110. The database server 102 searches the page DB for a page that matches the search query received from the information providing apparatus 101, and transmits the search result to the information providing apparatus 101.

クライアント端末１０３−１〜１０３−ｎは、ユーザが操作することで入力された検索クエリを情報提供装置１０１に送信する機能を有するコンピュータ装置である。また、クライアント端末１０３−１〜１０３−ｎは、検索クエリを送信した結果、その検索クエリに適合する検索結果を情報提供装置１０１から受信し、その検索結果を表示画面に表示する。 The client terminals 103-1 to 103-n are computer devices having a function of transmitting a search query input by a user operation to the information providing device 101. Further, as a result of transmitting the search query, the client terminals 103-1 to 103-n receive the search result that matches the search query from the information providing apparatus 101 and display the search result on the display screen.

（コンピュータ装置のハードウェア構成）
つぎに、図１に示した情報提供装置１０１、データベースサーバ１０２およびクライアント端末１０３−１〜１０３−ｎ（ここでは、単に「コンピュータ装置」という）のハードウェア構成について説明する。図２は、コンピュータ装置のハードウェア構成を示す説明図である。 (Hardware configuration of computer device)
Next, the hardware configuration of the information providing apparatus 101, the database server 102, and the client terminals 103-1 to 103-n (herein, simply referred to as “computer apparatus”) illustrated in FIG. 1 will be described. FIG. 2 is an explanatory diagram illustrating a hardware configuration of the computer apparatus.

図２において、コンピュータ装置は、コンピュータ本体２１０と、入力装置２２０と、出力装置２３０と、から構成されており、不図示のルータやモデムを介してＬＡＮ，ＷＡＮやインターネットなどのネットワーク１１０に接続可能である。 In FIG. 2, the computer device comprises a computer main body 210, an input device 220, and an output device 230, and can be connected to a network 110 such as a LAN, WAN, or the Internet via a router or a modem (not shown). It is.

コンピュータ本体２１０は、ＣＰＵ，メモリ，インターフェースを有する。ＣＰＵは、コンピュータ装置の全体の制御を司る。メモリは、ＲＯＭ，ＲＡＭ，ＨＤ，光ディスク２１１，フラッシュメモリから構成される。メモリはＣＰＵのワークエリアとして使用される。 The computer main body 210 has a CPU, a memory, and an interface. The CPU controls the entire computer device. The memory is composed of ROM, RAM, HD, optical disk 211, and flash memory. The memory is used as a work area for the CPU.

また、メモリには各種プログラムが格納されており、ＣＰＵからの命令に応じてロードされる。ＨＤおよび光ディスク２１１はディスクドライブによりデータのリード／ライトが制御される。また、光ディスク２１１およびフラッシュメモリはコンピュータ本体２１０に対し着脱自在である。インターフェースは、入力装置２２０からの入力、出力装置２３０への出力、ネットワーク１１０に対する送受信の制御をおこなう。 Various programs are stored in the memory, and loaded according to instructions from the CPU. Data read / write of the HD and the optical disk 211 is controlled by a disk drive. The optical disk 211 and the flash memory are detachable from the computer main body 210. The interface controls input from the input device 220, output to the output device 230, and transmission / reception with respect to the network 110.

また、入力装置２２０としては、キーボード２２１、マウス２２２、スキャナ２２３などがある。キーボード２２１は、文字、数字、各種指示などの入力のためのキーを備え、データの入力をおこなう。また、タッチパネル式であってもよい。マウス２２２は、カーソルの移動や範囲選択、あるいはウィンドウの移動やサイズの変更などをおこなう。スキャナ２２３は、画像を光学的に読み取る。読み取られた画像は画像データとして取り込まれ、コンピュータ本体２１０内のメモリに格納される。なお、スキャナ２２３にＯＣＲ機能を持たせてもよい。 The input device 220 includes a keyboard 221, a mouse 222, a scanner 223, and the like. The keyboard 221 includes keys for inputting characters, numbers, various instructions, and the like, and inputs data. Further, it may be a touch panel type. The mouse 222 performs cursor movement, range selection, window movement, size change, and the like. The scanner 223 optically reads an image. The read image is captured as image data and stored in a memory in the computer main body 210. Note that the scanner 223 may have an OCR function.

また、出力装置２３０としては、ディスプレイ２３１、スピーカ２３２、プリンタ２３３などがある。ディスプレイ２３１は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。また、スピーカ２３２は、効果音や読み上げ音などの音声を出力する。また、プリンタ２３３は、画像データや文書データを印刷する。 Examples of the output device 230 include a display 231, a speaker 232, and a printer 233. The display 231 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. The speaker 232 outputs sounds such as sound effects and reading sounds. The printer 233 prints image data and document data.

（本実施の形態の概要）
つぎに、本実施の形態の概要について説明する。図３は、本実施の形態の概要を示す説
明図である。図３において、まず、クライアント端末１０３−１〜１０３−ｎ（以下、単に「クライアント端末１０３」と表記する）により、ユーザがキーボード２２１やマウス２２２などの入力装置２２０を操作することで入力された検索クエリを情報提供装置１０１に送信する。 (Outline of this embodiment)
Next, an outline of the present embodiment will be described. FIG. 3 is an explanatory diagram showing an outline of the present embodiment. In FIG. 3, first, the client terminal 103-1 to 103-n (hereinafter simply referred to as “client terminal 103”) is input by the user operating the input device 220 such as the keyboard 221 or the mouse 222. A search query is transmitted to the information providing apparatus 101.

このあと、情報提供装置１０１により、クライアント端末１０３から受け付けた検索クエリに適合するページを検索する。この結果、図４に示す検索結果テーブル４００、図５に示す記事リンクテーブル５００および図６に示すブログリンクテーブル６００が作成される。本実施の形態では、ページとして、少なくとも一つ以上の記事（掲載情報）が掲載されたブログを例に挙げて説明する。 Thereafter, the information providing apparatus 101 searches for a page that matches the search query received from the client terminal 103. As a result, the search result table 400 shown in FIG. 4, the article link table 500 shown in FIG. 5, and the blog link table 600 shown in FIG. 6 are created. In the present embodiment, a blog on which at least one article (published information) is posted will be described as an example.

つぎに、検索結果テーブル４００および記事リンクテーブル５００を用いて、各ブログに掲載されている記事の言語解析を記事ごとに実行する。具体的には、例えば、各記事に対して、形態素解析や係り受け解析などの言語解析を実行する。この結果、図７に示す解析結果テーブル７００が作成される。 Next, using the search result table 400 and the article link table 500, language analysis of articles posted on each blog is executed for each article. Specifically, for example, language analysis such as morphological analysis and dependency analysis is executed for each article. As a result, an analysis result table 700 shown in FIG. 7 is created.

そして、解析結果テーブル７００を用いて、各記事を掲載内容に応じた複数のクラスタにクラスタリングする。この結果、図１０に示す記事クラスタテーブル１０００および図１１に示すブログクラスタテーブル１１００が作成される。 Then, using the analysis result table 700, each article is clustered into a plurality of clusters corresponding to the contents of the publication. As a result, the article cluster table 1000 shown in FIG. 10 and the blog cluster table 1100 shown in FIG. 11 are created.

このあと、ブログリンクテーブル６００、記事クラスタテーブル１０００およびブログクラスタテーブル１１００を用いて、ネットワークモデル１２００（図１２参照）を作成する。この結果、図１３に示すネットワークテーブル１３００が作成される。 Thereafter, a network model 1200 (see FIG. 12) is created using the blog link table 600, the article cluster table 1000, and the blog cluster table 1100. As a result, a network table 1300 shown in FIG. 13 is created.

つぎに、ネットワークテーブル１３００を用いて、ネットワークモデル１２００内の各ノードのスコアを算出する。この結果、図１４に示すスコアテーブル１４００が作成される。そして、スコアテーブル１４００を用いて検索結果のランキングをおこなう。 Next, the score of each node in the network model 1200 is calculated using the network table 1300. As a result, a score table 1400 shown in FIG. 14 is created. Then, the search results are ranked using the score table 1400.

最後に、ランキングに従ってブログ群に関するＨＴＭＬ情報を生成し、そのＨＴＭＬ情報を検索結果としてクライアント端末１０３に提示する。この結果、クライアント端末１０３のディスプレイ２３１に検索結果が表示される。 Finally, HTML information related to the blog group is generated according to the ranking, and the HTML information is presented to the client terminal 103 as a search result. As a result, the search result is displayed on the display 231 of the client terminal 103.

本実施の形態では、他のページとのリンク関係の有無にかかわらず、検索クエリに適合するページ（ブログ）に掲載されている記事の掲載内容を考慮したスコアリングをおこなう。この結果、他のページとのリンク関係が少ない、あるいは、リンク関係を有していないページについても適合度を適切に評価することができ、ランキングの最適化を図ることができる。 In this embodiment, scoring is performed in consideration of the content of articles posted on a page (blog) that matches a search query regardless of whether or not there is a link relationship with other pages. As a result, it is possible to appropriately evaluate the fitness level of pages that have little or no link relationship with other pages, and to optimize the ranking.

（検索結果テーブルの記憶内容）
つぎに、情報提供装置１０１に用いられる検索結果テーブルについて説明する。この検索結果テーブルは、検索システム１００に検索クエリを与えることで得られる検索結果をテーブル化したものである。具体的には、ブログと、そのブログに掲載されている記事とを関連付けてあらわすテーブル表である。 (Memory contents of search result table)
Next, a search result table used for the information providing apparatus 101 will be described. This search result table is a table of search results obtained by giving a search query to the search system 100. Specifically, it is a table that represents a blog and an article posted on the blog in association with each other.

図４は、検索結果テーブルの記憶内容を示す説明図である。図４において、検索結果テーブル４００には、検索クエリに適合するブログに掲載されている記事ごとに、ブログＩＤおよび記事ＩＤが記憶されている。ブログＩＤは、ブログを識別する識別子である。記事ＩＤは、記事を識別する識別子である。ここで、ブログＢ５５を例に挙げると、ブログＢ５５には記事Ｐ８５，Ｐ７１が掲載されている。 FIG. 4 is an explanatory diagram showing the stored contents of the search result table. In FIG. 4, the search result table 400 stores a blog ID and an article ID for each article posted on a blog that matches the search query. The blog ID is an identifier for identifying a blog. The article ID is an identifier for identifying an article. Here, taking blog B55 as an example, articles P85 and P71 are posted on blog B55.

これらブログＩＤおよび記事ＩＤは、例えば、ブログおよび記事が存在するネットワーク１１０上の場所を示すＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）によって表現される。なお、検索結果テーブル４００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 These blog IDs and article IDs are expressed by, for example, URLs (Uniform Resource Locators) that indicate locations on the network 110 where blogs and articles exist. The search result table 400 realizes its function by the storage unit such as ROM, RAM, HD described in FIG.

（記事リンクテーブルの記憶内容）
つぎに、情報提供装置１０１に用いられる記事リンクテーブルについて説明する。この記事リンクテーブルは、記事内に埋め込まれているハイパーリンクを検出することで得られるリンク関係をテーブル化したものである。具体的には、リンク元（ｆｒｏｍ）の記事とリンク先（ｔｏ）の記事とを関連付けてあらわすテーブル表である。 (Stored contents of article link table)
Next, an article link table used for the information providing apparatus 101 will be described. This article link table is a table of link relationships obtained by detecting hyperlinks embedded in articles. More specifically, the table is a table that represents the articles of the link source (from) and the articles of the link destination (to) in association with each other.

図５は、記事リンクテーブルの記憶内容を示す説明図である。図５において、記事リンクテーブル５００には、記事内に埋め込まれているハイパーリンクの個数が記事ごとに記憶されている。具体的には、記事ごとに、その記事（ｆｒｏｍ）から他の記事（ｔｏ）にリンクするハイパーリンクの個数が記憶されている。 FIG. 5 is an explanatory diagram showing the contents stored in the article link table. In FIG. 5, the article link table 500 stores the number of hyperlinks embedded in the article for each article. Specifically, for each article, the number of hyperlinks linked from the article (from) to another article (to) is stored.

ここで、記事Ｐ５３を例に挙げると、記事Ｐ５３内には記事Ｐ８５へリンクするハイパーリンクが２個埋め込まれている。なお、記事リンクテーブル５００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 Here, taking article P53 as an example, two hyperlinks that link to article P85 are embedded in article P53. Note that the article link table 500 realizes its function by the storage unit such as ROM, RAM, HD described in FIG.

（ブログリンクテーブルの記憶内容）
つぎに、情報提供装置１０１に用いられるブログリンクテーブルについて説明する。このブログリンクテーブルは、図４に示した検索結果テーブル４００と、図５に示した記事リンクテーブル５００と、を用いて得られるブログ間の相関関係をあらわすテーブル表である。 (Contents stored in the blog link table)
Next, a blog link table used for the information providing apparatus 101 will be described. This blog link table is a table showing the correlation between blogs obtained using the search result table 400 shown in FIG. 4 and the article link table 500 shown in FIG.

図６は、ブログリンクテーブルの記憶内容を示す説明図である。図６において、ブログリンクテーブル６００には、ブログごとに、そのブログとリンク関係を有する他のブログとの相関関係をあらわす数値が記憶されている。この数値は、一方のブログから他方のブログに遷移する遷移確率（詳細は後述）に相当する。 FIG. 6 is an explanatory diagram showing the stored contents of the blog link table. In FIG. 6, a blog link table 600 stores a numerical value representing a correlation between each blog and another blog having a link relationship with the blog. This numerical value corresponds to the transition probability (details will be described later) of transition from one blog to the other blog.

ここで、ブログＢ２５を例に挙げると、ブログＢ２３およびブログＢ５５とリンク関係を有しており、相関関係をあらわす数値はそれぞれ「０．５」である。また、この数値は、ブログリンクテーブル６００の行の合計が「１」となるように正規化されている。なお、ブログリンクテーブル６００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 Here, taking blog B25 as an example, it has a link relationship with blog B23 and blog B55, and the numerical value representing the correlation is “0.5”. Also, this numerical value is normalized so that the total number of rows in the blog link table 600 is “1”. The blog link table 600 realizes its function by the storage unit such as ROM, RAM, HD described in FIG.

（解析結果テーブルの記憶内容）
つぎに、情報提供装置１０１に用いられる解析結果テーブルについて説明する。この解析結果テーブルは、各記事に対して言語解析（例えば、形態素解析、係り受け解析）を実行することで得られた解析結果をテーブル化したものである。 (Storage contents of analysis result table)
Next, an analysis result table used for the information providing apparatus 101 will be described. This analysis result table is a table of analysis results obtained by performing language analysis (for example, morphological analysis, dependency analysis) on each article.

図７は、解析結果テーブルの記憶内容を示す説明図である。図７において、解析結果テーブル７００には、記事ごとに、記事ＩＤおよび解析結果が記憶されている。ここで、記事Ｐ５３を例に挙げると、解析結果Ｒ１は、記事Ｐ５３に対して実行された言語解析の解析結果である。解析結果の具体例については図９を用いて後述する。なお、解析結果テーブル７００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 FIG. 7 is an explanatory diagram showing the storage contents of the analysis result table. In FIG. 7, the analysis result table 700 stores an article ID and an analysis result for each article. Here, taking the article P53 as an example, the analysis result R1 is an analysis result of the language analysis performed on the article P53. A specific example of the analysis result will be described later with reference to FIG. The analysis result table 700 realizes its function by the storage unit such as ROM, RAM, HD described in FIG.

（情報提供装置の機能的構成）
つぎに、情報提供装置１０１の機能的構成について説明する。図８は、情報提供装置の機能的構成を示すブロック図である。図８において、情報提供装置１０１は、受付部８０１と、検索部８０２と、抽出部８０３と、分類部８０４と、算出部８０５と、作成部８０６と、提示部８０７と、決定部８０８と、選択部８０９と、を備えている。 (Functional configuration of information providing device)
Next, a functional configuration of the information providing apparatus 101 will be described. FIG. 8 is a block diagram illustrating a functional configuration of the information providing apparatus. 8, the information providing apparatus 101 includes a receiving unit 801, a searching unit 802, an extracting unit 803, a classifying unit 804, a calculating unit 805, a creating unit 806, a presenting unit 807, a determining unit 808, A selection unit 809.

これら各機能８０１〜８０９は、情報提供装置１０１の記憶部に記憶された当該機能８０１〜８０９に関するプログラムをＣＰＵに実行させることにより、または、入出力Ｉ／Ｆにより、当該機能を実現することができる。また、各機能８０１〜８０９からの出力データは上記記憶部に保持される。また、図８中矢印で示した接続先の機能は、接続元の機能からの出力データを記憶部から読み込んで、当該機能に関するプログラムをＣＰＵに実行させるものとする。 Each of the functions 801 to 809 can be realized by causing the CPU to execute a program related to the functions 801 to 809 stored in the storage unit of the information providing apparatus 101 or by using an input / output I / F. it can. Output data from each function 801 to 809 is held in the storage unit. In addition, the connection destination function indicated by the arrow in FIG. 8 reads output data from the connection source function from the storage unit and causes the CPU to execute a program related to the function.

まず、受付部８０１は、検索クエリの入力を受け付ける機能を有する。検索クエリとは、例えば、検索システム１００（図１参照）に対する問い合わせをあらわす文字列である。具体的には、例えば、受付部８０１は、クライアント端末１０３（図１参照）から送信される検索クエリの入力を受け付ける。 First, the receiving unit 801 has a function of receiving a search query input. The search query is, for example, a character string that represents an inquiry to the search system 100 (see FIG. 1). Specifically, for example, the accepting unit 801 accepts an input of a search query transmitted from the client terminal 103 (see FIG. 1).

検索部８０２は、ネットワーク１１０上に公開されているページ群の中から、受付部８０１によって受け付けた検索クエリに適合するページを検索する機能を有する。ページとは、例えば、インターネット上に公開されているブログなどのウェブサイトである。具体的には、例えば、受付部８０１によって受け付けた検索クエリをデータベースサーバ１０２に送信することで、検索クエリに適合する検索結果を得ることができる。 The search unit 802 has a function of searching for a page that matches the search query received by the receiving unit 801 from a group of pages published on the network 110. A page is a website such as a blog published on the Internet, for example. Specifically, for example, by transmitting the search query received by the receiving unit 801 to the database server 102, a search result that matches the search query can be obtained.

抽出部８０３は、検索部８０２によって検索された少なくとも一つ以上の掲載情報を含むページの集合の中から選ばれた当該ページに掲載されている掲載情報を抽出する機能を有する。ここで、掲載情報とは、ページを構成する記事、画像、音声、動画などである。より具体的には、例えば、ウェブサイトを構成する個々のウェブページである。本実施の形態では、掲載情報として記事（例えば、ブログが更新される都度、新しく登録される一回分の情報）を例に挙げて説明する。 The extraction unit 803 has a function of extracting publication information posted on the page selected from a set of pages including at least one or more publication information searched by the search unit 802. Here, the posted information is an article, an image, a sound, a moving image, or the like constituting the page. More specifically, for example, individual web pages constituting a website. In the present embodiment, description will be given by taking an article (for example, information newly registered once every time a blog is updated) as an example of posted information.

具体的には、例えば、抽出部８０３は、図４に示した検索結果テーブル４００の中から、各ブログに掲載されている記事を抽出する。なお、抽出部８０３は、検索されたすべてのページに掲載されている掲載情報を抽出することとしてもよく、また、検索されたページ（例えば、１００個のブログ）のうち、任意のページ（例えば、ランダムに選択された５０個のブログ）に掲載されている掲載情報を抽出することとしてもよい。 Specifically, for example, the extraction unit 803 extracts articles posted on each blog from the search result table 400 illustrated in FIG. Note that the extraction unit 803 may extract the publication information posted on all the searched pages, and any page (for example, 100 blogs) among the searched pages (for example, 100 blogs). It is good also as extracting the publication information published on 50 blogs selected at random).

分類部８０４は、抽出部８０３によって抽出された掲載情報（記事）を掲載内容に応じた複数のクラスタに分類する機能を有する。具体的には、例えば、抽出部８０３によって抽出された記事に出現する各単語の出現頻度と単語間の類似度とに基づいて、記事を複数のクラスタに分類することとしてもよい。 The classification unit 804 has a function of classifying the posted information (articles) extracted by the extracting unit 803 into a plurality of clusters corresponding to the posted contents. Specifically, for example, the articles may be classified into a plurality of clusters based on the appearance frequency of each word appearing in the article extracted by the extraction unit 803 and the similarity between words.

より具体的には、例えば、抽出部８０３によって抽出された各記事に対して、形態素解析（形態素への分割と品詞の付与）および係り受け解析（形態素から文節のまとめ上げと、文節間の係り受け関係の同定）を実行することで得られる解析結果を用いてクラスタリングをおこなう。 More specifically, for example, for each article extracted by the extraction unit 803, morphological analysis (division into morphemes and part-of-speech assignment) and dependency analysis (combination of phrases from morphemes and relationships between phrases) Clustering is performed using the analysis result obtained by executing (identification of receiving relationship).

ここで、言語解析の解析結果について説明する。図９は、解析結果の一例を示す説明図である。図９において、解析結果９００には、『すべての文書に対して、次のクラスタリングを行なうための言語解析を行なう・・・』という記事に対する言語解析の解析結果が示されている。具体的には、解析結果９００は、解析対象の記事から分割された文節（単語）ごとに、文節ＩＤ、修飾先の文節ＩＤおよび品詞に関する情報を有している。 Here, the analysis result of the language analysis will be described. FIG. 9 is an explanatory diagram illustrating an example of the analysis result. In FIG. 9, the analysis result 900 shows the analysis result of the language analysis for the article “Perform language analysis for the next clustering for all documents ...”. Specifically, the analysis result 900 includes information on the phrase ID, the modification-destination phrase ID, and the part of speech for each phrase (word) divided from the analysis target article.

例えば、符号９０１は、『すべて』を識別する文節ＩＤである。符号９０２は、文節『すべて』の修飾先の文節『の』の文節ＩＤである。このように、記事ごとの言語解析をおこなうことにより、記事を該記事内の各文を構成する最小の意味単位である形態素に分割し、さらに、文中の各文節がどの文節に係る（どの文節を修飾する）のかが定まる。 For example, reference numeral 901 is a phrase ID for identifying “all”. Reference numeral 902 is the clause ID of the clause “NO” to which the clause “ALL” is modified. In this way, by performing linguistic analysis for each article, the article is divided into morphemes that are the smallest semantic units constituting each sentence in the article, and each clause in the sentence relates to which clause (which clause To modify).

以下、図７に示した解析結果テーブル７００を例に挙げて、分類部８０４による分類処理（クラスタリング）の具体的処理手順について説明する。分類部８０４は、まず、解析結果テーブル７００を参照して、各記事Ｐ５３，Ｐ８５，Ｐ４０，Ｐ７１中に出現する単語の出現数をそれぞれ計数する。そして、単語ベクトルを用いてコサイン距離を求めることで、記事間の類似度を算出する。 Hereinafter, a specific processing procedure of classification processing (clustering) by the classification unit 804 will be described by taking the analysis result table 700 illustrated in FIG. 7 as an example. First, the classification unit 804 refers to the analysis result table 700 and counts the number of appearances of words appearing in the articles P53, P85, P40, and P71. And the similarity between articles is calculated by calculating | requiring a cosine distance using a word vector.

つぎに、予め設定されているクラスタ数に応じてクラスタを作成し、各クラスタの単語ベクトルを作成する。具体的には、クラスタに入っている文書の単語ベクトルを足し合わせることで、各クラスタの単語ベクトルを作成する。上記クラスタ数は、例えば、図２に示したキーボード２２１やマウス２２２などの入力装置２２０をユーザが操作することで、任意に設定可能である。 Next, a cluster is created according to a preset number of clusters, and a word vector for each cluster is created. Specifically, the word vectors of each cluster are created by adding the word vectors of the documents in the cluster. The number of clusters can be arbitrarily set by the user operating the input device 220 such as the keyboard 221 and the mouse 222 shown in FIG.

このあと、クラスタの単語ベクトルと記事の単語ベクトルとのコサイン距離を求める。そして、求めたコサイン距離を各記事のクラスタに対する所属確率とする。なお、抽出部８０３によって抽出された掲載情報の言語解析は、情報提供装置１０１において実行することとしたがこれに限らない。例えば、外部のコンピュータ装置を利用して掲載情報の言語解析を実行し、その実行結果を言語解析の解析結果として取得することとしてもよい。 Thereafter, the cosine distance between the cluster word vector and the article word vector is obtained. Then, the obtained cosine distance is set as the affiliation probability for each article cluster. Note that the language analysis of the posted information extracted by the extraction unit 803 is performed in the information providing apparatus 101, but the present invention is not limited to this. For example, the language analysis of the posted information may be executed using an external computer device, and the execution result may be acquired as the analysis result of the language analysis.

ここで、分類部８０４によって複数のクラスタに分類された分類結果をあらわす記事クラスタテーブルの具体例について説明する。図１０は、記事クラスタテーブルの記憶内容を示す説明図である。図１０において、記事クラスタテーブル１０００には、記事ごとに、記事ＩＤとクラスタＩＤとを関連づけて、各クラスタＣ１〜Ｃ４に所属する所属確率が記憶されている。クラスタＩＤは、クラスタを識別する識別子である。 Here, a specific example of the article cluster table representing the classification results classified into a plurality of clusters by the classification unit 804 will be described. FIG. 10 is an explanatory diagram showing the contents stored in the article cluster table. In FIG. 10, the article cluster table 1000 stores the affiliation probabilities belonging to each of the clusters C1 to C4 by associating the article ID with the cluster ID for each article. The cluster ID is an identifier for identifying a cluster.

所属確率とは、各記事のクラスタＣ１〜Ｃ４に対する相関の強さをあらわす数値である。ここで、記事Ｐ５３を例に挙げると、クラスタＣ１への所属確率は「０．１」、クラスタＣ２への所属確率は「０．５」、クラスタＣ３への所属確率は「０．２」、クラスタＣ４への所属確率は「０．０」である。なお、記事クラスタテーブル１０００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 The affiliation probability is a numerical value that represents the strength of correlation of each article with respect to the clusters C1 to C4. Taking the article P53 as an example, the membership probability to the cluster C1 is “0.1”, the membership probability to the cluster C2 is “0.5”, and the membership probability to the cluster C3 is “0.2”. The belonging probability to the cluster C4 is “0.0”. Note that the article cluster table 1000 realizes its function by the storage unit such as the ROM, RAM, and HD described in FIG.

図８の説明に戻り、算出部８０５は、検索部８０２によって検索されたページと分類部８０４によって分類されたクラスタとの相関関係に基づいて、検索クエリに対するページの適合度をページごとに算出する機能を有する。具体的には、後述する作成部８０６によって作成されるページ（例えば、ブログ）とクラスタとをノードとして有向グラフ化されたネットワークモデルを用いて、各ページの適合度を算出することができる。 Returning to the description of FIG. 8, the calculation unit 805 calculates, for each page, the suitability of the page with respect to the search query based on the correlation between the page searched by the search unit 802 and the cluster classified by the classification unit 804. It has a function. Specifically, the fitness of each page can be calculated using a network model that is a directed graph with nodes (for example, blogs) and clusters created by the creation unit 806 described later as nodes.

作成部８０６は、ページとクラスタとをノードとして有向グラフ化されたネットワークモデルを作成する機能を有する。具体的には、例えば、作成部８０６は、まず、図１０に示した記事クラスタテーブル１０００を用いて、ブログ（ページ）のクラスタに対する所属確率をあらわすブログクラスタテーブルを作成する。 The creation unit 806 has a function of creating a network model in a directed graph with pages and clusters as nodes. Specifically, for example, the creation unit 806 first creates a blog cluster table representing the affiliation probability of a blog (page) with respect to a cluster using the article cluster table 1000 shown in FIG.

より具体的には、ブログごとに、ブログに掲載されているすべての記事のクラスタに対する所属確率を加算し、その加算結果をブログのクラスタに対する所属確率とする。ここで、ブログＢ５５のクラスタＣ２に対する所属確率を例に挙げると、記事クラスタテーブル１０００を参照して、記事Ｐ８５のクラスタＣ２に対する所属確率「０．８」と、記事Ｐ７１のクラスタＣ２に対する所属確率「０．４」とを加算した所属確率「１．２」となる。 More specifically, for each blog, the affiliation probabilities for all articles posted on the blog are added, and the addition result is set as the affiliation probability for the blog cluster. Here, taking the affiliation probability of the blog B55 with respect to the cluster C2 as an example, referring to the article cluster table 1000, the affiliation probability “0.8” with respect to the cluster C2 of the article P85 and the affiliation probability with respect to the cluster C2 of the article P71 “ The membership probability “1.2” is obtained by adding “0.4”.

ここで、作成部８０６によって作成されたブログクラスタテーブルの具体例について説明する。図１１は、ブログクラスタテーブルの記憶内容を示す説明図である。図１１において、ブログクラスタテーブル１１００には、ブログごとに、ブログＩＤとクラスタＩＤとを関連付けて、各クラスタＣ１〜Ｃ４に遷移する遷移確率が記憶されている。 Here, a specific example of the blog cluster table created by the creation unit 806 will be described. FIG. 11 is an explanatory diagram showing the storage contents of the blog cluster table. In FIG. 11, the blog cluster table 1100 stores transition probabilities of transition to each of the clusters C1 to C4 in association with the blog ID and the cluster ID for each blog.

ここで、ブログＢ２３を例に挙げると、クラスタＣ１への遷移確率は「０．１」、クラスタＣ２への遷移確率は「０．５」、クラスタＣ３への遷移確率は「０．２」、クラスタＣ４への遷移確率は「０．０」である。なお、ブログクラスタテーブル１１００は、図２で説明したＲＯＭ，ＲＡＭ，ＨＤなどの記憶部によりその機能を実現する。 Taking blog B23 as an example, the transition probability to cluster C1 is “0.1”, the transition probability to cluster C2 is “0.5”, the transition probability to cluster C3 is “0.2”, The transition probability to the cluster C4 is “0.0”. The blog cluster table 1100 realizes its function by the storage unit such as ROM, RAM, HD described in FIG.

このあと、作成部８０６は、このブログクラスタテーブル１１００と、図６に示したブログリンクテーブル６００とを用いて、ブログＢ２３，Ｂ５５，Ｂ２５とクラスタＣ１〜Ｃ４とをノードとして有向グラフ化されたネットワークモデルを作成する。 Thereafter, the creation unit 806 uses the blog cluster table 1100 and the blog link table 600 shown in FIG. 6 to create a network model in which the blogs B23, B55, and B25 and the clusters C1 to C4 are used as nodes as a directed graph. Create

図１２は、ネットワークモデルの一例を示す有向グラフである。図１２において、ネットワークモデル１２００は、ブログＢ２３，Ｂ２５，Ｂ５５とクラスタＣ１〜Ｃ４とがノード化された有向グラフである。また、ノード間をつなぐエッジには、各ノード間の遷移確率が付与されている。なお、図面では、ノード間の遷移確率の一部を抜粋して表示している。 FIG. 12 is a directed graph showing an example of a network model. In FIG. 12, a network model 1200 is a directed graph in which blogs B23, B25, and B55 and clusters C1 to C4 are noded. Further, transition probabilities between the nodes are given to the edges connecting the nodes. In the drawing, a part of the transition probability between nodes is extracted and displayed.

ネットワークモデル１２００において、実線で示す両向矢印は、ブログをあらわすノードとクラスタをあらわすノードとの間の遷移をあらわしている。この両向矢印には、ブログをあらわすノードからクラスタをあらわすノードへの遷移確率（図１２中、下線）と、クラスタをあらわすノードからブログをあらわすノードへの遷移確率（図１２中、二重下線）とが付与されている。 In the network model 1200, a double-headed arrow indicated by a solid line represents a transition between a node representing a blog and a node representing a cluster. The two-way arrow has a transition probability from a node representing a blog to a node representing a cluster (underlined in FIG. 12), and a transition probability from a node representing the cluster to a node representing a blog (double underlined in FIG. 12). ) And are granted.

さらに、この両向矢印には、ブログクラスタテーブル１１００に管理されたブログがクラスタに所属する所属確率（図１２中、山括弧）が付与されている。また、ネットワークモデル１２００において、点線で示す矢印は、ブログをあらわすノード間の遷移をあらわしている。この矢印には、ブログリンクテーブル６００に管理されたブログをあらわすノード間の遷移確率が付与されている。 Furthermore, the affiliation probability (the angle brackets in FIG. 12) that the blog managed in the blog cluster table 1100 belongs to the cluster is given to this bidirectional arrow. Further, in the network model 1200, arrows indicated by dotted lines represent transitions between nodes representing a blog. This arrow is given a transition probability between nodes representing the blog managed in the blog link table 600.

ブログをあらわすノードとクラスタをあらわすノードとの間の遷移確率は、ブログに掲載されている記事のクラスタに対する所属確率を用いて求めることができる。より具体的には、例えば、ブログＢｉからクラスタＣｊへ遷移する遷移確率Ｘ（Ｂｉ−Ｃｊ）は、下記式（１）を用いて求めることができる。ただし、ＢｉはブログＩＤ、ＣｊはクラスタＩＤ、ｋは自然数、ｎはクラスタ数（ここではｎ＝４）である。 The transition probability between a node representing a blog and a node representing a cluster can be obtained using the affiliation probability with respect to a cluster of articles posted on the blog. More specifically, for example, the transition probability X (Bi−Cj) for transition from the blog Bi to the cluster Cj can be obtained using the following equation (1). However, Bi is a blog ID, Cj is a cluster ID, k is a natural number, and n is the number of clusters (here, n = 4).

ここで、ブログＢ２３をあらわすノードからクラスタＣ１をあらわすノードへ遷移する遷移確率Ｘ（Ｂ２３）を例に挙げると、『Ｘ（Ｂ２３−Ｃ１）＝（０．１／０．８）』となる。これは、ブログクラスタテーブル１１００に管理されたブログＢ２３のクラスタＣ１への所属確率「０．１」を、ブログＢ２３の各クラスタＣ１〜Ｃ４への所属確率を足し合わせた所属確率「０．１＋０．５＋０．２＋０．０＝０．８」で除算した値である。 Here, taking as an example the transition probability X (B23) of transition from the node representing the blog B23 to the node representing the cluster C1, “X (B23−C1) = (0.1 / 0.8)”. This is because the affiliation probability “0.1 + 0...” Is obtained by adding the affiliation probability “0.1” of the blog B23 managed in the blog cluster table 1100 to the clusters C1 to C4 of the blog B23. It is a value divided by “5 + 0.2 + 0.0 = 0.8”.

また、クラスタＣｊをあらわすノードからウェブＢｉをあらわすノードへ遷移する遷移確率Ｘ（Ｃｊ−Ｂｉ）は、下記式（２）を用いて求めることができる。ただし、ｍはブログ数（ここではｍ＝３）である。 Further, the transition probability X (Cj−Bi) of transition from the node representing the cluster Cj to the node representing the web Bi can be obtained using the following equation (2). However, m is the number of blogs (here, m = 3).

ここで、クラスタＣ１をあらわすノードからウェブＢ２３をあらわすノードへ遷移する遷移確率Ｘ（Ｃ１−Ｂ２３）を例に挙げると、『Ｘ（Ｃ１−Ｂ２３）＝（０．１／０．３）』となる。これは、ブログクラスタテーブル１１００に管理されたブログＢ２３のクラスタＣ１への所属確率「０．１」を、各ブログＢ２３，Ｂ５５，Ｂ２５のクラスタＣ１への所属確率を足し合わせた所属確率「０．１＋０．２＋０．０＝０．３」で除算した値である。 Here, taking as an example the transition probability X (C1-B23) of transition from the node representing the cluster C1 to the node representing the web B23, “X (C1-B23) = (0.1 / 0.3)”. Become. This is because the affiliation probability “0...” Obtained by adding the affiliation probability “0.1” of the blog B23 managed in the blog cluster table 1100 to the cluster C1 and the affiliation probability of each blog B23, B55, B25 to the cluster C1. 1 + 0.2 + 0.0 = 0.3 ”.

図１２に示したネットワークモデル１２００は、以下に説明するネットワークテーブルとして記憶部に保持される。図１３は、ネットワークテーブルの記憶内容を示す説明図である。図１３において、ネットワークテーブル１３００には、クラスタＩＤとブログＩＤとを関連付けて、ネットワークモデル１２００（図１２参照）内の各ノード間の遷移確率が記憶されている。 The network model 1200 shown in FIG. 12 is held in the storage unit as a network table described below. FIG. 13 is an explanatory diagram showing the contents stored in the network table. In FIG. 13, the network table 1300 stores the transition probabilities between the nodes in the network model 1200 (see FIG. 12) in association with the cluster ID and the blog ID.

ここで、クラスタをあらわすノード間の遷移確率はすべて「０」となっている。つまり、クラスタをあらわすノード間では遷移しない。また、ブログをあらわすノードとクラスタをあらわすノードとの間の遷移確率は、上記式（１）または（２）を用いて求めた遷移確率である。また、ブログをあらわすノード間の遷移確率は、ブログリンクテーブル６００を参照したものであり、行の合計が「１」となるように正規化されている。 Here, the transition probabilities between the nodes representing the clusters are all “0”. That is, there is no transition between nodes representing a cluster. Further, the transition probability between the node representing the blog and the node representing the cluster is the transition probability obtained using the above formula (1) or (2). Further, the transition probability between nodes representing a blog refers to the blog link table 600 and is normalized so that the total number of rows is “1”.

算出部８０５は、例えば、ネットワークテーブル１３００の記憶内容に基づいて、ブログの適合度をブログごとに算出する。具体的には、ネットワークモデル１２００内のノード間をつなぐエッジに付与されている遷移確率を用いて、ランダムウォーク手法による各ノードのスコアリングをおこなうことで、各ブログの適合度を算出することができる。 For example, the calculation unit 805 calculates the fitness of the blog for each blog based on the storage contents of the network table 1300. Specifically, the fitness of each blog can be calculated by scoring each node by a random walk method using transition probabilities assigned to edges connecting nodes in the network model 1200. it can.

ランダムウォークとは、「つぎに現れるものの確率」が不規則に決定される運動のことである。ここでは、ランダムウォークの概念を各ノードのスコアリングに適用する。ここで、ランダム手法により各ノードのスコアリングをおこなう処理概要について説明する。 A random walk is a motion in which the “probability of what appears next” is determined irregularly. Here, the concept of random walk is applied to scoring of each node. Here, an outline of processing for scoring each node by a random method will be described.

まず、ランダムウォークが終了する終了確率Ｐｔ（例えば、「０．２」）を予め設定する。そして、ノード間の遷移確率Ｘ（）と終了確率Ｐｔとを用いて、ネットワークモデル１２００内のノード群の中から１個のノードを任意に選択してからノード間の遷移が終了するまでの間に、各ノードに遷移した遷移回数をノードごとに計数する。 First, an end probability Pt (for example, “0.2”) at which the random walk ends is set in advance. Then, using the transition probability X () between nodes and the end probability Pt, the time from when one node is arbitrarily selected from the node group in the network model 1200 to the end of the transition between the nodes. In addition, the number of transitions to each node is counted for each node.

そして、１個のノードを任意に選択してからノード間の遷移が終了するまでの一連の処理を予め規定された規定回数Ｎ（例えば、「１００００」）繰り返す。最終的に、規定回数Ｎ繰り返した結果、各ノードに遷移した遷移回数を検索クエリに対する各ノードの適合度とする。なお、上述の終了確率Ｐｔおよび規定回数Ｎは、ユーザが入力装置２２０を操作することで任意に設定可能である。 Then, a series of processes from arbitrarily selecting one node until the end of transition between the nodes is repeated a predetermined number of times N (for example, “10000”). Finally, as a result of repeating the specified number of times N, the number of transitions to each node is set as the fitness of each node with respect to the search query. The end probability Pt and the prescribed number N described above can be arbitrarily set by the user operating the input device 220.

ここでは、ブログとクラスタとをノードとして有向グラフ化することにより、ブログとクラスタとの相関関係をあらわす遷移確率が付与された仮想的なネットワーク（ネットワークモデル１２００）を形成することができる。この結果、各ノードのスコアリングにランダムウォーク手法を適用することができる。 Here, by forming a directed graph using blogs and clusters as nodes, it is possible to form a virtual network (network model 1200) to which a transition probability representing the correlation between the blog and the clusters is given. As a result, the random walk method can be applied to scoring each node.

ランダムウォーク手法によるスコアリングでは、ノードに遷移する遷移回数が多ければ多いほど高スコアとなる。これは、遷移回数が多ければ多いノードほど、ランダムウォーク中にそのノードに訪れる可能性（期待値）が高くなるという関係に基づいている。ここでは、この関係を利用して、遷移回数が多いノードがあらわすクラスタまたはブログほど、検索者によって閲覧される可能性が高いクラスタまたはブログとして扱うことで（実際に閲覧されるのはページのみ）、検索クエリに対する適合度を高くする。 In scoring by the random walk method, the higher the number of transitions to the node, the higher the score. This is based on the relationship that as the number of transitions increases, the possibility (expected value) of visiting a node during a random walk increases. Here, by using this relationship, the clusters or blogs that represent nodes with a high number of transitions are treated as clusters or blogs that are more likely to be viewed by searchers (actually only pages are viewed) , Increase the relevance to the search query.

ここで、算出部８０５によって算出された各ノードの適合度（スコア）をあらわすスコアテーブルの具体例について説明する。図１４は、スコアテーブルの記憶内容を示す説明図である。図１４において、スコアテーブル１４００には、ネットワークモデル１２００内のノードごとに、検索クエリに対する適合度をあらわすスコアが記憶されている。 Here, a specific example of a score table representing the fitness (score) of each node calculated by the calculation unit 805 will be described. FIG. 14 is an explanatory diagram showing the contents stored in the score table. In FIG. 14, the score table 1400 stores a score representing the degree of fitness for the search query for each node in the network model 1200.

ここで、クラスタＣ１をあらわすノードを例に挙げると、検索クエリに対する適合度は「２５２３」である。また、ブログＢ２３をあらわすノードを例に挙げると、検索クエリに対する適合度は「５２４」である。 Here, taking a node representing the cluster C1 as an example, the degree of matching with the search query is “2523”. Taking a node representing the blog B23 as an example, the fitness for the search query is “524”.

提示部８０７は、算出部８０５によって算出された算出結果に基づいて、ページ群に関するページ情報を提示する機能を有する。具体的には、算出部８０５によって算出された算出結果に基づいてページ群に関するページ情報を生成し、そのページ情報をクライアント端末１０３（検索クエリの送信元）に送信する。 The presentation unit 807 has a function of presenting page information related to a page group based on the calculation result calculated by the calculation unit 805. Specifically, page information related to the page group is generated based on the calculation result calculated by the calculation unit 805, and the page information is transmitted to the client terminal 103 (search query transmission source).

より具体的には、例えば、図１４に示したスコアテーブル１４００を参照して、検索クエリに対する適合度が高い順にブログＢ２３，Ｂ５５，Ｂ２５を並び替えたインデックス（索引情報）を検索結果として生成することとしてもよい。 More specifically, for example, with reference to the score table 1400 illustrated in FIG. 14, an index (index information) in which the blogs B23, B55, and B25 are rearranged in descending order of suitability for the search query is generated as a search result. It is good as well.

このとき、クラスタＣ１〜Ｃ４に相当するウェブサイトは存在していないため、ブログＢ２３，Ｂ５５，Ｂ２５のみを考慮したインデックスを生成することとなる。つまり、スコアテーブル１４００のうち、クラスタＣ１〜Ｃ４の適合度を除くブログＢ２３，Ｂ５５，Ｂ２５の適合度から検索クエリに対する有効性を評価して、ブログＢ２３，Ｂ５５，Ｂ２５を並び替える。 At this time, since there is no website corresponding to the clusters C1 to C4, an index considering only the blogs B23, B55, and B25 is generated. That is, in the score table 1400, the blogs B23, B55, and B25 are rearranged by evaluating the effectiveness with respect to the search query from the fitness of the blogs B23, B55, and B25 excluding the fitness levels of the clusters C1 to C4.

決定部８０８は、分類部８０４によって複数のクラスタに分類された掲載情報に出現する単語の出現頻度に基づいて、クラスタを特徴付ける単語（トピック）をクラスタごとに決定する機能を有する。具体的には、クラスタごとに、分類された掲載情報（記事）に出現する出現頻度が高い単語をクラスタの特徴をあらわすトピックに決定する。 The determination unit 808 has a function of determining, for each cluster, a word (topic) that characterizes the cluster, based on the appearance frequency of words appearing in the posting information classified into a plurality of clusters by the classification unit 804. Specifically, for each cluster, a word having a high appearance frequency that appears in the classified posted information (article) is determined as a topic that represents the feature of the cluster.

また、決定部８０８は、掲載情報に出現する単語の出現回数を、ページ群に掲載されているすべての掲載情報に出現する上記単語の出現回数で除算した値に基づいて、クラスタを特徴付ける単語を決定することとしてもよい。これによれば、掲載内容にかかわらず出現頻度が高くなる傾向にある助詞（例えば、「を」）や記号（例えば、「。」）などの単語を除く他の単語の中から、クラスタを特徴付ける単語を決定することができる。 In addition, the determination unit 808 determines a word characterizing the cluster based on a value obtained by dividing the number of appearances of the word appearing in the posting information by the number of appearances of the word appearing in all the posting information posted on the page group. It may be determined. According to this, the cluster is characterized from other words excluding words such as particles (for example, “O”) and symbols (for example, “.”) That tend to appear more frequently regardless of the content. A word can be determined.

また、決定部８０８は、掲載情報に出現する特定の品詞の単語の出現頻度に基づいて、クラスタを特徴付ける単語を決定することとしてもよい。これによれば、クラスタを特徴付ける単語を、名詞や動詞などの特定の品詞に限定することができる。 Further, the determination unit 808 may determine a word characterizing the cluster based on the appearance frequency of a word with a specific part of speech that appears in the posted information. According to this, the words that characterize the cluster can be limited to specific parts of speech such as nouns and verbs.

ここで、クラスタごとに、掲載情報に出現する単語の出現頻度をあらわす特徴語テーブルの具体例について説明する。図１５は、特徴語テーブルの記憶内容を示す説明図である。図１５において、特徴語テーブル１５００には、クラスタＣ１〜Ｃ４ごとに、各クラスタに分類された記事に出現する単語の出現頻度をあらわす数値が記憶されている。 Here, a specific example of the feature word table representing the appearance frequency of words appearing in the posted information for each cluster will be described. FIG. 15 is an explanatory diagram showing the stored contents of the feature word table. In FIG. 15, the feature word table 1500 stores numerical values representing the appearance frequency of words appearing in articles classified into each cluster for each of the clusters C1 to C4.

具体的には、単語の出現回数を、ブログ群（Ｂ２３，Ｂ２５，Ｂ５５）に掲載されているすべての記事（Ｐ５３，Ｐ８５，Ｐ４０，Ｐ７１）に出現する上記単語の出現回数で除算した値が記憶されている。例えば、決定部８０８は、特徴語テーブル１５００を参照して、クラスタＣ１を特徴付ける単語を「音質」として決定する。 Specifically, the value obtained by dividing the number of appearances of the word by the number of appearances of the word appearing in all articles (P53, P85, P40, P71) published in the blog group (B23, B25, B55) is It is remembered. For example, the determination unit 808 refers to the feature word table 1500 and determines the word characterizing the cluster C1 as “sound quality”.

図８の説明に戻り、提示部８０７は、決定部８０８によって決定された決定結果を検索クエリに関するトピックとして提示する機能を有する。具体的には、例えば、検索クエリに適合するページに関するページ情報とともにトピックを提示することとしてもよい。 Returning to the description of FIG. 8, the presentation unit 807 has a function of presenting the determination result determined by the determination unit 808 as a topic related to the search query. Specifically, for example, a topic may be presented together with page information related to a page that matches a search query.

このとき、例えば、各ページと、そのページが所属する所属確率が最大のクラスタを特徴付けるトピックとを関連付けて提示することとしてもよい。具体的には、例えば、ブログＢ２３に関するページ情報を提示する場合、ブログクラスタテーブル１１００を参照して、所属確率が最大のクラスタＣ２を特徴付ける単語をトピックとして提示することとなる。また、スコアテーブル１４００を参照して、検索クエリに対する適合度が高い順にクラスタＣ１〜Ｃ４のトピックを並び替えて提示することとしてもよい。 At this time, for example, each page may be presented in association with a topic that characterizes a cluster having the maximum affiliation probability to which the page belongs. Specifically, for example, when the page information related to the blog B23 is presented, the blog cluster table 1100 is referred to, and a word characterizing the cluster C2 having the maximum affiliation probability is presented as a topic. In addition, referring to the score table 1400, the topics of the clusters C1 to C4 may be rearranged and presented in descending order of the degree of fitness for the search query.

ここで、提示部８０７によってページ情報が提示された結果、クライアント端末１０３のディスプレイ２３１に表示される画面例について説明する。図１６は、ディスプレイに表示される画面例を示す説明図（その１）である。図１６において、ディスプレイ２３１には、検索クエリ「ＦＪｐｏｄ」を与えることで得られた検索結果１６００が表示されている。 Here, an example of a screen displayed on the display 231 of the client terminal 103 as a result of the page information presented by the presentation unit 807 will be described. FIG. 16 is an explanatory diagram (part 1) illustrating an example of a screen displayed on the display. In FIG. 16, a search result 1600 obtained by giving a search query “FJpod” is displayed on the display 231.

具体的には、検索クエリ「ＦＪｐｏｄ」に対する適合度が高い順にページタイトルが表示されている。また、各ページタイトルには、そのページタイトルに応じたクラスタを特徴付けるトピックが関連付けて表示されている。ここで、カーソルＣを移動させて任意のタイトルをクリックすると、そのタイトルのＷｅｂページがディスプレイ２３１に表示される。 Specifically, the page titles are displayed in descending order of suitability for the search query “FJpod”. Each page title is displayed in association with a topic that characterizes a cluster corresponding to the page title. Here, when the cursor C is moved and an arbitrary title is clicked, the web page of the title is displayed on the display 231.

また、他の画面例として、検索結果をトピックごとに分類して提示することとしてもよい。図１７は、ディスプレイに表示される画面例を示す説明図（その２）である。図１７において、ディスプレイ２３１には、検索クエリ「ＦＪｐｏｄ」を与えることで得られた検索結果１７００が表示されている。 As another screen example, search results may be classified and presented for each topic. FIG. 17 is an explanatory diagram (part 2) illustrating an example of a screen displayed on the display. In FIG. 17, a search result 1700 obtained by giving a search query “FJpod” is displayed on the display 231.

具体的には、検索クエリ「ＦＪｐｏｄ」に関するトピックごとにページタイトルが表示されている。ここで、カーソルＣを移動させて『拡張』のトピックをクリックすると、『拡張』の出現頻度が高い記事を掲載しているウェブサイトのページタイトルが表示されることとなる。 Specifically, a page title is displayed for each topic related to the search query “FJpod”. Here, when the cursor C is moved and the “extension” topic is clicked, the page title of a website that publishes articles with a high occurrence frequency of “extension” is displayed.

選択部８０９は、提示部８０７によって提示された複数のトピックの中から任意のトピックの選択を受け付ける機能を有する。具体的には、クライアント端末１０３のディスプレイ２３１に複数のトピックが表示された結果、ユーザの操作入力によって選択されたトピックを受け付ける。 The selection unit 809 has a function of accepting selection of an arbitrary topic from among a plurality of topics presented by the presentation unit 807. Specifically, as a result of displaying a plurality of topics on the display 231 of the client terminal 103, the topic selected by the user's operation input is accepted.

この場合、ランダムウォーク手法による適合度の算出処理において、ランダムウォークを開始する最初のノードを任意に選択するのではなく、選択部８０９によって選択されたトピックによって特徴付けられるクラスタをあらわすノードを選択することとしてもよい。この場合、算出部８０５は、選択部８０９によって選択されたトピックによって特徴付けられるクラスタをあらわすノードを選択してからノード間の遷移が終了するまでの間に、各ノードに遷移した遷移回数を計数することにより、ページの適合度を算出することとなる。これにより、選択されたトピックに関する掲載情報（記事）が掲載されているページ（ブログ）のスコアを高くすることができる。 In this case, in the fitness calculation process using the random walk method, a node representing a cluster characterized by the topic selected by the selection unit 809 is selected instead of arbitrarily selecting the first node to start the random walk. It is good as well. In this case, the calculation unit 805 counts the number of transitions made to each node between the time when the node representing the cluster characterized by the topic selected by the selection unit 809 is selected and the time when the transition between the nodes ends. By doing so, the degree of conformity of the page is calculated. Thereby, the score of the page (blog) on which the posting information (article) related to the selected topic is posted can be increased.

ここで、クライアント端末１０３における検索手順について説明する。図１８は、クライアント端末における検索手順の一例を示す説明図である。図１８において、まず、ユーザが検索クエリ「ＦＪｐｏｄ」を与えると、ディスプレイ２３１に検索結果１８１０が表示される。具体的には、検索結果１８１０には、検索クエリ「ＦＪｐｏｄ」に関する複数のトピック（デザイン、音質、アクセサリ、販売店）が表示されている。 Here, a search procedure in the client terminal 103 will be described. FIG. 18 is an explanatory diagram illustrating an example of a search procedure in the client terminal. In FIG. 18, first, when a user gives a search query “FJpod”, a search result 1810 is displayed on the display 231. Specifically, the search result 1810 displays a plurality of topics (design, sound quality, accessories, dealers) related to the search query “FJpod”.

つぎに、検索結果１８１０において、カーソルＣを移動させて、任意のボタンＢ１〜Ｂ４をクリックすることで、任意のトピックを選択する。ここでは、ボタンＢ１，Ｂ２をクリックしたとする。このあと、キーボード２２１のエンターボタンを押下すると、検索結果１８２０がディスプレイ２３１に表示される。 Next, in the search result 1810, the cursor C is moved and any button B1 to B4 is clicked to select an arbitrary topic. Here, it is assumed that the buttons B1 and B2 are clicked. Thereafter, when the enter button on the keyboard 221 is pressed, the search result 1820 is displayed on the display 231.

具体的には、検索結果１８２０には、『デザイン』および『音質』に関する記事が掲載されているページ（ブログ）の適合度が高くなるようにして得られたページ情報が表示されている。このように、検索クエリに関する複数のトピックの中から任意のトピックを選択させることにより、検索者の意志がより反映された検索結果を提示することができる。 Specifically, the search result 1820 displays page information obtained by increasing the degree of fitness of a page (blog) on which articles about “design” and “sound quality” are posted. In this way, by selecting an arbitrary topic from among a plurality of topics related to the search query, it is possible to present a search result that more reflects the searcher's will.

なお、上記受付部８０１および検索部８０２の機能は、外部のコンピュータ装置によりその機能を実現することとしてもよい。この場合、抽出部８０３は、外部のコンピュータ装置から取得したページ群の中から当該ページに掲載されている掲載情報を抽出することとなる。 Note that the functions of the receiving unit 801 and the searching unit 802 may be realized by an external computer device. In this case, the extraction unit 803 extracts the posted information posted on the page from the page group acquired from the external computer device.

（情報提供装置の各種処理手順）
つぎに、本実施の形態にかかる情報提供装置１０１において実行される各種処理手順について説明する。 (Various processing procedures of the information providing device)
Next, various processing procedures executed in the information providing apparatus 101 according to the present embodiment will be described.

（ブログリンクテーブルの作成処理手順）
まず、図６に示したブログリンクテーブル６００を作成する作成処理手順について説明する。ブログリンクテーブル６００は、ブログＢ２３，Ｂ５５，Ｂ４０間の相関関係をあらわすテーブル表である。ここでは、作成部８０６により、図４に示した検索結果テーブル４００と、図５に示した記事リンクテーブル５００とを用いて、ブログリンクテーブル６００を作成する。 (Blog link table creation processing procedure)
First, a creation processing procedure for creating the blog link table 600 shown in FIG. 6 will be described. The blog link table 600 is a table showing the correlation between the blogs B23, B55, and B40. Here, the creation unit 806 creates the blog link table 600 using the search result table 400 shown in FIG. 4 and the article link table 500 shown in FIG.

図１９は、ブログリンクテーブルを作成する作成処理手順の一例を示すフローチャートである。図１９のフローチャートにおいて、まず、ブログＩＤを行と列とに持つブログリンクテーブルを作成して、各セルを「０」で初期化する（ステップＳ１９０１）。 FIG. 19 is a flowchart illustrating an example of a creation processing procedure for creating a blog link table. In the flowchart of FIG. 19, first, a blog link table having blog IDs in rows and columns is created, and each cell is initialized with “0” (step S1901).

このあと、ブログリンクテーブルの中から、遷移確率を格納すべき任意のセルを選択し（ステップＳ１９０２）、選択されたセル（以下、「選択セル」という）について遷移元（ｆｒｏｍ）のブログＩＤと遷移先（ｔｏ）のブログＩＤとが同じか否かを判断する（ステップＳ１９０３）。 Thereafter, an arbitrary cell in which the transition probability is to be stored is selected from the blog link table (step S1902), and the blog ID of the transition source (from) is selected for the selected cell (hereinafter referred to as “selected cell”). It is determined whether or not the blog ID of the transition destination (to) is the same (step S1903).

ここで、遷移元（ｆｒｏｍ）と遷移先（ｔｏ）とが違うと判断された場合（ステップＳ１９０３：Ｎｏ）、記事間の相関関係をあらわす記事リンクテーブル５００の中から、該当する記事ＩＤの値を読み出して選択セルに加算する（ステップＳ１９０４）。具体的には、検索結果テーブル４００を参照して選択セルのブログに掲載されている記事を特定し、記事リンクテーブル５００の中から特定された記事の値を読み出して加算する。 Here, when it is determined that the transition source (from) and the transition destination (to) are different (step S1903: No), the value of the corresponding article ID from the article link table 500 representing the correlation between articles. Is added to the selected cell (step S1904). Specifically, an article posted on the blog of the selected cell is identified with reference to the search result table 400, and the value of the identified article from the article link table 500 is read and added.

また、ステップＳ１９０３において、遷移元（ｆｒｏｍ）と遷移先（ｔｏ）とが同じと判断された場合には（ステップＳ１９０３：Ｙｅｓ）、ステップＳ１９０５に移行する。 If it is determined in step S1903 that the transition source (from) and the transition destination (to) are the same (step S1903: Yes), the process proceeds to step S1905.

つぎに、ブログリンクテーブルの中から選択されていない未選択のセルがあるか否かを判断し（ステップＳ１９０５）、未選択のセルがある場合（ステップＳ１９０５：Ｙｅｓ）、ステップＳ１９０２に戻り、ブログリンクテーブルの中から未選択のセルを選択して一連の処理を繰り返す。 Next, it is determined whether or not there is an unselected cell not selected from the blog link table (step S1905). If there is an unselected cell (step S1905: Yes), the process returns to step S1902, and the blog A series of processing is repeated by selecting an unselected cell from the link table.

一方、未選択のセルがない場合には（ステップＳ１９０５：Ｎｏ）、ブログリンクテーブルの中から任意の行を選択して（ステップＳ１９０６）、各セルの値の合計が「１」となるように正規化する（ステップＳ１９０７）。これにより、ブログリンクテーブルの各セルの値を平準化することができる。 On the other hand, if there is no unselected cell (step S1905: No), an arbitrary row is selected from the blog link table (step S1906) so that the total value of each cell becomes “1”. Normalization is performed (step S1907). Thereby, the value of each cell of the blog link table can be leveled.

つぎに、ブログリンクテーブルの中から選択されていない未選択の行があるか否かを判断し（ステップＳ１９０８）、未選択の行がある場合（ステップＳ１９０８：Ｙｅｓ）、ステップＳ１９０６に戻り、ブログリンクテーブルの中から未選択の行を選択して一連の処理を繰り返す。 Next, it is determined whether or not there is an unselected row not selected from the blog link table (step S1908). If there is an unselected row (step S1908: Yes), the process returns to step S1906, and the blog An unselected row is selected from the link table and a series of processing is repeated.

一方、未選択の行がない場合には（ステップＳ１９０８：Ｎｏ）、ブログリンクテーブル６００を記憶部に出力（保存）して（ステップＳ１９０９）、本フローチャートによる一連の処理を終了する。 On the other hand, if there is no unselected row (step S1908: No), the blog link table 600 is output (saved) to the storage unit (step S1909), and the series of processing according to this flowchart ends.

これにより、ブログ間の相関関係をあらわすブログリンクテーブル６００を作成することができる。また、ブログリンクテーブル６００内の行ごとに各セルの値を正規化することにより、各セルの値をブログリンクテーブル６００全体で平準化することができる。 Thereby, the blog link table 600 representing the correlation between blogs can be created. Further, by normalizing the value of each cell for each row in the blog link table 600, the value of each cell can be leveled throughout the blog link table 600.

（ブログクラスタテーブルの作成処理手順）
つぎに、図１１に示したブログクラスタテーブル１１００を作成する作成処理手順について説明する。ブログクラスタテーブル１１００は、ブログＢ２３，Ｂ５５，Ｂ２５とクラスタＣ１〜Ｃ４との相関関係をあらわすテーブル表である。ここでは、作成部８０６により、図４に示した検索結果テーブル４００と、図１０に示した記事クラスタテーブル１０００とを用いてブログクラスタテーブル１１００を作成する。 (Blog cluster table creation processing procedure)
Next, a creation processing procedure for creating the blog cluster table 1100 shown in FIG. 11 will be described. The blog cluster table 1100 is a table showing the correlation between the blogs B23, B55, and B25 and the clusters C1 to C4. Here, the creation unit 806 creates the blog cluster table 1100 using the search result table 400 shown in FIG. 4 and the article cluster table 1000 shown in FIG.

図２０は、ブログクラスタテーブルを作成する作成処理手順の一例を示すフローチャートである。図２０のフローチャートにおいて、まず、ブログＩＤを列に、クラスタＩＤを行に持つブログクラスタテーブルを作成して、各セルを「０」で初期化する（ステップＳ２００１）。 FIG. 20 is a flowchart illustrating an example of a creation processing procedure for creating a blog cluster table. In the flowchart of FIG. 20, first, a blog cluster table having a blog ID in a column and a cluster ID in a row is created, and each cell is initialized with “0” (step S2001).

このあと、検索結果テーブル４００の中から、ブログＩＤおよび記事ＩＤを格納する任意の行を選択して（ステップＳ２００２）、記事クラスタテーブル１０００の中から選択された行（以下、「選択行」という）の記事ＩＤの値を読み出す（ステップＳ２００３）。 Thereafter, an arbitrary row for storing the blog ID and the article ID is selected from the search result table 400 (step S2002), and the selected row from the article cluster table 1000 (hereinafter referred to as “selected row”). ) Is read (step S2003).

そして、ブログクラスタテーブルの選択行のブログＩＤに該当する行に読み出した値を加算する（ステップＳ２００４）。具体的には、選択行の記事ＩＤから特定される記事の各クラスタへの所属確率を、選択行のブログＩＤから特定されるブログの各クラスタへの遷移確率として加算する。つまり、記事とクラスタとの相関関係を用いて、ブログとクラスタとの相関関係をあらわす。 Then, the read value is added to the row corresponding to the blog ID of the selected row of the blog cluster table (step S2004). Specifically, the probability of belonging to each cluster of the article specified from the article ID of the selected row is added as the transition probability to each cluster of the blog specified from the blog ID of the selected row. That is, the correlation between the blog and the cluster is expressed using the correlation between the article and the cluster.

つぎに、検索結果テーブル４００の中から選択されていない未選択の行があるか否かを判断し（ステップＳ２００５）、未選択の行がある場合（ステップＳ２００５：Ｙｅｓ）、ステップＳ２００２に戻り、検索結果テーブル４００の中から未選択の行を選択して一連の処理を繰り返す。 Next, it is determined whether or not there is an unselected row not selected from the search result table 400 (step S2005). If there is an unselected row (step S2005: Yes), the process returns to step S2002, An unselected row is selected from the search result table 400 and a series of processing is repeated.

一方、未選択の行がない場合には（ステップＳ２００５：Ｎｏ）、ブログクラスタテーブルを記憶部に出力（保存）して（ステップＳ２００６）、本フローチャートによる一連の処理を終了する。これにより、ブログとクラスタとの相関関係をあらわすブログクラスタテーブル１１００を作成することができる。 On the other hand, if there is no unselected row (step S2005: No), the blog cluster table is output (saved) to the storage unit (step S2006), and the series of processing according to this flowchart ends. Thereby, the blog cluster table 1100 representing the correlation between the blog and the cluster can be created.

（ネットワークテーブルの作成処理手順）
つぎに、図１３に示したネットワークテーブル１３００を作成する作成処理手順について説明する。ネットワークテーブル１３００は、ブログＢ２３，Ｂ５５，Ｂ２５とクラスタＣ１〜Ｃ４との相関関係を有向グラフ化（図１２に示したネットワークモデル１２００）してあらわすテーブル表である。ここでは、作成部８０６により、図６に示したブログリンクテーブル６００と、図１１に示したブログクラスタテーブル１１００とを用いて、ネットワークテーブル１３００を作成する。 (Network table creation processing procedure)
Next, a creation processing procedure for creating the network table 1300 shown in FIG. 13 will be described. The network table 1300 is a table that represents the correlation between the blogs B23, B55, and B25 and the clusters C1 to C4 as a directed graph (network model 1200 shown in FIG. 12). Here, the creation unit 806 creates the network table 1300 using the blog link table 600 shown in FIG. 6 and the blog cluster table 1100 shown in FIG.

図２１は、ネットワークテーブルを作成する作成処理手順の一例を示すフローチャートである。図２１のフローチャートにおいて、まず、クラスタＩＤおよびブログＩＤを行と列とに持つネットワークテーブルを作成して、遷移確率を格納すべき各セルを「０」で初期化する（ステップＳ２１０１）。 FIG. 21 is a flowchart illustrating an example of a creation processing procedure for creating a network table. In the flowchart of FIG. 21, first, a network table having cluster IDs and blog IDs in rows and columns is created, and each cell in which transition probabilities are to be stored is initialized with “0” (step S2101).

このあと、ネットワークテーブルの中から、遷移確率を格納すべき任意のセルを選択して（ステップＳ２１０２）、選択されたセル（以下、「選択セル」という）について遷移元（ｆｒｏｍ）のブログＩＤまたはクラスタＩＤと、遷移先（ｔｏ）のブログＩＤまたはクラスタＩＤとが同じか否かを判断する（ステップＳ２１０３）。ここで、遷移元（ｆｒｏｍ）と遷移先（ｔｏ）とが同じと判断された場合（ステップＳ２１０３：Ｙｅｓ）、ステップＳ２１１３に移行する。 Thereafter, an arbitrary cell in which the transition probability is to be stored is selected from the network table (step S2102), and the blog ID of the transition source (from) or the selected cell (hereinafter referred to as “selected cell”) is selected. It is determined whether the cluster ID is the same as the transition destination (to) blog ID or cluster ID (step S2103). If it is determined that the transition source (from) and the transition destination (to) are the same (step S2103: Yes), the process proceeds to step S2113.

一方、遷移元（ｆｒｏｍ）と遷移先（ｔｏ）とが違うと判断された場合（ステップＳ２１０３：Ｎｏ）、選択セルについて両方クラスタＩＤか否かを判断する（ステップＳ２１０４）。ここで、両方クラスタＩＤと判断された場合（ステップＳ２１０４：Ｙｅｓ）、ステップＳ２１１３に移行する。 On the other hand, if it is determined that the transition source (from) and the transition destination (to) are different (step S2103: No), it is determined whether or not both of the selected cells are cluster IDs (step S2104). If both cluster IDs are determined (step S2104: YES), the process proceeds to step S2113.

一方、両方クラスタＩＤと判断されなかった場合には（ステップＳ２１０４：Ｎｏ）、選択セルについて両方ブログＩＤか否かを判断する（ステップＳ２１０５）。ここで、両方ブログＩＤと判断された場合（ステップＳ２１０５：Ｙｅｓ）、ブログリンクテーブル６００の中から該当するブログＩＤの値を読み出して（ステップＳ２１０６）、その値を選択セルに書き込んで（ステップＳ２１０７）、ステップＳ２１１３に移行する。 On the other hand, if both are not determined to be cluster IDs (step S2104: No), it is determined whether the selected cell is both blog IDs (step S2105). If both blog IDs are determined (step S2105: YES), the corresponding blog ID value is read from the blog link table 600 (step S2106), and the value is written in the selected cell (step S2107). ), The process proceeds to step S2113.

一方、両方ブログＩＤと判断されなかった場合には（ステップＳ２１０５：Ｎｏ）、選択セルについてクラスタＩＤからブログＩＤか否かを判断する（ステップＳ２１０８）。ここで、クラスタＩＤからブログＩＤと判断された場合（ステップＳ２１０８：Ｙｅｓ）、上記式（２）を用いて、クラスタＣｊからブログＢｉへ遷移する遷移確率Ｘ（Ｃｊ−Ｂｉ）を算出する（ステップＳ２１０９）。そして、その遷移確率Ｘ（Ｃｊ−Ｂｉ）を選択セルに書き込んで（ステップＳ２１１０）、ステップＳ２１１３に移行する。 On the other hand, if both blog IDs are not determined (step S2105: No), it is determined whether the selected cell is a blog ID from the cluster ID (step S2108). Here, when the blog ID is determined from the cluster ID (step S2108: Yes), the transition probability X (Cj−Bi) for transition from the cluster Cj to the blog Bi is calculated using the above equation (2) (step S2108: Yes). S2109). Then, the transition probability X (Cj-Bi) is written in the selected cell (step S2110), and the process proceeds to step S2113.

一方、クラスタＩＤからブログＩＤと判断されなかった場合（ステップＳ２１０８：Ｎｏ）、上記式（１）を用いて、ブログＢｉからクラスタＣｊへ遷移する遷移確率Ｘ（Ｂｉ−Ｃｊ）を算出する（ステップＳ２１１１）。そして、その遷移確率Ｘ（Ｂｉ−Ｃｊ）を選択セルに書き込む（ステップＳ２１１２）。 On the other hand, when the blog ID is not determined from the cluster ID (step S2108: No), the transition probability X (Bi-Cj) for transition from the blog Bi to the cluster Cj is calculated using the above equation (1) (step S2108: No). S2111). Then, the transition probability X (Bi-Cj) is written in the selected cell (step S2112).

このように、ブログとクラスタとの間の遷移確率を、ブログに掲載されている記事のクラスタに対する所属確率を用いて求めることで、ネットワークモデル１２００におけるブログをあらわすノードとクラスタをあらわすノードとの間のリンク関係を形成することができる。 As described above, the transition probability between the blog and the cluster is obtained using the affiliation probability with respect to the cluster of the article posted on the blog, so that the node representing the blog and the node representing the cluster in the network model 1200 are obtained. The link relationship can be formed.

つぎに、ネットワークテーブルの中から選択されていない未選択のセルがあるか否かを判断する（ステップＳ２１１３）。ここで、未選択のセルがある場合（ステップＳ２１１３：Ｙｅｓ）、ステップＳ２１０２に戻って、ネットワークテーブルの中から未選択のセルを選択して一連の処理を繰り返す。 Next, it is determined whether there is an unselected cell that has not been selected from the network table (step S2113). If there is an unselected cell (step S2113: YES), the process returns to step S2102 to select an unselected cell from the network table and repeat a series of processes.

一方、未選択のセルがない場合（ステップＳ２１１３：Ｎｏ）、ネットワークテーブルを記憶部に出力（保存）して（ステップＳ２１１５）、本フローチャートによる一連の処理を終了する。これにより、ネットワークモデル１２００内のノード間の相関関係をあらわすネットワークテーブル１３００を作成することができる。 On the other hand, if there is no unselected cell (step S2113: No), the network table is output (saved) to the storage unit (step S2115), and the series of processing according to this flowchart ends. Thereby, the network table 1300 showing the correlation between the nodes in the network model 1200 can be created.

（適合度の算出処理手順）
つぎに、検索クエリに対するブログの適合度を算出する算出処理手順について説明する。ここでは、算出部８０５により、図１２に示したネットワークモデル１２００（図１３に示したネットワークテーブル１３００）にランダムウォーク手法を適用することで、各ブログの適合度を算出する。ただし、ランダムウォークの規定回数をＮ、試行回数をｎ、終了確率をＰｔとする。 (Fitness calculation processing procedure)
Next, a calculation processing procedure for calculating the fitness of a blog with respect to a search query will be described. Here, the calculation unit 805 calculates the fitness of each blog by applying a random walk method to the network model 1200 shown in FIG. 12 (network table 1300 shown in FIG. 13). However, the prescribed number of random walks is N, the number of trials is n, and the end probability is Pt.

図２２は、ブログの適合度を算出する算出処理手順の一例を示すフローチャートである。図２２のフローチャートにおいて、まず、クラスタＣｊからをウェブＢｉへ遷移する遷移確率Ｘ（Ｃｊ−Ｂｉ）を「０」で初期化する（ステップＳ２２０１）。 FIG. 22 is a flowchart illustrating an example of a calculation processing procedure for calculating the fitness level of a blog. In the flowchart of FIG. 22, first, a transition probability X (Cj−Bi) for transition from the cluster Cj to the web Bi is initialized with “0” (step S2201).

ノードをあらわすクラスタＩＤとブログＩＤとを列に持つスコアテーブルを作成して、スコアを格納すべき各セルを「０」で初期化し（ステップＳ２２０１）、さらに、試行回数ｎを「０」で初期化する（ステップＳ２２０２）。 A score table having a cluster ID representing a node and a blog ID in a column is created, each cell in which a score is to be stored is initialized with “0” (step S2201), and the number of trials n is initialized with “0”. (Step S2202).

このあと、試行回数ｎに「１」をインクリメントして（ステップＳ２２０３）、ネットワークテーブル１３００のｆｒｏｍ列の中から１つのノードをランダムに選択して（ステップＳ２２０４）、スコアテーブルの選択されたノード（以下、「選択ノード」という）のセルに「１」をインクリメントする（ステップＳ２２０５）。 Thereafter, “1” is incremented to the number of trials n (step S2203), one node is randomly selected from the from column of the network table 1300 (step S2204), and the selected node ( Hereinafter, “1” is incremented in the cell of “selected node” (step S2205).

そして、選択ノード列の各要素に終了確率Ｐｔを加えた要素群から、１つの要素をランダムに選択する（ステップＳ２２０６）。このあと、ランダムウォークの終了が選択されたか否かを判断し（ステップＳ２２０７）、終了が選択されなかった場合（ステップＳ２２０７：Ｎｏ）、選択された要素に該当するノードを選択して（ステップＳ２２０８）、ステップＳ２２０５に戻る。 Then, one element is randomly selected from the element group in which the end probability Pt is added to each element of the selected node string (step S2206). Thereafter, it is determined whether or not the end of the random walk is selected (step S2207). If the end is not selected (step S2207: No), the node corresponding to the selected element is selected (step S2208). ), The process returns to step S2205.

すなわち、遷移終了が選択されるまで、ネットワークモデル１２００内のランダムウォークを継続し、各ノードに辿り着く都度、そのノードのスコア（適合度）となる遷移回数を計数する。 That is, the random walk in the network model 1200 is continued until the end of the transition is selected, and each time a node is reached, the number of transitions that becomes the score (fitness) of that node is counted.

一方、ランダムウォークの終了が選択された場合（ステップＳ２２０７：Ｙｅｓ）、試行回数ｎ≧規定回数Ｎか否かを判断する（ステップＳ２２０９）。ここで、試行回数ｎ＜規定回数Ｎの場合（ステップＳ２２０９：Ｎｏ）、ステップＳ２２０３に戻って一連の処理を繰り返す。 On the other hand, when the end of the random walk is selected (step S2207: Yes), it is determined whether or not the number of trials n ≧ the prescribed number N (step S2209). Here, when the number of trials n <the prescribed number N (step S2209: No), the process returns to step S2203 to repeat a series of processes.

一方、試行回数ｎ≧規定回数Ｎの場合（ステップＳ２２０９：Ｙｅｓ）、スコアテーブルを記憶部に出力（保存）して（ステップＳ２２１０）、本フローチャートによる一連の処理を終了する。 On the other hand, when the number of trials n ≧ the prescribed number N (step S2209: Yes), the score table is output (saved) to the storage unit (step S2210), and the series of processing according to this flowchart ends.

これにより、検索クエリに対する各ブログの適合度を、当該ブログをあらわすノードに遷移した遷移回数によってあらわすスコアテーブル１４００を作成することができる。これによれば、スコア（適合度）の高いトピックに関する記事を掲載しているブログほど高いスコアとなり、また、スコアの高いブログに掲載されている記事内に含まれているトピックは高いスコアとなる。 Accordingly, it is possible to create a score table 1400 that represents the degree of suitability of each blog with respect to the search query based on the number of times of transition to the node representing the blog. According to this, blogs that post articles on topics with high scores (fitness) have higher scores, and topics included in articles that are posted on blogs with higher scores have higher scores. .

（特徴語テーブルの作成処理手順）
つぎに、図１５に示した特徴語テーブル１５００を作成する作成処理手順について説明する。特徴語テーブル１５００は、各クラスタＣ１〜Ｃ４に分類された記事に出現する単語の出現頻度をリスト化したテーブル表である。ここでは、作成部８０６により、図７に示した解析結果テーブル７００と、図１０に示した記事クラスタテーブル１０００とを用いて、特徴語テーブル１５００を作成する。 (Feature word table creation process)
Next, a creation processing procedure for creating the feature word table 1500 shown in FIG. 15 will be described. The feature word table 1500 is a table that lists the appearance frequencies of words appearing in articles classified into the clusters C1 to C4. Here, the creation unit 806 creates the feature word table 1500 using the analysis result table 700 shown in FIG. 7 and the article cluster table 1000 shown in FIG.

図２３は、特徴語テーブルを作成する作成処理手順の一例を示すフローチャートである。図２３において、まず、クラスタＩＤを列に、単語（ブログ群に出現する全単語）を行に持つ特徴語テーブルを作成して数値を格納すべき各セルを「０」で初期化する（ステップＳ２３０１）。このあと、記事クラスタテーブル１０００の中から任意のクラスタを選択する（ステップＳ２３０２）。 FIG. 23 is a flowchart illustrating an example of a creation processing procedure for creating a feature word table. In FIG. 23, first, a feature word table having a cluster ID in a column and words (all words appearing in a blog group) in a row is created, and each cell in which a numerical value is to be stored is initialized to “0” (step) S2301). Thereafter, an arbitrary cluster is selected from the article cluster table 1000 (step S2302).

つぎに、選択されたクラスタ（以下、「選択クラスタ」という）の列から任意の記事を選択する（ステップＳ２３０３）。そして、解析結果テーブル７００を参照して、選択された記事（以下、「選択記事」という）の解析結果の中から任意の単語を選択する（ステップＳ２３０４）。このあと、特徴語テーブルの該当するセル（列：選択クラスタ、行：選択された単語）に選択記事の所属確率を加算する（ステップＳ２３０５）。 Next, an arbitrary article is selected from the column of the selected cluster (hereinafter referred to as “selected cluster”) (step S2303). Then, with reference to the analysis result table 700, an arbitrary word is selected from the analysis results of the selected article (hereinafter referred to as “selected article”) (step S2304). Thereafter, the affiliation probability of the selected article is added to the corresponding cell (column: selected cluster, row: selected word) of the feature word table (step S2305).

つぎに、選択記事の解析結果の中から選択されていない未選択の単語があるか否かを判断し（ステップＳ２３０６）、未選択の単語がある場合（ステップＳ２３０６：Ｙｅｓ）、ステップＳ２３０４に戻り、解析結果の中から未選択の単語を選択して一連の処理を繰り返す。 Next, it is determined whether or not there is an unselected word that is not selected from the analysis result of the selected article (step S2306). If there is an unselected word (step S2306: Yes), the process returns to step S2304. Then, an unselected word is selected from the analysis results and a series of processing is repeated.

一方、未選択の単語がない場合には（ステップＳ２３０６：Ｎｏ）、選択クラスタの列から選択されていない未選択の記事があるか否かを判断する（ステップＳ２３０７）。ここで、未選択の記事がある場合（ステップＳ２３０７：Ｙｅｓ）、ステップＳ２３０３に戻り、選択クラスタの列から未選択の記事を選択して一連の処理を繰り返す。 On the other hand, if there is no unselected word (step S2306: No), it is determined whether there is an unselected article that has not been selected from the selected cluster column (step S2307). If there is an unselected article (step S2307: YES), the process returns to step S2303, an unselected article is selected from the selected cluster column, and a series of processes is repeated.

一方、未選択の記事がない場合には（ステップＳ２３０７：Ｎｏ）、記事クラスタテーブル１０００の中から選択されていない未選択のクラスタがあるか否かを判断する（ステップＳ２３０８）。ここで、未選択のクラスタがある場合（ステップＳ２３０８：Ｙｅｓ）、ステップＳ２３０２に戻り、記事クラスタテーブル１０００の中から未選択のクラスタを選択して一連の処理を繰り返す。 On the other hand, if there is no unselected article (step S2307: No), it is determined whether there is an unselected cluster that has not been selected from the article cluster table 1000 (step S2308). If there is an unselected cluster (step S2308: Yes), the process returns to step S2302, and an unselected cluster is selected from the article cluster table 1000, and a series of processing is repeated.

一方、未選択のクラスタがない場合には（ステップＳ２３０８：Ｎｏ）、特徴語テーブルの各セルの値を重み付けする重み付け処理を実行し（ステップＳ２３０９）、最後に、特徴語テーブルを記憶部に出力（保存）して（ステップＳ２３１０）、本フローチャートによる一連の処理を終了する。これにより、各クラスタＣ１〜Ｃ４に分類された記事に出現する単語の出現頻度をリスト化してあらわす特徴語テーブル１５００を作成することができる。 On the other hand, when there is no unselected cluster (step S2308: No), a weighting process for weighting the value of each cell in the feature word table is executed (step S2309), and finally the feature word table is output to the storage unit. (Save) (step S2310), and a series of processing according to this flowchart is terminated. Thereby, it is possible to create the feature word table 1500 that lists the appearance frequencies of words appearing in articles classified into the clusters C1 to C4.

つぎに、図２３のステップＳ２３０９の重み付け処理の具体的処理手順について説明する。この重み付け処理は、すべての記事に共通して多く出現する単語の値を平準化することで、特定の記事に多く出現する単語の値に重み付けするものである。図２４は、重み付け処理の具体的処理手順の一例を示すフローチャートである。 Next, a specific processing procedure of the weighting process in step S2309 in FIG. 23 will be described. This weighting process weights the values of words that frequently appear in a specific article by leveling the values of words that frequently appear in all articles. FIG. 24 is a flowchart illustrating an example of a specific processing procedure of weighting processing.

図２４のフローチャートにおいて、まず、解析結果テーブル７００の記憶内容に基づいて、すべての単語の出現回数を単語ごとに計数する（ステップＳ２４０１）。このあと、特徴語テーブルの中から任意の単語を選択する（ステップＳ２４０２）。 In the flowchart of FIG. 24, first, the number of appearances of all words is counted for each word based on the stored contents of the analysis result table 700 (step S2401). Thereafter, an arbitrary word is selected from the feature word table (step S2402).

そして、選択された単語列の各セルの値をステップＳ２４０１において計数された該単語の出現回数で除算する（ステップＳ２４０３）。つぎに、特徴語テーブルの中から選択されていない未選択の単語があるか否かを判断する（ステップＳ２４０４）。 Then, the value of each cell of the selected word string is divided by the number of appearances of the word counted in step S2401 (step S2403). Next, it is determined whether there is an unselected word that has not been selected from the feature word table (step S2404).

ここで、未選択の単語がある場合（ステップＳ２４０４：Ｙｅｓ）、ステップＳ２４０２に戻り、特徴語テーブルの中から未選択の単語を選択して一連の処理を繰り返す。一方、未選択の単語がない場合には（ステップＳ２４０４：Ｎｏ）、図２３に示したステップＳ２３１０に移行する。 If there is an unselected word (step S2404: YES), the process returns to step S2402, selects an unselected word from the feature word table, and repeats a series of processes. On the other hand, when there is no unselected word (step S2404: No), the process proceeds to step S2310 shown in FIG.

これにより、記事の掲載内容にかかわらず出現頻度が高くなる助詞や記号などの単語に該当するセルの値が小さくなるため、結果的に、特定の記事に多く出現する単語に該当するセルの値に重み付けすることができる。 As a result, cell values corresponding to words such as particles and symbols that appear frequently regardless of the content of the article are reduced, and as a result, cell values corresponding to words that frequently appear in a specific article Can be weighted.

（検索結果の提示処理手順）
図２５は、検索結果の提示処理手順の一例を示すフローチャートである。ただし、検索結果として提示するブログの規定件数をＫとする。図２５のフローチャートにおいて、まず、ブログ件数をあらわすｋを「０」で初期化する（ステップＳ２５０１）。このあと、スコアテーブル１４００を参照して最高スコアのブログＩＤを特定し（ステップＳ２５０２）、ブログ件数ｋに「１」をインクリメントする（ステップＳ２５０３）。 (Search result presentation processing procedure)
FIG. 25 is a flowchart illustrating an example of a search result presentation processing procedure. However, the prescribed number of blogs presented as search results is K. In the flowchart of FIG. 25, first, k representing the number of blogs is initialized to “0” (step S2501). Thereafter, the blog ID having the highest score is identified with reference to the score table 1400 (step S2502), and “1” is incremented to the blog count k (step S2503).

つぎに、検索結果テーブル４００の中から、特定されたブログＩＤと、該ブログＩＤと関連付けられている記事ＩＤとを抽出する（ステップＳ２５０４）。また、ブログクラスタテーブル１１００を参照して、特定されたブログＩＤ行の中から所属確率が最大のクラスタＩＤを特定する（ステップＳ２５０５）。このあと、特徴語テーブル１５００を参照して、特定されたクラスタＩＤ行の中から値が最大の単語を抽出する（ステップＳ２５０６）。 Next, the identified blog ID and the article ID associated with the blog ID are extracted from the search result table 400 (step S2504). In addition, referring to the blog cluster table 1100, the cluster ID having the highest affiliation probability is identified from the identified blog ID rows (step S2505). Thereafter, referring to the feature word table 1500, the word having the maximum value is extracted from the identified cluster ID line (step S2506).

つぎに、ブログ件数ｋ≧規定件数Ｋ、または、スコアテーブル１４００内の全ブログＩＤを特定したか否かを判断する（ステップＳ２５０７）。ここで、ブログ件数ｋ＜規定件数Ｋと判断された場合（ステップＳ２５０７：Ｎｏ）、ステップＳ２５０２に戻り、未特定でかつ最高スコアのブログＩＤを特定して一連の処理を繰り返す。 Next, it is determined whether the number of blogs k ≧ the specified number K or whether all the blog IDs in the score table 1400 have been specified (step S2507). If it is determined that the number of blogs k <the specified number K (step S2507: No), the process returns to step S2502, and the blog ID having the highest score and the unspecified is specified, and a series of processes is repeated.

一方、ブログ件数ｋ≧規定件数Ｋ、または、スコアテーブル１４００内の全ブログＩＤを特定したと判断された場合（ステップＳ２５０７：Ｙｅｓ）、ステップＳ２５０４およびステップＳ２５０６において抽出されたブログＩＤ、記事ＩＤおよび単語を用いてページ情報を生成する（ステップＳ２５０８）。 On the other hand, if it is determined that the number of blogs k ≧ the specified number K or all the blog IDs in the score table 1400 have been specified (step S2507: Yes), the blog IDs, article IDs, and the IDs extracted in steps S2504 and S2506 Page information is generated using words (step S2508).

最後に、生成されたページ情報をクライアント端末１０３に提示して（ステップＳ２５０９）、本フローチャートによる一連の処理を終了する。これによれば、ブログの掲載内容を考慮した適切なスコアリングをおこなうことで最適化されたランキングに基づく検索結果を検索者に提示することができる。 Finally, the generated page information is presented to the client terminal 103 (step S2509), and a series of processing according to this flowchart is terminated. According to this, it is possible to present a search result based on the optimized ranking to the searcher by performing appropriate scoring in consideration of the content of the blog.

以上説明したように、本実施の形態によれば、各ページ（ブログ）の掲載内容（ブログに掲載されている記事の掲載内容）に基づくページとクラスタ（トピック）との相関関係を用いて、各ページのスコアリングをおこなうことができる。これにより、例えば、他のブログとのリンク関係が少ない、または、リンク関係を有していないブログについても適切なスコアリングをおこなうことができる。この結果、『重要なトピックを扱ったブログは重要』、『重要なブログに扱われているトピックは重要』という関係が成り立ち、ブログの掲載内容を考慮したランキングがおこなわれることとなる。 As described above, according to the present embodiment, using the correlation between pages and clusters (topics) based on the content of each page (blog) (the content of articles posted on the blog), Each page can be scored. Thereby, for example, appropriate scoring can be performed even for a blog that has little or no link relationship with other blogs. As a result, the relationships “important blogs that deal with important topics” and “important topics handled by important blogs” are established, and rankings are performed in consideration of the content of the blog.

また、ランダムウォーク手法による各ノードのスコアリングをおこなう際に、正規化されたページ間（ブログ間）の遷移確率を用いることで、リンク関係による影響を低減させることができる。これにより、例えば、他のブログと多くのリンク関係を有するブログが、掲載内容にかかわらず上位にランク付けされてしまう不具合を防ぐことができる。 Further, when scoring each node by the random walk method, the influence of the link relationship can be reduced by using the normalized transition probability between pages (between blogs). Thereby, for example, it is possible to prevent a blog having many link relationships with other blogs from being ranked higher regardless of the posted content.

また、検索者に検索クエリに関する複数のトピックの中から任意のトピックを選択させることにより、特定のトピックに関する掲載情報が掲載されているページのスコアを高くすることができる。この結果、例えば、検索者が興味のあるトピックに関する記事が掲載されているブログが上位にランク付けされやすくなり、検索者の意志がより反映された検索結果を提示することができる。 Further, by causing the searcher to select an arbitrary topic from among a plurality of topics related to the search query, it is possible to increase the score of a page on which posted information related to a specific topic is posted. As a result, for example, a blog on which an article related to a topic that the searcher is interested in is easily ranked higher, and a search result that reflects the searcher's will can be presented.

以上のことから、この情報提供方法、情報提供装置、情報提供プログラム、および該プログラムをコンピュータに記録した記録媒体によれば、ページの掲載内容を考慮した適切なスコアリングをおこなうことにより、検索結果のランキングを最適化し、検索システム１００におけるユーザの検索活動の効率化を図ることができる。 As described above, according to the information providing method, the information providing apparatus, the information providing program, and the recording medium in which the program is recorded on the computer, the search result can be obtained by performing appropriate scoring in consideration of the content of the page. Can be optimized, and the search system 100 can improve the efficiency of user search activities.

なお、本実施の形態で説明した情報提供方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネットなどのネットワークを介して配布することが可能な伝送媒体であってもよい。 The information providing method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）検索クエリを与えることで得られた少なくとも一つ以上の掲載情報を含むページの集合の中から選ばれた当該ページに掲載されている掲載情報を抽出する抽出工程と、
前記抽出工程によって抽出された掲載情報を掲載内容に応じて複数のクラスタに分類する分類工程と、
前記ページと前記分類工程によって分類されたクラスタとの相関関係に基づいて、前記検索クエリに対する前記ページの適合度を前記ページごとに算出する算出工程と、
前記算出工程によって算出された算出結果に基づいて、前記ページ群に関するページ情報を提示する提示工程と、
を含んだことを特徴とする情報提供方法。 (Supplementary Note 1) An extraction process for extracting the publication information posted on the page selected from a set of pages including at least one publication information obtained by giving a search query;
A classification step of classifying the posting information extracted by the extraction step into a plurality of clusters according to the posting content;
Based on the correlation between the page and the cluster classified by the classification step, a calculation step for calculating the degree of suitability of the page for the search query for each page;
Based on the calculation result calculated by the calculation step, a presentation step for presenting page information related to the page group;
An information providing method characterized by including

（付記２）前記ページと前記クラスタとをノードとして有向グラフ化されたネットワークモデルを作成する作成工程を含み、
前記算出工程は、
前記作成工程によって作成されたネットワークモデル内の前記ページをあらわすノードと前記クラスタをあらわすノードとの間を遷移する遷移確率に基づいて、前記ページの適合度を算出することを特徴とする付記１に記載の情報提供方法。 (Supplementary note 2) including a creation step of creating a directed network graph model with the page and the cluster as nodes,
The calculation step includes
The relevance of the page is calculated based on a transition probability of transition between a node representing the page and a node representing the cluster in the network model created by the creating step. Information providing method described.

（付記３）前記算出工程は、
さらに、前記ネットワークモデル内の前記ページをあらわすノード間を遷移する遷移確率に基づいて、前記ページの適合度を算出することを特徴とする付記２に記載の情報提供方法。 (Supplementary note 3)
The information providing method according to claim 2, further comprising: calculating the degree of conformity of the page based on a transition probability of transition between nodes representing the page in the network model.

（付記４）前記ページをあらわすノード間を遷移する遷移確率は正規化されていることを特徴とする付記３に記載の情報提供方法。 (Additional remark 4) The information provision method of Additional remark 3 characterized by normalizing the transition probability which changes between the nodes showing the said page.

（付記５）前記算出工程は、
前記ノード間を遷移する遷移確率と前記ノード間の遷移を終了する終了確率とを用いて、前記ノード群の中から１個のノードを任意に選択してから前記ノード間の遷移が終了するまでの間に、前記ノードに遷移した遷移回数を前記ノードごとに計数することにより、前記ページの適合度を算出することを特徴とする付記２〜４のいずれか一つに記載の情報提供方法。 (Supplementary Note 5) The calculation step includes
From arbitrarily selecting one node from the node group using the transition probability of transitioning between the nodes and the termination probability of ending transition between the nodes, until the transition between the nodes is completed 5. The information providing method according to any one of appendices 2 to 4, wherein the degree of matching of the page is calculated by counting the number of transitions to the node for each node during the period.

（付記６）前記分類工程は、
前記掲載情報に出現する単語の出現頻度と前記単語間の類似度とに基づいて、前記掲載情報を複数のクラスタに分類することを特徴とする付記１〜５のいずれか一つに記載の情報提供方法。 (Appendix 6)
The information according to any one of appendices 1 to 5, wherein the posted information is classified into a plurality of clusters based on an appearance frequency of words appearing in the posted information and a similarity between the words. How to provide.

（付記７）前記クラスタに分類された掲載情報に出現する単語の出現頻度に基づいて、前記クラスタを特徴付ける単語を前記クラスタごとに決定する決定工程を含み、
前記提示工程は、
さらに、前記決定工程によって決定された決定結果を前記検索クエリに関するトピックとして提示することを特徴とする付記６に記載の情報提供方法。 (Additional remark 7) The determination process which determines the word which characterizes the said cluster for every said cluster based on the appearance frequency of the word which appears in the publication information classified into the said cluster,
The presenting step includes
The information providing method according to claim 6, further comprising presenting the determination result determined in the determining step as a topic related to the search query.

（付記８）前記決定工程は、
前記掲載情報に出現する単語の出現回数を、前記ページ群に掲載されているすべての掲載情報に出現する前記単語の出現回数で除算した値に基づいて、前記クラスタを特徴付ける単語を決定することを特徴とする付記７に記載の情報提供方法。 (Supplementary note 8)
Determining a word that characterizes the cluster based on a value obtained by dividing the number of occurrences of the word appearing in the publication information by the number of occurrences of the word appearing in all the publication information published in the page group. The information providing method according to appendix 7, which is a feature.

（付記９）前記決定工程は、
前記掲載情報に出現する特定の品詞の単語の出現頻度に基づいて、前記クラスタを特徴付ける単語を決定することを特徴とする付記７または８に記載の情報提供方法。 (Supplementary note 9)
9. The information providing method according to appendix 7 or 8, wherein a word characterizing the cluster is determined based on an appearance frequency of a word with a specific part of speech appearing in the posted information.

（付記１０）前記決定工程によって決定された決定結果を前記トピックとして提示するトピック提示工程と、
前記トピック提示工程によって提示された複数のトピックの中から任意のトピックの選択を受け付ける選択工程と、を含み、
前記算出工程は、
前記選択工程によって選択されたトピックによって特徴付けられるクラスタをあらわすノードを選択してから前記ノード間の遷移が終了するまでの間に、前記各ノードに遷移した遷移回数を計数することにより、前記ページの適合度を算出することを特徴とする付記７〜９のいずれか一つに記載の情報提供方法。 (Additional remark 10) The topic presentation process which presents the determination result determined by the said determination process as the said topic,
A selection step for accepting selection of an arbitrary topic from among a plurality of topics presented by the topic presentation step,
The calculation step includes
The page is counted by counting the number of transitions made to each node between the time when the node representing the cluster characterized by the topic selected by the selection step is selected and the time when the transition between the nodes ends. The information provision method according to any one of appendices 7 to 9, wherein the degree of fitness of the information is calculated.

（付記１１）前記ページ群には、当該ページ群に含まれる他のページとリンク関係を有していないページが少なくとも１つ以上含まれていることを特徴とする付記１〜１０のいずれか一つに記載の情報提供方法。 (Supplementary Note 11) Any one of Supplementary Notes 1 to 10, wherein the page group includes at least one page that does not have a link relationship with another page included in the page group. Information providing method described in one.

（付記１２）検索クエリを与えることで得られた少なくとも一つ以上の掲載情報を含むページの集合の中から選ばれた当該ページに掲載されている掲載情報を抽出する抽出手段と、
前記抽出手段によって抽出された掲載情報を掲載内容に応じて複数のクラスタに分類する分類手段と、
前記ページと前記分類手段によって分類されたクラスタとの相関関係に基づいて、前記検索クエリに対する前記ページの適合度を前記ページごとに算出する算出手段と、
前記算出手段によって算出された算出結果に基づいて、前記ページ群に関するページ情報を提示する提示手段と、
を備えることを特徴とする情報提供装置。 (Supplementary Note 12) Extraction means for extracting the publication information posted on the page selected from the set of pages including at least one publication information obtained by giving a search query;
Classification means for classifying the publication information extracted by the extraction means into a plurality of clusters according to the contents of the publication;
Calculation means for calculating the degree of suitability of the page for the search query for each page based on the correlation between the page and the clusters classified by the classification means;
Presenting means for presenting page information related to the page group based on the calculation result calculated by the calculating means;
An information providing apparatus comprising:

（付記１３）コンピュータを、
検索クエリを与えることで得られた少なくとも一つ以上の掲載情報を含むページの集合の中から選ばれた当該ページに掲載されている掲載情報を抽出する抽出手段、
前記抽出手段によって抽出された掲載情報を掲載内容に応じて複数のクラスタに分類する分類手段、
前記ページと前記分類手段によって分類されたクラスタとの相関関係に基づいて、前記検索クエリに対する前記ページの適合度を前記ページごとに算出する算出手段、
前記算出手段によって算出された算出結果に基づいて、前記ページ群に関するページ情報を提示する提示手段、
として機能させることを特徴とする情報提供プログラム。 (Supplementary note 13)
An extraction means for extracting the publication information posted on the selected page from a set of pages including at least one publication information obtained by giving a search query;
Classification means for classifying the posting information extracted by the extraction means into a plurality of clusters according to the posting content,
Calculation means for calculating, for each page, the degree of suitability of the page with respect to the search query based on the correlation between the page and the cluster classified by the classification means;
Presenting means for presenting page information related to the page group based on the calculation result calculated by the calculating means;
An information providing program characterized by functioning as

（付記１４）付記１２に記載の情報提供プログラムを記録したコンピュータに読み取り可能な記録媒体。 (Supplementary Note 14) A computer-readable recording medium on which the information providing program according to Supplementary Note 12 is recorded.

検索システムのシステム構成図である。It is a system configuration figure of a search system. コンピュータ装置のハードウェア構成を示す説明図である。It is explanatory drawing which shows the hardware constitutions of a computer apparatus. 本実施の形態の概要を示す説明図である。It is explanatory drawing which shows the outline | summary of this Embodiment. 検索結果テーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a search result table. 記事リンクテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of an article link table. ブログリンクテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a blog link table. 解析結果テーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of an analysis result table. 情報提供装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of an information provision apparatus. 解析結果の一例を示す説明図である。It is explanatory drawing which shows an example of an analysis result. 記事クラスタテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of an article cluster table. ブログクラスタテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a blog cluster table. ネットワークモデルの一例を示す有向グラフである。It is a directed graph which shows an example of a network model. ネットワークテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a network table. スコアテーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a score table. 特徴語テーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a feature word table. ディスプレイに表示される画面例を示す説明図（その１）である。It is explanatory drawing (the 1) which shows the example of a screen displayed on a display. ディスプレイに表示される画面例を示す説明図（その２）である。It is explanatory drawing (the 2) which shows the example of a screen displayed on a display. クライアント端末における検索手順の一例を示す説明図である。It is explanatory drawing which shows an example of the search procedure in a client terminal. ブログリンクテーブルを作成する作成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the creation process procedure which produces a blog link table. ブログクラスタテーブルを作成する作成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the creation process procedure which produces a blog cluster table. ネットワークテーブルを作成する作成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the creation process procedure which produces a network table. ブログの適合度を算出する算出処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the calculation processing procedure which calculates the fitness of a blog. 特徴語テーブルを作成する作成処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the creation process procedure which produces a feature word table. 重み付け処理の具体的処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the specific process sequence of a weighting process. 検索結果の提示処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the presentation process procedure of a search result.

１００検索システム
１０１情報提供装置
１０２データベースサーバ
１０３，１０３−１〜１０３−ｎクライアント端末
４００検索結果テーブル
５００記事リンクテーブル
６００ブログリンクテーブル
７００解析結果テーブル
８０１受付部
８０２検索部
８０３抽出部
８０４分類部
８０５算出部
８０６作成部
８０７提示部
８０８決定部
８０９選択部
１０００記事クラスタテーブル
１１００ブログクラスタテーブル
１２００ネットワークモデル
１３００ネットワークテーブル
１４００スコアテーブル
１５００特徴語テーブル
１６００，１７００，１８１０，１８２０検索結果 DESCRIPTION OF SYMBOLS 100 Search system 101 Information provision apparatus 102 Database server 103,103-1 to 103-n Client terminal 400 Search result table 500 Article link table 600 Blog link table 700 Analysis result table 801 Reception part 802 Search part 803 Extraction part 804 Classification part 805 Calculation unit 806 Creation unit 807 Presentation unit 808 Determination unit 809 Selection unit 1000 Article cluster table 1100 Blog cluster table 1200 Network model 1300 Network table 1400 Score table 1500 Feature word table 1600, 1700, 1810, 1820 Search results

Claims

Display each set name of multiple sets formed according to each posting content of multiple web pages searched according to the search query,
Displaying title information of web pages belonging to the set of the selected set names among the set names of the plurality of displayed sets,
An information providing program for causing a computer to execute processing.

Display each set name of a plurality of sets formed according to each posting content of a plurality of web pages searched according to the search query, and select from among the set names of the displayed plurality of sets A control unit that displays title information of web pages belonging to a set of set names,
An information providing apparatus.

A method for providing a search service executed by a server that outputs a search result to the client in response to a search request from a client,
Providing each client with a set name of each of a plurality of sets formed according to the contents of each of a plurality of web pages searched according to a search query transmitted from the client;
When any one of the set names of the plurality of sets is selected by the client, the title information of the web page belonging to the selected set of the set names is provided to the client.
A search service providing method executed by a server.