JP2008269392A

JP2008269392A - Device, method, and program for processing web page information

Info

Publication number: JP2008269392A
Application number: JP2007112831A
Authority: JP
Inventors: Manabu Satsusano; 学颯々野; Toshiyuki Maezawa; 敏之前澤
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2007-04-23
Filing date: 2007-04-23
Publication date: 2008-11-06
Anticipated expiration: 2027-04-23
Also published as: JP4808181B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device, a method, and a program for avoiding, as much as possible, a situation in which a user cannot acquire desired information to be shown across a plurality of Web pages. <P>SOLUTION: A device for processing Web page (2) includes a Web page acquisition means acquiring a Web page including characteristic expression data expressing that content is shown across the plurality of Web pages; an aggregate data generation means generating aggregate data configured to enable aggregating web pages of which the content items are related to one another out of Web pages acquired by the acquisition means; and a result data generation means generating result data configured to be displayed on a user terminal (81) in an aspect in which Web pages that can be aggregated are aggregated with the aggregate data. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ウェブページ情報処理装置、ウェブページ情報処理方法、及びウェブページ情報処理プログラムに関する。 The present invention relates to a web page information processing apparatus, a web page information processing method, and a web page information processing program.

近年、インターネットを介して通信可能なユーザ端末を用いることにより、誰もが自分の日記、意見等をウェブページに公開することが可能となり、ブログ（Ｗｅｂｌｏｇの略）を代表とするウェブページが多数存在している。 In recent years, it has become possible for anyone to publish their diaries, opinions, etc. on a web page by using a user terminal that can communicate via the Internet, and there are many web pages represented by blogs (abbreviations of Weblog). Existing.

そこで、自分の興味のある事柄に対する他のユーザの体験談、例えば、旅行記、新製品のレポート、新規に開店したお店のレポート、映画のリポート等を閲覧したい場合には、検索エンジンを利用して閲覧したい事柄に関連するウェブページを絞り込む。このとき、絞り込まれたウェブページには、閲覧したい事柄に対する関連が高いウェブページ及び関連が低いウェブページが含まれている。 So, if you want to view other users' experiences about what you are interested in, such as travel reports, new product reports, newly opened store reports, movie reports, etc., use a search engine. And narrow down the web pages related to the things you want to browse. At this time, the narrowed-down web pages include web pages that are highly related to matters to be browsed and web pages that are less relevant.

近頃、閲覧しようとするウェブページから必要な部分だけを自動的に取り出すことができる技術が提案されている（例えば、特許文献１）。この技術によれば、例えば、ユーザが所望する情報とは別の情報が一のウェブページに含まれている場合であっても、所望する情報が含まれる必要な部分が自動的に取得されるので、一のウェブページから所望する情報を取得するときのユーザへの負担を低減することができる。
特開２００６−２７７０９０号公報 Recently, a technique that can automatically extract only a necessary portion from a web page to be browsed has been proposed (for example, Patent Document 1). According to this technology, for example, even when information different from information desired by a user is included in one web page, a necessary portion including desired information is automatically acquired. Therefore, it is possible to reduce the burden on the user when obtaining desired information from one web page.
JP 2006-277090 A

しかしながら、上記技術によれば、所望する情報が複数のウェブページに跨って示されている場合に、一のウェブページに示される所望する情報の一部は効率よく取得できるが、残りの所望する情報が示されているウェブページを特定することが困難であることから、結果として所望する情報を取得できないという事態が生じるおそれがあった。また、従来ウェブページの情報は、検索エンジンに所望する情報を表すキーワードを入力して検索することが一般的であるが、検索結果は、ページ毎に順位付けされて表示される。そのため、所望する情報をトピックとするページを見つけたとしても、当該トピックが複数のページに跨るものである場合、当該ページから更にリンクを辿って、前後のページを探さなければならなかった。 However, according to the above technique, when the desired information is shown across a plurality of web pages, a part of the desired information shown on one web page can be efficiently acquired, but the remaining desired information Since it is difficult to specify the web page on which the information is shown, there is a possibility that the desired information cannot be acquired as a result. Conventionally, web page information is generally searched by inputting a keyword representing desired information to a search engine, but search results are ranked and displayed for each page. For this reason, even if a page having a topic of desired information is found, if the topic spans a plurality of pages, it is necessary to follow the link further from the page and search for the previous and subsequent pages.

そこで、本発明の目的は、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる装置、方法、プログラムを提供することである。 Therefore, an object of the present invention is to provide an apparatus, a method, and a program capable of avoiding as much as possible a situation in which a user cannot acquire desired information shown across a plurality of web pages.

以上のような課題を解決すべく、本発明は、ウェブページ情報処理装置、ウェブページ情報処理方法、及びウェブページ情報処理プログラムにおいて、キーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるように構成され、且つキーワードに関連する集約データにより集約可能なウェブページがユーザ端末により表示されるように構成される結果データを生成することを特徴とする。 In order to solve the problems as described above, in the web page information processing apparatus, the web page information processing method, and the web page information processing program, the present invention aggregates web pages that can be aggregated by aggregated data related to keywords. According to another aspect of the present invention, it is configured to generate result data configured to be displayed by a user terminal and configured to be displayed by a user terminal on a web page that can be aggregated by aggregated data related to a keyword.

より具体的には、以下のようなものを提供する。 More specifically, the following is provided.

（１）ネットワークを介してユーザ端末と通信可能に構成され、前記ユーザ端末を用いてユーザにより指定されたキーワードに基づいて検索されたウェブページが前記ユーザ端末により表示されるように構成される結果データを前記ユーザ端末に送信するウェブページ情報処理装置であって、
複数のウェブページに跨って内容が示されることを表す特徴表現データが含まれるウェブページを、前記ユーザ端末を用いてユーザにより管理される多数のウェブページから取得するウェブページ取得手段と、
前記ウェブページ取得手段により取得されたウェブページのうち内容が関連するウェブページを集約可能に構成された集約データを生成する集約データ生成手段と、
前記キーワードに関連する集約データにより集約可能なウェブページが集約された態様で前記ユーザ端末により表示されるように構成され、且つ前記キーワードに関連する集約データにより集約可能なウェブページが前記ユーザ端末により表示されるように構成される結果データを生成する結果データ生成手段と、
前記結果データ生成手段によって生成された結果データを前記ユーザ端末に送信する検索結果送信手段と、
を備えたことを特徴とするウェブページ情報処理装置。 (1) Result configured to be communicable with a user terminal via a network and configured to display a web page searched based on a keyword designated by the user using the user terminal. A web page information processing apparatus for transmitting data to the user terminal,
Web page acquisition means for acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a number of web pages managed by the user using the user terminal;
Aggregated data generating means for generating aggregated data configured to be capable of aggregating web pages related to the content among the web pages acquired by the web page acquiring means;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. Result data generating means for generating result data configured to be displayed;
Search result transmitting means for transmitting result data generated by the result data generating means to the user terminal;
A web page information processing apparatus comprising:

（１）の構成によれば、検索結果送信手段により、キーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるように構成され、且つキーワードに関連する集約データにより集約可能なウェブページがユーザ端末により表示されるように構成される結果データがユーザ端末に送信される。 According to the structure of (1), it is comprised so that the web page which can be aggregated by the aggregated data relevant to a keyword by the search result transmission means will be displayed by the user terminal in an aggregated manner, and the aggregate related to the keyword. Result data configured such that a web page that can be aggregated by data is displayed by the user terminal is transmitted to the user terminal.

故に、所望する情報が複数のウェブページに跨って示される場合であっても、ユーザが所望する情報に係るキーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるので、所望する情報が示されているウェブページを容易に特定することができるようになる。 Therefore, even when the desired information is shown across multiple web pages, the web page that can be aggregated by aggregated data related to the keyword related to the information desired by the user is displayed on the user terminal. Therefore, it becomes possible to easily specify the web page on which the desired information is shown.

したがって、上記構成によれば、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる。 Therefore, according to the said structure, the situation where a user cannot acquire the desired information shown ranging over a some web page can be avoided as much as possible.

（２）（１）に記載のウェブページ情報処理装置であって、
前記ウェブページ取得手段は、前記特徴表現データである、一連の話が始まることを示す開始データ、一連の話が継続することを示す継続データ、及び一連の話が終わることを示す終了データのうち少なくとも１つが含まれるウェブページを取得することを特徴とするウェブページ情報処理装置。 (2) The web page information processing apparatus according to (1),
The web page acquisition means includes the feature expression data, start data indicating that a series of stories starts, continuation data indicating that a sequence of stories continues, and end data indicating that a series of stories is over A web page information processing apparatus that acquires a web page including at least one.

（２）の構成によれば、一連の話が始まることを示すウェブページをユーザが閲覧したい場合に、このウェブページをユーザが探す手間を極力省くことができる。 According to the configuration of (2), when the user wants to browse a web page indicating that a series of stories starts, it is possible to save the user from searching for this web page as much as possible.

（３）（２）に記載のウェブページ情報処理装置であって、
前記ウェブページ取得手段により取得されたウェブページに含まれる日時を示す日時情報及び期間を示す期間情報に基づいて、一連の話が継続することを示すウェブページ及び一連の話が終了することを示すウェブページを判別するウェブページ判別手段を備え、
前記集約データ生成手段は、前記ウェブページ判別手段により判別された判別情報を含んで集約データを生成し、
前記結果データ生成手段は、前記キーワードに関連する集約データに含まれる判別情報に基づいて判別されるウェブページが識別された態様で前記ユーザ端末により表示されるように構成される結果データを生成することを特徴とするウェブページ情報処理装置。 (3) The web page information processing apparatus according to (2),
Based on the date and time information indicating the date and time included in the web page acquired by the web page acquisition means and the period information indicating the period, the web page indicating that the series of stories continues and the series of stories are ended. Web page discrimination means for discriminating web pages is provided,
The aggregate data generation means generates aggregate data including the discrimination information determined by the web page determination means,
The result data generation unit generates result data configured to be displayed by the user terminal in a manner in which a web page determined based on determination information included in aggregated data related to the keyword is identified. A web page information processing apparatus.

（３）の構成によれば、キーワードに関連するウェブページであったとしても、集約されるウェブページから一連の話に関わりがないウェブページが排除された態様でユーザ端末により表示されるので、複数のウェブページに跨って示される一連の話を続けて閲覧するときのユーザの負担を低減することができる。 According to the configuration of (3), even if it is a web page related to a keyword, it is displayed by the user terminal in a manner in which web pages that are not related to a series of stories are excluded from the aggregated web pages. It is possible to reduce the burden on the user when continuously browsing a series of stories shown across a plurality of web pages.

（４）（１）から（３）のいずれかに記載のウェブページ情報処理装置であって、
前記集約データ生成手段は、前記ウェブページ取得手段により取得されたウェブページのＵＲＬの情報を含んで集約データを生成するウェブページ情報処理装置。 (4) The web page information processing apparatus according to any one of (1) to (3),
The aggregated data generation unit is a web page information processing apparatus that generates aggregated data including information on the URL of the web page acquired by the web page acquisition unit.

（４）の構成によれば、ウェブページ取得手段により取得されたウェブページのＵＲＬの情報を含んで集約データが生成されるので、ＵＲＬの情報が対応付けられたウェブページがユーザ端末により表示されるように構成される結果データを生成することができる。 According to the configuration of (4), since the aggregated data is generated including the information on the URL of the web page acquired by the web page acquisition unit, the web page associated with the URL information is displayed on the user terminal. Result data configured to be generated can be generated.

（５）（１）から（４）のいずれかに記載のウェブページ情報処理装置であって、
前記集約データ生成手段は、前記ウェブページ取得手段により取得されたウェブページのうちウェブページの類似度が相対的に高いウェブページを集約可能に構成された集約データを生成することを特徴とするウェブページ情報処理装置。 (5) The web page information processing apparatus according to any one of (1) to (4),
The aggregated data generation unit generates aggregated data configured to be capable of aggregating web pages having relatively high web page similarity among the web pages acquired by the web page acquiring unit. Page information processing device.

（５）の構成によれば、キーワードに関連するウェブページであったとしても、集約されるウェブページから類似度の低いウェブページが排除された態様でユーザ端末により表示されるので、複数のウェブページに跨って示される一連の話を続けて閲覧するときのユーザの負担を低減することができる。 According to the configuration of (5), even if it is a web page related to a keyword, it is displayed by the user terminal in a manner in which web pages with low similarity are excluded from the aggregated web pages. It is possible to reduce the burden on the user when browsing a series of stories shown across pages.

（６）ネットワークを介してユーザ端末と通信可能に構成されたウェブページ情報処理装置が、前記ユーザ端末を用いてユーザにより指定されたキーワードに基づいて検索されたウェブページが前記ユーザ端末により表示されるように構成される結果データを前記ユーザ端末に送信するウェブページ情報処理方法であって、
複数のウェブページに跨って内容が示されることを表す特徴表現データが含まれるウェブページを、前記ユーザ端末を用いてユーザにより管理される多数のウェブページから取得するウェブページ取得ステップと、
前記ウェブページ取得ステップにより取得されたウェブページのうち内容が関連するウェブページを集約可能に構成された集約データを生成する集約データ生成ステップと、
前記キーワードに関連する集約データにより集約可能なウェブページが集約された態様で前記ユーザ端末により表示されるように構成され、且つ前記キーワードに関連する集約データにより集約可能なウェブページが前記ユーザ端末により表示されるように構成される結果データを生成する結果データ生成ステップと、
前記結果データ生成ステップによって生成された結果データを前記ユーザ端末に送信する検索結果送信ステップと、
を備えたことを特徴とするウェブページ情報処理方法。 (6) A web page information processing apparatus configured to be communicable with a user terminal via a network displays a web page searched based on a keyword designated by the user using the user terminal. A web page information processing method for transmitting result data configured to be transmitted to the user terminal,
A web page acquisition step of acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a large number of web pages managed by the user using the user terminal;
An aggregated data generating step for generating aggregated data configured to be able to aggregate the webpages whose contents are related among the webpages acquired by the webpage acquiring step;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. A result data generation step for generating result data configured to be displayed;
A search result transmission step of transmitting the result data generated by the result data generation step to the user terminal;
A web page information processing method characterized by comprising:

（６）の構成によれば、所望する情報が複数のウェブページに跨って示される場合であっても、ユーザが所望する情報に係るキーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるので、所望する情報が示されているウェブページを容易に特定することができるようになる。この構成によれば、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる。 According to the configuration of (6), even if the desired information is shown across a plurality of web pages, the web pages that can be aggregated by the aggregated data related to the keywords related to the information desired by the user are aggregated. Since it is displayed by the user terminal in such a manner, it becomes possible to easily specify the web page on which the desired information is shown. According to this configuration, it is possible to avoid as much as possible the situation in which the user cannot obtain desired information shown across a plurality of web pages.

（７）ネットワークを介してユーザ端末と通信可能に構成されたウェブページ情報処理装置が、前記ユーザ端末を用いてユーザにより指定されたキーワードに基づいて検索されたウェブページが前記ユーザ端末により表示されるように構成される結果データを前記ユーザ端末に送信するためのウェブページ情報処理プログラムであって、
複数のウェブページに跨って内容が示されることを表す特徴表現データが含まれるウェブページを、前記ユーザ端末を用いてユーザにより管理される多数のウェブページから取得するウェブページ取得ステップと、
前記ウェブページ取得ステップにより取得されたウェブページのうち内容が関連するウェブページを集約可能に構成された集約データを生成する集約データ生成ステップと、
前記キーワードに関連する集約データにより集約可能なウェブページが集約された態様で前記ユーザ端末により表示されるように構成され、且つ前記キーワードに関連する集約データにより集約可能なウェブページが前記ユーザ端末により表示されるように構成される結果データを生成する結果データ生成ステップと、
前記結果データ生成ステップによって生成された結果データを前記ユーザ端末に送信する検索結果送信ステップと、
をコンピュータに実行させることを特徴とするウェブページ情報処理プログラム。 (7) A web page information processing apparatus configured to be communicable with a user terminal via a network displays a web page searched based on a keyword designated by the user using the user terminal. A web page information processing program for transmitting result data configured to the user terminal,
A web page acquisition step of acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a large number of web pages managed by the user using the user terminal;
An aggregated data generating step for generating aggregated data configured to be able to aggregate the webpages whose contents are related among the webpages acquired by the webpage acquiring step;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. A result data generation step for generating result data configured to be displayed;
A search result transmission step of transmitting the result data generated by the result data generation step to the user terminal;
Web page information processing program characterized by causing a computer to execute.

（７）の構成によれば、所望する情報が複数のウェブページに跨って示される場合であっても、ユーザが所望する情報に係るキーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるので、所望する情報が示されているウェブページを容易に特定することができるようになる。この構成によれば、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる。 According to the configuration of (7), even if the desired information is shown across a plurality of web pages, web pages that can be aggregated by aggregated data related to keywords related to the information desired by the user are aggregated. Since it is displayed by the user terminal in such a manner, it becomes possible to easily specify the web page on which the desired information is shown. According to this configuration, it is possible to avoid as much as possible the situation in which the user cannot obtain desired information shown across a plurality of web pages.

本発明によれば、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる。 ADVANTAGE OF THE INVENTION According to this invention, the situation where a user cannot acquire the desired information shown ranging over a some web page can be avoided as much as possible.

以下、本発明の実施形態について図１〜図１０を例に挙げて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to FIGS.

［ウェブページ情報処理システム１の構成］
図１を参照して、ウェブページ情報処理システム１の構成について説明する。 [Configuration of Web Page Information Processing System 1]
The configuration of the web page information processing system 1 will be described with reference to FIG.

図１に示すように、ウェブページ情報処理システム１は、ウェブページ情報処理装置２、ユーザ端末８１、ネットワーク８２、及び公開データベースサーバ９０から構成されている。ウェブページ情報処理システム１では、ネットワーク８２を介してウェブページ情報処理装置２、ユーザ端末８１、及び公開データベースサーバ９０が相互に通信可能に構成されている。 As shown in FIG. 1, the web page information processing system 1 includes a web page information processing apparatus 2, a user terminal 81, a network 82, and a public database server 90. In the web page information processing system 1, the web page information processing apparatus 2, the user terminal 81, and the public database server 90 are configured to be able to communicate with each other via a network 82.

ユーザ端末８１は、インターネット上で公開されている文書や画像などのコンテンツを表示させるためのウェブページをユーザが検索、閲覧等することができるように構成されている。公開データベースサーバ９０は、データベース（例えば、後述のブログＤＢ９１）を備えている。公開データベースサーバ９０は、外部（例えば、ユーザ端末８１）からの要求に応じてデータベースを操作（例えば、データベースに格納しているデータからその要求に応じたデータを取得）して、要求元に要求に応じたデータを送信する。なお、ウェブページ情報処理装置２の構成については後述する。 The user terminal 81 is configured so that the user can search and browse web pages for displaying content such as documents and images published on the Internet. The public database server 90 includes a database (for example, a blog DB 91 described later). The public database server 90 operates the database in response to a request from the outside (for example, the user terminal 81) (for example, obtains data according to the request from the data stored in the database), and requests the request source. Send data according to. The configuration of the web page information processing apparatus 2 will be described later.

［ウェブページ情報処理装置２の構成］
ウェブページ情報処理装置２は、ブログ解析サーバ１０と検索サーバ２０とデータベースサーバ３０とで構成される。 [Configuration of Web Page Information Processing Apparatus 2]
The web page information processing apparatus 2 includes a blog analysis server 10, a search server 20, and a database server 30.

ブログ解析サーバ１０は、記事取得部１１、特徴表現抽出部１２、開始・終了記事判別部１３、継続記事判別部１４、ストーリー集約部１５、及びインデックス生成部１６を備えている。 The blog analysis server 10 includes an article acquisition unit 11, a feature expression extraction unit 12, a start / end article determination unit 13, a continuous article determination unit 14, a story aggregation unit 15, and an index generation unit 16.

記事取得部１１は、ネットワーク８２を介して通信可能な公開データベースサーバ９０が有するブログＤＢ９１から、個人や数人のグループで運営され、管理されるウェブページで公開されている記事、いわゆるブログに掲載されている記事（以下「記事」という）を取得する。より詳細には、記事取得部１１は、通信可能な複数の公開データベースサーバ９０が有する複数のブログＤＢ９１から記事を取得する。 The article acquisition unit 11 is posted on a so-called blog, which is published on a web page managed and managed by an individual or a group of several people, from the blog DB 91 included in the public database server 90 that can communicate via the network 82. Articles that have been posted (hereinafter referred to as "articles") are acquired. More specifically, the article acquisition unit 11 acquires articles from a plurality of blog DBs 91 included in a plurality of public database servers 90 capable of communication.

特徴表現抽出部１２は、データベースサーバ３０が有する文例ＤＢ３１から、複数の記事に跨って内容が示されることを表す特徴表現データ（より詳細には、一連の話が始まることを示す開始データ、一連の話が継続することを示す継続データ、及び一連の話が終わることを示す終了データ）を取得する。なお、特徴表現データの詳細については後述する。そして、特徴表現抽出部１２は、記事取得部１１により取得された記事から、特徴表現データが含まれる記事を抽出する。 The feature expression extraction unit 12 includes feature expression data indicating that contents are shown across a plurality of articles from the sentence example DB 31 of the database server 30 (more specifically, start data indicating that a series of stories starts, a series of Continuation data indicating that the story of (1) continues, and end data indicating that the series of stories ends). Details of the feature expression data will be described later. Then, the feature expression extraction unit 12 extracts an article including feature expression data from the article acquired by the article acquisition unit 11.

ここで、文例ＤＢ３１は、データベースサーバ３０に設けられているがこれに限られるものではない。例えば、ブログ解析サーバ１０に設けられてもよいし、ウェブページ情報処理装置２の外部にある所定のデータベースサーバに設けられてもよい。なお、インデックスＤＢ３２についても同様のことが言える。他方、ブログＤＢ９１は、ウェブページ情報処理装置２の外部にある公開データベースサーバ９０に設けられているがこれに限られるものではない。例えば、ブログＤＢ９１は、ウェブページ情報処理装置２の内部に設けられてもよい。即ち、ウェブページ情報処理装置２がブログＤＢ９１を含む構成であってもよい。 Here, the sentence example DB 31 is provided in the database server 30, but is not limited thereto. For example, it may be provided in the blog analysis server 10 or may be provided in a predetermined database server outside the web page information processing apparatus 2. The same applies to the index DB 32. On the other hand, the blog DB 91 is provided in the public database server 90 outside the web page information processing apparatus 2, but is not limited thereto. For example, the blog DB 91 may be provided inside the web page information processing apparatus 2. That is, the web page information processing apparatus 2 may include the blog DB 91.

このようなことを踏まえると、ウェブページ情報処理装置２は、複数のウェブページに跨って内容が示されることを表す特徴表現データが含まれるウェブページを、ユーザ端末８１を用いてユーザにより管理される多数のウェブページから取得するウェブページ取得手段を備えていると言える。 Based on this, the web page information processing apparatus 2 manages a web page including feature expression data indicating that contents are shown across a plurality of web pages by the user using the user terminal 81. It can be said that a web page obtaining means for obtaining from a large number of web pages is provided.

開始・終了記事判別部１３は、特徴表現抽出部１２により抽出された記事から、一連の話が始まることを示す記事（以下「開始記事」という）及び一連の話が終わることを示す記事（以下「終了記事」という）を判別する。より詳細には、開始・終了記事判別部１３は、開始データが含まれる記事を開始記事として判別し、終了データが含まれる記事を終了記事として判別する。 The start / end article discriminating unit 13 includes an article indicating that a series of stories starts (hereinafter referred to as “starting article”) and an article indicating that the series of stories ends (hereinafter referred to as “starting articles”) from the articles extracted by the feature expression extracting unit 12. "End article"). More specifically, the start / end article determination unit 13 determines an article including start data as a start article, and determines an article including end data as an end article.

継続記事判別部１４は、特徴表現抽出部１２により抽出された記事から、一連の話が継続することを示す記事（以下「継続記事」という）を判別する。より詳細には、継続記事判別部１４は、継続データが含まれる記事を継続記事として判別する。 The continuation article determination unit 14 determines an article (hereinafter referred to as “continuation article”) indicating that a series of stories continues from the article extracted by the feature expression extraction unit 12. More specifically, the continuation article determination unit 14 determines an article including continuation data as a continuation article.

ストーリー集約部１５は、特徴表現抽出部１２により抽出された記事から、記事の内容が関連する記事（いわゆる一連のストーリーに関わる記事）を集約する。より詳細には、ストーリー集約部１５は、一の記事及び他の記事のうち少なくとも一方に含まれる日時を示す日時情報及び期間を示す期間情報に基づいて一の記事と他の記事とが関連するか否かを判別して、関連があると判別した場合に一の記事と他の記事とを集約する。例えば、ストーリー集約部１５は、一の記事に日時情報「１２月１０日出発」及び期間情報「３泊４日の旅行に行きます」が含まれている場合に、他の記事に日時情報「１２月１２日の朝」が含まれているならば、これらの記事には関連があると判別する。 The story aggregating unit 15 aggregates articles related to the contents of articles (articles related to a series of stories) from the articles extracted by the feature expression extracting unit 12. More specifically, the story aggregating unit 15 associates one article with another article based on date information indicating date and time and period information indicating a period included in at least one of the one article and another article. If one is determined to be related, one article and another article are aggregated. For example, when the date and time information “Departure on December 10” and the period information “I will go on a trip for 3 days and 4 nights” are included in one article, the story aggregation unit 15 includes the date and time information “ If "December 12 morning" is included, it is determined that these articles are related.

これに加え、又はこれに代えて、ストーリー集約部１５は、記事の類似度に基づいて、一連のストーリーに関わる記事を集約する。即ち、ストーリー集約部１５は、特徴表現抽出部１２により抽出された記事のうち記事の類似度が相対的に高い記事を集約する。例えば、開始記事のタイトルに含まれる単語又はフレーズに重みをつけて、開始記事に対する類似度を求める。このとき、ストーリー集約部１５は、類似度が特定の値より大きい場合に関連があると判別することが好適である。なお、類似度を求める方法は、公知の技術である。 In addition to or instead of this, the story aggregating unit 15 aggregates articles related to a series of stories based on the similarity of the articles. That is, the story aggregating unit 15 aggregates articles having a relatively high degree of similarity among articles extracted by the feature expression extracting unit 12. For example, weights are added to words or phrases included in the title of the start article to obtain a similarity to the start article. At this time, it is preferable that the story aggregation unit 15 determines that there is a relationship when the similarity is greater than a specific value. The method for obtaining the similarity is a known technique.

インデックス生成部１６は、ストーリー集約部１５により集約された結果を示す集約データを含んだインデックスを生成する。ここで、集約データは、特徴表現抽出部１２により抽出された記事のうち内容が関連する記事を集約可能に構成されている。なお、生成されたインデックスは、インデックスＤＢ３２に格納される。インデックスの詳細については後述する。 The index generation unit 16 generates an index including aggregated data indicating the results aggregated by the story aggregating unit 15. Here, the aggregated data is configured to be able to aggregate articles related to the contents among the articles extracted by the feature expression extraction unit 12. The generated index is stored in the index DB 32. Details of the index will be described later.

したがって、ウェブページ情報処理装置２は、上記ウェブページ取得手段により取得されたウェブページのうち内容が関連するウェブページを集約可能に構成された集約データを生成する集約データ生成手段を備えていると言える。 Therefore, the web page information processing apparatus 2 includes aggregated data generation means for generating aggregated data configured to be able to aggregate the web pages related to the contents among the web pages acquired by the web page acquisition means. I can say that.

検索サーバ２０は、インデックス取得部２１、検索結果生成部２２、及び検索結果送信部２３を備えている。 The search server 20 includes an index acquisition unit 21, a search result generation unit 22, and a search result transmission unit 23.

インデックス取得部２１は、ユーザ端末８１を用いてユーザにより指定されたキーワードに関連するインデックスをインデックスＤＢ３２から取得する。より詳細には、インデックス取得部２１は、キーワード又はキーワードの一部が含まれるインデックスと、当該インデックスに含まれる集約データと一致する集約データが含まれるインデックスとを取得する。 The index acquisition unit 21 acquires an index related to the keyword specified by the user using the user terminal 81 from the index DB 32. More specifically, the index acquisition unit 21 acquires an index including a keyword or a part of the keyword, and an index including aggregated data that matches the aggregated data included in the index.

検索結果生成部２２は、インデックス取得部２１により取得されたインデックスに基づいて、キーワードに関連する記事がユーザ端末８１により表示されるように構成される結果データを生成する。例えば、結果データには、ユーザが記事を閲覧するためのリンクを表示するときに用いられるＵＲＬの情報が含まれている。更に、検索結果生成部２２は、キーワードに関連する集約データにより集約可能な記事が集約された態様でユーザ端末８１により表示されるように構成される結果データを生成している。なお、記事が集約された態様の詳細については後述する。 Based on the index acquired by the index acquisition unit 21, the search result generation unit 22 generates result data configured to display an article related to the keyword on the user terminal 81. For example, the result data includes URL information used when the user displays a link for browsing an article. Further, the search result generation unit 22 generates result data configured to be displayed by the user terminal 81 in a manner in which articles that can be aggregated by aggregated data related to keywords are aggregated. Note that details of the manner in which articles are aggregated will be described later.

このようなことから、ウェブページ情報処理装置２は、キーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末８１により表示されるように構成され、且つ上記キーワードに関連する集約データにより集約可能なウェブページがユーザ端末８１により表示されるように構成される結果データを生成する結果データ生成手段を備えていると言える。 For this reason, the web page information processing apparatus 2 is configured to be displayed by the user terminal 81 in a manner in which web pages that can be aggregated by aggregated data related to keywords are aggregated, and are associated with the keywords. It can be said that the apparatus includes result data generating means for generating result data configured such that web pages that can be aggregated by the aggregated data are displayed by the user terminal 81.

検索結果送信部２３は、検索結果生成部２２により生成された結果データをユーザ端末８１に送信する。より詳細には、記事の検索の要求があったユーザ端末８１に検索結果生成部２２により生成された結果データを送信する。 The search result transmission unit 23 transmits the result data generated by the search result generation unit 22 to the user terminal 81. More specifically, the result data generated by the search result generation unit 22 is transmitted to the user terminal 81 that requested the article search.

故に、ウェブページ情報処理装置２は、結果データ生成手段によって生成された結果データをユーザ端末８１に送信する検索結果送信手段を備えていると言える。 Therefore, it can be said that the web page information processing apparatus 2 includes a search result transmission unit that transmits the result data generated by the result data generation unit to the user terminal 81.

データベースサーバ３０は、文例ＤＢ３１及びインデックスＤＢ３２を備えている。データベースサーバ３０は、ブログ解析サーバ１０（より厳密には、記事取得部１１及びインデックス生成部１６）と検索サーバ２０（より厳密には、インデックス取得部２１）とからの要求に応じて文例ＤＢ３１及びインデックスＤＢ３２を操作して、要求元に要求に応じたデータを送信する。 The database server 30 includes a sentence example DB 31 and an index DB 32. The database server 30 responds to requests from the blog analysis server 10 (more strictly speaking, the article acquisition unit 11 and the index generation unit 16) and the search server 20 (more strictly speaking, the index acquisition unit 21). The index DB 32 is operated to transmit data corresponding to the request to the request source.

以上のように、ウェブページ情報処理装置２は、上記ウェブページ取得手段と、上記集約データ生成手段と、上記結果データ生成手段と、上記検索結果送信手段と、を備えたことを特徴とする。 As described above, the web page information processing apparatus 2 includes the web page acquisition unit, the aggregated data generation unit, the result data generation unit, and the search result transmission unit.

この構成によれば、所望する情報が複数のウェブページに跨って示される場合であっても、ユーザが所望する情報に係るキーワードに関連する集約データにより集約可能なウェブページが集約された態様でユーザ端末により表示されるので、所望する情報が示されているウェブページを容易に特定することができるようになる。したがって、ウェブページ情報処理装置２によれば、複数のウェブページに跨って示される所望する情報をユーザが取得できないという事態を極力回避することができる。 According to this configuration, even in a case where desired information is shown across a plurality of web pages, web pages that can be aggregated by aggregated data related to keywords related to information desired by the user are aggregated. Since it is displayed by the user terminal, it becomes possible to easily specify the web page on which the desired information is shown. Therefore, according to the web page information processing apparatus 2, it is possible to avoid as much as possible the situation in which the user cannot acquire desired information shown across a plurality of web pages.

［ウェブページ情報処理システム１における主たる処理］
図２を参照して、ウェブページ情報処理システム１における主たる処理について説明する。ウェブページ情報処理システム１では、基本的に、ユーザ端末８１による後述の検索要求の検出を契機として一連の処理が開始する。 [Main processing in the web page information processing system 1]
With reference to FIG. 2, main processing in the web page information processing system 1 will be described. In the web page information processing system 1, basically, a series of processes starts when the user terminal 81 detects a search request described later.

（検索要求）
ユーザ端末８１は、キーワードに基づく検索の要求をユーザが行う操作（以下「検索要求」という）を検出する。ユーザ端末８１は、検索要求を検出すると、キーワードに関連する記事を取得するための検索要求データを検索サーバ２０に送信する（検索要求送信）。なお、検索要求データには、キーワードに関する情報が含まれている。他方、検索サーバ２０は、検索要求データを受信すると、受信した検索要求データをデータベースサーバ３０に送信する（検索要求送信）。 (Search request)
The user terminal 81 detects an operation (hereinafter referred to as “search request”) in which a user makes a search request based on a keyword. When detecting the search request, the user terminal 81 transmits search request data for acquiring an article related to the keyword to the search server 20 (search request transmission). The search request data includes information on keywords. On the other hand, when receiving the search request data, the search server 20 transmits the received search request data to the database server 30 (search request transmission).

（検索実行）
データベースサーバ３０は、検索要求データを受信すると、検索要求データに含まれるキーワードの情報（言い換えるならば、ユーザ端末８１を用いてユーザにより指定されたキーワード）に基づいて、キーワードに関連するインデックスをインデックスＤＢ３２から取得する。データベースサーバ３０は、キーワードに関連するインデックスを取得すると、インデックスの情報を含んだインデックス返信データを検索サーバ２０に送信する（結果返信）。 (Search execution)
Upon receiving the search request data, the database server 30 indexes the index related to the keyword based on the keyword information contained in the search request data (in other words, the keyword specified by the user using the user terminal 81). Obtain from DB32. When the database server 30 acquires the index related to the keyword, the database server 30 transmits index reply data including the index information to the search server 20 (result reply).

（結果データ生成）
検索サーバ２０は、インデックス返信データを受信すると、インデックス返信データに含まれるインデックスの情報に基づいて、結果データを生成する。検索サーバ２０は、結果データを生成すると、結果データをユーザ端末８１に送信する（結果返信）。 (Result data generation)
Upon receiving the index reply data, the search server 20 generates result data based on the index information included in the index reply data. When the search server 20 generates the result data, the search server 20 transmits the result data to the user terminal 81 (result reply).

（結果表示）
ユーザ端末８１は、結果データを受信すると、結果データに応じた画面を表示する。より詳細には、ユーザ端末８１により、キーワードに関連する記事が集約された態様で表示されると共に、キーワードに関連する記事が閲覧され得る態様で表示される（例えば、ユーザがキーワードに関連する記事を閲覧するためのリンクが表示される）。なお、画面の例については後述する。 (Result display)
When the user terminal 81 receives the result data, the user terminal 81 displays a screen corresponding to the result data. More specifically, the user terminal 81 displays the articles related to the keyword in an aggregated manner and displays the articles related to the keyword in a manner that can be browsed (for example, an article in which the user is related to the keyword). Link to view.) An example of the screen will be described later.

（記事要求）
ユーザ端末８１は、結果データに応じた画面に基づく記事の要求をユーザが行う操作（以下「記事要求」という）を検出する。ユーザ端末８１は、記事要求を検出すると、ユーザにより指定された記事を取得するための閲覧要求データを公開データベースサーバ９０に送信する（記事閲覧要求送信）。 (Article request)
The user terminal 81 detects an operation (hereinafter referred to as “article request”) in which a user requests an article based on a screen corresponding to the result data. When the user terminal 81 detects an article request, the user terminal 81 transmits browsing request data for acquiring an article specified by the user to the public database server 90 (article browsing request transmission).

（記事取得）
公開データベースサーバ９０は、閲覧要求データを受信すると、閲覧要求データに対応する記事をデータベース（例えば、ブログＤＢ９１）から取得する。公開データベースサーバ９０は、閲覧要求データに対応する記事を取得すると、この記事がユーザ端末８１により表示されるように構成された記事データをユーザ端末８１に送信する（結果返信）。 (Get articles)
When the public database server 90 receives the browsing request data, the public database server 90 acquires an article corresponding to the browsing request data from a database (for example, the blog DB 91). When the public database server 90 acquires an article corresponding to the browsing request data, the public database server 90 transmits article data configured to be displayed on the user terminal 81 to the user terminal 81 (result reply).

（結果表示）
ユーザ端末８１は、記事データを受信すると、記事データに応じた記事を表示する。このように、ウェブページ情報処理システム１では一連の処理が行われる。 (Result display)
When receiving the article data, the user terminal 81 displays an article corresponding to the article data. As described above, the web page information processing system 1 performs a series of processes.

［ユーザ端末８１により表示される画面の例］
図３を参照して、ユーザ端末８１により表示される画面の例について説明する。図３には、結果データに応じた画面の一例である結果表示画面４０が示されている。結果表示画面４０は、主に、テキストボックス４１、検索ボタン４２、検索モード切替えリンク４３、タイトル４４、開始リンク４５、継続リンク４６、終了リンク４７、及びページ切替えリンク４８とで構成されている。 [Example of screen displayed by user terminal 81]
An example of a screen displayed by the user terminal 81 will be described with reference to FIG. FIG. 3 shows a result display screen 40 which is an example of a screen corresponding to the result data. The result display screen 40 mainly includes a text box 41, a search button 42, a search mode switching link 43, a title 44, a start link 45, a continuation link 46, an end link 47, and a page switching link 48.

テキストボックス４１は、ユーザが所望する記事に関連したキーワードを入力するために設けられた入力部である。検索ボタン４２は、ユーザが検索を要求する操作を行うために設けられた操作部である。検索モード切替えリンク４３は、従来の検索である通常検索と一連のストーリーに関わる記事を検索するストーリー検索とを切替えるために設けられた検索モード切替え部である。タイトル４４は、キーワードに基づいて検索された記事に示される一連の話のタイトルである。開始リンク４５は、開始記事の照会部である。例えば、ユーザが後述の入力手段を用いて開始リンク４５を押すと、ユーザ端末８１には、開始記事が表示される。継続リンク４６は、継続記事の照会部である。終了リンク４７は、終了記事の照会部である。ページ切替えリンク４８は、集約された記事のうち画面に表示されていない記事を参照するために設けられたページ切替え部である。 The text box 41 is an input unit provided for inputting a keyword related to an article desired by the user. The search button 42 is an operation unit provided for the user to perform an operation for requesting a search. The search mode switching link 43 is a search mode switching unit provided for switching between a normal search that is a conventional search and a story search that searches for articles related to a series of stories. The title 44 is a title of a series of stories shown in an article searched based on a keyword. The start link 45 is a reference section for a start article. For example, when the user presses the start link 45 using an input unit described later, a start article is displayed on the user terminal 81. The continuation link 46 is an inquiry section for continuation articles. The end link 47 is an end article reference section. The page switching link 48 is a page switching unit provided to refer to articles that are not displayed on the screen among the collected articles.

ここで、ユーザが後述の入力手段を用いて、キーワードとして「初めて海外旅行」をテキストボックス４１に入力して、検索ボタン４２を押した場合にユーザ端末８１により表示されるときの画面の一例である結果表示画面４０の特質について説明する。なお、検索のモードとしてストーリー検索が指定されているが、これが指定される時機は、検索ボタン４２が押される前であってもよいし、検索ボタン４２が押された後であってもよい。 Here, an example of a screen displayed by the user terminal 81 when the user inputs “first overseas trip” as a keyword into the text box 41 using the input means described later and presses the search button 42. The characteristics of a certain result display screen 40 will be described. Although the story search is designated as the search mode, the time when this is designated may be before the search button 42 is pressed or after the search button 42 is pressed.

結果表示画面４０には、キーワードに関連する記事が集約された態様が示されている。集約された態様は、例えば、開始リンク４５、継続リンク４６、及び終了リンク４７の態様である。これらは、ユーザが認識しやすいように整列されている。更に、これらは、ユーザが認識しやすいように識別されている。 The result display screen 40 shows an aspect in which articles related to keywords are aggregated. The aggregated mode is, for example, the mode of the start link 45, the continuation link 46, and the end link 47. These are aligned so that the user can easily recognize them. Furthermore, these are identified so that the user can easily recognize them.

より詳細には、開始リンク４５、継続リンク４６、及び終了リンク４７は、海外旅行に関する一連の体験談が展開される順序に従って数字が付されて整列されている。そして、開始リンク４５には、海外旅行に関する一連の体験談が始まることを示す単語（或は、フレーズ）として、例えば「はじまり」が付されて開始記事が識別されている。他方、終了リンク４７には、海外旅行に関する一連の体験談が終わることを示す単語（或は、フレーズ）として、例えば「おしまい」が付されて終了記事が識別されている。 More specifically, the start link 45, the continuation link 46, and the end link 47 are numbered and arranged according to the order in which a series of experiences about overseas travel is developed. The start link 45 is identified by, for example, “beginning” as a word (or phrase) indicating that a series of experiences about overseas travel starts, and the start article is identified. On the other hand, for example, “end” is added to the end link 47 as a word (or phrase) indicating that a series of experiences relating to overseas travel ends, and the end article is identified.

これらを踏まえると、一連の体験談が複数の記事に跨り示される場合であったとしても、これらの記事が集約され、整列され、更には、識別された態様で表示されているので、雑多な記事が蓄積されているデータベースの中から、閲覧したい一連の体験談を探しながら読み進めるというユーザの手間を省くことができるようになる。 Based on these, even if a series of experiences is shown across multiple articles, these articles are aggregated, aligned, and displayed in an identified manner, so This makes it possible to save the user from having to read through a series of experiences that he wants to browse from a database of articles.

［特徴表現データの例］
図４及び図５を参照して、特徴表現データの例について説明する。図４及び図５には、ＩＤと属性と特徴表現データの内容（特徴表現）とが示されている。 [Example of feature expression data]
An example of feature expression data will be described with reference to FIGS. 4 and 5. 4 and 5 show the ID, the attribute, and the content of the feature expression data (feature expression).

ＩＤは、文例ＤＢ３１の中から、あるレコード（言い換えるならば、特徴表現データ）を一意に識別するための情報である。属性は、記事を構成する部分のうち特徴表現データが含まれる可能性が相対的に高い部分（例えば、タイトル、本文など）を示すものである。そして、特徴表現データには、属性が対応付けられている。一般に、属性「Ａ」が対応付けられた特徴表現データは、タイトルに含まれることが多く、「Ｂ」が対応付けられた特徴表現データは、本文に含まれることが多い。 The ID is information for uniquely identifying a certain record (in other words, feature expression data) from the sentence example DB 31. The attribute indicates a portion (for example, a title, a text, etc.) that has a relatively high possibility of including feature expression data among the portions constituting the article. And the attribute is matched with the feature expression data. In general, feature expression data associated with the attribute “A” is often included in the title, and feature expression data associated with “B” is often included in the text.

ここで、ＩＤ「１」に対応する「第＃ＮＵＭ話」、ＩＤ「４０」に対応する「中編」、及びＩＤ「１９３」に対応する「おしまい」を例に挙げて、特徴表現データについて説明する。なお、「＃ＮＵＭ」は、基本的に、任意の自然数である。よって、「第＃ＮＵＭ話」は、「第１話」、「第２話」、「第３話」などを示すことになる。 Here, with respect to the feature expression data, “Example #NUM” corresponding to ID “1”, “Part 2” corresponding to ID “40”, and “End” corresponding to ID “193” are taken as examples. explain. “#NUM” is basically an arbitrary natural number. Therefore, “No. #NUM episode” indicates “No. 1 episode”, “No. 2 episode”, “No. 3 episode”, and the like.

例えば、特徴表現が「第１話」である場合には、この特徴表現に係る特徴表現データが開始データであることを意味する。特徴表現が「中編」である場合には、この特徴表現に係る特徴表現データが継続データであることを意味する。特徴表現が「おしまい」である場合には、この特徴表現に係る特徴表現データが終了データであることを意味する。このように、特徴表現データは、複数の記事に跨って一連の内容が示されることを表すと共に、開始記事、継続記事、及び終了記事を識別することができるデータである。なお、特徴表現が「第２話」である場合には、この特徴表現に係る特徴表現データが継続データ又は終了データであることを意味するが、継続データ及び終了データの何れであるかは、日時情報、期間情報などに基づいて決定されている。 For example, when the feature expression is “first episode”, it means that the feature expression data related to this feature expression is start data. When the feature expression is “second edition”, it means that the feature expression data related to this feature expression is continuous data. When the feature expression is “end”, it means that the feature expression data related to this feature expression is end data. As described above, the feature expression data is data that represents that a series of contents are shown across a plurality of articles and can identify a start article, a continuation article, and an end article. In addition, when the feature expression is “second episode”, it means that the feature expression data related to this feature expression is continuation data or end data. It is determined based on date information, period information, and the like.

なお、特徴表現データの追加、変更、及び削除は、手動で行われてもよいし、機械学習の技術を適用して行われてもよい。機械学習には、サポートベクタマシンやナイーブベイズなどという公知の手法があり、それらを用いることで後述の重みが決定され、決定された重みに基づいて特徴表現（例えば、特徴的な単語）を獲得することができる。 The addition, change, and deletion of feature expression data may be performed manually or may be performed by applying a machine learning technique. For machine learning, there are known methods such as support vector machine and naive bayes, and the weights described later are determined by using them, and feature expressions (for example, characteristic words) are acquired based on the determined weights. can do.

即ち、ブログ解析サーバ１０は、機械学習により重みを決定する重み決定手部と、この重みに基づいて特徴表現を獲得する特徴表現獲得部と、を備えてもよい。このとき、ブログ解析サーバ１０は、所定の記事が単発の記事（以下「単発記事」という）であるか否か、言い換えるならば、所定の記事が連続の記事（以下「連続記事」という）であるか否かを所定の記事に含まれる語（例えば、名詞、動詞）の重みに基づいて判別する単発・連続記事判別部を備えることが好ましい。 That is, the blog analysis server 10 may include a weight determination unit that determines weights by machine learning, and a feature expression acquisition unit that acquires feature expressions based on the weights. At this time, the blog analysis server 10 determines whether or not the predetermined article is a single article (hereinafter referred to as “single article”), in other words, the predetermined article is a continuous article (hereinafter referred to as “continuous article”). It is preferable to provide a single / continuous article discriminating unit that discriminates whether or not there is based on the weight of words (for example, nouns, verbs) included in a predetermined article.

図６を参照して、ブログ解析サーバ１０が単発記事及び連続記事を判別する例について説明する。初めに、ブログ解析サーバ１０が以下のステップ１〜ステップ３の処理を行い、重みを決定する一例について説明する。 An example in which the blog analysis server 10 determines a single article and a continuous article will be described with reference to FIG. First, an example in which the blog analysis server 10 performs the following steps 1 to 3 and determines the weight will be described.

（ステップ１）
学習に必要な記事がブログ解析サーバ１０に与えられる。例えば、学習事例の表に示す４つの記事がブログ解析サーバ１０に与えられる。なお、図６に示す属性では、連続記事を「Ｉ」、単発記事を「Ｂ」と表記する。 (Step 1)
Articles necessary for learning are given to the blog analysis server 10. For example, four articles shown in the learning case table are given to the blog analysis server 10. In the attributes shown in FIG. 6, continuous articles are represented as “I” and single articles as “B”.

（ステップ２）
ブログ解析サーバ１０は、記事に含まれる語を取り出し、これらの語が出現する回数（いわゆる出現回数）を計数する。その結果は、例えば、出現回数の表のようになる。なお、語の横の数字は、出現回数を示す。 (Step 2)
The blog analysis server 10 takes out words included in the article and counts the number of times these words appear (so-called number of appearances). The result is, for example, a table of the number of appearances. The number next to the word indicates the number of appearances.

（ステップ３）
ブログ解析サーバ１０は、機械学習でそれぞれに重みをつけ、リスト（いわゆる重みリスト）を作成する。その結果は、例えば、重みリストの表のようになる。実施の形態では、例えば、連続記事である記事ＮＯ．１が単発記事であるとブログ解析サーバ１０により判定されたときに、「夜遊び」の重みが「０」であった場合には、重みを「１」（＝「０＋１」）とする。なお、「昨日」、「初めて」、「夜」、「バンフ」、「町」、「出かける」についても同様である。他方、単発記事である記事ＮＯ．４が連続記事であるとブログ解析サーバ１０により判定されたときに、「おはよう」の重みが「０」であった場合には、重みを「−１」（＝「０＋（−１）」）とする。なお、「朝」、「しわ」、「足」、「見せる」、「失礼」についても同様である。 (Step 3)
The blog analysis server 10 weights each by machine learning and creates a list (so-called weight list). The result is, for example, a weight list table. In the embodiment, for example, article NO. When the blog analysis server 10 determines that 1 is a single article, and the weight of “night play” is “0”, the weight is set to “1” (= “0 + 1”). The same applies to “yesterday”, “first time”, “night”, “banff”, “town”, and “going out”. On the other hand, article NO. When the blog analysis server 10 determines that 4 is a continuous article and the weight of “good morning” is “0”, the weight is “−1” (= “0 + (− 1)”). And The same applies to “morning”, “wrinkle”, “foot”, “show”, and “rude”.

即ち、ブログ解析サーバ１０は、単発記事及び連続記事の判定を誤った場合に、記事内の各語について、出現回数だけ重みを変える。そして、ブログ解析サーバ１０は、何回も（例えば、１０回、理想的には、全ての事例が正解するまで）判定を繰り返して最終的な重みを決定する。この結果、重みが大きい語が特徴表現となる。なお、重みリストには、人手で作成した特徴表現が（例えば、大きい重みをつけて）加えられてもよい。 That is, the blog analysis server 10 changes the weight by the number of appearances for each word in the article when the single article and the continuous article are erroneously determined. Then, the blog analysis server 10 determines the final weight by repeating the determination many times (for example, 10 times, ideally, until all cases are correctly answered). As a result, a word having a large weight becomes a feature expression. Note that a feature expression created manually may be added to the weight list (for example, with a large weight).

そして、ブログ解析サーバ１０は、重みが大きい語を多く含んだ記事が連続記事でると判別する。この判別については、「昨日はお土産やさんへ行きました。ここで、お土産をさらに買って荷物を送れないか相談。」という記事がブログ解析サーバ１０に与えられた場合を例に挙げて説明する。 Then, the blog analysis server 10 determines that an article including many words having a large weight is a continuous article. As an example of this determination, an example is given when the blog analysis server 10 is given an article that says, “I went to a souvenir shop yesterday. explain.

先ず、ブログ解析サーバ１０は、上記記事から語及び語の出現回数（この例では、「昨日：１、お土産：２、行く：１、買う：１、荷物：１、送る：１、相談：１」）を取り出す。続いて、ブログ解析サーバ１０は、語の出現回数と重みリストの重みとに基づいてスコアを算出する。より詳細には、ブログ解析サーバ１０は、語の出現回数とその語に対応する重みの値との乗算によりスコアを算出する。続いて、ブログ解析サーバ１０は、この記事に含まれる各語から求めたスコアの総和を算出する。この例では、総和は、２８×１＋１１×２＋７×１＝５７となる。最後に、ブログ解析サーバ１０は、この総和がある閾値以上（例えば、０以上）であるので、この記事を連続記事であると判別する。なお、ブログ解析サーバ１０は、この総和が上記閾値未満であるならば、この記事を単発記事であると判別する。 First, the blog analysis server 10 uses words and the number of occurrences of words from the above article (in this example, “Yesterday: 1, souvenir: 2, go: 1, buy: 1, baggage: 1, send: 1, consultation: 1)). Subsequently, the blog analysis server 10 calculates a score based on the number of appearances of the word and the weight of the weight list. More specifically, the blog analysis server 10 calculates a score by multiplying the number of appearances of a word by a weight value corresponding to the word. Subsequently, the blog analysis server 10 calculates the sum of the scores obtained from each word included in this article. In this example, the sum is 28 × 1 + 11 × 2 + 7 × 1 = 57. Finally, the blog analysis server 10 determines that this article is a continuous article because the sum is equal to or greater than a certain threshold (for example, 0 or more). The blog analysis server 10 determines that this article is a single article if the sum is less than the threshold.

［インデックスの例］
図７を参照して、インデックスの例について説明する。図７には、ＩＤ、区分、状態識別子、ＵＲＬ、タイトル、及び本文（抜粋）が示されている。なお、図７に示す「・・・」では、例の記載を省略している。 [Example of index]
An example of an index will be described with reference to FIG. FIG. 7 shows an ID, a category, a state identifier, a URL, a title, and a text (excerpt). In addition, description of an example is abbreviate | omitted in "..." shown in FIG.

ＩＤは、インデックスＤＢ３２の中から、あるレコード（言い換えるならば、インデックス）を一意に識別するための情報である。区分は、集約データを識別するための情報である。状態識別子は、開始記事、継続記事、及び終了記事の各々を識別するための情報である。ＵＲＬは、記事の場所を示すものである。タイトルは、記事のタイトルを示すものである。本文（抜粋）は、記事の本文の一部を示すものである。より詳細には、本文（抜粋）は、本文からタイトルと関連が高い部分を所定の文字数だけ抜粋したものである。なお、本文の抜粋に代えて、本文の全体を採用してもよい。 The ID is information for uniquely identifying a certain record (in other words, an index) from the index DB 32. The classification is information for identifying the aggregated data. The status identifier is information for identifying each of the start article, the continuation article, and the end article. The URL indicates the location of the article. The title indicates the title of the article. The text (excerpt) indicates a part of the text of the article. More specifically, the text (excerpt) is an excerpt of a predetermined number of characters from the text that is highly related to the title. Note that the entire text may be used instead of the text excerpt.

インデックスは、ＩＤ、区分、状態識別子、ＵＲＬ、タイトル、及び本文（抜粋）により構成されているがこれに限られるものではない。例えば、インデックスは、これらに加え、記事が作成された日時を含んで構成されてもよい。また、例えば、インデックスは、区分、状態識別子、及びＵＲＬにより構成されてもよい。この場合には、検索サーバ２０は、検索要求送信に代えて、公開データベースサーバ９０よりキーワードに関連する記事のＵＲＬを取得するようにする。そして、検索サーバ２０は、インデックスＤＢ３２を参照して、取得したＵＲＬに基づいてインデックスを検索する構成とする。 The index is composed of an ID, a category, a status identifier, a URL, a title, and a text (extract), but is not limited thereto. For example, the index may be configured to include the date and time when the article was created in addition to these. Further, for example, the index may be configured by a section, a state identifier, and a URL. In this case, the search server 20 acquires the URL of the article related to the keyword from the public database server 90 instead of transmitting the search request. The search server 20 refers to the index DB 32 and searches for the index based on the acquired URL.

［ブログ解析サーバ１０の動作］
図８を参照して、ブログ解析サーバ１０で実行される主たる処理について説明する。 [Operation of Blog Analysis Server 10]
With reference to FIG. 8, the main processing executed in the blog analysis server 10 will be described.

初めに、ブログ解析サーバ１０は、ブログＤＢ９１を参照し、記事を取得する（Ｓ１）。このとき、ブログ解析サーバ１０は、一のブログＤＢ９１から記事を取得し、Ｓ２〜Ｓ９の処理が行われた後に他のブログＤＢ９１から記事を取得する構成が好適である。即ち、ブログＤＢ９１毎にＳ１〜Ｓ９の処理が行われる構成としてもよい。 First, the blog analysis server 10 refers to the blog DB 91 and acquires an article (S1). At this time, it is preferable that the blog analysis server 10 obtains an article from one blog DB 91 and obtains an article from another blog DB 91 after the processing of S2 to S9 is performed. That is, it is good also as a structure by which the process of S1-S9 is performed for every blog DB91.

続いて、ブログ解析サーバ１０は、文例ＤＢ３１を参照し、特徴表現データを取得する（Ｓ２）。なお、Ｓ１の処理が行われてからＳ２の処理が行われる構成であるが、これに限られるものではない。Ｓ２の処理が行われてからＳ１の処理が行われる構成としてもよいし、Ｓ１の処理及びＳ２の処理が並列で行われる構成としてもよい。 Subsequently, the blog analysis server 10 refers to the sentence example DB 31 and acquires feature expression data (S2). In addition, although it is the structure by which the process of S2 is performed after the process of S1 is performed, it is not restricted to this. A configuration in which the processing in S1 is performed after the processing in S2 may be performed, or the processing in S1 and the processing in S2 may be performed in parallel.

続いて、ブログ解析サーバ１０は、記事のタイトルに特徴表現が含まれる記事を抽出する（Ｓ３）。より詳細には、ブログ解析サーバ１０は、記事のタイトルに属性が「Ａ」である特徴表現データが含まれる記事を抽出する。 Subsequently, the blog analysis server 10 extracts an article whose feature expression is included in the title of the article (S3). More specifically, the blog analysis server 10 extracts an article in which feature expression data having an attribute “A” is included in the title of the article.

続いて、ブログ解析サーバ１０は、記事の本文に特徴表現が含まれる記事を抽出する（Ｓ４）。より詳細には、ブログ解析サーバ１０は、記事の本文に属性が「Ｂ」である特徴表現データが含まれる記事を抽出する。このとき、Ｓ３で抽出した記事を除いた記事の中から記事を抽出することが好適である。なお、ブログ解析サーバ１０は、記事の本文に属性が「Ｂ」である特徴表現データが含まれる記事を抽出する構成を採用しているが、これに限られるものではない。例えば、ブログ解析サーバ１０は、記事の本文に属性が「Ａ」又は「Ｂ」である特徴表現データが含まれる記事を抽出する構成としてもよい。 Subsequently, the blog analysis server 10 extracts articles whose feature expressions are included in the body of the article (S4). More specifically, the blog analysis server 10 extracts an article in which feature expression data having an attribute “B” is included in the body of the article. At this time, it is preferable to extract articles from the articles excluding the articles extracted in S3. Note that the blog analysis server 10 employs a configuration in which an article including feature expression data having an attribute “B” is included in the body of the article, but is not limited thereto. For example, the blog analysis server 10 may be configured to extract an article in which feature expression data having an attribute “A” or “B” is included in the body of the article.

続いて、ブログ解析サーバ１０は、抽出した記事から開始記事及び終了記事を判別する（Ｓ５）。より詳細には、ブログ解析サーバ１０は、抽出した記事に開始データが含まれるか否かを判別する。このとき、ブログ解析サーバ１０は、開始データが含まれると判別した場合に状態識別子が「Ｓ」を示す判別データを生成する。更に、ブログ解析サーバ１０は、抽出した記事に終了データが含まれるか否かを判別する。このとき、ブログ解析サーバ１０は、終了データが含まれると判別した場合に状態識別子が「Ｅ」を示す判別データを生成する。 Subsequently, the blog analysis server 10 determines a start article and an end article from the extracted articles (S5). More specifically, the blog analysis server 10 determines whether start data is included in the extracted article. At this time, when the blog analysis server 10 determines that the start data is included, the blog analysis server 10 generates determination data in which the state identifier indicates “S”. Furthermore, the blog analysis server 10 determines whether or not end data is included in the extracted article. At this time, when the blog analysis server 10 determines that the end data is included, the blog analysis server 10 generates determination data in which the state identifier indicates “E”.

続いて、ブログ解析サーバ１０は、抽出した記事から継続記事を判別する（Ｓ６）。より詳細には、ブログ解析サーバ１０は、抽出した記事に継続データが含まれるか否かを判別する。このとき、ブログ解析サーバ１０は、継続データが含まれると判別した場合に状態識別子が「Ｃ」を示す判別データを生成する。 Subsequently, the blog analysis server 10 determines a continuation article from the extracted articles (S6). More specifically, the blog analysis server 10 determines whether continuation data is included in the extracted article. At this time, when the blog analysis server 10 determines that the continuation data is included, the blog analysis server 10 generates determination data in which the state identifier indicates “C”.

続いて、ブログ解析サーバ１０は、内容が関連する記事を集約する（Ｓ７）。より詳細には、ブログ解析サーバ１０は、日時情報及び期間情報に基づいて一の記事と他の記事とが関連するか否かを判別する。このとき、ブログ解析サーバ１０は、関連があると判別した場合に一の記事と他の記事とを集約可能に構成された集約データを生成する。生成された集約データには、区分の情報、ＵＲＬの情報、及び判別データ等が含まれている。 Subsequently, the blog analysis server 10 collects articles related to the content (S7). More specifically, the blog analysis server 10 determines whether one article and another article are related based on date information and period information. At this time, if the blog analysis server 10 determines that there is a relationship, the blog analysis server 10 generates aggregated data configured to aggregate one article and another article. The generated aggregated data includes classification information, URL information, discrimination data, and the like.

他方、ブログ解析サーバ１０は、集約した記事の中から、日時情報及び期間情報に基づいて継続記事及び終了記事を判別する。例えば、ブログ解析サーバ１０は、一の記事に日時情報「１２月１０日出発」及び期間情報「３泊４日の旅行に行きます」が含まれている場合に、他の記事に日時情報「１２月１２日の朝」が含まれているならば、他の記事が継続記事であると判別し、他の記事に期間情報の最終日を示す日時情報「１２月１３日の朝」が含まれているならば、基本的に、他の記事が終了記事であると判別する。ただし、集約された記事の中に他の記事に示される日時情報よりも後の日時情報（例えば、「１２月１８日の朝」）が含まれている類似度が相対的に高い記事があるならば、他の記事に期間情報の最終日（或は最終時）を示す日時情報が含まれていても、当該記事は、継続記事であると判別されることがある。 On the other hand, the blog analysis server 10 determines a continuation article and an end article from the aggregated articles based on date information and period information. For example, when the blog analysis server 10 includes date and time information “Departure on December 10” and period information “I will go on a trip for 3 days and 4 nights” in one article, the date and time information “ If “December 12 morning” is included, it is determined that the other article is a continuation article, and the date and time information “December 13 morning” indicating the final date of the period information is included in the other article. If so, it is basically determined that the other article is the end article. However, there is an article with relatively high similarity in which the date and time information (for example, “December 18 morning”) later than the date and time information indicated in other articles is included in the aggregated articles. Then, even if the date and time information indicating the last date (or the last time) of the period information is included in another article, the article may be determined to be a continuing article.

このようなことを踏まえると、ウェブページ情報処理装置２は、前記ウェブページ取得手段により取得されたウェブページに含まれる日時を示す日時情報及び期間を示す期間情報に基づいて、一連の話が継続することを示すウェブページ及び一連の話が終了することを示すウェブページを判別するウェブページ判別手段を備えていると言える。 Based on this, the web page information processing apparatus 2 continues a series of stories based on the date information indicating the date and time included in the web page acquired by the web page acquisition means and the period information indicating the period. It can be said that there is provided a web page discrimination means for discriminating a web page indicating that a web page is indicating to be performed and a web page indicating that a series of stories are to be ended.

ここで、図７を参照して上述の内容を説明する。ブログ解析サーバ１０は、日時情報及び期間情報に基づいて継続記事を判別するときに、例えば、ＩＤ「２」に対応する継続記事、ＩＤ「３」に対応する継続記事、ＩＤ「４」に対応する継続記事に従って話が進展すると判別した場合には、これらに対応する状態識別子に「１」、「２」、「３」という識別子を付す。したがって、状態識別子は、開始記事、継続記事、及び終了記事の各々を識別すると共に、一連の話が進展する順序を識別するための情報である。即ち、ブログ解析サーバ１０は、集約された記事が一連の話となるようにこれらの記事を結合するストーリー結合手段を備えていると言える。 Here, the above-mentioned content is demonstrated with reference to FIG. When the blog analysis server 10 determines the continuation article based on the date information and the period information, for example, the continuation article corresponding to the ID “2”, the continuation article corresponding to the ID “3”, and the ID “4” are supported. When it is determined that the story progresses according to the continuing article, the identifiers “1”, “2”, and “3” are attached to the state identifiers corresponding thereto. Therefore, the status identifier is information for identifying each of the start article, the continuation article, and the end article, and for identifying the order in which a series of stories progresses. That is, it can be said that the blog analysis server 10 includes story combination means for combining these articles so that the aggregated articles become a series of stories.

続いて、ブログ解析サーバ１０は、判別した結果及び集約した結果に基づいてインデックスを生成する（Ｓ８）。より詳細には、ブログ解析サーバ１０は、集約データを含んだインデックスを生成する。 Subsequently, the blog analysis server 10 generates an index based on the determined result and the aggregated result (S8). More specifically, the blog analysis server 10 generates an index including aggregated data.

続いて、ブログ解析サーバ１０は、生成したインデックスを登録するための登録要求データをデータベースサーバ３０に送信する（Ｓ９）。そして、データベースサーバ３０は、登録要求データを受信すると、登録要求データに対応するインデックスをインデックスＤＢ３２に格納する。 Subsequently, the blog analysis server 10 transmits registration request data for registering the generated index to the database server 30 (S9). When the database server 30 receives the registration request data, the database server 30 stores an index corresponding to the registration request data in the index DB 32.

なお、図示は省略しているが、これらの処理は、基本的に、所定の周期（好ましくは１日１回の周期）で、自動的に開始する（いわゆるバッチ処理）。ただし、検索ボタン４２が押されたことを契機に開始するように構成されてもよい（いわゆるリアルタイム処理）。リアルタイム処理を採用したならば、Ｓ１では、キーワードに関連する記事を取得することが好適である。そして、Ｓ９の後に、又はこれに代えて、ブログ解析サーバ１０は、生成したインデックス（即ち、インデックスの情報が含まれるインデックスデータ）を検索サーバ２０に送信する。そして、検索サーバ２０は、インデックスデータを受信すると、このインデックスデータに基づいて結果データを生成する。 Although illustration is omitted, these processes basically start automatically at a predetermined cycle (preferably once a day) (so-called batch processing). However, it may be configured to start when the search button 42 is pressed (so-called real-time processing). If real-time processing is adopted, it is preferable to acquire an article related to the keyword in S1. Then, after or instead of S9, the blog analysis server 10 transmits the generated index (that is, index data including index information) to the search server 20. And the search server 20 will generate result data based on this index data, if index data is received.

［検索サーバ２０の動作］
図９を参照して、検索サーバ２０で実行される主たる処理について説明する。 [Operation of Search Server 20]
With reference to FIG. 9, main processing executed by the search server 20 will be described.

初めに、検索サーバ２０は、インデックスＤＢ３２を参照して、結果データを生成する（Ｓ１１）。例えば、キーワードが「初めて海外旅行」である場合には、検索サーバ２０は、タイトル及び本文（抜粋）にキーワード又はキーワードの一部（なお、キーワードに関連する文字であってもよい）が含まれるインデックスを検索する。このとき、ＩＤが「１」であるインデックスが検索された場合には、区分が「１」である全てのインデックスを取得する。具体的には、検索サーバ２０は、取得したインデックスに含まれるＵＲＬに基づいて集約された各記事がユーザ端末８１により表示されるように構成され、取得したインデックスに含まれる区分「１」に基づいて集約可能なＩＤ「１」〜「５」に対応する記事が集約された態様でユーザ端末８１により表示されるように構成され、且つ取得したインデックスに含まれる状態識別子「Ｓ」、「Ｃ」、及び「Ｅ」に基づいて開始記事、継続記事、及び終了記事が識別された態様でユーザ端末８１により表示されるように構成される結果データを生成する。なお、複数の結果データが生成され得るならば、開始記事が作成された日時の降順に上記内容がユーザ端末８１により表示されるように各結果データが生成されてもよいし、キーワードに関連する度合いが高い順に上記内容がユーザ端末８１により表示されるように各結果データが生成されてもよい。更には、これらをユーザが指定できる構成が好適である。 First, the search server 20 generates result data with reference to the index DB 32 (S11). For example, when the keyword is “first overseas trip”, the search server 20 includes the keyword or a part of the keyword (which may be a character related to the keyword) in the title and the text (excerpt). Search the index. At this time, if an index having an ID of “1” is searched, all indexes having a classification of “1” are acquired. Specifically, the search server 20 is configured so that each article aggregated based on the URL included in the acquired index is displayed by the user terminal 81, and based on the category “1” included in the acquired index. State identifiers “S” and “C” that are configured to be displayed by the user terminal 81 in a manner in which articles corresponding to IDs “1” to “5” that can be aggregated are aggregated. And the result data configured to be displayed by the user terminal 81 in a manner in which the start article, the continuation article, and the end article are identified based on “E”. If a plurality of result data can be generated, each result data may be generated so that the above-mentioned contents are displayed by the user terminal 81 in descending order of the date and time when the start article was created. Each result data may be generated so that the content is displayed by the user terminal 81 in descending order. Furthermore, a configuration in which the user can specify these is preferable.

続いて、検索サーバ２０は、結果データをユーザ端末８１に送信する（Ｓ１２）。 Subsequently, the search server 20 transmits the result data to the user terminal 81 (S12).

［サーバのハードウェアの構成］
図１０は、ブログ解析サーバ１０、検索サーバ２０、データベースサーバ３０、及び公開データベースサーバ９０（以下「サーバ」という）のハードウェアの構成を示す図である。制御部を構成するＣＰＵ１１０（マルチプロセッサ構成ではＣＰＵ１２０等複数のＣＰＵが追加されてもよい）、バスライン１０５、通信Ｉ／Ｆ１４０、メインメモリ１５０、ＢＩＯＳ１６０、ＵＳＢポート１９０、Ｉ／Ｏコントローラ１７０、並びにキーボード及びマウス１８０等の入力手段や表示装置１２２を備える。 [Server hardware configuration]
FIG. 10 is a diagram illustrating a hardware configuration of the blog analysis server 10, the search server 20, the database server 30, and the public database server 90 (hereinafter referred to as “server”). CPU 110 constituting a control unit (a plurality of CPUs such as CPU 120 may be added in a multiprocessor configuration), bus line 105, communication I / F 140, main memory 150, BIOS 160, USB port 190, I / O controller 170, and Input means such as a keyboard and mouse 180 and a display device 122 are provided.

Ｉ／Ｏコントローラ１７０には、テープドライブ１７２、ハードディスク１７４、光ディスクドライブ１７６、半導体メモリ１７８等の記憶手段を接続することができる。 Storage means such as a tape drive 172, a hard disk 174, an optical disk drive 176, and a semiconductor memory 178 can be connected to the I / O controller 170.

ＢＩＯＳ１６０は、サーバの起動時にＣＰＵ１１０が実行するブートプログラムや、サーバのハードウェアに依存するプログラム等を格納する。 The BIOS 160 stores a boot program executed by the CPU 110 when the server is started, a program depending on the server hardware, and the like.

ハードディスク１７４は、サーバとして機能するための各種プログラム及び本実施形態の機能を実行するプログラムを記憶しており、更に必要に応じて各種データベースを構成可能である。 The hard disk 174 stores various programs for functioning as a server and programs for executing the functions of the present embodiment, and various databases can be configured as necessary.

光ディスクドライブ１７６としては、例えば、ＤＶＤ−ＲＯＭドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＡＭドライブ、ＣＤ−ＲＡＭドライブを使用することができる。この場合は各ドライブに対応した光ディスク１７７を使用する。光ディスク１７７から光ディスクドライブ１７６によりプログラム又はデータを読み取り、Ｉ／Ｏコントローラ１７０を介してメインメモリ１５０又はハードディスク１７４に提供することもできる。また、同様にテープドライブ１７２に対応したテープメディア１７１を主としてバックアップのために使用することもできる。 As the optical disk drive 176, for example, a DVD-ROM drive, a CD-ROM drive, a DVD-RAM drive, or a CD-RAM drive can be used. In this case, the optical disk 177 corresponding to each drive is used. A program or data may be read from the optical disk 177 by the optical disk drive 176 and provided to the main memory 150 or the hard disk 174 via the I / O controller 170. Similarly, the tape medium 171 corresponding to the tape drive 172 can be used mainly for backup.

サーバに提供されるプログラムは、ハードディスク１７４、光ディスク１７７、又はメモリーカード等の記録媒体に格納されて提供される。このプログラムは、Ｉ／Ｏコントローラ１７０を介して、記録媒体から読み出され、又は通信Ｉ／Ｆ１４０を介してダウンロードされることによって、サーバにインストールされ実行されてもよい。 The program provided to the server is provided by being stored in a recording medium such as the hard disk 174, the optical disk 177, or a memory card. This program may be installed in the server and executed by being read from the recording medium via the I / O controller 170 or downloaded via the communication I / F 140.

上述のプログラムは、内部又は外部の記憶媒体に格納されてもよい。ここで、記憶媒体としては、ハードディスク１７４、光ディスク１７７、又はメモリーカードの他に、ＭＤ等の光磁気記録媒体、テープメディア１７１を用いることができる。また、専用通信回線やインターネット等の通信回線に接続されたサーバシステムに設けたハードディスク１７４又は光ディスクライブラリ等の記憶装置を記録媒体として使用し、インターネットを介してプログラムをサーバに提供してもよい。 The above program may be stored in an internal or external storage medium. Here, in addition to the hard disk 174, the optical disk 177, or the memory card, a magneto-optical recording medium such as an MD, or a tape medium 171 can be used as the storage medium. Further, a storage device such as a hard disk 174 or an optical disk library provided in a server system connected to a dedicated communication line or a communication line such as the Internet may be used as a recording medium, and the program may be provided to the server via the Internet.

ここで、表示装置１２２は、各種画面を表示したり、サーバによる演算処理結果の画面を表示したりするものであり、ブラウン管表示装置（ＣＲＴ）、液晶表示装置（ＬＣＤ）等のディスプレイ装置を含む。 Here, the display device 122 displays various screens and displays a calculation result screen by the server, and includes a display device such as a cathode ray tube display device (CRT) or a liquid crystal display device (LCD). .

また、通信Ｉ／Ｆ１４０は、サーバをネットワーク８２（例えば、専用ネットワーク、公共ネットワークなど）を介してユーザ端末８１と接続できるようにするためのネットワーク・アダプタである。通信Ｉ／Ｆ１４０は、モデム、ケーブル・モデム及びイーサネット（登録商標）・アダプタを含んでよい。 The communication I / F 140 is a network adapter that enables the server to be connected to the user terminal 81 via a network 82 (for example, a dedicated network, a public network, etc.). The communication I / F 140 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

以上の例は、サーバについて主に説明したが、コンピュータに、プログラムをインストールして、そのコンピュータをサーバとして動作させることにより上記で説明した機能を実現することもできる。したがって、本実施形態として説明したサーバにより実現される機能は、上述の方法を当該コンピュータにより実行することにより、あるいは、上述のプログラムを当該コンピュータに導入して実行することによっても実現可能である。 In the above example, the server has been mainly described. However, the functions described above can be realized by installing a program in a computer and operating the computer as a server. Therefore, the functions realized by the server described as the present embodiment can be realized by executing the above-described method by the computer, or by introducing the above-described program into the computer and executing it.

以上、実施形態について説明したが、本発明は上述した実施形態に限られるものではない。また、本発明による効果は、実施形態に記載されたものに限定されるものではない。 Although the embodiment has been described above, the present invention is not limited to the above-described embodiment. Moreover, the effect by this invention is not limited to what was described in embodiment.

ウェブページ情報処理システムの構成を示す図。The figure which shows the structure of a web page information processing system. ウェブページ情報処理システムにおける主たる処理を示す図。The figure which shows the main processes in a web page information processing system. ユーザ端末により表示される画面の例を示す図。The figure which shows the example of the screen displayed by a user terminal. 特徴表現データの例を示す図。The figure which shows the example of feature expression data. 特徴表現データの例を示す図。The figure which shows the example of feature expression data. ブログ解析サーバが単発記事及び連続記事を判別する例を示す図。The figure which shows the example in which a blog analysis server discriminate | determines a single article and a continuous article. インデックスの例を示す図。The figure which shows the example of an index. ブログ解析サーバで実行される主たる処理を示すフローチャート。The flowchart which shows the main processes performed with a blog analysis server. 検索サーバで実行される主たる処理を示すフローチャート。The flowchart which shows the main processes performed with a search server. サーバのハードウェアの構成を示す図。The figure which shows the structure of the hardware of a server.

Explanation of symbols

１ウェブページ情報処理システム
２ウェブページ情報処理装置
１０ブログ解析サーバ
１１記事取得部
１２特徴表現抽出部
１３開始・終了記事判別部
１４継続記事判別部
１５ストーリー集約部
１６インデックス生成部
２０検索サーバ
２１インデックス取得部
２２検索結果生成部
２３検索結果送信部
３０データベースサーバ
３１文例ＤＢ
３２インデックスＤＢ DESCRIPTION OF SYMBOLS 1 Web page information processing system 2 Web page information processing apparatus 10 Blog analysis server 11 Article acquisition part 12 Feature expression extraction part 13 Start / end article discrimination | determination part 14 Continuation article discrimination | determination part 15 Story aggregation part 16 Index generation part 20 Search server 21 Index Acquisition unit 22 Search result generation unit 23 Search result transmission unit 30 Database server 31 Example DB
32 Index DB

Claims

Result data configured to be communicable with a user terminal via a network and configured to display a web page searched based on a keyword specified by a user using the user terminal is displayed by the user terminal. A web page information processing apparatus for transmitting to a user terminal,
Web page acquisition means for acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a number of web pages managed by the user using the user terminal;
Aggregated data generating means for generating aggregated data configured to be capable of aggregating web pages related to the content among the web pages acquired by the web page acquiring means;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. Result data generating means for generating result data configured to be displayed;
Search result transmitting means for transmitting result data generated by the result data generating means to the user terminal;
A web page information processing apparatus comprising:

The web page information processing apparatus according to claim 1,
The web page acquisition means includes the feature expression data, start data indicating that a series of stories starts, continuation data indicating that a sequence of stories continues, and end data indicating that a series of stories is over A web page information processing apparatus that acquires a web page including at least one.

The web page information processing apparatus according to claim 2,
Based on the date and time information indicating the date and time included in the web page acquired by the web page acquisition means and the period information indicating the period, the web page indicating that the series of stories continues and the series of stories are ended. Web page discrimination means for discriminating web pages is provided,
The aggregate data generation means generates aggregate data including the discrimination information determined by the web page determination means,
The result data generation unit generates result data configured to be displayed by the user terminal in a manner in which a web page determined based on determination information included in aggregated data related to the keyword is identified. A web page information processing apparatus.

The web page information processing apparatus according to any one of claims 1 to 3,
The aggregated data generation unit is a web page information processing apparatus that generates aggregated data including information on the URL of the web page acquired by the web page acquisition unit.

The web page information processing apparatus according to claim 1,
The aggregated data generation unit generates aggregated data configured to be capable of aggregating web pages having relatively high web page similarity among the web pages acquired by the web page acquiring unit. Page information processing device.

A web page information processing apparatus configured to be communicable with a user terminal via a network so that a web page searched based on a keyword specified by a user using the user terminal is displayed by the user terminal. A web page information processing method for transmitting result data configured to the user terminal,
A web page acquisition step of acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a large number of web pages managed by the user using the user terminal;
An aggregated data generating step for generating aggregated data configured to be able to aggregate the webpages whose contents are related among the webpages acquired by the webpage acquiring step;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. A result data generation step for generating result data configured to be displayed;
A search result transmission step of transmitting the result data generated by the result data generation step to the user terminal;
A web page information processing method characterized by comprising:

A web page information processing apparatus configured to be communicable with a user terminal via a network so that a web page searched based on a keyword specified by a user using the user terminal is displayed by the user terminal. A web page information processing program for transmitting the configured result data to the user terminal,
A web page acquisition step of acquiring a web page including feature expression data representing content being shown across a plurality of web pages from a large number of web pages managed by the user using the user terminal;
An aggregated data generating step for generating aggregated data configured to be able to aggregate the webpages whose contents are related among the webpages acquired by the webpage acquiring step;
A web page that can be aggregated by aggregated data related to the keyword is configured to be displayed by the user terminal in an aggregated manner, and a web page that can be aggregated by the aggregated data related to the keyword is displayed by the user terminal. A result data generation step for generating result data configured to be displayed;
A search result transmission step of transmitting the result data generated by the result data generation step to the user terminal;
A web page information processing program comprising: