JP2014085862A

JP2014085862A - Prediction server, program, and method for predicting number of future comments on prediction target content

Info

Publication number: JP2014085862A
Application number: JP2012234600A
Authority: JP
Inventors: Kazufumi Ikeda; 和史池田; Hajime Hattori; 元服部; Toshihiro Ono; 智弘小野
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2012-10-24
Filing date: 2012-10-24
Publication date: 2014-05-12
Anticipated expiration: 2032-10-24
Also published as: JP5952711B2

Abstract

PROBLEM TO BE SOLVED: To provide a prediction server which can communicate with a blog site server and predicts the number of future comments on a prediction target content.SOLUTION: A prediction server includes: learning information storage means of previously storing, as learning information, a transition state of the number of comments in respective unit times corresponding to a lapse of time for each content; prediction target comment retrieval means of counting comments in respective unit times corresponding to the lapse of time with respect to comments acquired from a server and corresponding to the prediction target comment; determination time retrieval means of retrieving a content having a transition state of the number of comments similar to a transition state of the counted number of comments in the respective unit times from the learning information storage means; and comment number prediction means of deriving a transition state of the number of comments after determination time corresponding to the retrieved content as a transition state of the number of future comments of the prediction target content.

Description

本発明は、予測対象となるコンテンツに対する、不特定多数のユーザの興味の傾向を分析する技術に関する。 The present invention relates to a technique for analyzing a tendency of interest of an unspecified number of users for content to be predicted.

近年、インターネットを介して、不特定多数の第三者に対して、様々なコンテンツが公開されている。公開コンテンツは、例えば、ニュース記事や、Ｗｅｂページ、音楽コンテンツ、電子書籍、テレビ放送コンテンツのような各種のメディアコンテンツであって、不特定多数の第三者からアクセス可能なコンテンツをいう。 In recent years, various contents have been released to an unspecified number of third parties via the Internet. The public content is, for example, various media contents such as news articles, Web pages, music contents, electronic books, and TV broadcast contents, and is accessible by an unspecified number of third parties.

一方で、インターネットを介して、ブログ(Web log)やミニブログ(mini Web log)（例えばtwitter（登録商標））のようなサイトに対して、不特定多数の第三者からのコメント文章が、活発に発信されている。このようなコメント文章は、共通の話題に対して議論されている場合も多い。このような共通の話題としては、前述したような不特定多数の第三者に対して配信される公開コンテンツがある。 On the other hand, comment texts from an unspecified number of third parties are sent to sites such as blogs (Web log) and mini blogs (for example, twitter (registered trademark)) via the Internet. It is actively transmitted. Such comment sentences are often discussed on a common topic. As such a common topic, there is public content distributed to an unspecified number of third parties as described above.

従来、このような公開コンテンツに関連するコメント文章を検索し、そのコメント数に応じて、コンテンツのランキング形式を公開する技術がある（例えば非特許文献１参照）。この技術によれば、多数投稿されているコメントの中からキーワードを抽出し、twitterで盛り上がっている話題やトレンドをリアルタイムに分析し、そのランキングををユーザに明示する。 Conventionally, there is a technique of searching for comment text related to such public content and publishing the content ranking format according to the number of comments (see, for example, Non-Patent Document 1). According to this technology, keywords are extracted from a large number of posted comments, and the topics and trends that are excited by twitter are analyzed in real time, and the ranking is clearly indicated to the user.

また、具体的なサービス技術として、例えば映画の関するtwitter上のコメント数や内容に基づいて、当該映画の興行収入を予測する技術もある（例えば非特許文献２参照）。具体的に予測に用いる情報として、映画の公開前のツイート数及びポジネガ比率（ツイートの内容の肯定的内容／否定的内容）と、公開後のツイート数及びポジネガ比率とを用いる。ツイート数が多く且つ肯定的（ポジティブ）なコメントが多い映画ほど、興行収入が多くなると予測される。一方で、ツイート数が少なく且つ否定的（ネガティブ）なコメントが多い映画ほど、興行収入が少なくなると予測される。 In addition, as a specific service technique, there is a technique for predicting the box office income of a movie based on, for example, the number of comments and contents on the twitter related to the movie (see, for example, Non-Patent Document 2). Specifically, information used for prediction includes the number of tweets and the positive negative ratio (positive contents / negative contents of the contents of tweets) and the number of tweets and positive negative ratios after the release of the movie. A movie with a large number of tweets and a large number of positive comments is expected to have a higher box office income. On the other hand, it is predicted that the performance income will be lower for movies with fewer tweets and more negative comments.

ＮＥＣビッグローブ、「ついっぷるトレンド」、[online]、［平成２４年８月２７日検索］、インターネット＜URL:http://tr.twipple.jp/＞NEC Big Robe, “Tappur Trend”, [online], [Search August 27, 2012], Internet <URL: http://tr.twipple.jp/> Sitaram Asur and Bernardo A.Huberman, HP Labs, “Predicting the Future With SocialMedia”, Proc. Of the 2010 IEEE/WIC/ACM InternationalConference on Web Intelligence and Intelligent Agent Technology (WI-IAT '10),vol. 1, pp. 492-499Sitaram Asur and Bernardo A. Huberman, HP Labs, “Predicting the Future With SocialMedia”, Proc. Of the 2010 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT '10), vol. 1, pp .492-499

非特許文献１に記載された技術によれば、現在時刻で、不特定多数のユーザが興味を持っているコンテンツを特定することができる。しかしながら、将来的に、不特定多数のユーザが興味を持つであろうコンテンツまで特定することはできない。 According to the technique described in Non-Patent Document 1, it is possible to specify content that an unspecified number of users are interested in at the current time. However, it will not be possible to identify content that will be of interest to an unspecified number of users in the future.

非特許文献２に記載された技術によれば、映画のように比較的限定された予測対象コンテンツを対象としているために、将来的に、不特定多数のユーザがどの程度興味を持つかは比較的高精度に予測することができる。しかしながら、この技術を用いても、例えばニュース記事のように、発信される数が多く且つその内容も多様となるコンテンツを予測対象とした場合、予測精度が十分に得られにくい。その理由としては、コンテンツの多様な内容に対する、コメントの内容の傾向に相違にあることに基づく。また、特に時事的なニュース記事に対するコメントは量的に多くなり、特定の予測対象コンテンツに対する将来的なコメント数の傾向まで検出することは難しい。 According to the technique described in Non-Patent Document 2, since the target content is relatively limited like a movie, it is compared with how much an unspecified number of users will be interested in the future. Can be predicted with high accuracy. However, even when this technology is used, when the content to be transmitted and the content of which is diverse, such as a news article, is targeted for prediction, it is difficult to obtain sufficient prediction accuracy. The reason is based on the fact that there is a difference in the tendency of the content of the comment with respect to various contents of the content. In particular, comments on current news articles increase in quantity, and it is difficult to detect the tendency of the future number of comments on specific prediction target content.

そこで、本発明は、一般的なニュース記事のような予測対象コンテンツであっても、将来的なコメント数を予測することによって、不特定多数のユーザにおける将来的な興味の傾向を分析することができる予測サーバ、プログラム及び方法を提供することを目的とする。 Therefore, the present invention can analyze the tendency of future interest among a large number of unspecified users by predicting the number of future comments even for content to be predicted such as general news articles. An object of the present invention is to provide a prediction server, a program, and a method.

本発明によれば、複数の投稿者間でテキストのコメントを発信し合うサイトサーバと通信可能であり、予測対象コンテンツにおける将来的なコメント数を予測する予測サーバであって、
コンテンツ毎に、時間経過に応じた各単位時間のコメント数の推移状態を、学習情報として予め記憶した学習情報記憶手段と、
サーバから取得された、予測対象コンテンツに対応するコメントについて、時間経過に応じた各単位時間のコメント数を計数する予測対象コメント検索手段と、
計数された各単位時間のコメント数の推移状態に類似する、コメント数の推移状態のコンテンツを、学習情報記憶手段から検索する判定時間検索手段と、
検索されたコンテンツに対応する判定時間後のコメント数の推移状態を、当該予測対象コンテンツにおける将来的なコメント数の推移状態として導出するコメント数予測手段と
を有することを特徴とする。 According to the present invention, it is possible to communicate with a site server that sends text comments among a plurality of contributors, and a prediction server that predicts the future number of comments in the prediction target content,
For each content, learning information storage means for storing the transition state of the number of comments in each unit time according to the passage of time as learning information in advance,
About the comment corresponding to the prediction target content acquired from the server, the prediction target comment search means for counting the number of comments in each unit time according to the passage of time,
A determination time search unit that searches the learning information storage unit for content in the transition state of the number of comments, which is similar to the transition state of the number of comments in each unit time counted,
Comment number prediction means for deriving the transition state of the number of comments after the determination time corresponding to the searched content as a transition state of the future number of comments in the prediction target content.

本発明の予測サーバにおける他の実施形態によれば、
サイトサーバは、ブログサイトサーバであって、
予測サーバは、
予測対象コンテンツに含まれるキーワード群を抽出する予測対象キーワード抽出手段を更に有し、
予測対象コメント検索手段は、予測対象キーワード抽出手段によって抽出されたキーワード群をキーとして、ブログサイトサーバから複数のコメントを検索し、時間経過に応じた各単位時間のコメント数を計数する
ことも好ましい。 According to another embodiment of the prediction server of the present invention,
The site server is a blog site server,
The prediction server
A prediction target keyword extracting unit that extracts a keyword group included in the prediction target content;
It is also preferable that the prediction target comment search unit searches a plurality of comments from the blog site server using the keyword group extracted by the prediction target keyword extraction unit as a key, and counts the number of comments in each unit time according to the passage of time. .

本発明の予測サーバにおける他の実施形態によれば、
当該予測サーバが、不特定多数の第三者に対してコンテンツを公開するコンテンツ公開サーバと更に通信可能であるか、又は、サイトサーバ自体がコンテンツ公開機能も備えており、
当該予測サーバは、
多数のコンテンツを、各コンテンツに含まれるキーワード群の類似度に基づいてクラスタリングするコンテンツクラスタリング手段と、
各クラスタに含まれるコンテンツ毎に、時間経過に応じた各単位時間のコメント数を、サイトサーバを用いて検索する学習対象コメント検索手段と、
予測対象コンテンツから抽出されたキーワード群に類似するキーワード群を含むクラスタを検索するクラスタ検索手段と
を有し、
判定時間検索手段は、クラスタ検索手段によって検索されたクラスタに含まれるコンテンツの中から、各単位時間のコメント数の推移状態に類似する、コメント数の推移状態のコンテンツを検索することも好ましい。 According to another embodiment of the prediction server of the present invention,
The prediction server can further communicate with a content publishing server that publishes content to an unspecified number of third parties, or the site server itself has a content publishing function,
The prediction server
Content clustering means for clustering a large number of contents based on the similarity of keyword groups included in each content;
For each content included in each cluster, learning target comment search means for searching the number of comments of each unit time according to the passage of time using a site server,
Cluster search means for searching for a cluster including a keyword group similar to the keyword group extracted from the prediction target content,
It is also preferable that the determination time search means searches for contents in the transition state of the number of comments similar to the transition state of the number of comments in each unit time from the contents included in the cluster searched by the cluster search means.

本発明の予測サーバにおける他の実施形態によれば、
学習対象コメント検索手段は、
全てのクラスタに含まれるコンテンツ毎に、予め、時間経過に応じた各単位時間のコメント数を、ブログサイトサーバを用いて検索する、又は、
クラスタ検索手段によって検索された当該クラスタに含まれるコンテンツ毎に、時間経過に応じた各単位時間のコメント数を、ブログサイトサーバを用いて検索することも好ましい。 According to another embodiment of the prediction server of the present invention,
Learning target comment search means
For each content included in all clusters, search the number of comments in each unit time according to the passage of time in advance using the blog site server, or
It is also preferable to search for the number of comments in each unit time corresponding to the passage of time using the blog site server for each content included in the cluster searched by the cluster search means.

本発明の予測サーバにおける他の実施形態によれば、
各コメントのテキストから、その内容が肯定的か又は否定的かを判定するポジネガ判定手段を更に有し、
学習情報記憶手段は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける肯定的又は否定的なポジネガ比率も記憶し、
コメント数予測手段は、検索されたコンテンツに対応する予測時間範囲のコメント数に、ポジネガ比率を乗算することによって、当該予測対象コンテンツにおける将来的に肯定的コメント数又は否定的コメント数を導出することも好ましい。 According to another embodiment of the prediction server of the present invention,
A positive / negative determining means for determining whether the content is positive or negative from the text of each comment;
The learning information storage means also stores the positive or negative positive / negative ratio in all comments together with the number of comments in each unit time according to the passage of time in the determination time range,
The comment number prediction means derives the number of positive comments or negative comments in the prediction target content in the future by multiplying the number of comments in the prediction time range corresponding to the searched content by the positive / negative ratio. Is also preferable.

本発明の予測サーバにおける他の実施形態によれば、
各コメントのテキストから、当該コメントを投稿したユーザのプロフィールに関する属性情報を抽出するプロフィール情報抽出手段を更に有し、
学習情報記憶手段は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける属性種別に応じた属性比率も記憶し、
コメント数予測手段は、検索されたコンテンツに対応する予測時間範囲のコメント数に、属性比率を乗算することによって、当該予測対象コンテンツにおける将来的な属性種別毎のコメント数を導出することも好ましい。 According to another embodiment of the prediction server of the present invention,
Profile information extracting means for extracting attribute information about the profile of the user who posted the comment from the text of each comment,
The learning information storage means also stores the attribute ratio according to the attribute type in all comments, together with the number of comments in each unit time according to the passage of time in the determination time range,
It is also preferable that the comment number prediction means derives the number of comments for each future attribute type in the prediction target content by multiplying the number of comments in the prediction time range corresponding to the searched content by the attribute ratio.

本発明の予測サーバにおける他の実施形態によれば、
各コメントのテキストから、その内容が肯定的か又は否定的かを判定するポジネガ判定手段と、
各コメントのテキストから、当該コメントを投稿したユーザのプロフィールに関する属性情報を抽出するプロフィール情報抽出手段と
を更に有し、
学習情報記憶手段は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける「肯定的又は否定的」及び「属性種別」の組み合わせに応じたポジネガ属性比率も記憶し、
コメント数予測手段は、検索されたコンテンツに対応する予測時間範囲のコメント数に、ポジネガ属性比率を乗算することによって、当該予測対象コンテンツにおける将来的なポジネガ属性種別毎のコメント数を導出することも好ましい。 According to another embodiment of the prediction server of the present invention,
Positive / negative determination means for determining whether the content of each comment is positive or negative,
Profile information extracting means for extracting attribute information about the profile of the user who posted the comment from the text of each comment;
The learning information storage means also stores the positive / negative attribute ratio according to the combination of “positive or negative” and “attribute type” in all comments, as well as the number of comments in each unit time according to the passage of time in the determination time range. ,
The comment number prediction means may derive the number of comments for each future positive / negative attribute type in the prediction target content by multiplying the number of comments in the predicted time range corresponding to the searched content by the positive / negative attribute ratio. preferable.

本発明の予測サーバにおける他の実施形態によれば、
複数の予測対象コンテンツについて、コメント数予測手段によって導出された将来的なコメント数が多いコンテンツから順にソートしたランキング情報を、ページ情報としてクライアントへ公開するランキング公開手段を更に有することも好ましい。 According to another embodiment of the prediction server of the present invention,
It is also preferable to further include ranking publication means for publishing, as page information, ranking information that is sorted in order from contents having a large number of future comments derived by the comment number prediction means for a plurality of prediction target contents.

本発明の予測サーバにおける他の実施形態によれば、
判定時間検索手段は、学習情報記憶手段における各単位時間のコメント数の推移状態（時間的変化）と、予測対象コンテンツにおける判定時間範囲のコメント数の（時間的変化）推移状態とを、回帰モデルを用いて類似度を導出することも好ましい。 According to another embodiment of the prediction server of the present invention,
The determination time search means uses a regression model that shows the transition state (temporal change) of the number of comments in each unit time in the learning information storage means and the transition state (time change) of the number of comments in the determination time range in the prediction target content. It is also preferable to derive the similarity using.

本発明によれば、複数の投稿者間でテキストのコメントを発信し合うサイトサーバと通信可能であり、予測対象コンテンツにおける将来的なコメント数を予測するサーバに搭載されたコンピュータを機能させる予測プログラムであって、
コンテンツ毎に、時間経過に応じた各単位時間のコメント数の推移状態を、学習情報として予め記憶した学習情報記憶手段と、
サーバから取得された、予測対象コンテンツに対応するコメントについて、時間経過に応じた各単位時間のコメント数を計数する予測対象コメント検索手段と、
計数された各単位時間のコメント数の推移状態に類似する、コメント数の推移状態のコンテンツを、学習情報記憶手段から検索する判定時間検索手段と、
検索されたコンテンツに対応する判定時間後のコメント数の推移状態を、当該予測対象コンテンツにおける将来的なコメント数の推移状態として導出するコメント数予測手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, a prediction program that can communicate with a site server that sends text comments among a plurality of contributors and that functions as a computer mounted on a server that predicts the future number of comments in the prediction target content Because
For each content, learning information storage means for storing the transition state of the number of comments in each unit time according to the passage of time as learning information in advance,
About the comment corresponding to the prediction target content acquired from the server, the prediction target comment search means for counting the number of comments in each unit time according to the passage of time,
A determination time search unit that searches the learning information storage unit for content in the transition state of the number of comments, which is similar to the transition state of the number of comments in each unit time counted,
The computer is made to function as comment number prediction means for deriving the transition state of the number of comments after the determination time corresponding to the searched content as the transition state of the future number of comments in the prediction target content.

本発明によれば、複数の投稿者間でテキストのコメントを発信し合うサイトサーバと通信可能であり、予測対象コンテンツにおける将来的なコメント数を予測するサーバにおけるコメント数予測方法であって、
コンテンツ毎に、時間経過に応じた各単位時間のコメント数の推移状態を、学習情報として予め記憶した学習情報記憶部を有し、
サーバから取得された、予測対象コンテンツに対応するコメントについて、時間経過に応じた各単位時間のコメント数を計数する第１のステップと、
計数された各単位時間のコメント数の推移状態に類似する、コメント数の推移状態のコンテンツを、学習情報記憶手段から検索する第２のステップと、
検索されたコンテンツに対応する判定時間後のコメント数の推移状態を、当該予測対象コンテンツにおける将来的なコメント数の推移状態として導出する第３のステップと
を有することを特徴とする。 According to the present invention, it is possible to communicate with a site server that sends text comments between a plurality of contributors, and a method for predicting the number of comments in a server that predicts the number of future comments in a prediction target content,
For each content, it has a learning information storage unit that pre-stores the transition state of the number of comments in each unit time according to the passage of time as learning information,
A first step of counting the number of comments in each unit time according to the passage of time for comments corresponding to the prediction target content acquired from the server;
A second step of retrieving content in the transition state of the number of comments similar to the transition state of the comment number in each unit time from the learning information storage unit;
And a third step of deriving a transition state of the number of comments after the determination time corresponding to the searched content as a transition state of the future number of comments in the prediction target content.

本発明の予測サーバ、プログラム及び方法によれば、一般的なニュース記事のような予測対象コンテンツであっても、将来的なコメント数を予測することによって、不特定多数のユーザにおける将来的な興味の傾向を分析することができる。 According to the prediction server, the program, and the method of the present invention, the future interest of an unspecified number of users can be predicted by predicting the number of comments in the future even for content to be predicted such as general news articles. To analyze the trend.

本発明におけるシステム構成図である。It is a system configuration diagram in the present invention. 本発明における予測サーバの機能構成図である。It is a functional block diagram of the prediction server in this invention. 予測サーバの学習記憶部に記憶された情報を表す説明図である。It is explanatory drawing showing the information memorize | stored in the learning memory | storage part of the prediction server. 予測サーバの判定時間判定部における処理を表す説明図である。It is explanatory drawing showing the process in the determination time determination part of a prediction server. 予測サーバのランキング公開部における処理を表す説明図である。It is explanatory drawing showing the process in the ranking publicizing part of a prediction server. 本発明における予測サーバの学習部の機能構成図である。It is a function block diagram of the learning part of the prediction server in this invention.

以下、本発明の実施の形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明におけるシステム構成図である。 FIG. 1 is a system configuration diagram according to the present invention.

図１によれば、コンテンツ公開サーバ３が、インターネットに接続されている。コンテンツ公開サーバ３は、不特定多数の第三者に向けて公開コンテンツを配信する。公開コンテンツは、様々なメディアコンテンツであって、例えばニュース記事、Ｗｅｂページ、音楽コンテンツ、電子書籍、テレビ放送コンテンツであってもよい。 According to FIG. 1, the content publishing server 3 is connected to the Internet. The content publishing server 3 distributes public content to an unspecified number of third parties. The public content is various media content, and may be, for example, a news article, a web page, music content, an electronic book, or a television broadcast content.

また、図１によれば、ブログサイトサーバ２が、インターネットに更に接続されており、複数の投稿者間でテキストのコメントを発信し合うことができる。ブログサイトサーバ２は、例えばtwitter（登録商標）サイトであってもよい。 In addition, according to FIG. 1, the blog site server 2 is further connected to the Internet, and text comments can be transmitted among a plurality of contributors. The blog site server 2 may be a twitter (registered trademark) site, for example.

尚、以下では、ブログサイトサーバ２とコンテンツ公開サーバ３とが別々にインターネットに設置されているものとして説明する（図１参照）が、これら機能が一体となったサイトサーバであってもよい。このようなサイトサーバの場合、コンテンツ毎にコメントが対応付けて公開されている。例えばYouTube（登録商標）のようなサイトサーバがある。 In the following description, it is assumed that the blog site server 2 and the content publishing server 3 are separately installed on the Internet (see FIG. 1). However, a site server in which these functions are integrated may be used. In the case of such a site server, a comment is associated with each content and released. For example, there is a site server such as YouTube (registered trademark).

端末４は、パーソナルコンピュータ、携帯端末、スマートフォン、テレビ等であって、コンテンツ公開サーバ３及びブログサイトサーバ２にアクセスすることができる。不特定多数のユーザは、端末４を用いて、コンテンツ公開サーバ３によって公開されるコンテンツを閲覧しながら、ブログサイトサーバ２へコメント文章を投稿し、他人のコメント文章も閲覧することができる。 The terminal 4 is a personal computer, a portable terminal, a smartphone, a television, or the like, and can access the content publishing server 3 and the blog site server 2. An unspecified number of users can post comment text to the blog site server 2 while browsing content published by the content publishing server 3 using the terminal 4, and can also browse other people's comment text.

本発明によれば、予測サーバ１が、インターネットに更に接続されており、コンテンツ公開サーバ３及びブログサイトサーバ２と通信することができる。本発明における予測サーバ１は、予測対象コンテンツにおける将来的なコメント数を予測することができる。これによって、ユーザは、端末４から予測サーバ１へアクセスすることによって、将来的に注目されるであろうコンテンツを知ることができる。 According to the present invention, the prediction server 1 is further connected to the Internet, and can communicate with the content publishing server 3 and the blog site server 2. The prediction server 1 in the present invention can predict the future number of comments in the prediction target content. Thereby, the user can know the content that will be noticed in the future by accessing the prediction server 1 from the terminal 4.

図２は、本発明における予測サーバの機能構成図である。 FIG. 2 is a functional configuration diagram of the prediction server in the present invention.

図２によれば、予測サーバ１は、通信インタフェース部１０と、予測対象キーワード抽出部１１と、予測対象コメント検索部１２と、学習情報記憶部１３と、判定時間検索部１４と、コメント数予測部１５と、ランキング公開部１６と、学習部１７とを有する。通信インタフェース部１０以外のこれら機能構成部は、サーバに搭載されたコンピュータを機能させるプログラムを実行することによって実現される。 According to FIG. 2, the prediction server 1 includes a communication interface unit 10, a prediction target keyword extraction unit 11, a prediction target comment search unit 12, a learning information storage unit 13, a determination time search unit 14, and a comment number prediction. The unit 15, the ranking disclosure unit 16, and the learning unit 17 are included. These functional components other than the communication interface unit 10 are realized by executing a program that causes a computer mounted on the server to function.

［予測対象キーワード抽出部１１］
予測対象キーワード抽出部１１は、キーとなる予測対象コンテンツを入力し、当該予測対象コンテンツに含まれるキーワード群を抽出する。予測対象コンテンツは、Ｗｅｂページであってもよいし、ＵＲＬ(Uniform Resource Locator)のみであってもよい。ＵＲＬのみである場合、予測対象キーワード抽出部１１は、そのＵＲＬに基づくＷｅｂページを、コンテンツ公開サーバ３から取得する。これらコンテンツには、少なくともテキストが含まれていることを前提とする。 [Prediction target keyword extraction unit 11]
The prediction target keyword extraction unit 11 inputs a prediction target content as a key, and extracts a keyword group included in the prediction target content. The prediction target content may be a Web page or only a URL (Uniform Resource Locator). When the URL is only the URL, the prediction target keyword extraction unit 11 acquires a Web page based on the URL from the content publishing server 3. It is assumed that these contents include at least text.

次に、予測対象キーワード抽出部１１は、コンテンツに含まれるテキストから形態素解析によって単語を抽出する。「形態素解析」とは、文章を、意味のある単語に区切り、辞書を利用して品詞や内容を判別する技術をいう。「形態素」とは、文章の要素のうち、意味を持つ最小の単位を意味する。 Next, the prediction target keyword extraction unit 11 extracts words from the text included in the content by morphological analysis. “Morphological analysis” refers to a technique in which sentences are divided into meaningful words and the part of speech and contents are discriminated using a dictionary. The “morpheme” means the smallest unit having meaning among the elements of the sentence.

次に、予測対象キーワード抽出部１１は、ＴＦ−ＩＤＦ（Term Frequency - Inverse Document Frequency：単語の出現頻度−逆出現頻度）によって特徴的な単語を、キーワードとして抽出する。ＴＦ−ＩＤＦとは、各単語に重みを付けて、クエリから文章をベクトル空間で表し、文章とクエリの類似度でランク付けをする技術である。ランク付けられた値が高いほど、重要キーワードと認識される。 Next, the prediction target keyword extraction unit 11 extracts a characteristic word as a keyword based on TF-IDF (Term Frequency-Inverse Document Frequency). TF-IDF is a technology that weights each word, expresses sentences from a query in a vector space, and ranks them based on the similarity between the sentences and the query. The higher the ranked value, the more important keywords are recognized.

例えば、予測対象コンテンツとして、以下のようなニュース記事があったとする。
「［日本、42年ぶり優勝＝米国の8連覇を阻む―ソフトボール女子］ソフトボールの女子世界選手権最終日は22日、カナダ・ホワイトホースで行われ、日本は決勝で8連覇を目指した米国を延長十回、2―1で破り、＜7月23日(月)11時30分配信＞」
このニュース記事から、例えば以下のようなキーワード群が抽出される。
「優勝」「連覇」「ソフトボール」
勿論、予測対象コンテンツのＵＲＬそのものを、キーワードとして抽出するものであってもよい。 For example, it is assumed that the following news article exists as the prediction target content.
“[Japan wins first win in 42 years = 8th consecutive win in the US-Softball Women] The final day of the Softball Women ’s World Championship will be held in Whitehorse, Canada on the 22nd. Break 10 times, 2-1 and <delivery at 11:30 on Monday, July 23>
For example, the following keyword group is extracted from this news article.
"Winner""Consecutivevictory""Softball"
Of course, the URL of the prediction target content itself may be extracted as a keyword.

［予測対象コメント検索部１２］
予測対象コメント検索部１２は、抽出されたキーワード群をキーとして、ブログサイトサーバ２から複数のコメントを検索する。前述のキーワード群をキーとして、例えば以下のようなコメントが検索される。
＞投稿者ID: xxxyyy
＞属性:30代、男性、会社員、スポーツ
＞フォロワー数:200人
＞内容：ようやった！日本、42年ぶり優勝＝米国の8連覇を阻む―ソフトボール女子
http://www.news.jp/xxx
＞投稿時刻：7月23日(月)12時30分 [Prediction target comment search unit 12]
The prediction target comment search unit 12 searches the blog site server 2 for a plurality of comments using the extracted keyword group as a key. For example, the following comments are searched using the above keyword group as a key.
> Contributor ID: xxxyyy
> Attributes: 30s, male, office worker, sports> Number of followers: 200> Content: Finally! Japan wins for the first time in 42 years
http://www.news.jp/xxx
> Posting time: Monday, July 23, 12:30

予測対象コメント検索部１２は、予測対象コンテンツに関連する多数のコメントを検索する。そして、当該予測対象コンテンツについて、時間経過に応じた各単位時間のコメント数を計数する。経過時間に応じたコメント数の推移状態（グラフ化された履歴情報）は、判定時間検索部１４へ出力される。 The prediction target comment search unit 12 searches a large number of comments related to the prediction target content. And about the said prediction object content, the number of comments of each unit time according to progress of time is counted. The transition state (graphed history information) of the number of comments according to the elapsed time is output to the determination time search unit 14.

尚、当該予測サーバが、ブログサイトサーバの機能とコンテンツ公開サーバの機能とが一体となったサイトサーバと通信可能である場合、予測対象キーワード抽出部１１は必須の構成要素とはならない。また、予測対象コメント検索部１２は、抽出されたキーワード群をキーとしてブログサイトサーバ２から複数のコメントを検索する必要もない。一体化されたサイトサーバが既に、コンテンツ毎に対応付けてコメント群を発信しているためである。 When the prediction server can communicate with a site server in which the function of the blog site server and the function of the content publishing server are integrated, the prediction target keyword extraction unit 11 is not an essential component. Further, the prediction target comment search unit 12 does not need to search a plurality of comments from the blog site server 2 using the extracted keyword group as a key. This is because the integrated site server has already transmitted a comment group in association with each content.

［学習情報記憶部１３］
学習情報記憶部１３は、コンテンツ毎に、時間経過に応じた各単位時間のコメント数の推移状態を、学習情報として予め記憶する。 [Learning information storage unit 13]
The learning information storage unit 13 stores in advance, as learning information, the transition state of the number of comments in each unit time corresponding to the passage of time for each content.

図３は、予測サーバの学習記憶部に記憶された情報を表す説明図である。 FIG. 3 is an explanatory diagram illustrating information stored in the learning storage unit of the prediction server.

図３によれば、コンテンツ毎に、横軸の時間経過に応じた、縦軸のコメント数のグラフが記憶されている。図３によれば、３つのコンテンツＡ，Ｂ，Ｃについてグラフが表されている。例えば、図３のコンテンツＡによれば、野球の日本の優勝の記事における推移状態が表されている。これによって、コンテンツ毎に、単位時間（例えば４時間）毎のコメント数が記憶されている。 According to FIG. 3, a graph of the number of comments on the vertical axis corresponding to the passage of time on the horizontal axis is stored for each content. According to FIG. 3, a graph is shown for three contents A, B, and C. For example, according to the content A of FIG. 3, the transition state in the article of the Japanese championship of baseball is represented. Thus, the number of comments per unit time (for example, 4 hours) is stored for each content.

［判定時間検索部１４］
判定時間検索部１４は、予測対象コンテンツについて計数された各単位時間のコメント数の推移状態に類似する、コメント数の推移状態となるコンテンツを、学習情報記憶部１３から検索する。 [Decision time search unit 14]
The determination time search unit 14 searches the learning information storage unit 13 for content that has a comment number transition state similar to the comment state transition state of each unit time counted for the prediction target content.

図４は、予測サーバの判定時間判定部における処理を表す説明図である。 FIG. 4 is an explanatory diagram illustrating processing in the determination time determination unit of the prediction server.

図４（ａ）によれば、予測対象コンテンツについて計数された各単位時間のコメント数の推移状態が表されている。例えば現在時刻９：００に、予測対象コンテンツに対する将来的なコメント数を予測したいとする。現在時刻から遡った一定時間（例えば２４時間や８時間など）における推移状態を、判定時間範囲とする。図４（ａ）によれば、現在時刻９：００から遡って８時間を判定時間範囲としている。 FIG. 4A shows the transition state of the number of comments in each unit time counted for the prediction target content. For example, assume that the current number of comments for the prediction target content is to be predicted at the current time 9:00. A transition state in a fixed time (for example, 24 hours, 8 hours, etc.) retroactive from the current time is set as a determination time range. According to FIG. 4A, the determination time range is 8 hours retroactive from the current time 9:00.

図４（ｂ）によれば、学習情報記憶部１３に蓄積されたコンテンツ毎の推移状態の中で、予測対象コンテンツの判定時間範囲に類似する推移状態とっているコンテンツを検索する。図４（ｂ）によれば、コンテンツＡの前半部分の８時間分の推移状態が、予測対象コンテンツの推移状態に類似している。 According to FIG.4 (b), the content which is in the transition state similar to the determination time range of a prediction object content in the transition state for every content accumulate | stored in the learning information storage part 13 is searched. According to FIG.4 (b), the transition state for 8 hours of the first half part of the content A is similar to the transition state of the prediction target content.

判定時間検索部１４は、学習情報記憶部１３における各単位時間のコメント数の推移状態（時間的変化）と、予測対象コンテンツにおける判定時間範囲のコメント数の（時間的変化）推移状態とを、回帰モデルを用いて類似度を導出することも好ましい。回帰モデルとしては、代表的に最小２乗法がある。最小二乗法とは、測定で得られた数値の組を、適当なモデルから想定される特定の関数（１次関数、対数曲線など）を用いて近似するときに、想定する関数が測定値に対してよい近似となるように、残差の二乗和を最小とする係数を決定することをいう。即ち、本発明によれば、予測対象コンテンツの推移状態に対して相関が強い推移状態のコンテンツを、学習情報記憶部１３から検索する。 The determination time search unit 14 includes a transition state (temporal change) of the number of comments in each unit time in the learning information storage unit 13 and a transition state of the number of comments in the determination time range in the prediction target content (time change). It is also preferable to derive the similarity using a regression model. As a regression model, there is typically a least square method. Least-squares method means that a set of numerical values obtained by measurement is approximated using a specific function (linear function, logarithmic curve, etc.) assumed from an appropriate model. This is to determine a coefficient that minimizes the sum of squares of the residuals so as to be a good approximation. That is, according to the present invention, the learning information storage unit 13 is searched for a content in a transition state having a strong correlation with the transition state of the prediction target content.

このような推移状態（コメント数の時間的変化）の比較は、回帰モデルに限られず、ポアソン分布を用いることもできる。回帰モデルとしては、前述した最小二乗法に限られず、ＳＶＲ(Support Vector Regression)を用いることもできる。また、感染症の流行モデル（ＳＩＲモデル）を用いることもできる。 Such comparison of transition states (temporal changes in the number of comments) is not limited to regression models, and Poisson distribution can also be used. The regression model is not limited to the aforementioned least square method, and SVR (Support Vector Regression) can also be used. An epidemic model (SIR model) of infectious diseases can also be used.

［コメント数予測部１５］
コメント数予測部１５は、検索されたコンテンツに対応する予測時間範囲のコメント数を、当該予測対象コンテンツにおける将来的なコメント数として導出する。このように、予測対象コンテンツの過去から現在までのコメント数の傾向（推移状態）と、過去に収集された他の多数のコンテンツの推移状態と比較することによって、今後のコメント数の増減の傾向を予測することができる。 [Comment number prediction unit 15]
The comment number prediction unit 15 derives the number of comments in the prediction time range corresponding to the searched content as the future number of comments in the prediction target content. In this way, by comparing the trend of the number of comments from the past to the present of the content to be predicted (transition state) with the transition state of many other contents collected in the past, the trend of the increase or decrease in the number of comments in the future Can be predicted.

図４（ｃ）によれば、コンテンツＡの判定時間範囲以後の予測時間範囲の推移状態が、予測対象コンテンツの予測時間範囲の推移状態となるであろうと予測されている。 According to FIG. 4C, it is predicted that the transition state of the prediction time range after the determination time range of the content A will be the transition state of the prediction time range of the prediction target content.

［ランキング公開部１６］
ランキング公開部１６は、複数の予測対象コンテンツについて、コメント数予測部１５によって導出された将来的なコメント数が多いコンテンツから順にソートしたランキング情報を、ページ情報としてクライアントへ公開する。この場合、複数の予測対象コンテンツは、ユーザによって選択されたものであってもよいし、予測サーバの運営事業者によった予め選択されたものであってもよい。例えば、ユーザは、現在それほど注目されていないけれども、その後、コメント数が急増するようなコンテンツを知ることができる。 [Ranking Disclosure Department 16]
The ranking publication unit 16 publishes, as page information, ranking information that is sorted in order from content with a large number of future comments derived by the comment number prediction unit 15 for a plurality of prediction target contents. In this case, the plurality of contents to be predicted may be selected by the user, or may be previously selected by the operator of the prediction server. For example, although the user is not attracting much attention at present, the user can know the content in which the number of comments increases rapidly thereafter.

図５は、予測サーバのランキング公開部における処理を表す説明図である。 FIG. 5 is an explanatory diagram illustrating processing in the ranking disclosure unit of the prediction server.

図５によれば、現在時刻から見て、将来的にコメント数が増加するであろう１位から５位までのランキング形式で、コンテンツが表されている。例えば１位のコンテンツについては、現在時刻であってもコメント数が多いが、今後更に増加することが予想される。また、例えば第４位のコンテンツについては、現在時刻ではコメント数はそれほど多くは無いが、今後急増することが予想される。このように、ユーザは、ランキング形式のページを閲覧することによって、将来的にコメント数が増加するであろうコンテンツを知ることができる。即ち、ユーザは、ネット上でまだ大きな話題になっていないコンテンツを、先行して知ることができる。 According to FIG. 5, the content is represented in the ranking format from the first place to the fifth place where the number of comments will increase in the future from the current time. For example, the number one content has a large number of comments even at the current time, but is expected to increase further in the future. For example, for the 4th content, the number of comments is not so large at the current time, but is expected to increase rapidly in the future. In this way, the user can know the content whose number of comments will increase in the future by browsing the ranking-type page. That is, the user can know in advance the content that is not yet a big topic on the net.

［学習部１７］
学習部１７は、ブログサイトサーバ２及びコンテンツ公開サーバ３と通信することによって、学習情報記憶部１３へ記憶させるべき学習情報を生成する。 [Learning unit 17]
The learning unit 17 generates learning information to be stored in the learning information storage unit 13 by communicating with the blog site server 2 and the content publishing server 3.

図６は、本発明における予測サーバの学習部の機能構成図である。 FIG. 6 is a functional configuration diagram of the learning unit of the prediction server in the present invention.

図６によれば、学習部１７は、コンテンツクラスタリング部１７１と、学習対象コメント検索部１７２と、クラスタ検索部１７３とを有する。 According to FIG. 6, the learning unit 17 includes a content clustering unit 171, a learning target comment search unit 172, and a cluster search unit 173.

コンテンツクラスタリング部１４１は、多数のコンテンツを、各コンテンツに含まれるキーワード群の類似度に基づいてクラスタリングする。クラスタリングには、例えばk-meansのような方法が用いられ、互いに類似するコンテンツ同士が同一のクラスとなる。これによって、学習情報記憶部１３は、複数のコンテンツを含むクラスタとして記憶する。 The content clustering unit 141 clusters a large number of contents based on the similarity of the keyword group included in each content. For clustering, for example, a method such as k-means is used, and contents similar to each other are in the same class. Thereby, the learning information storage unit 13 stores the data as a cluster including a plurality of contents.

また、クラスタリングには、文章間の類似度を導出するコサイン類似度を用いて、所定閾値以上類似する１つのコンテンツ群を導出するものであってもよい。コサイン類似度とは、ベクトル空間モデルにおいて、文書同士を比較する際に用いられる類似度を算出する方法である。コサイン類似度は、そのまま、ベクトル同士の成す角度の近さを表現するため、三角関数のコサインのように、１に近ければ類似しており、０に近ければ非類似と判定することできる。一般には、ＴＦ−ＩＤＦの値が用いられる。 In clustering, one content group similar to a predetermined threshold or more may be derived using a cosine similarity that derives the similarity between sentences. The cosine similarity is a method for calculating a similarity used when comparing documents in a vector space model. Since the cosine similarity expresses the closeness of the angle formed by the vectors as it is, it can be determined that the cosine similarity is similar if it is close to 1 and is dissimilar if it is close to 0, like the cosine of a trigonometric function. In general, the value of TF-IDF is used.

クラスタ検索部１４２は、予測対象コンテンツから抽出されたキーワード群に類似するキーワード群を含むクラスタを検索する。 The cluster search unit 142 searches for a cluster including a keyword group similar to the keyword group extracted from the prediction target content.

学習対象コメント検索部１４３は、各クラスタに含まれるコンテンツ毎に、時間経過に応じた各単位時間のコメント数を、ブログサイトサーバ２を用いて検索する。ここで、学習対象コメント検索部１４３は、以下のいずれか一方の方法で、コンテンツ毎のコメント数を収集することができる。
（１）全てのクラスタに含まれるコンテンツ毎に、予め、時間経過に応じた各単位時間のコメント数を、ブログサイトサーバ２を用いて検索する。これは、学習情報記憶部１３に、多数のコンテンツの学習情報を静的に記憶しておく。
（２）クラスタ検索部１４２によって検索された当該クラスタに含まれるコンテンツ毎に、時間経過に応じた各単位時間のコメント数を、ブログサイトサーバ２を用いて検索する。これは、クラスタ検索部１４２の結果に基づいて学習対象コメント検索部１４３が動的に動作する。 The learning target comment search unit 143 searches the blog site server 2 for the number of comments in each unit time corresponding to the passage of time for each content included in each cluster. Here, the learning target comment search unit 143 can collect the number of comments for each content by any one of the following methods.
(1) For each content included in all clusters, the number of comments in each unit time corresponding to the passage of time is searched in advance using the blog site server 2. This is because learning information of a large number of contents is statically stored in the learning information storage unit 13.
(2) The blog site server 2 is used to search for the number of comments in each unit time corresponding to the passage of time for each content included in the cluster searched by the cluster search unit 142. This is because the learning target comment search unit 143 dynamically operates based on the result of the cluster search unit 142.

そして、判定時間検索部１４は、クラスタ検索部１４２によって検索されたクラスタに含まれるコンテンツの中から、各単位時間のコメント数の推移状態に類似する、判定時間範囲のコメント数の推移状態となるコンテンツを検索する。これによって、予測対象コンテンツの内容に比較的類似したコンテンツ群の中から、推移状態が類似するコンテンツを検索することができる。 Then, the determination time search unit 14 becomes a transition state of the number of comments in the determination time range similar to the transition state of the number of comments in each unit time from the contents included in the cluster searched by the cluster search unit 142. Search for content. Accordingly, it is possible to search for a content having a similar transition state from a content group that is relatively similar to the content of the prediction target content.

また、図６によれば、予測サーバ１は、学習部１７と共に、ポジネガ判定部１８と、プロフィール情報抽出部１９とを更に有する。 Further, according to FIG. 6, the prediction server 1 further includes a positive / negative determination unit 18 and a profile information extraction unit 19 along with the learning unit 17.

ポジネガ判定部１４４は、各コメントのテキストから、その内容が肯定的か又は否定的かを判定する。 The positive / negative determining unit 144 determines whether the content is affirmative or negative from the text of each comment.

この場合、学習情報記憶部１３は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける肯定的又は否定的なポジネガ比率も記憶する。
コンテンツＡに対する判定時間範囲のコメント：ポジ比率Ｒ_ＡＰ＋ネガ比率Ｒ_ＡＮ
コンテンツＢに対する判定時間範囲のコメント：ポジ比率Ｒ_ＢＰ＋ネガ比率Ｒ_ＢＮ
コンテンツＣに対する判定時間範囲のコメント：ポジ比率Ｒ_ＣＰ＋ネガ比率Ｒ_ＣＮ
また、コメント数予測部１５は、検索されたコンテンツに対応する予測時間範囲のコメント数に、ポジネガ比率を乗算することによって、当該予測対象コンテンツにおける将来的に肯定的コメント数又は否定的コメント数を導出する。
コンテンツＡの将来的な予測時間範囲のコメントについて
肯定的なコメント数＝ポジ比率Ｒ_ＡＰ×予測時間範囲のコメント数
否定的なコメント数＝ネガ比率Ｒ_ＡＮ×予測時間範囲のコメント数 In this case, the learning information storage unit 13 also stores the positive / negative positive / negative ratios in all comments together with the number of comments in each unit time according to the passage of time in the determination time range.
Comments on the judgment time range for content A: Positive ratio R _AP + Negative ratio R _AN
Comments on the determination time range for content B: Positive ratio R _BP + Negative ratio R _BN
Comments on the determination time range for content C: positive ratio R _CP + negative ratio R _CN
Further, the comment number prediction unit 15 multiplies the number of comments in the prediction time range corresponding to the searched content by the positive / negative ratio, thereby obtaining the number of positive comments or negative comments in the prediction target content in the future. To derive.
Comments on the future predicted time range of content A
Number of positive comments = positive ratio _RAP x number of comments in the predicted time range
Number of negative comments = Negative ratio R _AN × Number of comments in the predicted time range

プロフィール情報抽出部１４５は、各コメントのテキストから、当該コメントを投稿したユーザのプロフィールに関する属性情報を抽出する。 The profile information extraction unit 145 extracts attribute information related to the profile of the user who posted the comment from the text of each comment.

この場合、学習情報記憶部１３は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける属性種別に応じた属性比率も記憶する。
コンテンツＡに対する判定時間範囲のコメント：男性比率Ｒ_ＡＭ＋女性比率Ｒ_ＡＦ
コンテンツＢに対する判定時間範囲のコメント：男性比率Ｒ_ＢＭ＋女性比率Ｒ_ＢＦ
コンテンツＣに対する判定時間範囲のコメント：男性比率Ｒ_ＣＭ＋女性比率Ｒ_ＣＦ
また、コメント数予測部１５は、検索されたコンテンツに対応する予測時間範囲のコメント数に、属性比率を乗算することによって、当該予測対象コンテンツにおける将来的な属性種別毎のコメント数を導出する。
コンテンツＡの将来的な予測時間範囲のコメントについて
男性のコメント数＝男性比率Ｒ_ＡＭ×予測時間範囲のコメント数
女性のコメント数＝女性比率Ｒ_ＡＦ×予測時間範囲のコメント数 In this case, the learning information storage unit 13 also stores the attribute ratio according to the attribute type in all comments, together with the number of comments in each unit time according to the passage of time in the determination time range.
Comments on the determination time range for content A: male ratio R _AM + female ratio R _AF
Comment of the judgment time range for content B: male ratio R _BM + female ratio R _BF
Comments on the determination time range for content C: male ratio R _CM + female ratio R _CF
In addition, the comment number prediction unit 15 derives the number of comments for each future attribute type in the prediction target content by multiplying the number of comments in the prediction time range corresponding to the searched content by the attribute ratio.
Comments on the future predicted time range of content A
Number of male comments = male ratio R _AM × number of comments in the predicted time range
Number of female comments = female ratio R _AF x number of comments in the predicted time range

また、ポジネガ判定部１８及びプロフィール情報抽出部１９の両方を用いることも好ましい。この場合、学習情報記憶部１３は、判定時間範囲における時間経過に応じた各単位時間のコメント数と共に、全てのコメントにおける「肯定的又は否定的」及び「属性種別」の組み合わせに応じたポジネガ属性比率も記憶する。
コンテンツＡに対する判定時間範囲のコメント：ポジ比率Ｒ_ＡＰ＋ネガ比率Ｒ_ＡＮ
：男性比率Ｒ_ＡＭ＋女性比率Ｒ_ＡＦ
コンテンツＢに対する判定時間範囲のコメント：ポジ比率Ｒ_ＢＰ＋ネガ比率Ｒ_ＢＮ
：男性比率Ｒ_ＢＭ＋女性比率Ｒ_ＢＦ
コンテンツＣに対する判定時間範囲のコメント：ポジ比率Ｒ_ＣＰ＋ネガ比率Ｒ_ＣＮ
：男性比率Ｒ_ＣＭ＋女性比率Ｒ_ＣＦ
また、コメント数予測部１５は、検索されたコンテンツに対応する予測時間範囲のコメント数に、ポジネガ属性比率を乗算することによって、当該予測対象コンテンツにおける将来的なポジネガ属性種別毎のコメント数を導出する。
コンテンツＡの将来的な予測時間範囲のコメントについて
肯定的な男性のコメント数＝
ポジ比率Ｒ_ＡＰ×男性比率Ｒ_ＡＭ×予測時間範囲のコメント数
否定的な女性のコメント数＝
ネガ比率Ｒ_ＡＦ×女性比率Ｒ_ＡＦ×予測時間範囲のコメント数
・・・・・・ It is also preferable to use both the positive / negative determination unit 18 and the profile information extraction unit 19. In this case, the learning information storage unit 13 has positive / negative attributes corresponding to combinations of “positive or negative” and “attribute type” in all comments, together with the number of comments in each unit time according to the passage of time in the determination time range. The ratio is also memorized.
Comments on the judgment time range for content A: Positive ratio R _AP + Negative ratio R _AN
: Male ratio R _AM + Female ratio R _AF
Comments on the determination time range for content B: Positive ratio R _BP + Negative ratio R _BN
: Male ratio R _BM + Female ratio R _BF
Comments on the determination time range for content C: positive ratio R _CP + negative ratio R _CN
: Male ratio R _CM + Female ratio R _CF
Further, the comment number prediction unit 15 derives the number of comments for each future positive / negative attribute type in the prediction target content by multiplying the number of comments in the prediction time range corresponding to the searched content by the positive / negative attribute ratio. To do.
Comments on the future predicted time range of content A
Number of positive male comments =
Comments of the positive ratio _{R AP} × male ratio _{R AM} × prediction time range
Negative female comments =
Negative ratio R _AF × Female ratio R _AF × Number of comments in the predicted time range
・・・・・・

以上、詳細に説明したように、本発明の予測サーバ、プログラム及び方法によれば、一般的なニュース記事のような予測対象コンテンツであっても、将来的なコメント数を予測することによって、不特定多数のユーザにおける将来的な興味の傾向を分析することができる。 As described above in detail, according to the prediction server, program, and method of the present invention, it is not possible to predict the number of future comments even for a content to be predicted such as a general news article. Future trends in interest for a particular number of users can be analyzed.

前述した本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 Various changes, modifications, and omissions of the above-described various embodiments of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１予測サーバ
１０通信インタフェース部
１１予測対象キーワード抽出部
１２予測対象コメント検索部
１３学習情報記憶部
１４判定時間検索部
１５コメント数予測部
１６ランキング公開部
１７学習部
１７１コンテンツクラスタリング部
１７２クラスタ検索部
１７３学習対象コメント検索部
１８ポジネガ判定部
１９プロフィール情報抽出部
２ブログサイトサーバ
３コンテンツ公開サーバ
４端末 DESCRIPTION OF SYMBOLS 1 Prediction server 10 Communication interface part 11 Prediction object keyword extraction part 12 Prediction object comment search part 13 Learning information storage part 14 Judgment time search part 15 Comment number prediction part 16 Ranking public part 17 Learning part 171 Content clustering part 172 Cluster search part 173 Learning target comment search unit 18 Positive / negative determination unit 19 Profile information extraction unit 2 Blog site server 3 Content disclosure server 4 Terminal

Claims

A prediction server that can communicate with a site server that sends text comments between multiple contributors and predicts the future number of comments in the prediction target content,
For each content, learning information storage means for storing the transition state of the number of comments in each unit time according to the passage of time as learning information in advance,
About comments corresponding to the prediction target content acquired from the site server, prediction target comment search means for counting the number of comments in each unit time according to the passage of time;
A determination time search unit that searches the learning information storage unit for content in the transition state of the number of comments, which is similar to the transition state of the comment number in each unit time counted,
A prediction server comprising comment number prediction means for deriving a transition state of the number of comments after the determination time corresponding to the searched content as a transition state of a future number of comments in the prediction target content.

The site server is a blog site server,
The prediction server
A prediction target keyword extraction unit that extracts a keyword group included in the prediction target content;
The prediction target comment search means searches for a plurality of comments from the blog site server using the keyword group extracted by the prediction target keyword extraction means as a key, and counts the number of comments in each unit time according to the passage of time. The prediction server according to claim 1.

The prediction server can further communicate with a content publishing server that publishes content to an unspecified number of third parties, or the site server itself has a content publishing function,
The prediction server
Content clustering means for clustering a large number of contents based on the similarity of keyword groups included in each content;
For each content included in each cluster, learning target comment search means for searching the number of comments in each unit time according to the passage of time using the site server,
Cluster search means for searching for a cluster including a keyword group similar to the keyword group extracted from the prediction target content;
The determination time search unit searches the content included in the cluster searched by the cluster search unit for content in the transition state of the number of comments similar to the transition state of the number of comments in each unit time. The prediction server according to claim 1 or 2.

The learning target comment search means includes:
For each content included in all clusters, search the number of comments in each unit time according to the passage of time in advance using the blog site server, or
The number of comments in each unit time corresponding to the passage of time is searched using the blog site server for each content included in the cluster searched by the cluster search means. Prediction server.

A positive / negative determining means for determining whether the content is positive or negative from the text of each comment;
The learning information storage means also stores the positive or negative positive / negative ratio in all comments together with the number of comments in each unit time according to the passage of time in the determination time range,
The comment number prediction means derives the number of positive comments or negative comments in the prediction target content in the future by multiplying the number of comments in the prediction time range corresponding to the searched content by the positive / negative ratio. The prediction server according to any one of claims 1 to 4, wherein:

Profile information extracting means for extracting attribute information about the profile of the user who posted the comment from the text of each comment,
The learning information storage means also stores the attribute ratio according to the attribute type in all comments, together with the number of comments in each unit time according to the passage of time in the determination time range,
The comment number prediction means derives the number of comments for each future attribute type in the prediction target content by multiplying the number of comments in the prediction time range corresponding to the searched content by the attribute ratio. The prediction server according to any one of claims 1 to 4, wherein the prediction server is characterized.

Positive / negative determination means for determining whether the content of each comment is positive or negative,
Profile information extracting means for extracting attribute information about the profile of the user who posted the comment from the text of each comment;
The learning information storage means also stores a positive / negative attribute ratio according to a combination of “positive or negative” and “attribute type” in all comments, together with the number of comments in each unit time according to the passage of time in the determination time range. And
The comment number prediction means derives the number of comments for each future positive / negative attribute type in the prediction target content by multiplying the number of comments in the prediction time range corresponding to the searched content by the positive / negative attribute ratio. The prediction server according to any one of claims 1 to 4, characterized in that:

The system further comprises a ranking publishing unit that publishes, as page information, ranking information that is sorted in order from content having a large number of future comments derived by the comment number predicting unit for a plurality of prediction target contents. The prediction server according to any one of claims 1 to 7.

The determination time search means includes a transition state (temporal change) of the number of comments in each unit time in the learning information storage means and a transition state of the number of comments in the determination time range in the prediction target content (temporal change). The prediction server according to any one of claims 1 to 8, wherein the similarity is derived using a regression model.

A prediction program that allows a computer mounted on a server that can communicate with a site server that sends text comments between a plurality of contributors and predicts the future number of comments in the prediction target content,
For each content, learning information storage means for storing the transition state of the number of comments in each unit time according to the passage of time as learning information in advance,
About the comment corresponding to the prediction target content acquired from the server, prediction target comment search means for counting the number of comments in each unit time according to the passage of time;
A determination time search unit that searches the learning information storage unit for content in the transition state of the number of comments, which is similar to the transition state of the comment number in each unit time counted,
For a server, wherein the computer functions as comment number prediction means for deriving the transition state of the number of comments after the determination time corresponding to the searched content as the transition state of the future number of comments in the prediction target content Prediction program.

A method for predicting the number of comments in a server that can communicate with a site server that sends text comments among a plurality of contributors and predicts the number of future comments in the prediction target content,
For each content, it has a learning information storage unit that pre-stores the transition state of the number of comments in each unit time according to the passage of time as learning information,
A first step of counting the number of comments in each unit time according to the passage of time for the comment corresponding to the prediction target content acquired from the server;
A second step of searching the learning information storage means for content in the transition state of the number of comments similar to the transition state of the number of comments in each unit time counted;
And a third step of deriving a transition state of the number of comments after the determination time corresponding to the searched content as a transition state of a future number of comments in the prediction target content. .