WO2013073377A1 - 情報拡散規模予測装置、情報拡散規模予測方法、および情報拡散規模予測プログラム - Google Patents
情報拡散規模予測装置、情報拡散規模予測方法、および情報拡散規模予測プログラム Download PDFInfo
- Publication number
- WO2013073377A1 WO2013073377A1 PCT/JP2012/078292 JP2012078292W WO2013073377A1 WO 2013073377 A1 WO2013073377 A1 WO 2013073377A1 JP 2012078292 W JP2012078292 W JP 2012078292W WO 2013073377 A1 WO2013073377 A1 WO 2013073377A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text data
- learning
- prediction
- topic
- posts
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
Definitions
- the present invention relates to an information diffusion scale prediction apparatus, an information diffusion scale prediction method, and an information diffusion scale prediction program, and more particularly to an information diffusion scale prediction apparatus that predicts the number of posts in the future for a specific topic on a specific website. .
- SNS Social Network Service
- Patent Document 1 there are already many technologies and services for analyzing the posting status on SNS as described in Non-Patent Document 1. Moreover, as disclosed in Patent Document 1, for each type of website, the influence on other media is estimated by theories such as machine learning and mathematical statistics, and the future posting situation is predicted based on the information. The technology is already known.
- An object of the present invention is an information diffusion scale prediction apparatus and information that can accurately predict the influence of each poster and the number of posts in the future for a specific topic on a website such as SNS. It is to provide a diffusion scale prediction method and an information diffusion scale prediction program.
- an information diffusion scale prediction apparatus obtains text data from a specific website via the Internet, and calculates the number of postings to the website based on this text data in the future.
- An information diffusion scale prediction device that predicts and outputs the prediction result, and classifies the learning text data input unit for acquiring text data as learning text data from a specific website, and the learning text data by topic,
- a node influence that calculates the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs from the classified number of posts by topic and stores the result in a storage means provided in advance as learning data Learning part and text from a specific website after storing learning data
- the text data for prediction as the text data for prediction, the text data for prediction is classified by topic, and the topic is posted at a specific time in the future from the number of posts by topic and learning data
- a future posting number predicting unit that predicts the number of items and outputs the result to an output unit in advance.
- the information diffusion scale prediction method acquires text data from a specific website via the Internet, and based on this text data, calculates the number of posts to the website in the future.
- a learning text data input unit acquires text data as learning text data from a specific website, and the learning text data is acquired as a node influence learning unit.
- the node influence learning unit calculates the influence on the number of posts for each group to which the node indicating a specific user for the topic belongs from the classified number of posts by topic, and the result is used as learning data It is stored in a storage means provided in advance, and after learning data is stored, a specific website is stored.
- the text data is obtained from the text as predictive text data by the predictive text data input unit, and the predictive text data is classified by topic by the future post number predictor, and the topic is determined from the classified number of posts and learning data by topic.
- the future posting number prediction unit predicts the number of postings at a specific time in the future, and the result is output by the future posting number prediction unit to output means provided in advance.
- the information diffusion scale prediction program obtains text data from a specific website via the Internet, and calculates the number of postings to the website based on this text data in the future.
- An information diffusion scale prediction apparatus that predicts and outputs the prediction result, and a computer provided in the information diffusion scale prediction apparatus obtains text data as learning text data from a specific website, learning text data
- a procedure for classifying by topic a storage means provided in advance as learning data by calculating the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs from the number of posts for each classified topic Procedure to remember, learning data after memorizing specific web
- the procedure for obtaining text data as prediction text data from a site, the procedure for classifying prediction text data by topic, the number of posts by classified topic and learning data, and the number of posts at a specific time in the future A procedure for predicting and a procedure for outputting the result to an output means provided in advance are executed.
- the present invention calculates the influence of a specific user on a specific topic from learning text data acquired from a specific website, stores this as learning data, and newly acquires this learning data. Since the number of posts in the future of the specific topic is predicted from the predicted text data, the prediction process can be performed with a data amount that can be realistically calculated.
- a size prediction method and an information diffusion scale prediction program can be provided.
- the information diffusion scale prediction apparatus 10 acquires text data from a specific website via the Internet 20, predicts the number of posts to the website in the future based on this text data, and An information diffusion scale prediction apparatus that outputs a prediction result.
- the information diffusion scale prediction apparatus 10 includes a learning text data input unit 101 that acquires text data as learning text data from a specific website, classifies the learning text data by topic, and the number of posts by classified topic.
- Node influence learning unit node influence learning that calculates the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs, and stores the result in the storage means 12 provided in advance as learning data 110 Unit 102, prediction text data input unit 106 that acquires text data as prediction text data from a specific website after storing learning data, and classifying the prediction text data by topic, and posting by classified topic Future characteristics of the topic from the number of records and learning data It predicts contribution number at the time and a future contribution number prediction unit 107 to be output to the output unit 14 previously provided the results.
- the information diffusion scale prediction apparatus 10 includes a group creation unit 104 that classifies nodes into groups based on information on attributes of each node, and the number of utterances from learning text data and prediction text data for each time and group. It further includes a group / time information totaling unit 103 that performs cross tabulation and outputs the result to the node influence learning unit and the future posting number prediction unit.
- the node influence learning unit 102 indicates the number of messages cross-tabulated for each time and group in the matrix X
- the element xij indicates the number of messages in the group j at the time i in the matrix X
- the sth row from the first row of the matrix X If the submatrix extracted up to the eye is Xs, the sum of the number of utterances at each time is ys, and the value of the density function at x of the Poisson distribution with an average ⁇ is P0 (x, ⁇ ), the group at time s Influence ⁇ s Is obtained as a value for minimizing the numerical value of f (ys, Xs, ⁇ s).
- the node influence learning unit 102 minimizes a numerical value obtained by adding a value obtained by multiplying f (ys, Xs, ⁇ s) by L1 regularization or L2 regularization ⁇ s and a regularization parameter given in advance. As a value in the case of conversion, the group influence ⁇ s at time s at time s is obtained.
- the information diffusion scale prediction device 10 can accurately predict the influence of each contributor and the number of posts in the future for a specific topic. Hereinafter, this will be described in more detail.
- FIG. 1 is an explanatory diagram showing a configuration of an information diffusion scale prediction apparatus 10 according to an embodiment of the present invention.
- the information diffusion scale prediction apparatus 10 has a basic configuration as a computer apparatus. That is, the information diffusion scale prediction apparatus 10 includes a processor 11 that is a main body that executes a computer program, a storage unit 12 that stores data, a communication unit 13 that transmits data to other apparatuses via the Internet 20, Display means 14 for presenting the processing result to the user.
- the main arithmetic control means 11 is executed by a computer program to execute a learning text data input unit 101, a node influence learning unit 102, a group / time information totaling unit 103, a group creation unit 104, and an attribute value input unit 105, which will be described later.
- Each of these units can be configured to be executed by a separate computer device.
- the learning text data input unit 101 acquires text data and accompanying attribute data from the acquisition target website through the communication means 13 and the Internet 20 according to a learning period and a learning interval given in advance. For example, in the case where Twitter is an acquisition target, information about the time of tweeting, the tweeted node, and the topic to which the tweet belongs is acquired at the same time as the tweeted text data. These acquired data are passed to the node influence learning unit 102.
- Node here is defined as “unit for estimating influence” in the present invention. More specifically, it may be a “contributor” unit, or a “media” unit or “operating organization” unit to which the poster belongs.
- the number of tweets (number of utterances), the number of followers, the number of followers, the number of replies for each tweet, and the number of retweets for each tweet.
- Etc. can be used as attributes of the node.
- the content of the posted article itself, for example, “word type” and “appearance frequency” included in the article may be used as the attribute of the node.
- the number of posts may change during the learning data acquisition period (referred to as the learning period).
- the number of posts may change during the learning data acquisition period (referred to as the learning period).
- the number of tweets may change during the learning data acquisition period (referred to as the learning period).
- the number of tweets may change during the learning data acquisition period (referred to as the learning period).
- the maximum number of follow-ups may change during the learning data acquisition period (referred to as the learning period).
- “Maximum value of the number of followers” or the like can be calculated for each contributor and can be used as an attribute of the node.
- the node influence learning unit 102 classifies each utterance for each topic, and for each topic, the node information / time information / text data of the utterances belonging to the topic are group / time information totaling unit 103. Output to. The group / time information totaling unit 103 cross-counts the group / time information for each topic, and returns it to the node influence learning unit 102.
- the node influence learning unit 102 Upon receiving the group / time information cross-tabulated for each topic returned from the group / time totaling unit 103, the node influence learning unit 102 has the influence of each group, and subsequently the influence of each node. Is calculated. Then, the calculated influence of the node is output to the future posting number prediction unit 107.
- the group / time information totaling unit 103 includes attributes of each utterance input from the node influence learning unit 102, for example, in the case of Twitter, node information / time information / text data of tweets belonging to a single topic, and group creation From the information of the group to which the node belongs, which is input from the unit 104, time ⁇ group cross tabulation data is created regarding the number of utterances, and this cross tabulation data is output to the node influence learning unit 102.
- the group creation unit 104 groups each node from the node attribute values input from the attribute value input unit 105, and outputs the group information to the group / time information totaling unit 103.
- the attribute value input unit 105 outputs the node attribute value input from the outside of the apparatus to the group creation unit 104.
- the predicted text data input unit 106 transmits text data from the acquisition target website and attribute data associated therewith via the communication means 13 and the Internet 20 in accordance with a predetermined prediction interval. To get. These acquired data are passed to the future posting number prediction unit 107.
- the future posting number prediction unit 107 receives the data input from the predicted text data input unit 106 and receives the input about the influence of each node from the node influence learning unit 102, and for each topic to which each utterance is classified. , Node information / time information / text data of a message belonging to the topic is output to the group / time information totaling unit 103. The group / time information totaling unit 103 cross-counts the group / time information for each topic and returns it to the future posting number prediction unit 107.
- the future posting number predicting unit 107 calculates a predicted value of the number of future postings and displays it. 14 is displayed.
- This display means 14 may be a computer different from the information diffusion scale prediction apparatus 10.
- “future posting status” refers to how many hours ahead in the future (information diffusion rate) for a specific topic specified in advance (for example, a topic specified as something watched by the observer). This means whether or not there are articles of the number (information diffusion scale) on the target website. In addition, for each posted article, it is assumed that information about the posting source node, the posting time, and which topic (multiple topics are possible) is given.
- the operation of the information diffusion scale prediction apparatus 10 described above is roughly divided into two stages, a “learning phase” and a “prediction phase”. Each of these will be described below. In the following example, it is assumed that all monitored websites are Twitter.
- FIG. 2 is a flowchart showing the operation in the learning phase of the information diffusion scale prediction apparatus 10 shown in FIG.
- the learning text data input unit 101 operates according to a learning period and a learning interval given in advance via the communication unit 13 and the Internet 20, and acquires text data tweeted on Twitter for learning.
- the learning text data input unit 101 operates according to a learning period and a learning interval given in advance via the communication unit 13 and the Internet 20, and acquires text data tweeted on Twitter for learning.
- information about the tweeted time, the tweeted node, and the topic to which the tweet belongs is also acquired. These acquired data are transferred to the node influence learning unit 102 (step S201).
- the node influence learning unit 102 classifies each tweet by topic, and for each topic, the node information / time information / text data of the tweet belonging to the topic is group / time information totaling unit 103.
- the group / time information totaling unit 103 uses the group information to which the node belongs, which is input from the group creation unit 104, and cross-counts the group / time information for each topic and returns it to the node influence learning unit 102. (Step S203).
- the node influence learning unit 102 that has received the cross tabulated group / time information calculates the influence of the group, calculates the influence of the node based on the value, and stores this as learning data 110. 12 to save. (Step S204).
- FIG. 3 is a flowchart showing the operation in the prediction phase of the information diffusion scale prediction apparatus 10 shown in FIG.
- the predicted text data input unit 106 operates according to a given prediction interval, and acquires text data tweeted on Twitter for prediction.
- information about the tweeted time, the tweeted node, and the topic to which the tweet belongs is also acquired.
- These acquired data are transferred to the future posting number prediction unit 107 (step S251).
- the future posting number prediction unit 107 receives these data inputs, classifies each tweet by topic, and group / time information aggregation unit 103 collects node information / time information / text data of the tweet belonging to the topic for each topic. (Step S252).
- the group / time information totaling unit 103 uses the group information to which the node belongs, which is input from the group creation unit 104, and cross-counts the group / time information for each topic and returns it to the future posting number prediction unit 107. (Step S253).
- the group / time information aggregation unit 103, the group creation unit 104, and the attribute value input unit 105 are in the learning phase and the prediction phase. Can be shared.
- the future posting number prediction unit 107 receives this cross tabulated group / time information, the future posting number prediction unit 107 reads the learning data about the influence of the node from the storage unit 12 (step S254), and calculates the predicted value of the number of future postings therefrom. And it displays on the display means 14 (step S255).
- the premise in the example of processing contents shown here is as follows.
- the analysis object is text data tweeted (posted) on Twitter.
- a topic to be predicted for the number of future posts (tweets) is specified in advance.
- For each tweet information related to the “tweeted user”, “date and time of tweet”, and “topic to which the tweet belongs” is obtained.
- For each of the learning phase and the prediction phase a period and a time interval for performing processing for acquiring text data are designated in advance. However, the prediction phase is performed after the end of the learning phase.
- -One user is set as one node.
- “Client software” for each node (user) "Number of tweets during the learning period”"Number of comments during the learning period” * Number of trackbacks * Number of replies * Number of retweets] "Number of follow-up during the learning period” , “The maximum value of the number of followers”.
- estimation and prediction may be performed for each topic by the method described below.
- the node influence learning unit 102 groups each node in the process of step S202 of FIG. 2 from each data acquired by the learning text data input unit 101 in the process of step S201 of FIG. For example, there are the following as viewpoints when performing the grouping. Further, a product set of grouping results for a plurality of attributes can be used as a final grouping result.
- client software being used-The type of "OS (operating system)” that operates the client software-The category (for example, “1 to 100 times", “ (From 101 to 1000 times) or “1001 times or more” category to which category the node belongs) -The category to which "the maximum number of followers within the learning period” belongs (for example, which category the node belongs to among “1 to 1000 people” or "1001 or more people)
- a node having a certain number of tweets can be defined as a single group.
- each node belongs to one or a plurality of groups. This grouping can substantially reduce the number of nodes, which contributes to stabilizing the estimation result of the influence of the nodes.
- the group / time information totaling unit 103 performs the totaling work of which group tweeted at which time at the time shown in step S203 of FIG. 2 and creates a time ⁇ group cross tabulation table regarding the number of tweets. To do.
- This tabulation result (cross tabulation table) can be expressed by the following matrix X. In this matrix X, the row indicates the time and the column indicates the group, and the element xij of the matrix indicates “the number of tweets of group j at time i”.
- the group / time information totaling unit 103 estimates the influence of each node in the process shown in step S204 of FIG.
- the influence of the node is calculated based on the influence of the group.
- the influence of the group is given by the matrix ⁇ shown in Equation 2 below.
- the row of this matrix ⁇ indicates how many times ahead in the aggregate unit time
- the column indicates the group (similar to Equation 1)
- the element ⁇ ij of the matrix is “group j in the future i times ahead”. Defined as the sum of the influences.
- Equation 3 a method of setting the value of ⁇ that minimizes the following Equation 3 as the group influence is mentioned.
- ys shown in Equation 4 is the sum of the number of tweets at each time for all nodes
- Xs is the first row of the matrix X shown in Equation 1.
- ⁇ of the second term ⁇ P ( ⁇ s) is a parameter called a regularization parameter for adjusting the stability of the estimation result. A more specific definition of P ( ⁇ s) will be described later.
- f (ys, Xs, ⁇ s) of the equation 3 is calculated as the following equation 5.
- P0 (x, ⁇ ) is the value of the density function at x in the Poisson distribution with an average ⁇ .
- Equation 6 P ( ⁇ s) in the second term of Equation 3 is calculated as in Equation 6 or Equation 7 below.
- the calculation of Equation 6 is a method called L1 regularization
- the calculation of Equation 7 is a method called L2 regularization.
- Regularization here is a technique used in the field of machine learning and mathematical statistics to obtain a stable estimation result when a sufficient amount of learning data cannot be obtained.
- the second term of Equation 3 may be omitted, and the calculation including no regularization element may be performed.
- the information diffusion scale prediction apparatus 10 performs the operation of the prediction phase shown in FIG. 3 based on the learning data created by the above processing (learning phase).
- the processing from steps S251 to S254 in the prediction phase shown in FIG. 3 is the same as the processing from steps S201 to S204 in the learning phase shown in FIG.
- the future posting number prediction unit 107 cross-tabulates the group / time information for each topic.
- the number of tweets z for each group obtained by this cross tabulation is expressed as the following formula 8.
- the future posting number prediction unit 107 performs the influence shown for step S255 in FIG.
- the number of cases is calculated by the process shown in Equation 9 below.
- the matrix Z representing the number of posts in the latest time zones 1 to A including the time s is defined as the following formula 10.
- the row of this matrix Z indicates how many times ahead the future is based on the total unit time
- the column indicates the group (similar to equation 1)
- the element zij of the matrix is “group j in the future i times ahead”. Is actually calculated by the formula shown in Equation 9.
- the information diffusion scale prediction method obtains text data from a specific website via the Internet, predicts the number of postings to the website in the future based on the text data, and predicts the result.
- the learning text data input unit acquires text data from a specific website as learning text data (step S201 in FIG. 2), and the learning text data is obtained as a node influence.
- the learning unit classifies by topic (FIG. 2, step S202), and the node influence learning unit determines the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs from the number of posts by classified topic.
- the result of calculation is stored in a storage means provided in advance as learning data (FIG. 2).
- Steps S203 to 204) After storing the learning data, the text data is obtained from the specific website as the text data for prediction as the text data for prediction (FIG. 3, Step S251), and the prediction text data is predicted for the number of future posts. Are classified by topic (step S252 in FIG. 3), and the number of future postings at the specific time in the future of the topic is predicted from the number of utterances and learning data for each classified topic, and the result Is output to the output means provided in advance (FIG. 3, steps S253 to 255).
- each of the above-described operation steps may be programmed to be executable by a computer, and may be executed by the processor 11 of the information diffusion scale prediction apparatus 10 that directly executes each of the steps.
- the program may be recorded on a non-temporary recording medium, such as a DVD, a CD, or a flash memory. In this case, the program is read from the recording medium by a computer and executed.
- the influence of a specific user on a specific topic is calculated in the learning phase, stored as learning data, and this learning data and newly acquired text data for prediction are acquired in the prediction phase. It is configured to predict the number of posts in the future for that specific topic.
- This processing can be performed in a regular manner close to real time as long as the processing period and time interval are set for each of the learning phase and the prediction phase.
- the aggregation target is limited to Twitter, but other than this, Facebook, Mixi, or each company's weblog, etc., depending on the nature of each site, each topic and each Node attributes can be set as appropriate, and processing can be performed in the same manner.
- Information diffusion scale prediction device that acquires text data from a specific website via the Internet, predicts the number of posts to the website in the future based on this text data, and outputs the prediction result Because A learning text data input unit for acquiring the text data as learning text data from the specific website; The learning text data is classified by topic, and the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs is calculated from the number of posts for each classified topic, and the result is used as learning data.
- a node influence learning unit for storing in a storage means provided in advance;
- a prediction text data input unit for acquiring the text data as prediction text data from the specific website after storing the learning data;
- the prediction text data is classified by topic, and the number of posts for each classified topic and the learning data are used to predict the number of posts at a specific time in the future, and the result is provided in advance as output means.
- An information diffusion scale prediction apparatus comprising: a future posting number prediction unit for outputting.
- the node influence learning unit uses the matrix X to indicate the number of posts cross-tabulated for each time and the group, the element xij, the number of posts in the group j at the time i of the matrix X, 1 of the matrix X
- Xs be the submatrix extracted from the sth row to the sth row
- ys be the sum of the number of posts at all times for all nodes
- P0 (x, ⁇ ) be the density function value at x in the Poisson distribution with mean ⁇ .
- the information diffusion scale prediction apparatus according to appendix 2, characterized in that it is obtained as a value for minimizing the numerical value of f (ys, Xs, ⁇ s) indicated by
- the node influence learning unit adds a value obtained by multiplying the f (ys, Xs, ⁇ s) by L1 regularized or L2 regularized ⁇ s and a regularization parameter given in advance. 4.
- An information diffusion scale prediction device that acquires text data from a specific website via the Internet, predicts the number of postings to the website in the future based on the text data, and outputs the prediction result There,
- the learning text data input unit obtains the text data as learning text data from the specific website
- the node influence learning unit classifies the learning text data by topic
- a node influence learning unit calculates the influence on the number of posts for each group to which a node indicating a specific user for the topic belongs from the classified number of posts by topic, and the result is stored in advance as learning data Memorize in the means
- the prediction text data input unit obtains the text data as prediction text data from the specific website,
- the prediction text data is categorized by topic by the future posting number prediction unit, Based on the classified number of posts by topic and the learning data, the future post number prediction unit predicts the number of posts at a specific time in the future of the topic,
- An information diffusion scale prediction method wherein the future posting number prediction unit outputs the result to an
- Information diffusion scale prediction device that obtains text data from a specific website via the Internet, predicts the number of future posts to the website based on the text data, and outputs the prediction result There,
- a procedure for acquiring the text data as learning text data from the specific website A procedure for classifying the learning text data by topic;
- a procedure for acquiring the text data as prediction text data from the specific website after storing the learning data A procedure for classifying the text data for prediction by topic;
- a step of predicting the number of posts at a specific time in the future of the topic from the classified number of posts by topic and the learning data And an information diffusion scale prediction program for executing a procedure for outputting the result to an output means provided in advance.
- the present invention can be applied to information diffusion scale prediction technology used for corporate crisis management and marketing research.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
以下、本発明の実施形態の構成について添付図1に基づいて説明する。
最初に、本実施形態の基本的な内容について説明し、その後でより具体的な内容について説明する。本実施形態に係る情報拡散規模予測装置10は、インターネット20を介して特定のウェブサイトからテキストデータを取得し、このテキストデータを基にして該ウェブサイトへの未来における投稿件数を予測してその予測結果を出力する情報拡散規模予測装置である。この情報拡散規模予測装置10は、特定のウェブサイトからテキストデータを学習用テキストデータとして取得する学習テキストデータ入力部101と、学習用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの投稿件数に対する影響力を算出してその結果を学習データ110として予め備えられた記憶手段12に記憶するノード影響力学習部ノード影響力学習部102と、学習データを記憶した後に特定のウェブサイトからテキストデータを予測用テキストデータとして取得する予測テキストデータ入力部106と、予測用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数と学習データとから該トピックの未来の特定の時刻における投稿件数を予測してその結果を備えられた予め出力手段14に出力する未来投稿件数予測部107とを有する。
で示されるf(ys,Xs,βs)の数値を最小化する場合の値として求める。
以下、これをより詳細に説明する。
以下、図2~3で示した情報拡散規模予測装置10による学習フェーズおよび予測フェーズの各処理の内容を、より詳細に説明する。ここから示す処理内容の例における前提は、以下の通りである。
・分析対象はツイッター上にツイート(投稿)されたテキストデータである。
・未来の投稿件数(ツイート数)の予測対象となるトピックが予め指定されている。
・各々のツイートについて、当該ツイートの「ツイートしたユーザ」「ツイートされた日時」「そのツイートが属するトピック」に関する情報が得られている。
・学習フェーズおよび予測フェーズの各々について、テキストデータを取得する処理を行う期間および時間間隔が予め指定されている。ただし、予測フェーズは学習フェーズの終了後に行われる。
・1ユーザ(投稿者)を1ノードとする。
・各ノード(ユーザ)ごとの「クライアントソフト」「学習期間内のツイート回数」「学習期間内でのコメント数・トラックバック数・リプライ数・リツイート数の各平均値」「学習期間内でのフォロー数、フォロワー数の最大値」が予め得られている。
・使用している「クライアントソフト」の種類
・そのクライアントソフトを動作させている「OS(オペレーティングシステム)」の種類
・「学習期間内のツイート回数」の属する区分(たとえば「1~100回」「101~1000回」「1001回以上」の区分のうち、当該ノードがいずれの区分に属するか)
・「学習期間内でのフォロワー数の最大値」の属する区分(たとえば「1~1000名」「1001名以上」の区分のうち、当該ノードがいずれの区分に属するか)
次に、上記の実施形態の全体的な動作について説明する。
本実施形態に係る情報拡散規模予測方法は、インターネットを介して特定のウェブサイトからテキストデータを取得し、このテキストデータを基にして該ウェブサイトへの未来における投稿件数を予測してその予測結果を出力する情報拡散規模予測装置10にあって、特定のウェブサイトからテキストデータを学習用テキストデータとして学習テキストデータ入力部が取得し(図2・ステップS201)、学習用テキストデータをノード影響力学習部がトピック別に分類し(図2・ステップS202)、分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの投稿件数に対する影響力をノード影響力学習部が算出してその結果を学習データとして予め備えられた記憶手段に記憶し(図2・ステップS203~204)、学習データを記憶した後に特定のウェブサイトからテキストデータを予測用テキストデータとして予測テキストデータ入力部が取得し(図3・ステップS251)、予測用テキストデータを未来投稿件数予測部がトピック別に分類し(図3・ステップS252)、分類されたトピック別の発言件数と学習データとから該トピックの未来の特定の時刻における投稿件数を未来投稿件数予測部が予測し、その結果を予め備えられた出力手段に未来投稿件数予測部が出力する(図3・ステップS253~255)。
この動作により、本実施形態は以下のような効果を奏する。
以上で示した処理内容の例では、集計対象をツイッターに限定したが、これ以外にもフェイスブックやミクシイ、あるいは各社のウェブログなどに対しても、各サイトの性質に応じて各トピックや各ノードの属性を適宜設定して、同様の方法で処理していくことができる。
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして取得する学習テキストデータ入力部と、
前記学習用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力を算出してその結果を学習データとして予め備えられた記憶手段に記憶するノード影響力学習部と、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして取得する予測テキストデータ入力部と、
前記予測用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を予測してその結果を備えられた予め出力手段に出力する未来投稿件数予測部と
を有することを特徴とする情報拡散規模予測装置。
前記学習用テキストデータおよび前記予測用テキストデータから投稿数に関して時刻および前記グループごとにクロス集計してその結果を前記ノード影響力学習部および前記未来投稿件数予測部に出力するグループ・時刻情報集計部と
を有することを特徴とする、付記1に記載の情報拡散規模予測装置。
で示されるf(ys,Xs,βs)の数値を最小化する場合の値として求めることを特徴とする、付記2に記載の情報拡散規模予測装置。
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして学習テキストデータ入力部が取得し、
前記学習用テキストデータをノード影響力学習部がトピック別に分類し、
分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力をノード影響力学習部が算出してその結果を学習データとして予め備えられた記憶手段に記憶し、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして予測テキストデータ入力部が取得し、
前記予測用テキストデータを未来投稿件数予測部がトピック別に分類し、
分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を未来投稿件数予測部が予測し、
その結果を予め備えられた出力手段に前記未来投稿件数予測部が出力する
ことを特徴とする情報拡散規模予測方法。
前記情報拡散規模予測装置の備えるコンピュータに、
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして取得する手順、
前記学習用テキストデータをトピック別に分類する手順、
分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力を算出してその結果を学習データとして予め備えられた記憶手段に記憶する手順、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして取得する手順、
前記予測用テキストデータをトピック別に分類する手順、
分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を予測する手順、
およびその結果を予め備えられた出力手段に出力する手順
を実行させることを特徴とする情報拡散規模予測プログラム。
11 プロセッサ
12 記憶手段
13 通信手段
14 表示手段
20 インターネット
101 学習テキストデータ入力部
102 ノード影響力学習部
103 グループ・時刻情報集計部
104 グループ作成部
105 属性値入力部
106 予測テキストデータ入力部
107 未来投稿件数予測部
110 学習データ
Claims (6)
- インターネットを介して特定のウェブサイトからテキストデータを取得し、このテキストデータを基にして該ウェブサイトへの未来における投稿件数を予測してその予測結果を出力する情報拡散規模予測装置であって、
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして取得する学習テキストデータ入力部と、
前記学習用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力を算出してその結果を学習データとして予め備えられた記憶手段に記憶するノード影響力学習部と、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして取得する予測テキストデータ入力部と、
前記予測用テキストデータをトピック別に分類し、分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を予測してその結果を備えられた予め出力手段に出力する未来投稿件数予測部と
を有することを特徴とする情報拡散規模予測装置。 - 前記各ノードの属性に関する情報に基づいて前記ノードを前記グループに分類するグループ作成部と、
前記学習用テキストデータおよび前記予測用テキストデータから投稿数に関して時刻および前記グループごとにクロス集計してその結果を前記ノード影響力学習部および前記未来投稿件数予測部に出力するグループ・時刻情報集計部と
を有することを特徴とする、請求項1に記載の情報拡散規模予測装置。 - 前記ノード影響力学習部が、前記f(ys,Xs,βs)に、L1正則化もしくはL2正則化されたβsと、予め与えられた正則化パラメータとを乗算した値を加算した数値を最小化する場合の値として時刻sにおける前記グループの影響力βsを求めることを特徴とする、請求項3に記載の情報拡散規模予測装置。
- インターネットを介して特定のウェブサイトからテキストデータを取得し、このテキストデータを基にして該ウェブサイトへの未来における投稿件数を予測してその予測結果を出力する情報拡散規模予測装置にあって、
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして学習テキストデータ入力部が取得し、
前記学習用テキストデータをノード影響力学習部がトピック別に分類し、
分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力をノード影響力学習部が算出してその結果を学習データとして予め備えられた記憶手段に記憶し、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして予測テキストデータ入力部が取得し、
前記予測用テキストデータを未来投稿件数予測部がトピック別に分類し、
分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を未来投稿件数予測部が予測し、
その結果を予め備えられた出力手段に前記未来投稿件数予測部が出力する
ことを特徴とする情報拡散規模予測方法。 - インターネットを介して特定のウェブサイトからテキストデータを取得し、このテキストデータを基にして該ウェブサイトへの未来における投稿件数を予測してその予測結果を出力する情報拡散規模予測装置にあって、
前記情報拡散規模予測装置の備えるコンピュータに、
前記特定のウェブサイトから前記テキストデータを学習用テキストデータとして取得する手順、
前記学習用テキストデータをトピック別に分類する手順、
分類されたトピック別の投稿件数から当該トピックに対する特定のユーザを示すノードの属するグループごとの前記投稿件数に対する影響力を算出してその結果を学習データとして予め備えられた記憶手段に記憶する手順、
前記学習データを記憶した後に前記特定のウェブサイトから前記テキストデータを予測用テキストデータとして取得する手順、
前記予測用テキストデータをトピック別に分類する手順、
分類されたトピック別の投稿件数と前記学習データとから該トピックの未来の特定の時刻における投稿件数を予測する手順、
およびその結果を予め備えられた出力手段に出力する手順
を実行させることを特徴とする情報拡散規模予測プログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/824,122 US8983880B2 (en) | 2011-11-18 | 2012-11-01 | Information spread scale prediction device, information spread scale prediction method, and information spread scale prediction program |
JP2013511458A JP5282857B1 (ja) | 2011-11-18 | 2012-11-01 | 情報拡散規模予測装置、情報拡散規模予測方法、および情報拡散規模予測プログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011252311 | 2011-11-18 | ||
JP2011-252311 | 2011-11-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013073377A1 true WO2013073377A1 (ja) | 2013-05-23 |
Family
ID=48429444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/078292 WO2013073377A1 (ja) | 2011-11-18 | 2012-11-01 | 情報拡散規模予測装置、情報拡散規模予測方法、および情報拡散規模予測プログラム |
Country Status (3)
Country | Link |
---|---|
US (1) | US8983880B2 (ja) |
JP (1) | JP5282857B1 (ja) |
WO (1) | WO2013073377A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016535344A (ja) * | 2013-08-09 | 2016-11-10 | フェイスブック,インク. | 対話履歴に基づくユーザ体験インターフェースまたはユーザ・インターフェース |
JP2019079474A (ja) * | 2017-10-27 | 2019-05-23 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | サイト改善装置、サイト改善方法およびサイト改善プログラム |
JP7061328B1 (ja) | 2021-07-30 | 2022-04-28 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
JP7182819B1 (ja) | 2021-07-30 | 2022-12-05 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150309965A1 (en) * | 2014-04-28 | 2015-10-29 | Elwha Llc | Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis |
KR101628738B1 (ko) * | 2014-10-29 | 2016-06-09 | (주)타파크로스 | 학습형 룰베이스 방식의 부정적 이슈 감지 방법 및 시스템 |
WO2017023322A1 (en) * | 2015-08-06 | 2017-02-09 | Hewlett Packard Enterprise Development Lp | Influence spread maximization in social networks |
US10430451B2 (en) * | 2016-02-22 | 2019-10-01 | Arie Rota | System and method for aggregating and sharing accumulated information |
CN106845022A (zh) * | 2017-03-01 | 2017-06-13 | 邯郸市气象局 | 基于风险扩散机理的气象灾害风险评估方法 |
US10687206B2 (en) * | 2018-01-30 | 2020-06-16 | Hewlett Packard Enterprise Development Lp | Response messages including information elements not indicated as requested |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009116342A1 (ja) | 2008-03-18 | 2009-09-24 | 日本電気株式会社 | 動的トピック分析システム、動的トピック分析方法および動的トピック分析プログラムを記録した媒体 |
-
2012
- 2012-11-01 WO PCT/JP2012/078292 patent/WO2013073377A1/ja active Application Filing
- 2012-11-01 JP JP2013511458A patent/JP5282857B1/ja not_active Expired - Fee Related
- 2012-11-01 US US13/824,122 patent/US8983880B2/en not_active Expired - Fee Related
Non-Patent Citations (4)
Title |
---|
FUMI YAMAZAKI: "Twitter Marketing Katsuyo Kotohajime", NIKKEI NETMARKETING, 25 May 2010 (2010-05-25), pages 44 - 47 * |
KAZUKI YOSHIMOTO ET AL.: "Micro Blog ni Okeru Tasha eno Eikyo o Koryo shita Tokosha no Juyodo Suitei Shuho", DAI 2 KAI FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT -DEIM 2010- RONBUNSHU, 9 June 2010 (2010-06-09), pages 1 - 8 * |
KYOSUKE NISHIDA ET AL.: "Tweet-Topic Classification using Data Compression", DBSJ JOURNAL, vol. 10, no. 1, 24 June 2011 (2011-06-24), pages 1 - 6 * |
YUYA YOSHIKAWA ET AL.: "Estimating Method of Expected Influence Curve from Single Diffusion Sequence on Social Networks", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J94-D, no. 11, 1 November 2011 (2011-11-01), pages 1899 - 1908 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016535344A (ja) * | 2013-08-09 | 2016-11-10 | フェイスブック,インク. | 対話履歴に基づくユーザ体験インターフェースまたはユーザ・インターフェース |
JP2019079474A (ja) * | 2017-10-27 | 2019-05-23 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | サイト改善装置、サイト改善方法およびサイト改善プログラム |
JP7009160B2 (ja) | 2017-10-27 | 2022-01-25 | エヌ・ティ・ティ・コミュニケーションズ株式会社 | サイト改善装置、サイト改善方法およびサイト改善プログラム |
JP7061328B1 (ja) | 2021-07-30 | 2022-04-28 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
JP7182819B1 (ja) | 2021-07-30 | 2022-12-05 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
JP2023020366A (ja) * | 2021-07-30 | 2023-02-09 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
JP2023020864A (ja) * | 2021-07-30 | 2023-02-09 | 株式会社Jx通信社 | 情報処理装置、情報処理システムおよびプログラム |
Also Published As
Publication number | Publication date |
---|---|
JP5282857B1 (ja) | 2013-09-04 |
US20140244551A1 (en) | 2014-08-28 |
US8983880B2 (en) | 2015-03-17 |
JPWO2013073377A1 (ja) | 2015-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5282857B1 (ja) | 情報拡散規模予測装置、情報拡散規模予測方法、および情報拡散規模予測プログラム | |
Ye et al. | Closed-form estimators for the gamma distribution derived from likelihood equations | |
Miller et al. | Extensions of the Johnson-Neyman technique to linear models with curvilinear effects: Derivations and analytical tools | |
US9123055B2 (en) | Generating and displaying customer commitment framework data | |
US10846613B2 (en) | System and method for measuring and predicting content dissemination in social networks | |
Zhang et al. | Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression | |
JP2017142796A (ja) | 情報の特定及び抽出 | |
Alizadeh Noughabi et al. | Monte Carlo comparison of five exponentiality tests using different entropy estimates | |
US20160034553A1 (en) | Hybrid aggregation of data sets | |
TW201719569A (zh) | 社交業務特徵用戶的識別方法和裝置 | |
Chakrabarty et al. | Compounded inverse Weibull distributions: Properties, inference and applications | |
JP5814303B2 (ja) | 収益指標値生成システム及び収益指標値生成方法 | |
US20170286975A1 (en) | Data Infrastructure and Method for Estimating Influence Spread in Social Networks | |
Yılancı et al. | The causality relationship between trade and environment in G7 countries: evidence from dynamic symmetric and asymmetric bootstrap panel causality tests | |
Gaidai et al. | Singapore COVID-19 data cross-validation by the Gaidai reliability method | |
Chávez et al. | A threshold GARCH model for Chilean economic uncertainty | |
Chen et al. | Forecasting tourism demand of tourist attractions during the COVID-19 pandemic | |
Xiao et al. | Convergence and stability of numerical methods with variable step size for stochastic pantograph differential equations | |
CN110209944B (zh) | 一种股票分析师推荐方法、装置、计算机设备和存储介质 | |
Kirichenko et al. | Probabilistic Machine Learning Methods for Fractional Brownian Motion Time Series Forecasting | |
JP2018077671A (ja) | 情報処理装置、情報処理方法、予測モデルの生成装置、予測モデルの生成方法、およびプログラム | |
JP6062514B2 (ja) | 収益指標値生成システム及び収益指標値生成方法 | |
CN114925275A (zh) | 产品推荐方法、装置、计算机设备及存储介质 | |
Biswas et al. | Spatial estimation and rescaled spatial bootstrap approach for finite population | |
Fedorova et al. | Queueing System with Two Phases of Service and Service Rate Degradation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2013511458 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13824122 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12850598 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12850598 Country of ref document: EP Kind code of ref document: A1 |