JP7187865B2

JP7187865B2 - Content evaluation device

Info

Publication number: JP7187865B2
Application number: JP2018139295A
Authority: JP
Inventors: 竜示狩野; 康秀三浦; 智子大熊
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2018-07-25
Filing date: 2018-07-25
Publication date: 2022-12-13
Anticipated expiration: 2038-07-25
Also published as: JP2020017054A

Description

本発明は、コンテンツ評価装置に関する。 The present invention relates to a content evaluation device.

ソーシャルメディアの発展に伴い、ネット上で行き交う情報は膨大なものとなっており、これら膨大な情報の中から有益な情報を見出すことがますます重要となっている。 With the development of social media, a huge amount of information is exchanged on the Internet, and it is becoming more and more important to find useful information from this huge amount of information.

特許文献１には、通信ネットワーク内で影響力がある／人気がある参加者を判定する方法が記載されている。グループ内の参加者をノードとして表現し、参加者間で交換されたメッセージをリンクとして表現するステップと、メッセージを分析して各ノードのメッセージ関連データを生成するステップと、各ノードのメッセージ関連データをネットワーク内の他のノードに伝播させ、各ノードに対応する参加者のソーシャルアクティビティ又はコネクションの相対的サイズ及び強度を示す、各ノードに関する対応する影響値を生成するステップを有する。 US Pat. No. 6,200,000 describes a method for determining influential/popular participants in a communication network. representing participants in the group as nodes and representing messages exchanged between the participants as links; analyzing the messages to generate message-related data for each node; and message-related data for each node. to other nodes in the network to generate a corresponding influence value for each node that indicates the relative size and strength of the social activity or connection of the participant corresponding to each node.

特許文献２には、ＳＮＳなどで、特定トピックに対しての未来における投稿数を的確に予測することを可能とする装置が記載されている。装置は、特定のウェブサイトから学習用テキストデータを取得する学習テキストデータ入力部と、トピック別の発言件数から、当該トピックに対する単数の特定のユーザを示すノードの属するグループごとの発言件数に対する影響力を算出してこれを学習データとして記憶するノード影響力学習部と、学習データを記憶した後に特定のウェブサイトから予測用テキストデータを取得する予測テキストデータ入力部と、トピック別の発言件数と学習データとから該トピックの未来の特定の時刻における投稿件数を予測して出力する未来投稿件数予測部を有する。 Patent Literature 2 describes a device capable of accurately predicting the future number of posts on a specific topic on an SNS or the like. The device has a learning text data input unit that acquires text data for learning from a specific website, and an influence on the number of comments for each group to which a node indicating a single specific user for the topic belongs, based on the number of comments for each topic. a node influence learning unit that calculates and stores this as learning data, a prediction text data input unit that acquires text data for prediction from a specific website after storing the learning data, the number of remarks by topic and learning and a future post number prediction unit that predicts the number of posts on the topic at a specific time in the future based on data and outputs the result.

特許文献３には、ユーザの注目を惹くか否かという観点において任意の画像を適切に評価することができる装置が記載されている。装置は、第１の画像に対するテキストを入力して、当該テキストの数を集計する第１の集計手段と、第１の画像の特徴量を示す情報を取得する第１の特徴量取得手段と、第１の特徴量取得手段によって取得された特徴量を説明変数に対応する情報、及び第１の集計手段によって集計された数に基づく情報を目的変数に対応する情報とした学習用データを用いて機械学習を実行して、任意の画像の特徴量を入力としてスコアを出力する学習モデルを生成する学習モデル生成手段と、第２の画像を複数取得する画像取得手段と、第２の画像の特徴量を示す情報を取得する第２の特徴量取得手段と、学習モデル生成手段によって生成された学習モデルに、第２の特徴量取得手段によって取得された特徴量を説明変数として入力して、当該学習モデルの出力から第２の画像のスコアを算出するスコア算出手段と、第２の画像に対するテキストを取得するテキスト取得手段と、第２の画像と、テキスト取得手段によって取得されたテキストとに基づいて、第２の画像同士が類似するか否かを判定する類似判定手段と、類似判定手段によって類似すると判定された複数の第２の画像について、スコア算出手段によって算出されたスコアに基づいて、当該複数の第２の画像を代表する代表画像を決定する代表画像決定手段を備える。 Patent Literature 3 describes an apparatus capable of appropriately evaluating an arbitrary image in terms of whether or not it attracts the user's attention. The apparatus includes: first counting means for inputting text for a first image and counting the number of the texts; first feature amount acquiring means for acquiring information indicating the feature amount of the first image; Using learning data in which the feature amount acquired by the first feature amount acquisition means is information corresponding to the explanatory variable and information based on the number aggregated by the first aggregation means is information corresponding to the objective variable Learning model generation means for executing machine learning to generate a learning model that outputs a score with a feature amount of an arbitrary image as input; image acquisition means for acquiring a plurality of second images; and features of the second image. The feature quantity acquired by the second feature quantity acquisition means is input as an explanatory variable to the learning model generated by the second feature quantity acquisition means for acquiring information indicating the quantity and the learning model generation means, and the Based on the score calculation means for calculating the score of the second image from the output of the learning model, the text acquisition means for acquiring the text for the second image, the second image, and the text acquired by the text acquisition means Based on the scores calculated by the score calculating means for the similarity determining means for determining whether or not the second images are similar, and the plurality of second images determined to be similar by the similarity determining means, A representative image determination means is provided for determining a representative image representing the plurality of second images.

特許文献４には、一般的なニュース記事のような予測対象コンテンツであっても、将来的なコメント数を予測することによって、不特定多数のユーザにおける将来的な興味の傾向を分析することができるサーバが記載されている。サーバは、コンテンツ毎に、時間経過に応じた各単位時間のコメント数の推移状態を、学習情報として予め記憶した学習情報記憶手段と、サーバから取得された、予測対象コンテンツに対応するコメントについて、時間経過に応じた各単位時間のコメント数を計数する予測対象コメント検索手段と、計数された各単位時間のコメント数の推移状態に類似する、コメント数の推移状態のコンテンツを、学習情報記憶手段から検索する判定時間検索手段と、検索されたコンテンツに対応する判定時間後のコメント数の推移状態を、当該予測対象コンテンツにおける将来的なコメント数の推移状態として導出するコメント数予測手段を有する。 In Patent Document 4, it is possible to analyze the future interest trends of an unspecified number of users by predicting the number of comments in the future even for prediction target content such as general news articles. A server that can be used is described. The server comprises learning information storage means for pre-storing, as learning information, the transition state of the number of comments per unit time according to the passage of time for each content; Prediction target comment search means for counting the number of comments in each unit time according to the passage of time, and learning information storage means for storing contents of the transition state of the number of comments similar to the transition state of the counted number of comments in each unit time. and a comment number prediction means for deriving the transition state of the number of comments after the judgment time corresponding to the searched content as the future transition state of the number of comments in the prediction target content.

特許第５０３５６４２号Patent No. 5035642 国際公開第１３／０７３３７７号WO 13/073377 特許第６０３３６９７号Patent No. 6033697 特許第５９５２７１１号Patent No. 5952711

ソーシャルメディアにおける膨大な情報の中から有益な情報を抽出するためには、ソーシャルメディアに投稿されたメッセージや画像等のユーザの反応度（人気度）を評価することが有効である。但し、ソーシャルメディアに投稿されたメッセージや画像等のユーザの反応度は、投稿したユーザ情報や投稿時間、投稿速度等により影響を受けるため、どのような内容（コンテンツ）が反応度に有効かを判別することは困難である。 In order to extract useful information from a huge amount of information on social media, it is effective to evaluate user reaction (popularity) of messages and images posted on social media. However, since the response level of users to messages, images, etc. posted on social media is affected by the posted user information, posting time, posting speed, etc., what kind of content (content) is effective for response level It is difficult to tell.

本発明は、ソーシャルメディアに対する投稿のコンテンツのみの反応度（人気度）を評価し得る技術を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide a technique capable of evaluating the degree of reaction (popularity) of only content posted to social media.

請求項１に記載の発明は、ソーシャルメディアに対する投稿から属性特徴を抽出する属性特徴抽出手段と、前記属性特徴から属性スコアを算出する属性スコア算出手段と、前記投稿からコンテンツ特徴を抽出するコンテンツ特徴抽出手段と、前記コンテンツ特徴からコンテンツスコアを算出し、前記属性スコアとは独立に出力するコンテンツスコア算出手段と、前記属性スコア及び前記コンテンツスコアを用いて前記投稿の反応度を算出して出力する反応度算出手段とを備え、学習時には前記ソーシャルメディアに対する投稿及び前記投稿に対する反応度のデータセットを用いて前記属性特徴抽出手段及び前記コンテンツ特徴抽出手段を学習し、評価時には評価対象のコンテンツを学習済みの前記コンテンツ特徴抽出手段に供給して前記コンテンツスコアを算出して出力するコンテンツ評価装置である。 The invention according to claim 1 comprises attribute feature extracting means for extracting attribute features from posts on social media, attribute score calculating means for calculating attribute scores from the attribute features, and content feature extracting content features from the posts. an extracting means, a content score calculating means for calculating a content score from the content features and outputting it independently of the attribute score, and calculating and outputting a reaction degree of the post using the attribute score and the content score. and a reactivity calculating means for learning the attribute feature extracting means and the content feature extracting means using a data set of posts to the social media and the reactivity to the posts during learning, and learning contents to be evaluated during evaluation. It is a content evaluation device that supplies the content feature extraction means that has been completed, calculates the content score, and outputs the content score .

請求項２に記載の発明は、前記評価時には前記ソーシャルメディアに対する投稿のアノテートスコアを用いて評価する請求項１に記載のコンテンツ評価装置である。 The invention according to claim 2 is the content evaluation apparatus according to claim 1 , wherein the evaluation is performed using an annotated score of the post on the social media.

請求項３に記載の発明は、前記コンテンツ特徴抽出手段は、ＬＳＴＭまたはＦＮＮを用いて前記コンテンツ特徴を抽出する請求項１、２のいずれかに記載のコンテンツ評価装置である。 The invention according to claim 3 is the content evaluation apparatus according to any one of claims 1 and 2 , wherein the content feature extraction means extracts the content features using LSTM or FNN.

請求項４に記載の発明は、前記属性特徴抽出手段は、多層パーセプトロンを用いて前記属性特徴を抽出する請求項１～３のいずれかに記載のコンテンツ評価装置である。 The invention according to claim 4 is the content evaluation apparatus according to any one of claims 1 to 3 , wherein the attribute feature extracting means extracts the attribute feature using a multi-layer perceptron.

請求項５に記載の発明は、前記反応度算出手段は、前記属性スコア及び前記コンテンツスコアを乗算して前記投稿の反応度を算出する請求項１～４のいずれかに記載のコンテンツ評価装置である。 The invention according to claim 5 is the content evaluation device according to any one of claims 1 to 4 , wherein the reaction degree calculation means calculates the reaction degree of the post by multiplying the attribute score and the content score. be.

請求項６に記載の発明は、前記反応度算出手段は、前記属性スコア及び前記コンテンツスコアを乗算し、さらに前記属性特徴をバイアスとして加算して前記投稿の反応度を算出する請求項１～４のいずれかに記載のコンテンツ評価装置である。 In the invention according to claim 6 , the reaction degree calculation means calculates the reaction degree of the post by multiplying the attribute score and the content score and adding the attribute feature as a bias. 3. The content evaluation device according to any one of

請求項７に記載の発明は、ソーシャルメディアに対する投稿の属性から属性スコアを算出する属性スコア算出手段と、前記属性スコア算出手段とは独立して前記投稿のコンテンツからコンテンツスコアを算出するコンテンツスコア算出手段と、前記コンテンツスコアと、前記属性スコア及び前記コンテンツスコアを用いて算出された前記投稿の反応度の少なくともいずれかを出力する出力手段とを備えるコンテンツ評価装置である。 The invention according to claim 7 comprises attribute score calculation means for calculating an attribute score from attributes of a post on social media, and content score calculation for calculating a content score from the content of the post independently of the attribute score calculation means. means, and an output means for outputting at least one of the content score and the post reaction level calculated using the attribute score and the content score.

請求項１，７に記載の発明によれば、ソーシャルメディアに対する投稿のコンテンツのみの反応度（人気度）を評価し得る。また、評価時においてコンテンツスコアを算出して出力できる。 According to the inventions described in claims 1 and 7 , it is possible to evaluate the degree of reaction (degree of popularity) of only the content posted to social media. Also, a content score can be calculated and output at the time of evaluation.

請求項２に記載の発明によれば、さらに、アノテートスコアを用いて評価され得る。 According to the second aspect of the invention, the evaluation can be further performed using an annotation score.

請求項３，４に記載の発明によれば、さらに、ＬＳＴＭ等のニューラルネットワークを用いて学習され得る。 According to the third and fourth aspects of the invention, learning can be further performed using a neural network such as LSTM.

請求項５，６に記載の発明によれば、さらに、それぞれ独立に算出される属性スコアとコンテンツスコアの乗算を用いて投稿の反応度が算出され得る。 According to the inventions of claims 5 and 6, the post reactivity can be calculated by multiplying the independently calculated attribute score and content score.

実施形態の構成ブロック図である。1 is a configuration block diagram of an embodiment; FIG. 投稿の属性及びコンテンツと反応度との関係を示す模式図（その１）である。FIG. 3 is a schematic diagram (part 1) showing the relationship between post attributes and content and reactivity; 投稿の属性及びコンテンツと反応度との関係を示す模式図（その２）である。FIG. 10 is a schematic diagram (part 2) showing the relationship between the attribute and content of a post and the degree of reaction; 実施形態の機能ブロック図である。It is a functional block diagram of an embodiment. 実施形態のアノテートスコアの精度を示すテーブルである。It is a table which shows the accuracy of the annotation score of embodiment. 実施形態のKarmaScoreの精度を示すテーブルである。4 is a table showing the accuracy of KarmaScore of the embodiment. 実施形態のコンテンツスコア出力説明図である。FIG. 4 is an explanatory diagram of content score output according to the embodiment; 実施形態の投稿の属性及びコンテンツと反応度との関係を示す模式図（その１）である。FIG. 4 is a schematic diagram (Part 1) showing the relationship between the attribute and content of posts and the degree of reactivity according to the embodiment; 実施形態の投稿の属性及びコンテンツと反応度との関係を示す模式図（その２）である。FIG. 2 is a schematic diagram (part 2) showing the relationship between post attributes and content and reactivity according to the embodiment; 実施形態の処理フローチャートである。It is a processing flowchart of the embodiment. 実施形態の学習と評価の説明図である。It is explanatory drawing of learning and evaluation of embodiment. 実施形態のコンテンツとコンテンツスコアの対応関係を示すテーブル（その１）である。FIG. 10 is a table (part 1) showing the correspondence between content and content scores according to the embodiment; FIG. 実施形態のコンテンツとコンテンツスコアの対応関係を示すテーブル（その２）である。2 is a table (part 2) showing correspondence between content and content scores according to the embodiment; 変形例の機能ブロック図である。It is a functional block diagram of a modification.

以下、図面に基づき本発明の実施形態について説明する。 An embodiment of the present invention will be described below with reference to the drawings.

図１は、本実施形態におけるコンテンツ評価装置の構成ブロック図を示す。コンテンツ評価装置は、１又は複数のプロセッサ１０、プログラムメモリ１２、記憶装置１４、入出力インターフェイス（Ｉ／Ｆ）１６、通信Ｉ／Ｆ１８、入力装置２０、及び出力装置２２を備えて構成される。コンテンツ評価装置は、１又は複数のプロセッサ及びメモリを備える汎用コンピュータで構成され得る。 FIG. 1 shows a configuration block diagram of a content evaluation device according to this embodiment. The content evaluation device comprises one or more processors 10 , program memory 12 , storage device 14 , input/output interface (I/F) 16 , communication I/F 18 , input device 20 and output device 22 . A content evaluation device may consist of a general-purpose computer with one or more processors and memory.

１又は複数のプロセッサは、プログラムメモリ１２に記憶された処理プログラムを読み出して実行することで各種機能を実行する。１又は複数のプロセッサ１０は、機能ブロックとして、属性特徴抽出部１０ａ、属性スコア算出部１０ｂ、コンテンツ特徴抽出部１０ｃ、コンテンツスコア算出部１０ｄ、及び反応度算出部１０ｅを備える。 One or more processors perform various functions by reading and executing the processing programs stored in the program memory 12 . The one or more processors 10 include, as functional blocks, an attribute feature extraction unit 10a, an attribute score calculation unit 10b, a content feature extraction unit 10c, a content score calculation unit 10d, and a reactivity calculation unit 10e.

属性特徴抽出部１０ａは、ソーシャルメディアに対して投稿されるメッセージや画像（静止画及び動画を含む）の属性（コンテクスト）の特徴を抽出する。 The attribute feature extraction unit 10a extracts attributes (context) features of messages and images (including still images and moving images) posted on social media.

属性スコア算出部１０ｂは、属性特徴抽出部１０ａで抽出された属性特徴を用いて属性スコアを算出する。 The attribute score calculator 10b calculates attribute scores using the attribute features extracted by the attribute feature extractor 10a.

コンテンツ特徴抽出部１０ｃは、ソーシャルメディアに対して投稿されるメッセージや画像のコンテンツ特徴を抽出する。 The content feature extraction unit 10c extracts content features of messages and images posted on social media.

コンテンツスコア算出部１０ｄは、コンテンツ特徴抽出部１０ｃで抽出されたコンテンツ特徴を用いてコンテンツスコアを算出する。 The content score calculator 10d calculates a content score using the content features extracted by the content feature extractor 10c.

反応度算出部１０ｅは、属性スコア算出部１０ｂで算出された属性スコア、及びコンテンツスコア算出部１０ｄで算出されたコンテンツスコアを用いて、投稿されたメッセージや画像の反応度を算出する。 The reaction degree calculation unit 10e calculates the reaction degree of the posted message or image using the attribute score calculated by the attribute score calculation unit 10b and the content score calculated by the content score calculation unit 10d.

プログラムメモリ１２は、プロセッサ１０で実行されるべき処理プログラムを格納する。また、プログラムメモリ１２は、学習により得られた各種パラメータを記憶する。 Program memory 12 stores processing programs to be executed by processor 10 . The program memory 12 also stores various parameters obtained by learning.

記憶装置１４は、学習に用いる投稿メッセージや画像、テストデータ等を記憶する。 The storage device 14 stores posted messages, images, test data, and the like used for learning.

入出力Ｉ／Ｆ１６は、キーボードやマウス等の入力装置２０、及びディスプレイ等の出力装置２２に接続される。 The input/output I/F 16 is connected to an input device 20 such as a keyboard and mouse, and an output device 22 such as a display.

通信Ｉ／Ｆ１８は、インターネット経由でクラウド２４に接続される。クラウド２４は、ソーシャルメディアを実現し、そこに投稿されるメッセージや画像は通信Ｉ／Ｆ１８を介して記憶装置１４に格納され、さらにプロセッサ１０に供給されて解析される。 Communication I/F 18 is connected to cloud 24 via the Internet. The cloud 24 implements social media, and messages and images posted there are stored in the storage device 14 via the communication I/F 18 and further supplied to the processor 10 for analysis.

プロセッサ１０により実装されるモデル、具体的には属性特徴抽出部１０ａ、コンテンツ特徴抽出部１０ｃは、ソーシャルメディアに対して実際に投稿されたメッセージや画像と、その反応度（人気度）を用いて学習される。学習済みのプロセッサ１０は、反応度の予測対象である投稿メッセージや画像を解析し、属性スコア及びコンテンツスコアを算出して出力し、さらに反応度を算出して出力装置２２に出力する。プロセッサ１０は、属性スコア及びコンテンツスコアをそれぞれ独立に算出するため、最終的な反応度とは別に、これらのスコアを個別に出力装置２２に出力し得る。すなわち、プロセッサ１０の出力は、
・属性スコアのみ
・コンテンツスコアのみ
・反応度のみ
・コンテンツスコアと反応度
・属性スコアと反応度
・属性スコアとコンテンツスコアと反応度
のいずれかである。プロセッサ１０で算出され出力されるコンテンツスコアは、投稿されたメッセージや画像のコンテンツのみの反応度に対応する。 The model implemented by the processor 10, specifically the attribute feature extraction unit 10a and the content feature extraction unit 10c, uses messages and images actually posted on social media and their reaction (popularity) be learned. The trained processor 10 analyzes posted messages and images for which reactivity is to be predicted, calculates and outputs attribute scores and content scores, and further calculates reactivity and outputs to the output device 22 . Since the processor 10 independently calculates the attribute score and the content score, these scores can be output to the output device 22 separately from the final reactivity. That is, the output of processor 10 is
- attribute score only - content score only - reactivity only - content score and reactivity - attribute score and reactivity - attribute score and content score and reactivity The content score calculated and output by the processor 10 corresponds to the reactivity of only the posted message or image content.

ここで、反応度とは、ソーシャルメディア上の投稿に対する反応（人気）の度合いを示す指標であり、具体的には、
ツイッター（Twitter）（登録商標）におけるリツイート数（RT数）
フェイスブック（facebook）における「いいね」数
ユーチューブ（Youtube）（登録商標）の閲覧数
RedditにおけるKarmaScore
等である。 Here, the degree of reaction is an index that indicates the degree of reaction (popularity) to a post on social media.
Number of retweets (RTs) on Twitter (registered trademark)
Number of likes on Facebook Number of views on YouTube (registered trademark)
KarmaScore on Reddit
etc.

この中でもRedditは、規模が比較的大きく、データが公開されていることから、ユーザの反応度の研究対象になることが多い。例えば、Redditを用いた研究として、
Hao Fang, Hao Cheng, Mari Ostendorf. Learning Latent Local Conversation Modes for Predicting Com-munity Endorsement in Online Discussions. ACL 2016.
Hao Cheng, Hao Fang, Mari Ostendorf. A Factored Neural Network Model for Characterizing Online Discussions in Vector Space. EMNLP 2017.
等が知られている。 Among them, Reddit is often the subject of user reaction research because it is relatively large and the data is open to the public. For example, as a study using Reddit,
Hao Fang, Hao Cheng, Mari Ostendorf. Learning Latent Local Conversation Modes for Predicting Com-munity Endorsement in Online Discussions. ACL 2016.
Hao Cheng, Hao Fang, Mari Ostendorf. A Factored Neural Network Model for Characterizing Online Discussions in Vector Space. EMNLP 2017.
etc. are known.

Redditでは、多様な話題に関するテキストが投稿される。Redditには、投稿をトピック毎にまとめたsubredditが存在し、各subredditにはスレッドと呼ばれる投稿群の単位が存在する。ユーザは新しい話題に関する投稿を最初に行い（submission）、このsubmissionに対して返信が可能であり、その返信に対しても返信が可能である。submissionと、その下部にある全ての投稿が１つのスレッドを構成する。各投稿には複数の返信が可能であるため、スレッドの構造はツリー構造となる。そして、ユーザは、投稿に対して支持か不支持かの評価を行うことができる。支持の票数から不支持の票数を引いたものがKarmaScoreである。すなわち、
KarmaScore=支持票数－不支持票数
である。 On Reddit, texts are posted on a wide variety of topics. Reddit has subreddits that organize posts by topic, and each subreddit has a group of posts called threads. A user first makes a contribution on a new topic (submission), can reply to this submission, and can also reply to the reply. A submission and all posts below it form a single thread. Since each post can have multiple replies, the structure of the thread is a tree structure. Then, the user can evaluate whether the post is supported or not supported. KarmaScore is the number of support votes minus the number of disapproval votes. i.e.
KarmaScore = number of votes in favor - number of votes in favor.

本実施形態では、ソーシャルメディアとして、規模が大きくデータも公開されているRedditを例示し、反応度としてのKarmaScoreを用いる場合について説明する。但し、本発明は、特定のソーシャルメディアに限定されるものではなく、任意のソーシャルメディアに投稿されるメッセージや画像にも適用可能である。 In this embodiment, Reddit, which has a large scale and is open to the public, is exemplified as a social media, and a case where KarmaScore is used as the degree of reactivity will be described. However, the present invention is not limited to specific social media, and can be applied to messages and images posted on any social media.

図２Ａ及び図２Ｂは、ソーシャルメディアに対して投稿されるメッセージの属性及びコンテンツと反応度との一般的な関係を示す。 2A and 2B show the general relationship between attributes and content of messages posted on social media and reactivity.

図２Ａは、投稿時間２２時に投稿者Ａが「今日は寒い」とのメッセージを投稿した場合である。このとき、メッセージの属性は、
投稿時間：２２時
投稿者：Ａ
であり、メッセージのコンテンツは、
「今日は寒い」
である。投稿された当該メッセージの反応度、例えばRedditのKarmaScoreは１０００であったとする。
他方、図２Ｂは、同一メッセージを異なる時間に異なる投稿者が投稿した場合である。このとき、メッセージの属性は、
投稿時間：１０時
投稿者：Ｂ
であり、メッセージのコンテンツは、
「今日は寒い」
である。投稿された当該メッセージの反応度、例えばKarmaScoreは１０であったとする。 FIG. 2A shows a case where poster A posted a message "It's cold today" at the posting time of 22:00. In this case, the attributes of the message are
Posting time: 22:00 Posted by: A
and the content of the message is
"it is cold today"
is. Assume that the reaction level of the posted message, for example, Reddit's KarmaScore, is 1000.
On the other hand, FIG. 2B shows a case where different posters post the same message at different times. In this case, the attributes of the message are
Posting time: 10:00 Posted by: B
and the content of the message is
"it is cold today"
is. Assume that the posted message has a reaction level of 10, for example, a KarmaScore.

このように、たとえコンテンツが同一であったとしても、属性が異なれば反応度が異なり得る、言い替えれば、反応度は投稿者や投稿時間、投稿速度等の属性の影響を強く受けるため、コンテンツ自体の反応度を評価するためには、これら属性の影響を排除する必要がある。 In this way, even if the content is the same, if the attributes are different, the degree of response may differ. In order to evaluate the degree of reactivity of

そこで、本実施形態では、ソーシャルメディアに対して投稿されるメッセージや画像の反応度を予測する際に、属性の反応度とコンテンツの反応度をそれぞれ独立に分離して算出するモデルを用いる。 Therefore, in the present embodiment, when predicting the degree of reaction of a message or image posted on social media, a model is used in which the degree of reaction of attributes and the degree of reaction of content are calculated independently.

図３は、プロセッサ１０の機能ブロックをより詳細に示す。プロセッサ１０の機能ブロックは、属性スコアを算出する属性ブロックと、コンテンツスコアを算出するコンテンツブロックに分離される。 FIG. 3 shows the functional blocks of processor 10 in more detail. The functional blocks of processor 10 are separated into attribute blocks for calculating attribute scores and content blocks for calculating content scores.

属性ブロックは、属性特徴抽出部１０ａ及び属性スコア算出部１０ｂを備える。 The attribute block includes an attribute feature extraction unit 10a and an attribute score calculation unit 10b.

属性特徴抽出部１０ａは、例えば多層パーセプトロン（ＭＬＰ）で構成され、投稿されたメッセージや画像の属性特徴を抽出する。属性は、コンテクストであり、投稿されたメッセージの文脈や背景事実である。属性特徴抽出部１０ａは、属性として、投稿者、スレッドの深さ、前回の投稿からの時間、前回の投稿に対する全ての返信中における投稿時間のランク、前回の投稿に対する全ての返信数等が用いられる。より詳細には、以下を属性として用い得る。
（１）投稿者がスレッドを開始した者と同一か否か
（２）投稿者により投稿されたメッセージ数
（３）投稿に対する返信数
（４）投稿に先立つ投稿数
（５）投稿より後の投稿数
（６）投稿の兄弟（シブリング）投稿の数
（７）投稿から派生したサブツリーの投稿数
（８）投稿から派生したサブツリーの程度
（９）スレッドの深さ
（１０）最初の投稿からの投稿時間
（１１）親の投稿からの投稿時間
多層パーセプトロンは例えば３層とし、それぞれの層の次元は例えば６４とし得る。属性スコア算出部１０ｂは、属性特徴抽出部１０ａで抽出された属性特徴のパラメータベクトルを積算することでスカラー値としての属性スコアを算出する。 The attribute feature extraction unit 10a is composed of, for example, a multi-layer perceptron (MLP), and extracts attribute features of posted messages and images. Attributes are context, the context or background facts of a posted message. The attribute feature extraction unit 10a uses, as attributes, the poster, the depth of the thread, the time since the previous posting, the rank of the posting time during all replies to the previous post, the number of all replies to the previous post, and the like. be done. More specifically, the following may be used as attributes.
(1) Whether or not the poster is the same as the person who started the thread (2) Number of messages posted by the poster (3) Number of replies to posts (4) Number of posts prior to posting (5) Posts after posting number (6) number of sibling posts of a post (7) number of posts in the subtree derived from the post (8) degree of subtree derived from the post (9) depth of thread (10) posts from the first post Time (11) Post Time from Parent Post A multi-layer perceptron may have, for example, 3 layers, each layer having, for example, 64 dimensions. The attribute score calculator 10b calculates an attribute score as a scalar value by integrating parameter vectors of the attribute features extracted by the attribute feature extractor 10a.

コンテンツブロックは、コンテンツ特徴抽出部１０ｃ及びコンテンツスコア算出部１０ｄを備える。コンテンツ特徴抽出部１０ｃは、ＬＳＴＭ（Long short-term memory）又はＦＮＮ（Factored Neural Network）で構成され、コンテンツ特徴を抽出する。ＬＳＴＭは基本的な言語モデルとして用いられ、ＦＮＮはアテンションメカニズム（attention mechanism）を用いてあるコメントの次のワード、及びそれに続く返信を予測する言語モデルとして用いられる。 The content block comprises a content feature extraction unit 10c and a content score calculation unit 10d. The content feature extraction unit 10c is composed of an LSTM (Long short-term memory) or FNN (Factored Neural Network), and extracts content features. LSTM is used as a basic language model and FNN is used as a language model to predict the next word of a comment and subsequent reply using an attention mechanism.

ＬＳＴＭは、ＲＮＮ(Recurrent Neural Network)の拡張であり、時系列データ(sequential data)に対するモデルである。ＬＳＴＮはＲＮＮの中間層のユニットをＬＳＴＭブロックと呼ばれるメモリと３つのゲートを持つブロックに置き換えることで実現される。テキストを単語毎に分割し、識別子（ＩＤ）を付与する。そして、これらの配列をword embedding層に投入し、単語ベクトルの配列を得る。単語ベクトルの配列を
Ｘ＝ｘ_１，ｘ_２，・・・ｘ_Ｔ
とする。ここで、Ｔは系列長である。そして、これらの単語ベクトルをＬＳＴＭにて時系列処理する。ＬＳＴＭに単語ベクトルｘ_ｉを入力した後、隠れ層の状態ｈ_ｉは以下のように算出される。

ここで、ｃｔは記憶素子、ｉｔは入力ゲート、ｆｔは忘却ゲート、ｏｔは出力ゲート、ｇｔは状態候補、

は要素積を示す。また、Ｗ，Ｕ，ｂはパラメータである。 LSTM is an extension of RNN (Recurrent Neural Network) and is a model for sequential data. The LSTN is realized by replacing the intermediate layer unit of the RNN with a block called an LSTM block, which has a memory and three gates. A text is divided into words and given an identifier (ID). Then we put these arrays into the word embedding layer to get an array of word vectors. Let the array of word vectors be X=x ₁ , x ₂ , . . . x _T
and where T is the sequence length. Then, these word vectors are time-series processed by LSTM. After inputting the word vectors x _i into the LSTM, the hidden layer states h _i are computed as follows.

where ct is a storage element, it is an input gate, ft is a forget gate, ot is an output gate, gt is a state candidate,

indicates the element product. Also, W, U, and b are parameters.

ＬＳＴＭは例えば１層とし、最後の隠れ層をコンテンツ特徴抽出として用いることができる。隠れ層の次元は例えば６４とし、ワードの埋め込みの次元は２５６とし得る。また、メッセージの単語長が５０以上の場合には、最初の５０単語のみを使用し得る。頻度順に上位３２０００語にＩＤを振り分ける。５回未満の頻度の単語は、トークン<unk>に置換する。損失関数として二乗誤差、最適化はAdamを用い得る。ミニバッチサイズは６４、ドロップアウトの比率は０．５とし得る。 The LSTM can be for example one layer and the last hidden layer can be used as content feature extraction. The dimension of the hidden layer may be, for example, 64 and the dimension of the word embedding may be 256. Also, if the message word length is greater than 50, only the first 50 words may be used. The IDs are sorted into the top 32000 words in order of frequency. ５回未満の頻度の単語は、トークン<unk>に置換する。 Squared error can be used as the loss function, and Adam can be used for optimization. The minibatch size can be 64 and the dropout ratio can be 0.5.

ＦＮＮについては、例えば、Hao Cheng Hao Fang Mari Ostendorf. 2017. “A Factored Neural Network Model for Characterizing Online Discussions in Vector Space”，In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2296-2306, Copenhagen, Denmarkに記載されている。この文献では、上記の属性を含む１３個の属性を用い、話題を複数潜在変数ベクトルの線形和として表現し、ＬＳＴＭに逐次的に入力する言語モデルを使用することが記載されている。すなわち、投稿（コメント）はローカルベクトルとコンテンツベクトルに埋め込まれ、ローカルベクトルは投稿のコンテクストを示す。ローカルベクトルは、グローバルモードベクトルから計算される。コンテンツベクトルは、投稿のコンテンツ特徴を示す。 Regarding FNN, for example, Hao Cheng Hao Fang Mari Ostendorf. , Denmark. This document describes the use of a language model that uses 13 attributes including the above attributes, expresses topics as a linear sum of multiple latent variable vectors, and sequentially inputs them to the LSTM. That is, a post (comment) is embedded in a local vector and a content vector, with the local vector indicating the context of the post. Local vectors are calculated from the global mode vectors. A content vector indicates the content characteristics of a post.

コンテンツスコア算出部１０ｄは、コンテンツ特徴抽出部１０ｃで抽出されたコンテンツ特徴のパラメータベクトルを積算することでスカラー値としてのコンテンツスコアを算出する。 The content score calculation unit 10d calculates a content score as a scalar value by accumulating the parameter vectors of the content features extracted by the content feature extraction unit 10c.

図３に示す属性特徴抽出部１０ａ及びコンテンツ特徴抽出部１０ｃは、Karmascoreをラベルとして用いて学習される。そして、学習後のモデルによって算出されるコンテンツスコアは、アノテート（注釈付け）されたデータセットによって評価される。 The attribute feature extraction unit 10a and content feature extraction unit 10c shown in FIG. 3 are trained using Karmascore as a label. The content score calculated by the trained model is then evaluated using the annotated dataset.

Redditの反応度であるKarmascoreの分布はZipfの法則に従うことが知られているから、本実施形態では、このような偏った分布を平滑化すべく、以下の式を用いてKarmascoreを変換する。
ｆ（ｋ）＝ｌｏｇ（ｋ＋１）（ｋ≧０）
＝０（ｋ＜０）
ここで、ｋはKarmascoreである。 Since the distribution of Karmascore, which is Reddit's reactivity, is known to follow Zipf's law, in the present embodiment, Karmascore is converted using the following formula to smooth such a biased distribution.
f(k)=log(k+1) (k≧0)
= 0 (k < 0)
where k is Karmascore.

学習に当たり、Redditの３つのsubredditである、AskMen, AskWomen, AskRedditからそれぞれ４２０，５９８個、２４７，０１２個、６４４，０３４個の投稿数を収集し、それぞれ４：１の比率で学習用と評価用のデータセットとする。全ての投稿は、２０１６年６月１日から２０１７年６月１日にかけて実際に投稿されたものである。 For learning, we collected 420,598, 247,012, and 644,034 posts from the three Reddit subreddits, AskMen, AskWomen, and AskReddit, and evaluated them for learning at a 4:1 ratio. Let the data set for All posts were actually posted between June 1, 2016 and June 1, 2017.

評価用のテストデータとして用いる際に、投稿に対してアノテートする。アノテートは、例えばクラウドソーシングが用いられ得る。アノテータ（注釈者）は、１０個の投稿から好ましいと思われる３つの投稿を選択する。１０個の投稿のそれぞれは、あるsubmissionに対する返信である。１０個以上の返信があるsubmissionについては、Karmascoreが高い上位１０個の投稿が選択される。 Annotate posts when using them as test data for evaluation. Annotations can be crowdsourced, for example. An annotator selects three posts that are considered preferable from the ten posts. Each of the 10 posts is a reply to a certain submission. For submissions with 10 or more replies, the top 10 posts with the highest KarmaScore are selected.

各subredditについて、１３０個のサブセットがアノテートされ、合計１３００個の投稿がアノテートされる。１０人のアノテータがそれぞれ異なる３つの投稿に投票するため、ある一つの投稿に着目するとその票数は０～１０のいずれかになる。０～１０の範囲にある票数をアノテートスコアと称する。要するに、ある投稿に対する１０の返信のうち、好ましいものを３個選ぶタスクを１０人のアノテータに行わせ、１０人のアノテータに選ばれた総回数をアノテートスコアと称する。 For each subreddit, 130 subsets are annotated for a total of 1300 posts annotated. Since 10 annotators vote on 3 different posts, the number of votes will be 0 to 10 when focusing on a certain post. The number of votes in the range 0-10 is called the annotate score. In short, 10 annotators are given a task of selecting 3 favorable ones out of 10 replies to a certain post, and the total number of times the 10 annotators have chosen them is called an annotate score.

このようなタスクでは、投稿者や投稿時間、投稿速度等の属性が排除されているため、属性の影響を除去したコンテンツのみの反応度の指標として用いることができる。本実施形態では、反応度としてのKarmascoreを用いて学習した後、学習後のコンテンツ特徴抽出部１０ｃ及びコンテンツスコア算出部１０ｄで算出されるコンテンツスコアと、アノテートスコアとを比較して評価することで、コンテンツスコアがコンテンツ自身の反応度を反映している、すなわち属性の影響が排除されていることを評価し得る。 In such a task, attributes such as poster, posting time, and posting speed are excluded, so it can be used as an indicator of the reactivity of only the content from which the influence of the attributes has been removed. In the present embodiment, after learning using Karmascore as the degree of reactivity, the content score calculated by the content feature extraction unit 10c and the content score calculation unit 10d after learning is compared with the annotation score for evaluation. , it can be evaluated that the content score reflects the reactivity of the content itself, that is, the influence of attributes is eliminated.

図４は、学習済みの装置により算出されたコンテンツスコアとアノテートスコアの比較結果を示す。コンテンツ特徴抽出部１０ｃとしてＬＳＴＭを用いた場合をＬＳＴＭ_Disjunctive、ＦＮＮを用いた場合をＦＮＮ_Disjunctiveとして示す。３つのsubredditであるAskMen, AskWomen, AskRedditのそれぞれについて、ＭＡＰ（Mean Average Precision）、ＭＲＲ（Mean Reciprocal Rank）、ｐｒｅｃ３（precision@3）を算出したものである。なお、これらの指標については、Brian Mcfee and Gert Lanckriet.2010. “Metric learning to rank”. In Proceedings of the 27th annual International Conference on Mechine Learning に詳述されている。また、比較のため、属性とコンテンツを分離せずに連結させて用いた反応度予測モデルをＬＳＴＭ_Concat及びＦＮＮ_Concat，コンテンツのみを用いた反応度予測モデルをＬＳＴＭ_Text及びＦＮＮ_Textとして示す。 FIG. 4 shows the result of comparison between the content score and the annotation score calculated by the trained device. A case where LSTM is used as the content feature extraction unit 10c is indicated as LSTM _Disjunctive , and a case where FNN is used is indicated as FNN _Disjunctive . MAP (Mean Average Precision), MRR (Mean Reciprocal Rank), and prec3 (precision@3) are calculated for each of the three subreddits, AskMen, AskWomen, and AskReddit. These indicators are detailed in Brian Mcfee and Gert Lanckriet.2010.“Metric learning to rank”.In Proceedings of the 27th annual International Conference on Mechine Learning. For comparison, LSTM _Concat and FNN _Concat are reactivity prediction models that are used by connecting attributes and content without separation, and LSTM _Text and FNN _Text are reactivity prediction models that use only content.

本実施形態におけるＬＳＴＭ_Disjunctive、ＦＮＮ_Disjunctiveではコンテンツスコアを算出して評価しているが、比較モデルであるＬＳＴＭ_ConcatとＦＮＮ_Concat及びＬＳＴＭ_TextとＦＮＮ_Textでは、KarmaScoreを算出して評価している。これらの比較モデルでは、本実施形態のような属性とコンテンツを分離させたモデルと異なり、属性とコンテンツが分離していないためにコンテンツのみの反応度を示すコンテンツスコアをそもそも算出し得ないからである。 LSTM _Disjunctive and FNN _Disjunctive in this embodiment are evaluated by calculating the content score, but in the comparative models LSTM _{Concat and} FNN _Concat and LSTM _{Text and} FNN _Text , KarmaScore is calculated and evaluated. This is because, in these comparison models, unlike the model in which the attribute and the content are separated as in the present embodiment, since the attribute and the content are not separated, the content score indicating the reactivity of the content alone cannot be calculated in the first place. be.

図４に示すように、例えばAskMenのｐｒｅｃ３では、ＬＳＴＭモデルでは
ＬＳＴＭ_Concat（比較例）：０．３０６
ＬＳＴＭ_Text（比較例）：０．３３５
ＬＳＴＭ_Disjunctive（実施形態）：０．３４８
であり、ＦＮＮモデルでは
ＦＮＮ_Concat（比較例）：０．２９２
ＦＮＮ_Text（比較例）：０．３４２
ＦＮＮ_Disjunctive（実施形態）：０．３８８
であり、本実施形態のモデルは、ＬＳＴＭ_Disjunctive、ＦＮＮ_Disjunctiveのいずれも比較モデルよりも高精度の結果が得られており、コンテンツのみの反応度を高精度に予測できることがわかる。ＬＳＴＭ_Text及びＦＮＮ_Textの精度が低いのは、KarmaScoreは同一コンテンツであっても属性の影響を受けるからである。また、ＬＳＴＭ_Concat及びＦＮＮ_Concatの精度が低いのも、同様に属性の影響によるものと考えられる。図４の結果は、KarmaScoreは属性の影響を受けており、他方でアノテートスコアは属性の影響を排除しているため両者は相違し、比較モデルではKarmaScoreを算出しているためアノテートスコアとは乖離し、他方で本実施形態のモデルではコンテンツスコアとしてアノテートスコアに近い値を算出し得るため精度が良いといえる。 As shown in FIG. 4, for example, in prec3 of AskMen, the LSTM model is LSTM _Concat (comparative example): 0.306
LSTM _Text (comparative example): 0.335
LSTM _Disjunctive (Embodiment): 0.348
and in the FNN model, FNN _Concat (comparative example): 0.292
FNN _Text (comparative example): 0.342
FNN _Disjunctive (Embodiment): 0.388
, and the model of the present embodiment obtains results with higher accuracy than the comparison model for both LSTM _Disjunctive and FNN _Disjunctive , and it can be seen that the reactivity of content alone can be predicted with high accuracy. The reason why the accuracy of LSTM _Text and FNN _Text is low is that KarmaScore is affected by attributes even for the same content. Also, the low accuracy of LSTM _Concat and FNN _Concat can be similarly attributed. The results in Figure 4 show that the KarmaScore is affected by attributes, while the annotate score excludes the influence of attributes, so the two are different. On the other hand, the model of this embodiment can be said to be highly accurate because it can calculate a value close to the annotation score as the content score.

図５は、本実施形態のモデルと比較モデルのKarmaScoreの結果を示す。すなわち、本実施形態のモデルで、属性スコアとコンテンツスコアを乗算して反応度としてKarmaScoreを算出し、実際のKarmascoreと比較した結果である。
図５に示すように、例えばAskMenのｐｒｅｃ３では、ＬＳＴＭモデルでは
ＬＳＴＭ_Concat（比較例）：０．３４８
ＬＳＴＭ_Text（比較例）：０．３００
ＬＳＴＭ_Disjunctive（実施形態）：０．３４６
であり、ＦＮＮモデルでは
ＦＮＮ_Concat（比較例）：０．４５３
ＦＮＮ_Text（比較例）：０．３２０
ＦＮＮ_Disjunctive（実施形態）：０．４４１
であり、ＬＳＴＭ_Text及びＦＮＮ_Textは、他のモデルと比べて精度が低い。これは、既述したようにＬＳＴＭ_Text及びＦＮＮ_Textでは、属性を用いずコンテンツのみを用いてKarmaScoreを算出していることに基づく。 FIG. 5 shows the KarmaScore results of the model of the present embodiment and the comparative model. That is, in the model of the present embodiment, the KarmaScore is calculated as the reactivity by multiplying the attribute score and the content score, and the result is compared with the actual KarmaScore.
As shown in FIG. 5, for example, in prec3 of AskMen, the LSTM model is LSTM _Concat (comparative example): 0.348
LSTM _Text (comparative example): 0.300
LSTM _Disjunctive (Embodiment): 0.346
and in the FNN model, FNN _Concat (comparative example): 0.453
FNN _Text (comparative example): 0.320
FNN _Disjunctive (Embodiment): 0.441
, and LSTM _Text and FNN _Text are less accurate than other models. This is based on the fact that LSTM _Text and FNN _Text , as described above, calculate the KarmaScore using only content without using attributes.

また、本実施形態の属性とコンテンツを分離させたＬＳＴＭDisjunctive、ＦＮＮDisjunctiveは、属性とコンテンツを分離させずに連結させたＬＳＴＭ_Concat、ＦＮＮ_Concatとほぼ同程度の精度が得られている。このことは、本実施形態のように属性とコンテンツを独立分離させたとしても反応度を精度良く評価し得る、つまり、属性とコンテンツが互いに独立しているとのモデルの仮定に問題がないことを意味するものである。 Also, the LSTMDisjunctive and FNNDisjunctive in which the attribute and the content are separated from each other according to the present embodiment achieve almost the same accuracy as the LSTM _Concat and FNN _Concat in which the attribute and the content are connected without being separated from each other. This means that even if the attribute and the content are separated independently as in this embodiment, the degree of reactivity can be evaluated with high accuracy. means

図６は、学習済みの装置によりコンテンツのみの反応度を出力する場合を模式的に示す。ソーシャルメディアに対して投稿されるメッセージや画像のコンテンツは、コンテンツ特徴抽出部１０ｃに供給される。 FIG. 6 schematically shows a case where a learned device outputs the reactivity of only content. Contents of messages and images posted on social media are supplied to the content feature extraction unit 10c.

コンテンツ特徴抽出部１０ｃは、コンテンツの特徴ベクトルを抽出してコンテンツスコア算出部１０ｄに出力する。 The content feature extraction unit 10c extracts a content feature vector and outputs it to the content score calculation unit 10d.

コンテンツスコア算出部１０ｄは、コンテンツの特徴ベクトルを積算してコンテンツスコアを出力する。 The content score calculator 10d integrates feature vectors of content and outputs a content score.

図６において、コンテンツのみからコンテンツスコアが出力され、コンテンツのみの反応度の指標として出力される点に留意すべきである。 It should be noted that in FIG. 6, the content score is output from the content only, and is output as an indicator of the reactivity of the content only.

図７Ａ及び図７Ｂは、学習済みの装置により最終的な反応度を出力する場合を模式的に示す。 7A and 7B schematically show a case where a trained device outputs the final reactivity.

図７Ａは、投稿時間２２時に投稿者Ａが「今日は寒い」とのメッセージを投稿した場合である。このとき、メッセージの属性は、
投稿時間：２２時
投稿者：Ａ
であり、メッセージのコンテンツは、
「今日は寒い」
である。 FIG. 7A shows a case where poster A posted a message "It's cold today" at the posting time of 22:00. In this case, the attributes of the message are
Posting time: 22:00 Posted by: A
and the content of the message is
"it is cold today"
is.

メッセージの属性は、属性特徴抽出部１０ａに供給される。属性特徴抽出部１０ａは、属性の特徴ベクトルを抽出して属性スコア算出部１０ｂに出力する。属性スコア算出部１０ｂは、属性の特徴ベクトルを積算して属性スコアを算出する。属性スコアは、例えば「１００」であるとする。 The attributes of the message are supplied to the attribute feature extraction unit 10a. The attribute feature extraction unit 10a extracts attribute feature vectors and outputs them to the attribute score calculation unit 10b. The attribute score calculation unit 10b calculates an attribute score by integrating feature vectors of attributes. Assume that the attribute score is, for example, "100".

また、メッセージのコンテンツは、コンテンツ特徴抽出部１０ｃに供給される。コンテンツ特徴抽出部１０ｃは、コンテンツの特徴ベクトルを抽出してコンテンツスコア算出部１０ｄに出力する。コンテンツスコア算出部１０ｄは、コンテンツの特徴ベクトルを積算してコンテンツスコアを算出する。コンテンツスコアは、例えば「１０」であるとする。 Also, the content of the message is supplied to the content feature extraction unit 10c. The content feature extraction unit 10c extracts a content feature vector and outputs it to the content score calculation unit 10d. The content score calculator 10d calculates a content score by integrating feature vectors of content. Assume that the content score is, for example, "10".

反応度算出部１０ｅは、属性スコア算出部１０ｂで算出された属性スコアと、コンテンツスコア算出部１０ｄで算出されたコンテンツスコアと乗算することで反応度を算出する。すなわち、反応度は、属性スコア×コンテンツスコア＝１００×１０＝１０００となる。 The reactivity calculator 10e calculates reactivity by multiplying the attribute score calculated by the attribute score calculator 10b by the content score calculated by the content score calculator 10d. That is, the reactivity is attribute score×content score=100×10=1000.

図７Ｂは、投稿時間１０時に投稿者Ｂが「今日は寒い」とのメッセージを投稿した場合である。このとき、メッセージの属性は、
投稿時間：１０時
投稿者：Ｂ
であり、メッセージのコンテンツは、
「今日は寒い」
である。 FIG. 7B shows a case in which contributor B has posted a message "It's cold today" at the posting time of 10:00. In this case, the attributes of the message are
Posting time: 10:00 Posted by: B
and the content of the message is
"it is cold today"
is.

メッセージの属性は、属性特徴抽出部１０ａに供給される。属性特徴抽出部１０ａは、属性の特徴ベクトルを抽出して属性スコア算出部１０ｂに出力する。属性スコア算出部１０ｂは、属性の特徴ベクトルを積算して属性スコアを算出する。属性スコアは、例えば「１」であるとする。 The attributes of the message are supplied to the attribute feature extraction unit 10a. The attribute feature extraction unit 10a extracts attribute feature vectors and outputs them to the attribute score calculation unit 10b. The attribute score calculation unit 10b calculates an attribute score by integrating feature vectors of attributes. Assume that the attribute score is "1", for example.

反応度算出部１０ｅは、属性スコア算出部１０ｂで算出された属性スコアと、コンテンツスコア算出部１０ｄで算出されたコンテンツスコアと乗算することで反応度を算出する。すなわち、反応度は、属性スコア×コンテンツスコア＝１×１０＝１０となる。 The reactivity calculator 10e calculates reactivity by multiplying the attribute score calculated by the attribute score calculator 10b by the content score calculated by the content score calculator 10d. That is, the reactivity is attribute score×content score=1×10=10.

このように、メッセージの属性が異なるため属性スコアは異なるものの、メッセージのコンテンツは同一であるためコンテンツスコアは同一となり、最終的な反応度は異なるものとなる。 In this way, although the message attributes are different, the attribute scores are different, but since the message contents are the same, the content scores are the same, and the final reactivity is different.

図８は、本実施形態の処理フローチャートを示す。 FIG. 8 shows a processing flowchart of this embodiment.

まず、学習に用いるデータとしてソーシャルメディアに対して投稿されたデータを収集する。データは、メッセージ（あるいは画像）とその反応度のセットから構成される。例えば、Redditの３つのsubredditから投稿メッセージ、及びその反応度を示す指標としてのKarmaScoreを抽出する（Ｓ１０１）。学習に用いるデータは、記憶装置１４に記憶される。 First, data posted on social media is collected as data used for learning. The data consists of a set of messages (or images) and their reactivity. For example, from three subreddits of Reddit, posted messages and KarmaScore as an index indicating the degree of reaction are extracted (S101). Data used for learning is stored in the storage device 14 .

次に、図３に示すような属性とコンテンツが独立・分離したモデルをコンピュータの１又は複数のプロセッサ１０で実装し、属性特徴抽出部１０ａのモデルとして例えばＭＬＰ、コンテンツ特徴抽出部１０ｃのモデルとして例えばＬＳＴＭ（またはＦＮＮ）を用いて、Ｓ１０１で収集したデータを用いて学習する（Ｓ１０２）。メッセージの単語長が５０以上の場合には、最初の５０単語のみを使用し、頻度順に上位３２０００語にＩＤを振り分ける。５回未満の頻度の単語は、トークン<unk>に置換する。損失関数として二乗誤差、最適化はAdamを用いる。 Next, a model in which attributes and contents are independent and separated as shown in FIG. For example, LSTM (or FNN) is used to learn using the data collected in S101 (S102). If the word length of the message is 50 or more, only the first 50 words are used, and the IDs are distributed to the top 32000 words in order of frequency. ５回未満の頻度の単語は、トークン<unk>に置換する。 Square error is used as the loss function, and Adam is used for optimization.

学習させた後、学習済みのモデルを用いてコンテンツスコア及び反応度を算出する（Ｓ１０３）。反応度は、
反応度＝属性スコア×コンテンツスコア
として算出される。コンテンツスコアは、コンテンツのみの反応度に相当する。また、反応度は、属性とコンテンツを考慮した値であり、RedditのKarmaScoreに相当する。Ｓ１０３において、コンテンツスコアのみを算出して出力してもよい。コンテンツ評価装置のユーザが、適宜出力すべきスコアを選択できるように構成してもよい。 After learning, the content score and reactivity are calculated using the trained model (S103). The reactivity is
It is calculated as reactivity=attribute score×content score. The content score corresponds to the reactivity of content only. Reactivity is a value that considers attributes and content, and corresponds to Reddit's KarmaScore. In S103, only the content score may be calculated and output. It may be configured such that the user of the content evaluation device can appropriately select the score to be output.

次に、コンテンツスコアを評価するために、投稿されたメッセージについてアノテートスコアを算出する。すなわち、Redditのある投稿に対する１０個の返信を抽出し（Ｓ１０４）、抽出した１０個の返信から好ましいと思う３個を選択する（Ｓ１０５）。３個の返信を選択する処理を合計１０人のアノテータが実行する（Ｓ１０６）。ある投稿に対して１０個以上の返信がある場合には、Karmascoreが高い上位１０個の投稿を対象とする。Ｓ１０４～Ｓ１０６の処理は、各subredditについて実行される。 An annotation score is then calculated for the posted message to evaluate the content score. That is, 10 replies to a post on Reddit are extracted (S104), and 3 of the extracted 10 replies are selected (S105). A total of 10 annotators execute the process of selecting 3 replies (S106). If there are 10 or more replies to a post, the top 10 posts with the highest KarmaScore are targeted. The processing of S104-S106 is executed for each subreddit.

そして、１０個の返信毎に合計１０人により選択された回数をカウントし（Ｓ１０７）、返信毎にカウントされた回数を各投稿のアノテートスコアとする（Ｓ１０８）。１０人全てに選ばれない投稿のアノテートスコアは０であり、１０人全てに選ばれた投稿のアノテートスコアは１０である。１０人のアノテータによる選択で、投稿時間や投稿速度、投稿者等の属性が排除される。 Then, the number of times of selection by a total of 10 people is counted for every 10 replies (S107), and the counted number of times for each reply is used as the annotation score of each post (S108). A post that is not picked by all 10 people has an annotated score of 0, and a post that is picked by all 10 people has an annotated score of 10. Attributes such as posting time, posting speed, and poster are excluded by selection by 10 annotators.

以上のようにアノテートスコアが算出された後、アノテートスコアが算出された投稿について、学習済みの装置により算出されたコンテンツスコアと、Ｓ１０８で算出されたアノテートスコアとを比較して評価する（Ｓ１０９）。コンテンツスコアがアノテートスコアと予め定めた精度で一致していれば、学習済みの装置でコンテンツスコアを出力することにより、属性の影響を排除したコンテンツのみの反応度を算出して出力し得ることになる。 After the annotation score is calculated as described above, the post for which the annotation score has been calculated is evaluated by comparing the content score calculated by the learned device and the annotation score calculated in S108 (S109). . If the content score matches the annotate score with a predetermined accuracy, by outputting the content score with a trained device, it is possible to calculate and output the reactivity of only the content excluding the influence of attributes. Become.

図９は、本実施形態の装置における学習と評価の構成を示す。学習は、ソーシャルメディアに対する投稿メッセージとその反応度、例えばKarmaScoreを用いて行われ、属性特徴抽出部１０ａ及びコンテンツ特徴抽出部１０ｃのモデルパラメータが調整される。また、評価は、複数のアノテータによりアノテートされたメッセージとその反応度、すなわちアノテートスコアを用いて行われ、コンテンツスコアとアノテートスコアとを比較することで評価される。すなわち、学習と評価とで異なる機能ブロック及び異なる指標が用いられる。 FIG. 9 shows the configuration of learning and evaluation in the device of this embodiment. Learning is performed using messages posted to social media and their reaction levels, for example, KarmaScore, and model parameters of the attribute feature extraction unit 10a and the content feature extraction unit 10c are adjusted. Also, evaluation is performed using messages annotated by a plurality of annotators and their reactivity, that is, annotation scores, and is evaluated by comparing content scores and annotation scores. That is, different functional blocks and different indices are used for learning and evaluation.

図１０Ａ及び図１０Ｂは、投稿メッセージと算出されたコンテンツスコアの具体例を示す．図９Ａは、相対的にコンテンツスコアが高い投稿メッセージの例であり、図９Ｂは、相対的にコンテンツスコアが低い投稿メッセージの例である。例えば、
「my neighbor on the other side if the house were having a fire pit . we heard a fox in the woods before we knew what it was we」
なるメッセージについてコンテンツスコアは「１．０６２」と算出される。また、
「yeah but somebodys opinion could be that climate change isnt real . up- voting that gives visibility to misinformation that had the potential to hurt people . im sorry but if somebody is trying to tell me that china invented climate change im downvoting that because its factually incor- rect and contributes」
なるメッセージについてコンテンツスコアは「－０．４５７」と算出される。一般的に、文脈的に独立しているメッセージはコンテンツスコアが高くなり、文脈依存性が強いメッセージはコンテンツスコアが低くなる傾向にある。 10A and 10B show specific examples of posted messages and calculated content scores. FIG. 9A is an example of a posted message with a relatively high content score, and FIG. 9B is an example of a posted message with a relatively low content score. for example,
"my neighbor on the other side if the house were having a fire pit. we heard a fox in the woods before we knew what it was we"
The content score for this message is calculated as "1.062". again,
"yeah but somebodys opinion could be that climate change isnt real . up-voting that gives visibility to misinformation that had the potential to hurt people . im sorry but if somebody is trying to tell me that china invented climate change im downvoting that because its factually incorrect and contributes”
The content score for this message is calculated as "-0.457". In general, messages that are contextually independent tend to have high content scores, while messages that are highly contextual tend to have low content scores.

Ｓ１０２ではコンテンツ特徴抽出部１０ｃのモデルとしてＬＳＴＭ及びＦＮＮを例示したが、これ以外にもＢｏＷ（Bag of Words）、ＢｏＷをｔｆ－ｉｄｆ（Term Frequency - Inverse Document Frequency）で重み付けしたもの、Embedした単語ベクトルを線形加算、Maxpoolingしたもの等が用いられ得る。 In S102, LSTM and FNN were exemplified as models of the content feature extraction unit 10c, but in addition to this, BoW (Bag of Words), BoW weighted by tf-idf (Term Frequency - Inverse Document Frequency), Embedded words Linear addition, Maxpooling, etc. of vectors can be used.

また、本実施形態では、コンテンツとしてメッセージを例示したが、画像の場合にはＢｏＦ（Bag of Feature）やＣＮＮ（Convolutional Neural Network）等が用いられ得る。 Also, in the present embodiment, a message is exemplified as content, but in the case of an image, BoF (Bag of Feature), CNN (Convolutional Neural Network), or the like can be used.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、種々の変形が可能である。以下、変形例について説明する。 Although the embodiment of the present invention has been described above, the present invention is not limited to this, and various modifications are possible. Modifications will be described below.

＜変形例＞
実施形態では、図３に示すように、
反応度＝属性スコア×コンテンツスコア
として反応度を算出しているが、属性の影響をバイアスとしてさらに印加して反応度を算出してもよい。 <Modification>
In an embodiment, as shown in FIG.
Although the reactivity is calculated as reactivity=attribute score×content score, the reactivity may be calculated by additionally applying the influence of attributes as a bias.

図１１は、変形例におけるプロセッサ１０の機能ブロック図を示す。図３の構成に加え、反応度算出部１０ｅで算出された値に、さらに属性特徴抽出部１０ａで抽出された属性特徴ベクトルから算出されたスカラー値をバイアスとして加算器１０ｆで加算して最終的な反応度を算出する。すなわち、
反応度＝属性バイアス＋属性スコア×コンテンツスコア
として反応度を算出する。図１１の構成でも、コンテンツスコアは属性スコアとは独立・分離して算出される。図３の構成は、図１１におけるバイアスが常に０である特殊な場合ということができる。 FIG. 11 shows a functional block diagram of processor 10 in a modification. In addition to the configuration of FIG. 3, the scalar value calculated from the attribute feature vector extracted by the attribute feature extraction unit 10a is added to the value calculated by the reactivity calculation unit 10e as a bias by the adder 10f. Calculate the appropriate reactivity. i.e.
Reactivity is calculated as reactivity=attribute bias+attribute score×content score. Also in the configuration of FIG. 11, the content score is calculated independently and separately from the attribute score. The configuration of FIG. 3 can be said to be a special case where the bias in FIG. 11 is always zero.

１０プロセッサ、１２プログラムメモリ、１４記憶装置、１６入出力インターフェイス（Ｉ／Ｆ）、１８通信インターフェイス（Ｉ／Ｆ）、２０入力装置、２２出力装置。
10 processor, 12 program memory, 14 storage device, 16 input/output interface (I/F), 18 communication interface (I/F), 20 input device, 22 output device.

Claims

attribute feature extracting means for extracting attribute features from posts on social media;
attribute score calculation means for calculating an attribute score from the attribute features;
content feature extraction means for extracting content features from the post;
a content score calculation means for calculating a content score from the content features and outputting the content score independently of the attribute score;
a reaction degree calculation means for calculating and outputting the reaction degree of the post using the attribute score and the content score;
with
During learning, the attribute feature extraction means and the content feature extraction means are learned using a data set of posts on the social media and the degree of reaction to the posts,
At the time of evaluation, the content to be evaluated is supplied to the learned content feature extraction means, and the content score is calculated and output.
content evaluator.

The content evaluation device according to claim 1 , wherein the evaluation is performed using an annotated score of a post on the social media.

3. The content evaluation device according to claim 1, wherein said content feature extraction means extracts said content features using LSTM or FNN.

4. The content evaluation device according to any one of claims 1 to 3, wherein said attribute feature extracting means extracts said attribute feature using a multi-layer perceptron.

The content evaluation device according to any one of claims 1 to 4, wherein the reaction degree calculation means calculates the reaction degree of the post by multiplying the attribute score and the content score.

The content evaluation device according to any one of claims 1 to 4, wherein the reaction degree calculation means calculates the reaction degree of the post by multiplying the attribute score and the content score and adding the attribute feature as a bias. .

attribute score calculation means for calculating an attribute score from attributes of posts on social media;
content score calculation means for extracting content features from the posted content independently of the attribute score calculation means and calculating a content score from the content features;
output means for outputting at least one of the content score and the reaction level of the post calculated using the attribute score and the content score;
with
During learning, learning a content feature extraction means for extracting the content feature using a data set of posts on the social media and the degree of reaction to the posts,
At the time of evaluation, the content to be evaluated is supplied to the learned content feature extraction means to calculate the content score;
content evaluator.