JP6623186B2

JP6623186B2 - Content evaluation prediction system and content evaluation prediction method

Info

Publication number: JP6623186B2
Application number: JP2017037525A
Authority: JP
Inventors: 矢野　亮; 亮矢野; 直哉前田; 理沙松元; 拓也茨木
Original assignee: NTT Data Corp; NTT Data Institute of Management Consulting Inc
Current assignee: NTT Data Corp; NTT Data Institute of Management Consulting Inc
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2019-12-18
Anticipated expiration: 2037-02-28
Also published as: JP2018142272A

Description

本発明は、コンテンツを視聴した視聴者のコンテンツに対する評価を予測するコンテンツ評価予測システム及びコンテンツ評価予測方法に関する。 The present invention relates to a content evaluation prediction system and a content evaluation prediction method for predicting evaluation of content of a viewer who has viewed content.

近年、テレビジョン、ラジオあるいはインターネットなどの通信媒体により、画像及び音声からなる通信販売における商品の広告のコンテンツの配信が行なわれている。（例えば、特許文献１参照）
このテレビショッピングを用いて、商品の通信販売を行なう際、番組制作者（ディレクタ）が過去の経験を元に視聴者に対して購買意欲を生じさせる番組を、創造的な視点で行なっている。 2. Description of the Related Art In recent years, product advertisement contents in mail order sales consisting of images and sounds have been distributed over communication media such as television, radio, and the Internet. (For example, see Patent Document 1)
When performing online shopping for products using this TV shopping, a program producer (director) performs a program that creates a willingness to purchase viewers based on past experience from a creative viewpoint.

視聴者のコンテンツに対する反応は、広告の視聴者への通知後における電話やファックスによる問い合わせ数、あるいはウェブ（Ｗｅｂ）ブラウザにおけるクリック数である。番組制作者は、この問い合わせあるいはクリックの数（アクセス数）により、制作した番組が商品の販売に対して効果があったか否かの判定を行なう。これにより、番組制作者は、番組制作の経験を積み、どのような番組構成とすれば商品販売に繋がるアクセス数が増加するかを、ノウハウとして蓄積していく。 The reaction to the viewer's content is the number of inquiries by telephone or fax after the advertisement is notified to the viewer, or the number of clicks in the Web (Web) browser. Based on this inquiry or the number of clicks (the number of accesses), the program producer determines whether the produced program has an effect on the sale of the product. Thereby, the program producer accumulates experience in program production and accumulates as know-how what kind of program configuration will increase the number of accesses leading to product sales.

特開２００９−１３９９７７号公報JP 2009-139777 A

しかしながら、番組制作者においても、広告のコンテンツを作成する場合に、ノウハウに基づいて作成しても、アクセス数がコンテンツの全体的な評価となり、コンテンツのいずれの部分が販売に寄与しているかを認識することはできない。このため、コンテンツの視聴者による評価は、コンテンツ毎に異なり、常に高くアクセスが得られるとは限らない。
また、広告のコンテンツを初めて作成する番組制作者にとっては、自身の作成する広告のコンテンツが、どの程度の視聴者から、商品に対するアクセスを得られるかは全く判らない。 However, even for program producers, when creating advertising content, even if it is created based on know-how, the number of accesses is an overall evaluation of the content, and which part of the content contributes to sales It cannot be recognized. For this reason, the evaluation of the content by the viewer is different for each content, and high access is not always obtained.
In addition, for a program producer who creates advertisement content for the first time, it is completely unknown how many viewers can get access to the product from the advertisement content that he creates.

本発明は、このような事情に鑑みてなされたもので、コンテンツを制作する番組制作者に対し、作成したコンテンツに対するアクセス数の予測値を供給することで、ノウハウの蓄積量によらずにコンテンツの制作を可能とするコンテンツ評価予測システム及びコンテンツ評価予測方法を提供することを目的とする。 The present invention has been made in view of such circumstances, and by supplying a predicted value of the number of accesses to the created content to the program producer who produces the content, the content can be obtained regardless of the accumulated amount of know-how. It is an object of the present invention to provide a content evaluation prediction system and a content evaluation prediction method that enable the production of content.

この発明は上述した課題を解決するためになされたもので、本発明のコンテンツ評価予測システムは、時系列に情報が変化するコンテンツの前記情報の特徴である情報特徴を所定の周期毎に抽出する特徴抽出部と、前記情報特徴と、前記所定の周期及び当該所定の周期の直後の周期の各々におけるコンテンツの評価値としてのアクセス数の加算値とを用いた機械学習により、前記情報に対応する情報特徴を入力することにより、当該情報特徴に対するアクセス数を予測するコンテンツ評価予測モデルを生成する予測モデル生成部とを備えることを特徴とする。 The present invention has been made to solve the above-described problems, and the content evaluation prediction system of the present invention extracts information features, which are features of the information of content whose information changes in time series, at predetermined intervals. a feature extraction unit, and the information feature, the machine learning using the sum of the number of access as the evaluation value of the content in each of the periods just after the predetermined period and the predetermined period, corresponding to the information And a prediction model generation unit that generates a content evaluation prediction model for predicting the number of accesses to the information feature by inputting the information feature.

本発明のコンテンツ評価予測システムは、前記情報が少なくとも動画像及び台詞のいずれかを含んでいることを特徴とする。 The content evaluation prediction system according to the present invention is characterized in that the information includes at least one of a moving image and a dialogue.

本発明のコンテンツ評価予測システムは、前記特徴抽出部が、前記動画像の特徴抽出を行なう際、画像特徴抽出の学習後の他の畳み込みニューラルネットワーク（ＣＮＮ）における全結合層前段までの特徴抽出機能を用いていることを特徴とする。 In the content evaluation prediction system according to the present invention, when the feature extraction unit performs feature extraction of the moving image, the feature extraction function up to the previous stage of all connected layers in another convolutional neural network (CNN) after learning of image feature extraction It is characterized by using.

本発明のコンテンツ評価予測システムは、前記特徴抽出部が、前記台詞の特徴抽出を行なう際、形態素解析において分解を制限する単語である分解制限単語を辞書に登録し、当該辞書を参照して形態素解析を行なうことを特徴とする。 In the content evaluation prediction system of the present invention, when the feature extraction unit performs feature extraction of the dialogue, it registers a decomposition restriction word, which is a word that restricts decomposition in morphological analysis, in the dictionary, and refers to the dictionary for the morpheme. It is characterized by performing analysis.

本発明のコンテンツ評価予測システムは、前記予測モデル生成部が、前記コンテンツ評価予測モデルを生成する際、スパースモデリング法を用いることを特徴とする。 The content evaluation prediction system of the present invention is characterized in that the prediction model generation unit uses a sparse modeling method when generating the content evaluation prediction model.

本発明のコンテンツ評価予測方法は、特徴抽出部が、時系列に情報が変化するコンテンツの前記情報の特徴である情報特徴を所定の周期毎に抽出する特徴抽出過程と、予測モデル生成部が、前記情報特徴と、前記所定の周期及び当該所定の周期の直後の周期の各々におけるコンテンツの評価値としてのアクセス数の加算値とを用いた機械学習により、前記情報に対応する情報特徴を入力することにより、当該情報特徴に対するアクセス数を予測するコンテンツ評価予測モデルを生成する予測モデル生成過程とを含むことを特徴とする。
In the content evaluation prediction method of the present invention, the feature extraction unit extracts a feature of the information of the content whose information changes in time series for each predetermined period, and a prediction model generation unit includes: said information feature, by machine learning using the sum of the number of access as the evaluation value of the content in each of the periods just after the predetermined period and the predetermined period, and inputs the information features corresponding to the information And a prediction model generation process for generating a content evaluation prediction model for predicting the number of accesses to the information feature.

この発明によれば、コンテンツを制作する番組制作者に対し、作成したコンテンツに対するアクセス数の予測値を供給することで、ノウハウの蓄積量によらずにコンテンツの制作を可能とするコンテンツ評価予測システム及びコンテンツ評価予測方法を提供することができる。 According to the present invention, a content evaluation prediction system that enables content production regardless of the amount of know-how accumulated by supplying a predicted value of the number of accesses to the created content to a program producer who produces content. And a content evaluation prediction method can be provided.

本発明の一実施形態によるコンテンツ評価予測システムの構成例を示す図である。It is a figure which shows the structural example of the content evaluation prediction system by one Embodiment of this invention. ＡｌｅｘＮｅｔにおけるＣＮＮから画像特徴抽出機能を取得することを説明する概念図である。It is a conceptual diagram explaining acquiring an image feature extraction function from CNN in AlexNet. 抽出特徴記憶部１８に記憶されている画像特徴ベクトルテーブルの構成例を示す図である。5 is a diagram illustrating a configuration example of an image feature vector table stored in an extracted feature storage unit 18. FIG. 抽出特徴記憶部１８に記憶されている台詞特徴ベクトルテーブルの構成例を示す図である。4 is a diagram illustrating a configuration example of a dialogue feature vector table stored in an extracted feature storage unit 18. FIG. 抽出特徴記憶部１８に記憶されているアクセス数テーブルの構成例を示す図である。FIG. 6 is a diagram illustrating a configuration example of an access number table stored in an extracted feature storage unit 18. 画像特徴ベクトルと台詞特徴ベクトルとをコンテンツ総合ベクトルとして統合する概念図である。It is a conceptual diagram which integrates an image feature vector and a dialogue feature vector as a content comprehensive vector. 抽出特徴記憶部１８に記憶されている学習コンテンツデータ組テーブルの構成例を示す図である。It is a figure which shows the structural example of the learning content data set table memorize | stored in the extraction feature memory | storage part. 本実施形態のコンテンツ評価予測システム１におけるコンテンツ評価予測モデルの生成処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the production | generation process of the content evaluation prediction model in the content evaluation prediction system 1 of this embodiment. 本実施形態におけるコンテンツ評価予測モデルで予測した、商品の広告である対象コンテンツの各評価周期の予測入電数と、実際に得られた入電数との比較を示すグラフである。It is a graph which shows the comparison with the prediction number of incoming calls of each evaluation period of the object content which is the advertisement of goods predicted with the contents evaluation prediction model in this embodiment, and the number of incoming calls actually obtained. 図９において連結されたコンテンツから選択した、対象コンテンツの各評価周期の予測入電数と、実際に得られた入電数との比較を示すグラフである。10 is a graph showing a comparison between the predicted number of incoming calls in each evaluation cycle of the target content selected from the linked content in FIG. 9 and the actually obtained number of incoming calls.

以下、図面を参照して、本発明の一実施形態について説明する。図１は、本発明の一実施形態によるコンテンツ評価予測システムの構成例を示す図である。図１において、本実施形態におけるコンテンツ評価予測システム１は、コンテンツデータ入力部１１、特徴抽出部１２、アクセス数集積部１３、予測モデル生成部１４、アクセス数予測部１５、データベース１６、分解制限単語辞書記憶部１７、抽出特徴記憶部１８及び評価結果記憶部１９の各々を備えている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a configuration example of a content evaluation prediction system according to an embodiment of the present invention. In FIG. 1, the content evaluation prediction system 1 according to the present embodiment includes a content data input unit 11, a feature extraction unit 12, an access number accumulation unit 13, a prediction model generation unit 14, an access number prediction unit 15, a database 16, and a decomposition restriction word. Each of the dictionary storage unit 17, the extracted feature storage unit 18, and the evaluation result storage unit 19 is provided.

コンテンツデータ入力部１１は、後述するコンテンツ評価予測モデルを生成する際に用いる学習コンテンツのデータ、コンテンツを評価する評価値としてのアクセス数を予測する対象コンテンツのデータを読み込み、データベース１６に対して書き込んで記憶する。ここで、アクセス数とは、コンテンツがテレビショッピングなどで通信販売を行なっている商品の広告の動画像である場合、評価周期毎（例えば、１分毎）における、コンテンツを視聴している視聴者から、商品を販売している会社のコールセンターに対して発呼された電話の数としての入電数である。入電数が多いほど、コンテンツを視聴して、商品に対して興味を有した視聴者が多いため、コンテンツの商品の販売に寄与した度合いが高い。以下、本実施形態においては、アクセス数を視聴者からの電話の入電数として説明する。 The content data input unit 11 reads learning content data used when generating a content evaluation prediction model, which will be described later, and target content data for predicting the number of accesses as an evaluation value for evaluating the content, and writes it to the database 16. Remember me. Here, the number of accesses refers to viewers viewing content at each evaluation cycle (for example, every minute) when the content is a moving image of an advertisement for a product that is being sold through mail-order shopping or the like. The number of incoming calls as the number of calls made to the call center of the company that sells the product. The greater the number of incoming calls, the higher the degree of contribution to the sales of the content product because there are more viewers interested in the product while viewing the content. Hereinafter, in the present embodiment, the number of accesses will be described as the number of incoming calls from the viewer.

特徴抽出部１２は、コンテンツの時系列に変化する情報（上記動画、台詞など）から、上記評価周期毎の特徴（情報特徴）を抽出する。本実施形態においては、コンテンツは、動画像として説明するため、画像と台詞との各々から、それぞれ画像特徴、台詞特徴を抽出する。
そのため、特徴抽出部１２は、画像特徴を抽出する画像特徴抽出部１２１と、台詞特徴を抽出する台詞特徴抽出部１２２とを備えている。
画像特徴抽出部１２１は、ディープラーニングを用いた機械学習により学習されたＣＮＮ（Convolutional Neural Network、畳み込みニューラルネットワーク）などで形成され、画像の特徴を抽出する画像特徴抽出機能を有しており、入力された画像に対応した所定の次元数の画像の特徴ベクトル（画像特徴ベクトル）を抽出する。 The feature extraction unit 12 extracts a feature (information feature) for each evaluation period from information that changes in time series of the content (the moving image, dialogue, etc.). In the present embodiment, since the content is described as a moving image, an image feature and a dialogue feature are extracted from each of the image and the dialogue.
Therefore, the feature extraction unit 12 includes an image feature extraction unit 121 that extracts image features and a dialogue feature extraction unit 122 that extracts dialogue features.
The image feature extraction unit 121 is formed by CNN (Convolutional Neural Network) learned by machine learning using deep learning, and has an image feature extraction function for extracting image features. A feature vector (image feature vector) of an image having a predetermined number of dimensions corresponding to the obtained image is extracted.

本実施形態においては、コンテンツの動画から、１秒ごとにシーンをサンプリングし、シーンのピクセルを格子状に２２７×２２７個のブロックに分割し、ブロック毎にピクセルにおけるＲＧＢの階調度の平均を算出した数値を、特徴を抽出するためのデータ列として用いる。２２７×２２７＝５１５２９となり、５１５２９ブロックに対して、ＲＧＢ毎の３つのデータがあるため、入力される画像のデータ列は、１５４５８７のデータから構成されている。 In this embodiment, a scene is sampled every second from the moving image of the content, the pixels of the scene are divided into 227 × 227 blocks in a grid pattern, and the average of the RGB gradation in the pixels is calculated for each block. The numerical values obtained are used as a data string for extracting features. Since 227 × 227 = 51529 and there are three pieces of data for each of RGB for 51529 blocks, the data string of the input image is composed of 154587 data.

画像特徴抽出部１２１は、本実施形態において、上述した１５４５８７のデータ列から、４０９６次元の画像特徴からなる画像特徴ベクトルを抽出する。
また、本実施形態においては、この画像特徴抽出部１２１における画像特徴抽出機能を有するＣＮＮを、画像特徴量抽出の学習後の他のＣＮＮを転意学習として用いている。まっさらな状態からＣＮＮを作成することは、非常に多くの画像を必要とし、学習に使用できる学習コンテンツデータの数が限られている場合には現実的ではない。このため、本実施形態においては、ＡｌｅｘＮｅｔにおけるＣＮＮのネットワークにおける層から、最後の全結合層を取り除いた部分までを、画像特徴抽出機能として用いている。 In this embodiment, the image feature extraction unit 121 extracts an image feature vector composed of 4096-dimensional image features from the above-described data sequence of 154587.
Further, in the present embodiment, the CNN having the image feature extraction function in the image feature extraction unit 121 is used as other learning CNN after learning of image feature amount extraction. Creating a CNN from a clean state requires a very large number of images and is not practical when the number of learning content data that can be used for learning is limited. For this reason, in the present embodiment, a part from the layer in the CNN network in AlexNet to the part obtained by removing the last all combined layer is used as the image feature extraction function.

図２は、ＡｌｅｘＮｅｔにおけるＣＮＮから画像特徴抽出機能を取得することを説明する概念図である。ＣＮＮは、複数の畳み込み層と最大プーリング層と正規化層とが繰り返され、最後に全結合層が複数個設けられた構成となっている。ここで、本実施形態においては、最初の段の全結合層（全結合層前段）までを、画像特徴量を抽出する画像特徴抽出器として使用している。一般的には、最後の全結合層のみを取り外して用いるが、より後段での情報量を低減させずに用いるために、最初の段の全結合層までを用いている。
そして、最初の段の全結合層の出力を、本実施形態における、後述するコンテンツ評価予測モデルである予測器（スパースモデリングなど）に接続する。 FIG. 2 is a conceptual diagram for explaining acquisition of an image feature extraction function from a CNN in AlexNet. The CNN has a configuration in which a plurality of convolution layers, a maximum pooling layer, and a normalization layer are repeated, and finally a plurality of all coupling layers are provided. Here, in the present embodiment, up to the first fully connected layer (preceding to the fully connected layer) is used as an image feature extractor that extracts image feature values. In general, only the last fully connected layer is removed and used. However, in order to use without reducing the amount of information in the later stage, the first fully connected layer is used.
Then, the output of the first fully connected layer is connected to a predictor (sparse modeling or the like) which is a content evaluation prediction model described later in the present embodiment.

図１に戻り、画像特徴抽出部１２１は、図２で示した他から転移した画像特徴抽出機能を有するＣＮＮにより、学習用コンテンツの画像あるいは評価用コンテンツの画像から、それぞれの画像の特徴量を示す画像特徴ベクトルを抽出し、抽出特徴記憶部１８に書き込んで記憶させる。このとき、画像特徴抽出部１２１は、所定時間（例えば、１秒）毎の画像特徴量ベクトルを、４０９６次元の各々の次元の特徴量を評価周期（例えば、１分）単位で平均し、最終的に、４０９６次元からなる画像特徴ベクトルとする。そして、画像特徴抽出部１２１は、求めた画像特徴ベクトルを評価周期の時間毎に、抽出特徴記憶部１８に対して書き込んで記憶させる。 Returning to FIG. 1, the image feature extraction unit 121 uses the CNN having the image feature extraction function transferred from the other shown in FIG. 2 to calculate the feature amount of each image from the learning content image or the evaluation content image. The image feature vector shown is extracted and written and stored in the extracted feature storage unit 18. At this time, the image feature extraction unit 121 averages the image feature amount vector for each predetermined time (for example, 1 second), the feature amount of each dimension of 4096 dimensions for each evaluation period (for example, 1 minute), and finally Specifically, the image feature vector is composed of 4096 dimensions. Then, the image feature extraction unit 121 writes and stores the obtained image feature vector in the extraction feature storage unit 18 for each evaluation period.

図３は、抽出特徴記憶部１８に記憶されている画像特徴ベクトルテーブルの構成例を示す図である。画像特徴ベクトルテーブルは、コンテンツ毎に設けられている。各レコードに、時間（評価周期）毎に、コンテンツ名（本実施形態においては、例えば動画Ａ）、時間、動画フレームの説明と、１次元からｍ次元（本実施形態においてはｍ＝４０９６）の各々の特徴量が示され、レコード毎に画像特徴ベクトルが示されている。画像特徴ベクトルは、評価周期毎に、Ｐ１（Ｘ１＿１，Ｘ２＿１，…，Ｘｍ＿１）、Ｐ２（Ｘ１＿２，Ｘ２＿２，…，Ｘｍ＿２）、…として表される。Ｘ１＿１，Ｘ２＿１，…，Ｘｍ＿１の各々が、各次元における画像の特徴量である。例えば、特徴量Ｘ１＿１は、評価周期において得られた６０個の特徴ベクトルの各々の４０９６次元における１次元目の特徴量の平均値である。 FIG. 3 is a diagram illustrating a configuration example of an image feature vector table stored in the extracted feature storage unit 18. The image feature vector table is provided for each content. In each record, for each time (evaluation period), the content name (eg, video A in the present embodiment), description of time, video frame, and one to m dimensions (in this embodiment, m = 4096) Each feature amount is shown, and an image feature vector is shown for each record. The image feature vectors are represented as P1 (X1_1, X2_1,..., Xm_1), P2 (X1_2, X2_2,..., Xm_2),. Each of X1_1, X2_1,..., Xm_1 is an image feature amount in each dimension. For example, the feature quantity X1_1 is an average value of the first dimensional feature quantity in 4096 dimensions of each of the 60 feature vectors obtained in the evaluation period.

図１に戻り、台詞特徴抽出部１２２は、画像とともに流れる音声の台詞のテキストデータを形態素解析し、得られたそれぞれの単語の特徴抽出を行なう。このとき、本実施形態においては、コンテンツの商品名など、複数の一般的な単語を連結して作成した造語など、必要以上に小さな単語に分割されたくない単語を、分解制限単語として、分解制限単語辞書記憶部１７に予め書き込んで記憶させている（辞書登録してある）。 Returning to FIG. 1, the dialogue feature extraction unit 122 performs morphological analysis on the speech dialogue text data that flows along with the image, and performs feature extraction of each obtained word. At this time, in the present embodiment, a word that is not desired to be divided into words smaller than necessary, such as a coined word created by concatenating a plurality of general words such as a product name of content, is set as a decomposition restriction word, and is limited in decomposition. It is written in advance and stored in the word dictionary storage unit 17 (registered in the dictionary).

したがって、台詞特徴抽出部１２２は、入力される形態素解析を行なう際、この辞書を参照して形態素解析を行ない、通常であれば分解する連結された単語でも、上記辞書に分解制限単語として登録されているものは、登録された構成を最も素な単語として取り扱い、それ以上の分割は行なわない。また、台詞特徴抽出部１２２は、Ｗｏｒｄ２ｖｅｃあるいはＤｏｃ２ｖｅｃを用いて、形態素解析された単語の特徴量を示す台詞特徴ベクトルとして出力する。このとき、台詞特徴抽出部１２２は、評価周期（例えば、１分間）毎の台詞に含まれる全ての単語の各々の１００次元の特徴ベクトルを抽出する。また、上記Ｗｏｒｄ２ｖｅｃ及びＤｏｃ２ｖｅｃではなく、教師無し学習を行なう単語の特徴抽出を行なうアルゴリズムであれば、他の手法を用いて単語の特徴抽出を行なっても良い。 Accordingly, the line feature extraction unit 122 performs morpheme analysis with reference to this dictionary when performing input morpheme analysis, and even connected words that are normally decomposed are registered as decomposition limited words in the dictionary. Is treated as the simplest word, and no further division is performed. Further, the dialogue feature extraction unit 122 outputs a feature feature vector indicating the feature amount of the word subjected to morphological analysis using Word2vec or Doc2vec. At this time, the dialogue feature extraction unit 122 extracts a 100-dimensional feature vector of each of all words included in the dialogue for each evaluation period (for example, one minute). Further, instead of the above Word2vec and Doc2vec, the word feature extraction may be performed using another method as long as it is an algorithm for extracting the feature of a word for which unsupervised learning is performed.

そして、台詞特徴抽出部１２２は、評価周期毎に、上記特徴ベクトルの１００次元の各々の次元において、全ての単語ので最大の特徴量及び最小の特徴量の各々を抽出し、また、全ての単語の特徴量の平均値を算出する。例えば、全ての単語の特徴量ベクトルにおける１００次元における１次元目の特徴量の中から、最大値と最小値とを抽出し、全ての単語の１次元目の特徴量を集約するとともに、１次元目の特徴量を２次元に拡張する。また、全ての単語の特徴量ベクトルにおける１００次元における１次元目の特徴量の平均値を算出し、さらに１次元の拡張を行なう。この結果、特徴量の１次元が３次元に拡張される。
これにより、台詞特徴抽出部１２２は、１００次元の各次元毎に、全ての単語における特徴量を、全単語の各次元の特徴量を集約し、かつ１００次元の各々の特徴量を最大値、最小値及び平均値の３種類（３次元）に拡張したデータからなる、３００次元の特徴量を有する台詞特徴ベクトルを、抽出特徴記憶部１８に対して書き込んで記憶させる。 Then, the dialogue feature extraction unit 122 extracts each of the maximum feature amount and the minimum feature amount of all the words in each of the 100 dimensions of the feature vector for each evaluation period, and all the words The average value of the feature quantities is calculated. For example, the maximum value and the minimum value are extracted from the first dimension feature quantities in 100 dimensions in the feature quantity vectors of all words, and the first dimension feature quantities of all words are aggregated and one dimension Extend eye features to two dimensions. Further, the average value of the first-dimensional feature values in 100 dimensions in the feature value vectors of all words is calculated, and further one-dimensional expansion is performed. As a result, the one dimension of the feature amount is expanded to three dimensions.
Thereby, the line feature extraction unit 122 aggregates the feature values of all words, the feature values of all the dimensions of all words, and sets the feature values of each of the 100 dimensions to the maximum value for each dimension of 100 dimensions. A dialogue feature vector having 300-dimensional feature values, which is composed of data expanded to three types (three-dimensional) of the minimum value and the average value, is written and stored in the extracted feature storage unit 18.

図４は、抽出特徴記憶部１８に記憶されている台詞特徴ベクトルテーブルの構成例を示す図である。台詞特徴ベクトルテーブルは、コンテンツ毎に設けられている。各レコードに、時間（評価周期）毎に、コンテンツ名（本実施形態においては、例えば動画Ａ）、時間、シーンの台詞の説明と、１次元からｎ次元（本実施形態においてはｎ＝３００）の各々の特徴量が示され、レコード毎に台詞特徴ベクトルが示されている。台詞特徴ベクトルは、評価周期毎に、Ｑ１（Ｙ１＿１，Ｙ２＿１，Ｙ３＿１，…，Ｙｍ＿１）、Ｑ２（Ｙ１＿２，Ｙ２＿２，…，Ｙｍ＿２）、…として表される。ここで、台詞特徴ベクトルは、図３における画像特徴ベクトルに対応するフレームにおいて使用される台詞の特徴ベクトルである。Ｙ１＿１，Ｙ２＿１，Ｙ３＿１，…，Ｙｍ＿１の各々が、各次元における台詞の特徴量である。ここで例えば、特徴量Ｙ１＿１は全単語の特徴ベクトルにおける１次元目の特徴量の最大値であり、特徴量Ｔ１＿２は全単語の特徴ベクトルにおける１次元目の特徴量の最小値であり、特徴量Ｙ１＿３は全単語の特徴ベクトルにおける１次元目の特徴量の平均値である。 FIG. 4 is a diagram illustrating a configuration example of a dialogue feature vector table stored in the extracted feature storage unit 18. The dialogue feature vector table is provided for each content. For each record, for each time (evaluation cycle), the content name (for example, video A in the present embodiment), the description of the time and scene dialogue, and the first dimension to the nth dimension (n = 300 in the present embodiment). Each feature amount is shown, and a dialogue feature vector is shown for each record. The dialogue feature vectors are expressed as Q1 (Y1_1, Y2_1, Y3_1,..., Ym_1), Q2 (Y1_2, Y2_2,..., Ym_2),. Here, the dialogue feature vector is a dialogue feature vector used in a frame corresponding to the image feature vector in FIG. Each of Y1_1, Y2_1, Y3_1,..., Ym_1 is a line feature amount in each dimension. Here, for example, the feature quantity Y1_1 is the maximum value of the first dimension feature quantity in the feature vectors of all words, the feature quantity T1_2 is the minimum value of the first dimension feature quantity in the feature vectors of all words, and the feature quantity Y1_3 is the average value of the first-dimensional feature values in the feature vectors of all words.

図１に戻り、アクセス数集積部１３は、学習コンテンツの評価周期毎のアクセス数を、外部装置（オペレータの各々の入電を集計を行なうサーバなど）から入力し、評価周期に対応させて抽出特徴記憶部１８に書き込んで記憶させる。 Returning to FIG. 1, the access number accumulating unit 13 inputs the number of accesses for each evaluation period of the learning content from an external device (such as a server that counts each incoming call of the operator), and extracts it according to the evaluation period. The data is written and stored in the storage unit 18.

図５は、抽出特徴記憶部１８に記憶されているアクセス数テーブルの構成例を示す図である。レコード単位に、時間（評価周期）と、評価周期におけるアクセス数である入電数が対応付けられている。 FIG. 5 is a diagram illustrating a configuration example of the access number table stored in the extracted feature storage unit 18. Time (evaluation period) and the number of incoming calls that are the number of accesses in the evaluation period are associated with each record.

図１に戻り、予測モデル生成部１４は、所定の重回帰モデルからなるコンテンツ評価予測モデルを、機械学習により求める。ここで、予測モデル生成部１４は、上述した画像特徴ベクトルと台詞特徴ベクトルとを総合し、コンテンツ総合特徴ベクトルを生成する。コンテンツ総合特徴ベクトルは、４０９６次元の特徴量からなる画像特徴ベクトルと、３００次元の特徴量からなる台詞特徴ベクトルとが総合され、４３９６次元の特徴量の次元を有するコンテンツ総合特徴ベクトルとなる。また、上記重回帰モデルの代わりに、重回帰（線形回帰）をニューラルネットワークで構成し、単純な線形回帰問題をこのニューラルネットワークに学習させるようにして、コンテンツ評価予測モデルを構成しても良い。 Returning to FIG. 1, the prediction model generation unit 14 obtains a content evaluation prediction model including a predetermined multiple regression model by machine learning. Here, the prediction model generation unit 14 combines the above-described image feature vector and dialogue feature vector to generate a content comprehensive feature vector. The content total feature vector is a content total feature vector having a dimension of 4396 dimensions, by combining an image feature vector composed of 4096 dimensions and a dialogue feature vector composed of 300 dimensions. Further, instead of the multiple regression model, multiple regression (linear regression) may be configured by a neural network, and a simple linear regression problem may be learned by the neural network to configure a content evaluation prediction model.

図６は、画像特徴ベクトルと台詞特徴ベクトルとをコンテンツ総合ベクトルとして総合する概念図である。画像特徴ベクトルのベクトル空間１００（例えば、画像ベースのショッピング番組ベクター空間）と、台詞特徴ベクトルのベクトル空間１０２（例えば、シナリオベクター空間）とが総合され、コンテンツ評価予測モデルに対して入力するコンテンツ総合特徴ベクトルのベクトル空間１０４（例えば、総合特徴ベクター空間）が生成される。図６においては、ベクトル空間１００、１０２及び１０４の各々は、３次元として示されているが、それぞれ４０９６次元、３００次元、４３９６次元である。
例えば、画像特徴ベクトルのベクトル空間１００においては、図３の画像特徴ベクトルテーブルにおけるコンテンツ名が動画Ａであり、時間が００：０１の特徴ベクトルが動画Ａ＿００：０１と、時間が００：０２の特徴ベクトルが動画Ａ＿００：０２と表示されている。また、台詞特徴ベクトルのベクトル空間１０２においては、図４の台詞特徴ベクトルテーブルにおけるコンテンツ名が動画Ａであり、時間が００：０１の特徴ベクトルが台詞Ａ＿００：０１と、時間が００：０２の特徴ベクトルが台詞Ａ＿００：０２と表示されている。 FIG. 6 is a conceptual diagram for integrating an image feature vector and a dialogue feature vector as a content total vector. Content vector vector 100 (for example, an image-based shopping program vector space) and dialogue feature vector vector space 102 (for example, a scenario vector space) are integrated, and the total content input to the content evaluation prediction model A feature vector space 104 (eg, a total feature vector space) is generated. In FIG. 6, each of the vector spaces 100, 102, and 104 is shown as three dimensions, but has 4096 dimensions, 300 dimensions, and 4396 dimensions, respectively.
For example, in the image feature vector vector space 100, the content name in the image feature vector table of FIG. 3 is the movie A, the feature vector at time 00:01 is the movie A_00: 01, and the feature is time 00:02. The vector is displayed as a moving image A_00: 02. Also, in the dialogue feature vector vector space 102, the content name in the dialogue feature vector table of FIG. 4 is the video A, the feature vector of time 00:01 is dialogue A_00: 01, and the feature is time 00:02. The vector is displayed as line A_00: 02.

すなわち、予測モデル生成部１４は、図３の画像特徴ベクトルテーブルと、図４の台詞特徴ベクトルテーブルと、図５に示すアクセス数テーブルを統合し、コンテンツ評価予測モデルを学習させる、学習コンテンツのデータ組を作成する。そして、予測モデル生成部１４は、評価周期毎に学習コンテンツのデータ組を抽出特徴記憶部１８に書き込んで記憶させる。 That is, the prediction model generation unit 14 integrates the image feature vector table of FIG. 3, the dialogue feature vector table of FIG. 4, and the access number table shown in FIG. 5 to learn the content evaluation prediction model and learn content data Create a pair. And the prediction model production | generation part 14 writes and memorize | stores the data set of learning content in the extraction feature memory | storage part 18 for every evaluation period.

図７は、抽出特徴記憶部１８に記憶されている学習コンテンツデータ組テーブルの構成例を示す図である。学習コンテンツデータ組テーブルは、レコード単位に、時間（評価周期）に対応させて、１次元からｍ次元（本実施形態においてはｍ＝４０９６）の各々の画像特徴量と、ｍ＋１次元からｍ＋ｎ次元（本実施形態においてはｎ＝３００）の各々の台詞特徴量と、評価周期内における受電数とが学習コンテンツデータ組として示されている。ここで、各次元に対応する入電数は、視聴者が商品を紹介するコンテンツを視聴し、視聴者がその商品に対して興味を持ち、電話のある場所まで移動してコールセンターに電話をかけるまでの遅延時間を考慮し、予測する評価周期とその直後の評価周期における入電数を、予測する評価周期の画像及び台詞に対する評価（入電数）とする。すなわち、予測する評価周期と直後の評価周期との各々の入電数を加算し、学習コンテンツデータ組としている。したがって、後述するコンテンツ評価予測モデルは、予測する評価周期と直後の評価周期との各々の入電数を、予測入電数として予測する。 FIG. 7 is a diagram illustrating a configuration example of the learning content data set table stored in the extracted feature storage unit 18. The learning content data set table corresponds to time (evaluation period) for each record, and each image feature amount from one dimension to m dimension (in this embodiment, m = 4096) and m + 1 dimension to m + n dimension ( In this embodiment, each line feature amount of n = 300) and the number of received powers in the evaluation period are shown as a learning content data set. Here, the number of incoming calls corresponding to each dimension is the number of calls until the viewer views the content introducing the product, the viewer is interested in the product, moves to the place where the phone is located, and calls the call center In consideration of the delay time, the number of incoming calls in the predicted evaluation cycle and the immediately following evaluation cycle is set as the evaluation (number of incoming calls) for the image and dialogue of the predicted evaluation cycle. That is, the number of incoming calls in the predicted evaluation cycle and the immediately following evaluation cycle are added to form a learning content data set. Therefore, the content evaluation prediction model to be described later predicts the number of incoming calls in the evaluation cycle to be predicted and the immediately following evaluation cycle as the predicted number of incoming calls.

図１に戻り、予測モデル生成部１４は、コンテンツ評価予測モデルに対し、学習コンテンツデータ組テーブルの時間の順番に、時系列に、学習コンテンツデータ組におけるコンテンツ総合特徴ベクトルを順次入力させ、入力されるコンテンツ総合特徴ベクトルから予測される予測入電を出力する。このとき、予測モデル生成部１４は、コンテンツの全ての評価周期における学習コンテンツデータ組において、出力される予測入電が、同一の評価周期における学習コンテンツデータ組における入電数に近づくように、コンテンツ評価予測モデルにおけるニューラルネットの結合層の重み付けを調整する。また、予測モデル生成部１４は、上述した処理を、複数の学習コンテンツの学習コンテンツデータ組テーブルにおいても行なう。 Returning to FIG. 1, the prediction model generation unit 14 sequentially inputs the content comprehensive feature vectors in the learning content data set in time order in the learning content data set table with respect to the content evaluation prediction model. The predicted incoming call predicted from the content comprehensive feature vector is output. At this time, the prediction model generation unit 14 performs the content evaluation prediction so that the predicted incoming call output in the learning content data sets in all the evaluation cycles of the content approaches the number of incoming calls in the learning content data set in the same evaluation cycle. Adjust the weights of the neural network connection layer in the model. Moreover, the prediction model production | generation part 14 performs the process mentioned above also in the learning content data set table of several learning content.

このとき、回帰の問題として、複数のパラメータ（コンテンツ総合特徴ベクトル）から、一つのパラメータ（入電数）を予測することは重回帰モデルである。この重回帰モデル（コンテンツ評価予測モデル）において、予測に使用する次元数の２倍のデータ数が必要となる。しかしながら、本実施形態においては、コンテンツ総合特徴ベクトルの次元が４３９６あるため、単純に８７９２個のデータが必要であるが、データ（学習コンテンツデータ組）の数が不十分な場合あるいは計算を単純化する目的で、結合における係数の算出にスパースモデリングの手法を用いている。具体的には、ＬＡＳＳＯ（least absolute shrinkage and selection operator）と呼ばれるＬ１最小化のアルゴリズムを用いる。本実施形態においては、過学習を防止するため、１０交差検定でパラメータを決定している。また、すでに述べたように、コンテンツ評価予測モデルを重回帰モデルでなく、ニューラルネットワークのモデルにて生成する場合も、同様の処理を行なうことができる。 At this time, as a problem of regression, predicting one parameter (number of incoming calls) from a plurality of parameters (content comprehensive feature vector) is a multiple regression model. In this multiple regression model (content evaluation prediction model), the number of data twice the number of dimensions used for prediction is required. However, in this embodiment, since the content comprehensive feature vector has a dimension of 4396, 8792 pieces of data are simply required. However, when the number of data (learning content data sets) is insufficient or the calculation is simplified. For this purpose, a sparse modeling technique is used to calculate the coefficient in the combination. Specifically, an L1 minimization algorithm called LASSO (least absolute shrinkage and selection operator) is used. In the present embodiment, parameters are determined by 10 cross-validation to prevent overlearning. Further, as described above, the same processing can be performed when the content evaluation prediction model is generated not by the multiple regression model but by a neural network model.

予測モデル生成部１４は、コンテンツ評価予測モデルを構成する重回帰モデルにおける重み付けの回帰係数ｗ（ｐ）を、以下の（１）式により求める。以下の（１）式において、ｎは評価周期の順番を示す評価周期番号である。Ｎは、コンテンツにおける全評価周期数である。Ｚｎは、ｎ番目の評価周期番号におけるアクセス数（入電数）である。ｗ（ｐ）は、コンテンツ総合特徴ベクトルにおけるｐ番目の特徴量のデータへの重み係数（回帰係数）である。ｘ_ｎ（ｐ）は、ｎ番目の評価周期番号のコンテンツ総合特徴ベクトルにおけるｐ番目の特徴量のデータである。ｂは、切片のデータである。λは、正則化係数である。 The prediction model generation unit 14 obtains a weighting regression coefficient w (p) in the multiple regression model constituting the content evaluation prediction model by the following equation (1). In the following formula (1), n is an evaluation cycle number indicating the order of the evaluation cycles. N is the total number of evaluation periods in the content. Zn is the number of accesses (number of incoming calls) in the nth evaluation cycle number. w (p) is a weighting coefficient (regression coefficient) for the data of the p-th feature amount in the content comprehensive feature vector. x _n (p) is data of the p-th feature amount in the content comprehensive feature vector of the n-th evaluation cycle number. b is intercept data. λ is a regularization coefficient.

上述した（１）式において、ｘ_ｎ（ｐ）から予測入電数ｚを重回帰で推定するため、係数ｗ（ｐ）を決定する必要がある。このとき、（１）式において正規化項が１次で含まれているため、予測入電数ｚを推定するために必要のないｘ_ｎ（ｐ）に対する係数ｗ（ｐ）が０とされる（すなわち、スパース化される）。これにより、学習コンテンツデータ組の数が上述した規則に対して少なくとも、予測入電数ｚを予測するコンテンツ評価予測モデルを生成することができる。 In the above equation (1), it is necessary to determine the coefficient w (p) in order to estimate the predicted incoming power number z from the x _n (p) by multiple regression. At this time, since the normalized term is included in the first order in the equation (1), the coefficient w (p) for x _n (p) that is not necessary for estimating the predicted incoming power z is set to 0 ( Ie sparse). This makes it possible to generate a content evaluation prediction model that predicts at least the predicted number of incoming calls z with respect to the rules described above for the number of learning content data sets.

アクセス数予測部１５は、予測モデル生成部１４が生成したコンテンツ評価予測モデルを用いて、評価対象のコンテンツである対象コンテンツの評価周期毎の予測入電数を求める。ここで、アクセス数予測部１５は、対象コンテンツの評価周期毎の画像特徴ベクトルを画像特徴抽出部１２１に抽出させる。また、アクセス数予測部１５は、対象コンテンツの評価周期毎の台詞特徴ベクトルを台詞特徴抽出部１２２に抽出させる。 The access number prediction unit 15 uses the content evaluation prediction model generated by the prediction model generation unit 14 to obtain the predicted number of incoming calls for each evaluation period of the target content that is the content to be evaluated. Here, the access number prediction unit 15 causes the image feature extraction unit 121 to extract an image feature vector for each evaluation period of the target content. In addition, the access number prediction unit 15 causes the dialogue feature extraction unit 122 to extract dialogue feature vectors for each evaluation period of the target content.

そして、アクセス数予測部１５は、特徴抽出部１２から供給される画像特徴ベクトルと台詞特徴ベクトルとの各々を統合して、コンテンツ総合特徴ベクトルを生成する。アクセス数予測部１５は、コンテンツ評価予測モデルに対し、上記コンテンツ総合特徴ベクトルを、時系列に評価周期毎に入力し、予測入電数ｚを得る。 Then, the access number prediction unit 15 integrates each of the image feature vector and the dialogue feature vector supplied from the feature extraction unit 12 to generate a content comprehensive feature vector. The number-of-accesses prediction unit 15 inputs the content comprehensive feature vector to the content evaluation prediction model in time series for each evaluation period, and obtains the predicted number of incoming calls z.

上述したように、本実施形態によれば、作成したコンテンツが視聴者からどの程度の評価を受けるかを、予測入電数により事前に得られるため、広告の番組制作にノウハウを有さない番組作成者であっても、視聴者の評価を考慮して作成することができる。
これにより、本実施形態によれば、番組作成者が（広告の番組制作にノウハウを有さない番組作成者であっても）、視聴者の評価を予測してコンテンツの作成が行えるため、視聴者の評価が全く不明な状態で、テレビショッピングの番組などで無駄に放映することが無くなる。 As described above, according to the present embodiment, how much evaluation the created content receives from the viewer can be obtained in advance by the predicted number of incoming calls, so that the program creation without know-how in advertising program production Even a viewer can be created in consideration of viewer evaluation.
As a result, according to the present embodiment, since the program creator (even a program creator who does not have know-how in advertising program production) can create a content by predicting the viewer's evaluation, The user's evaluation is completely unknown, and there is no useless broadcast on TV shopping programs.

また、本実施形態によれば、作成したコンテンツにおいて、所定の評価周期において期待した予測入電数が得られない場合、過去の事例を参考に作成した画像及び台詞に評価周期に対応する部分のコンテンツを入れ替えて、再度、コンテンツの評価が行えるため、広告番組の作成のノウハウを持たない番組制作者でも、予測入電数の増加に寄与できるコンテンツを容易に作成することができる。 Further, according to the present embodiment, in the created content, when the predicted number of incoming calls expected in a predetermined evaluation cycle cannot be obtained, the content of the portion corresponding to the evaluation cycle in the image and dialogue created with reference to past cases Since the content can be evaluated again, even a program producer who does not have the know-how of creating an advertising program can easily create content that can contribute to an increase in the predicted number of incoming calls.

次に、図８を用いて、本実施形態のコンテンツ評価予測システム１におけるコンテンツ評価予測モデルの生成の処理の流れを説明する。図８は、本実施形態のコンテンツ評価予測システム１におけるコンテンツ評価予測モデルの生成処理の動作例を示すフローチャートである。 Next, a flow of processing for generating a content evaluation prediction model in the content evaluation prediction system 1 according to the present embodiment will be described with reference to FIG. FIG. 8 is a flowchart illustrating an operation example of the content evaluation prediction model generation process in the content evaluation prediction system 1 of the present embodiment.

ステップＳ１：
コンテンツデータ入力部１１は、コンテンツ評価予測モデルを生成するための学習コンテンツのデータを外部装置から入力し、データベース１６に対して書き込んで記憶させる。 Step S1:
The content data input unit 11 inputs learning content data for generating a content evaluation prediction model from an external device, and writes and stores it in the database 16.

ステップＳ２：
台詞特徴抽出部１２２は、データベース１６から学習コンテンツにおける評価周期単位でテキスト文章を、順次読み出す。そして、台詞特徴抽出部１２２は、評価周期内におけるテキスト文章から、すでに説明した辞書を参照しつつ、形態素解析を行なって単語の抽出を行なう。 Step S2:
The dialogue feature extraction unit 122 sequentially reads out text sentences from the database 16 in units of evaluation periods in the learning content. Then, the line feature extraction unit 122 extracts words from the text sentences in the evaluation period by performing morphological analysis while referring to the already described dictionary.

ステップＳ３：
台詞特徴抽出部１２２は、Ｗｏｒｄ２ｖｅｃあるいはＤｏｃ２ｖｅｃを用いて、抽出した単語毎に、１００次元の特徴ベクトルを抽出する。そして、台詞特徴抽出部１２２は、全ての単語の特徴ベクトルにおける各次元における特徴量の最大値及び最小値を抽出する。また、台詞特徴抽出部１２２は、全ての単語の特徴ベクトルにおける各次元における特徴量の平均値を求める。これにより、台詞特徴抽出部１２２は、１００次元の各次元において、評価周期の全単語の特徴量における最大値及び最小値と、全単語の特徴量の平均値とを有するため、３００次元の特徴量を有する台詞特徴ベクトルを生成する。
そして、台詞特徴抽出部１２２は、生成した台詞特徴ベクトルを評価周期に対応させて、抽出特徴記憶部１８の台詞特徴ベクトルテーブルに書き込んで記憶させる。 Step S3:
The dialogue feature extraction unit 122 extracts a 100-dimensional feature vector for each extracted word using Word2vec or Doc2vec. Then, the line feature extraction unit 122 extracts the maximum value and the minimum value of the feature amount in each dimension in the feature vectors of all words. Further, the line feature extraction unit 122 obtains an average value of feature amounts in each dimension in the feature vectors of all words. Accordingly, the dialogue feature extraction unit 122 has the maximum value and the minimum value of the feature values of all the words in the evaluation period and the average value of the feature values of all the words in each of the 100 dimensions, and thus the 300-dimensional feature. A dialogue feature vector having a quantity is generated.
Then, the dialogue feature extraction unit 122 writes and stores the generated dialogue feature vector in the dialogue feature vector table of the extraction feature storage unit 18 in association with the evaluation period.

ステップＳ４：
台詞特徴抽出部１２２は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の台詞特徴ベクトルの抽出が終了したか否かの判定を行なう。このとき、台詞特徴抽出部１２２は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の台詞特徴ベクトルの抽出が終了した場合、処理をステップＳ５へ進める。一方、台詞特徴抽出部１２２は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の台詞特徴ベクトルの抽出が終了していない場合、次の順番の評価周期に対応した部分の学習コンテンツからの台詞特徴ベクトルの抽出を行なうため、処理をステップＳ２へ進める。 Step S4:
The dialogue feature extraction unit 122 determines whether or not the extraction of dialogue feature vectors for each evaluation cycle of the learning content is completed in all the evaluation cycles of the learning content. At this time, if the dialogue feature extraction unit 122 completes the extraction of dialogue feature vectors for each evaluation cycle of the learning content in all the evaluation cycles of the learning content, the process proceeds to step S5. On the other hand, if the dialogue feature vector extraction for each learning content evaluation cycle has not been completed in all the evaluation cycles of the learning content, the dialogue feature extraction unit 122 learns the portion corresponding to the next evaluation cycle. In order to extract a dialogue feature vector from the content, the process proceeds to step S2.

ステップＳ５：
画像特徴抽出部１２１は、データベース１６から学習コンテンツにおける評価周期単位で動画を、順次読み出す。テレビショッピングの場合、１秒間（所定周期）に３０フレームの画像が用いられている。このため、画像特徴抽出部１２１は、例えば、１秒間における最初の１フレーム目をサンプリングとして読み出す。
そして、画像特徴抽出部１２１は、他のＣＮＮから転移させたＣＮＮで構成された特徴抽出機能により、サンプリングした画像から、この画像の画像特徴ベクトルを抽出する。 Step S5:
The image feature extraction unit 121 sequentially reads out moving images from the database 16 in units of evaluation periods in the learning content. In the case of television shopping, 30 frames of images are used per second (predetermined period). For this reason, the image feature extraction unit 121 reads, for example, the first frame in one second as sampling.
Then, the image feature extraction unit 121 extracts the image feature vector of this image from the sampled image by the feature extraction function configured with the CNN transferred from the other CNN.

ステップＳ６：
画像特徴抽出部１２１は、評価周期（１分）内における所定周期（１秒）の画像特徴ベクトル（６０秒分の特徴ベクトル）を抽出したか（抽出が終了したか）否かの判定を行なう。このとき、画像特徴抽出部１２１は、評価周期内における所定周期の画像特徴ベクトルを抽出した場合、処理をステップＳ７へ進める。一方、画像特徴抽出部１２１は、評価周期内における所定周期の画像特徴ベクトルを抽出していない場合、処理をステップＳ５へ進める。 Step S6:
The image feature extraction unit 121 determines whether an image feature vector (feature vector for 60 seconds) having a predetermined period (1 second) within the evaluation period (1 minute) has been extracted (extraction has been completed). . At this time, if the image feature extraction unit 121 extracts an image feature vector having a predetermined period within the evaluation period, the process advances to step S7. On the other hand, if the image feature extraction unit 121 has not extracted an image feature vector of a predetermined period within the evaluation period, the process proceeds to step S5.

ステップＳ７：
画像特徴抽出部１２１は、上記６０秒分、すなわち６０個の画像特徴ベクトルの４０９６次元の次元毎に特徴量の平均値を算出し、この次元それぞれの平均値からなる画像特徴ベクトルを、評価周期における画像特徴量ベクトルとして求める。そして、台詞特徴抽出部１２２は、求めた画像特徴ベクトルを評価周期に対応させて、抽出特徴記憶部１８の画像特徴ベクトルテーブルに書き込んで記憶させる。 Step S7:
The image feature extraction unit 121 calculates an average value of feature values for each of 4096 dimensions of the 60 image feature vectors, that is, 60 image feature vectors, and calculates an image feature vector composed of the average value of each dimension as an evaluation cycle. As an image feature vector. Then, the dialogue feature extraction unit 122 writes and stores the obtained image feature vector in the image feature vector table of the extraction feature storage unit 18 in association with the evaluation period.

ステップＳ８：
画像特徴抽出部１２１は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の画像特徴ベクトルの抽出が終了したか否かの判定を行なう。このとき、画像特徴抽出部１２１は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の画像特徴ベクトルの抽出が終了した場合、処理をステップＳ９へ進める。一方、画像特徴抽出部１２１は、学習コンテンツにおける全ての評価周期において、この学習コンテンツの評価周期毎の画像特徴ベクトルの抽出が終了していない場合、次の順番の評価周期に対応した部分の学習コンテンツからの画像特徴ベクトルの抽出を行なうため、処理をステップＳ５へ進める。 Step S8:
The image feature extraction unit 121 determines whether or not the extraction of the image feature vector for each evaluation period of the learning content is completed in all the evaluation periods of the learning content. At this time, the image feature extraction unit 121 advances the process to step S9 when the extraction of the image feature vector for each evaluation period of the learning content is completed in all the evaluation periods of the learning content. On the other hand, if the extraction of image feature vectors for each evaluation period of the learning content has not been completed in all the evaluation periods in the learning content, the image feature extraction unit 121 learns the part corresponding to the next evaluation period. In order to extract the image feature vector from the content, the process proceeds to step S5.

ステップＳ９：
アクセス数集積部１３は、学習コンテンツの評価周期毎のアクセス数を、外部装置から入力し、評価周期に対応させて抽出特徴記憶部１８のアクセス数テーブルに書き込んで記憶させる。
次に、予測モデル生成部１４は、重回帰モデルであるコンテンツ評価予測モデルの生成に使用する学習コンテンツデータ組を作成する。このとき、予測モデル生成部１４は、抽出特徴記憶部１８から、画像特徴ベクトルテーブルと、台詞特徴ベクトルテーブルと、アクセス数テーブルとを読み出し、それぞれを統合して学習コンテンツデータ組テーブルを構成して、抽出特徴記憶部１８に書き込んで記憶させる。 Step S9:
The access number accumulating unit 13 inputs the number of accesses for each evaluation period of the learning content from an external device, and writes and stores it in the access number table of the extracted feature storage unit 18 in correspondence with the evaluation period.
Next, the prediction model generation unit 14 generates a learning content data set used for generating a content evaluation prediction model that is a multiple regression model. At this time, the prediction model generation unit 14 reads the image feature vector table, the dialogue feature vector table, and the access number table from the extracted feature storage unit 18 and integrates them to form a learning content data set table. The extracted feature storage unit 18 is written and stored.

ステップＳ１０：
予測モデル生成部１４は、抽出特徴記憶部１８の学習コンテンツデータ組テーブルを参照し、学習コンテンツデータ組のコンテンツ総合特徴ベクトルを重回帰モデルに入力し、出力される予測入電数が学習コンテンツデータ組テーブルにおける入電数に近づくように、重回帰モデルの回帰係数を調整する処理を行なう。このとき、予測モデル生成部１４は、ＬＡＳＳＯと呼ばれるＬ１最小化のアルゴリズムを用いた機械学習により、重回帰モデルの各特徴量に対する回帰係数のスパース化を行ない、重回帰モデルに基づく調整コンテンツ評価予測モデルを生成する。また、予測モデル生成部１４は、生成したコンテンツ評価予測モデルを、抽出特徴記憶部１８に書き込んで記憶させる。 Step S10:
The prediction model generation unit 14 refers to the learning content data set table in the extracted feature storage unit 18, inputs the content comprehensive feature vector of the learning content data set to the multiple regression model, and the output number of predicted inputs is the learning content data set. A process of adjusting the regression coefficient of the multiple regression model is performed so as to approach the number of incoming calls in the table. At this time, the prediction model generation unit 14 sparses the regression coefficient for each feature amount of the multiple regression model by machine learning using an L1 minimization algorithm called LASSO, and adjusts content evaluation prediction based on the multiple regression model Generate a model. Further, the prediction model generation unit 14 writes and stores the generated content evaluation prediction model in the extracted feature storage unit 18.

上述した処理により、コンテンツ評価予測モデルを生成し、アクセス数予測部１５は、このコンテンツ評価予測モデルを用い、評価対象の対象コンテンツの視聴者の評価としての予測受電数を求める。
すなわち、アクセス数予測部１５は、抽出特徴記憶部１８からコンテンツ評価予測モデルを読み出す。そして、アクセス数予測部１５は、コンテンツ評価予測モデルに対し、対象コンテンツの評価周期毎の画像特徴ベクトル及び台詞特徴ベクトルから生成したコンテンツ総合特徴ベクトルを入力させ、評価周期毎の予測入電数を得る。
そして、アクセス数予測部１５は、求めた予測入電数を、評価した対象コンテンツを識別する情報を付加し、評価周期の各々と、この評価周期に対応する予測入電数との組を、この対象コンテンツの評価結果として、評価結果記憶部１９に書き込んで記憶させる。 Through the above-described processing, a content evaluation prediction model is generated, and the access number prediction unit 15 uses this content evaluation prediction model to obtain a predicted power reception number as an evaluation of the viewer of the target content to be evaluated.
That is, the access number prediction unit 15 reads the content evaluation prediction model from the extracted feature storage unit 18. Then, the access number predicting unit 15 inputs the content comprehensive feature vector generated from the image feature vector and the speech feature vector for each evaluation period of the target content to the content evaluation prediction model, and obtains the predicted number of incoming calls for each evaluation period. .
Then, the number-of-accesses prediction unit 15 adds information for identifying the target content that has been evaluated to the calculated predicted number of incoming calls, and sets each of the evaluation periods and the predicted number of incoming calls corresponding to the evaluation period as the target. The content evaluation result is written and stored in the evaluation result storage unit 19.

図９は、本実施形態におけるコンテンツ評価予測モデルで予測した、商品の広告である対象コンテンツの各評価周期の予測入電数と、実際に得られた入電数との比較を示すグラフである。図９のグラフにおいて、横軸が時間（評価周期（１分））を示し、縦軸が予測入電数及び入電数（発呼数）を示している。このグラフは、複数のコンテンツの予測入電数と実際の入電数とを連結して生成している。
また、このグラフにおいては、予測入電数が破線、実際の入電数が実線で示されている。図９のグラフにおける予測入電数と実際の入電数との誤差（ＲＳＭＥ：root mean squared error）は、全ての評価周期の平均値として±１．０の範囲となっている。 FIG. 9 is a graph showing a comparison between the predicted number of incoming calls in each evaluation period of the target content that is the advertisement of the product predicted by the content evaluation prediction model in the present embodiment and the actually obtained number of incoming calls. In the graph of FIG. 9, the horizontal axis indicates time (evaluation cycle (1 minute)), and the vertical axis indicates the predicted number of incoming calls and the number of incoming calls (number of calls). This graph is generated by connecting the predicted number of incoming calls and the actual number of incoming calls of a plurality of contents.
Further, in this graph, the predicted number of incoming calls is indicated by a broken line, and the actual number of incoming calls is indicated by a solid line. The error (RSME: root mean squared error) between the predicted number of incoming calls and the actual number of incoming calls in the graph of FIG. 9 is in the range of ± 1.0 as an average value of all evaluation periods.

図１０は、図９において連結されたコンテンツから選択した、対象コンテンツの各評価周期の予測入電数と、実際に得られた入電数との比較を示すグラフである。図１０（ａ）は、テレビショッピングにおける商品Ａの広告を行なった対象コンテンツである。図１０（ｂ）は、テレビショッピングにおける商品Ｂの広告を行なった対象コンテンツである。
図１０（ａ）及び図１０（ｂ）の各々において、横軸は時間（評価周期（１分））を示し、縦軸が予測入電数及び入電数（発呼数）を示している。また、図１０（ａ）及び図１０（ｂ）ともに、コンテンツの長さは３０分である。また、この図１０（ａ）及び図１０（ｂ）のグラフにおいては、予測入電数が破線、実際の入電数が実線で示されている。 FIG. 10 is a graph showing a comparison between the predicted number of incoming calls for each evaluation period of the target content selected from the linked content in FIG. 9 and the number of incoming calls actually obtained. FIG. 10A shows the target content for which the product A is advertised in the television shopping. FIG. 10B shows target content for which the product B is advertised in the television shopping.
In each of FIG. 10A and FIG. 10B, the horizontal axis indicates time (evaluation cycle (1 minute)), and the vertical axis indicates the predicted number of incoming calls and the number of incoming calls (number of calls). Also, in both FIG. 10A and FIG. 10B, the content length is 30 minutes. Further, in the graphs of FIGS. 10A and 10B, the predicted number of incoming calls is indicated by a broken line, and the actual number of incoming calls is indicated by a solid line.

図１０（ａ）における商品Ａの測入電数と実際の入電数との誤差（ＲＳＭＥ）は、３０分における全ての評価周期の平均値として０．９６４７７となっている。また、図１０（ｂ）における商品Ｂの測入電数と実際の入電数との誤差（ＲＳＭＥ）は、３０分における全ての評価周期の平均値として０．７７６７４となっている。
この結果から、本実施形態によるコンテンツ評価予測モデルが予測する予測入電数は、実際にコンテンツを視聴した視聴者の評価を示す入電数とほぼ同様の数値（入電数の変化の傾向も同様）が得られることが判る。 The error (RSME) between the measured power input number and the actual power input number of the product A in FIG. 10A is 0.96477 as an average value of all evaluation periods in 30 minutes. In addition, the error (RSME) between the measured power input number and the actual power input number of the product B in FIG. 10B is 0.77674 as an average value of all evaluation periods in 30 minutes.
From this result, the predicted number of incoming calls predicted by the content evaluation prediction model according to the present embodiment is almost the same as the number of incoming calls indicating the evaluation of the viewer who actually viewed the content (the change tendency of the number of incoming calls is also the same). It turns out that it is obtained.

したがって、本実施形態によれば、新たに作成しているコンテンツを対象コンテンツとして、コンテンツ評価予測モデルにより、各評価周期の予測入電数を求めることで、広告のコンテンツを複数に分割して評価することが可能となり、コンテンツにおける広告する商品を紹介する部分（評価周期）と、視聴者の商品の購買意欲を向上させる部分（評価周期）とを、切り分けて評価することができ、視聴者の評価における商品の紹介を行なう部分と購買意欲を向上させる部分の両者の関係性も推定できる。
また、本実施形態によれば、新たに作成しているコンテンツにおいて、入電数を増加させるように、視聴者に対してインパクトを与えて作成した評価周期における予測入電数が、期待したほど増加しない場合、コンテンツにおけるこの評価周期における画像及び台詞を他と変え、放映前に再度試すことができ、予測入電数の増加に寄与できるコンテンツを作成する作業を支援することができる。 Therefore, according to the present embodiment, the newly created content is the target content, and the content of the advertisement is divided into a plurality of parts and evaluated by obtaining the predicted number of incoming calls in each evaluation cycle by the content evaluation prediction model. It is possible to separate and evaluate the part that introduces the product to be advertised in the content (evaluation cycle) and the part that improves the viewer's willingness to purchase the product (evaluation cycle). It is also possible to estimate the relationship between the part that introduces products and the part that improves purchase motivation.
In addition, according to the present embodiment, in the newly created content, the predicted number of incoming calls in the evaluation cycle created by giving an impact to the viewer does not increase as expected in order to increase the number of incoming calls. In this case, it is possible to change the image and dialogue in this evaluation period in the content and try again before the broadcast, and support the work of creating content that can contribute to an increase in the predicted number of incoming calls.

上記実施形態においては、コンテンツとしてテレビショッピングであり、コンテンツ評価予測モデルに入力するコンテンツの特徴として画像特徴と台詞特徴とであった。しかしながら、コンテンツがラジオなどによる商品の広告である場合、台詞特徴の他に、台詞を話す話者の音声の周波数や強度（音声信号の振幅値）などの特徴を抽出し、コンテンツ評価予測モデルに入力する構成としても良い。
また、テレビショッピングにおいて動画でなく、静止画が時系列に変化するコンテンツに対しても、同様に画像特徴を抽出して用いても良い。 In the above-described embodiment, the content is TV shopping, and the feature of the content input to the content evaluation prediction model is the image feature and the dialogue feature. However, if the content is an advertisement for a product such as a radio, in addition to the dialogue features, features such as speech frequency and intensity (speech signal amplitude value) of the speaker who speaks dialogue are extracted and used as a content evaluation prediction model. An input configuration may be used.
Further, image features may be extracted and used in the same manner for content in which still images change in time series instead of moving images in television shopping.

また、上記実施形態においては、台詞における単語の特徴量として、Ｗｏｒｄ２ｖｅｃあるいはＤｏｃ２ｖｅｃにより単語から抽出された特徴量を用いている。しかしながら、評価周期内において時系列に与えられる単語を、それぞれの単語の時系列な配置における長期依存（long-term dependencies）を学習するため、ＲＮＮ（recurrent neural network、過去の情報を考慮して未来のことを予測するニューラルネットワーク）の発展型であるＬＳＴＭ（long short-term memory）を用いても良い。 Further, in the above embodiment, feature quantities extracted from words by Word2vec or Doc2vec are used as feature quantities of words in the dialogue. However, in order to learn the long-term dependencies of the words given in time series within the evaluation period in the time-series arrangement of each word, the future takes into account recurrent neural network, past information LSTM (long short-term memory), which is an advanced type of a neural network that predicts the above, may be used.

この構成の場合、上述したＷｏｒｄ２ｖｅｃあるいはＤｏｃ２ｖｅｃにより単語から抽出された特徴量の特徴ベクトルをＬＳＴＭに入力し、各特徴ベクトルに対して単語の時系列における並びを考慮した配列特徴ベクトル（単語の特徴ベクトルとは異なる時系列における順方向及び逆方向のベクトル）を求める。これにより、単語の特徴ベクトルではパラメータとして反映されなかった、各単語の文脈における配置を反映させることができ、より精度の高いコンテンツにおける台詞の評価に対する寄与を予測することができる。 In the case of this configuration, feature vectors extracted from words by Word2vec or Doc2vec described above are input to the LSTM, and for each feature vector, an array feature vector (word feature vector) that considers the time-series arrangement of words. Forward and backward vectors in a different time series). Thereby, the arrangement in the context of each word, which is not reflected as a parameter in the word feature vector, can be reflected, and the contribution to the evaluation of the dialogue in the content with higher accuracy can be predicted.

そして、ＬＳＴＭから単語毎に出力される上記配列ベクトルの各々に対し、台詞の時系列な単語の配置（単語の並び：台詞の文脈）に対応した重要度（Attention）を算出する。すなわち、配列ベクトルの重要度を算出することにより、いずれの配列ベクトルが予測入電数（アクセス数）を予測するために重要かを求める。そして、この重要度に基づいて予測入電数を予測するコンテンツ評価予測モデルにおいて、配列ベクトルの加重平均を算出することにより、予測入電数を予測する。
また、本実施形態においては、視聴者からの評価を示す反応（アクセス数）として、電話による入電数を用いたが、これにファックスを受け取った受け取り数、あるいはウェブブラウザ（インターネットブラウザ）における商品が選択されたクリックの回数などを、利用者の評価の反応として用いても良い。 Then, for each of the array vectors output for each word from the LSTM, an importance level (Attention) corresponding to the time-series word arrangement (word arrangement: line context) is calculated. That is, by calculating the importance of the array vector, it is determined which array vector is important for predicting the predicted number of incoming calls (number of accesses). Then, in the content evaluation prediction model that predicts the predicted number of incoming calls based on this importance, the predicted number of incoming calls is predicted by calculating a weighted average of the array vectors.
In this embodiment, the number of incoming calls by telephone is used as the reaction (number of accesses) indicating the evaluation from the viewer, but the number of received faxes or the product in the web browser (Internet browser) The number of clicks selected may be used as a user evaluation reaction.

また、図１に示すコンテンツ評価予測システムにおける入力されるコンテンツの情報の特徴ベクトルからアクセス数を予測する機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、コンテンツの情報の特徴ベクトルからアクセス数を予測する処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。 Further, a program for realizing the function of predicting the number of accesses from the feature vector of the input content information in the content evaluation prediction system shown in FIG. 1 is recorded on a computer-readable recording medium and recorded on this recording medium. The number of accesses may be predicted from the feature vector of the content information by causing the computer system to read and execute the program. Here, the “computer system” includes an OS and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、この発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design and the like within a scope not departing from the gist of the present invention.

１…コンテンツ評価予測システム
１１…コンテンツデータ入力部
１２…特徴抽出部
１３…アクセス数集積部
１４…予測モデル生成部
１５…アクセス数予測部
１６…データベース
１７…分解制限単語辞書記憶部
１８…抽出特徴記憶部
１９…評価結果記憶部 DESCRIPTION OF SYMBOLS 1 ... Content evaluation prediction system 11 ... Content data input part 12 ... Feature extraction part 13 ... Access number accumulation part 14 ... Prediction model production | generation part 15 ... Access number prediction part 16 ... Database 17 ... Decomposition limit word dictionary memory | storage part 18 ... Extraction feature Storage unit 19 ... Evaluation result storage unit

Claims

A feature extraction unit for extracting information features, which are features of the information of content whose information changes in time series, at predetermined intervals;
Said information feature, by machine learning using the sum of the number of access as the evaluation value of the content in each of the periods just after the predetermined period and the predetermined period, and inputs the information features corresponding to the information Thus, a content evaluation prediction system comprising: a prediction model generation unit that generates a content evaluation prediction model for predicting the number of accesses to the information feature.

The content evaluation prediction system according to claim 1, wherein the information includes at least one of a moving image and a dialogue.

The feature extraction unit
The feature extraction function up to the first stage of all connected layers in another convolutional neural network (CNN) after learning of image feature extraction is used when performing feature extraction of the moving image. Content evaluation prediction system.

The feature extraction unit
4. The feature extraction of the dialogue is performed by registering a decomposition restriction word, which is a word that restricts decomposition in morpheme analysis, in a dictionary, and performing morpheme analysis with reference to the dictionary. The content evaluation prediction system described in 1.

The prediction model generation unit
The content evaluation prediction system according to any one of claims 1 to 4, wherein a sparse modeling method is used when generating the content evaluation prediction model.

A feature extraction process in which a feature extraction unit extracts an information feature, which is a feature of the information of content whose information changes in time series, at predetermined intervals;
Prediction model generation unit, and the information characteristic, the machine learning using the sum of the number of access as the evaluation value of the content in each of the periods just after the predetermined period and the predetermined period, corresponding to the information And a prediction model generation process for generating a content evaluation prediction model for predicting the number of accesses to the information feature by inputting the information feature.