JP6549643B2

JP6549643B2 - Audience rating system, method and computer program

Info

Publication number: JP6549643B2
Application number: JP2017118644A
Authority: JP
Inventors: 藤田　昭人; 昭人藤田; ピーターデイビス
Original assignee: TELECOGNIX CORPORATION; Internet Initiative Japan Inc
Current assignee: TELECOGNIX CORPORATION; Internet Initiative Japan Inc
Priority date: 2017-06-16
Filing date: 2017-06-16
Publication date: 2019-07-24
Anticipated expiration: 2037-06-16
Also published as: JP2019004370A

Description

本発明は、放映番組の視聴率を推定する方法に関する。 The present invention relates to a method of estimating the rating of a broadcast program.

現在、テレビ放送における視聴率は、各テレビ局の各テレビ番組について調査されており、たとえば、テレビ広告の放映時間や放映料を決定するために、利用されている。また、視聴率データは、広告出稿社（者）、テレビ局、広告会社が広告取引をする際に、テレビの媒体力や広告効果を測るひとつの指標としても、利用される。特に米国では、視聴率データが、主としてテレビ広告の取引のための通貨として利用される。現在、視聴率の調査は、たとえば、米国のニールセン社、日本のビデオリサーチ社などの大手会社によって行われている。 At present, the ratings in television broadcasting are surveyed for each television program of each television station, and are used, for example, to determine the airing time and airing fee of television advertisements. In addition, audience rating data is also used as a measure of television media power and advertising effectiveness when an advertising agency (person), a television station or an advertising company trades in advertising. Particularly in the United States, audience data is used primarily as a currency for trading television advertisements. Currently, ratings surveys are conducted by major companies such as, for example, Nielsen in the United States and Video Research in Japan.

現時点で行われている視聴率の調査方法には、アンケート式と機械式がある。アンケート式の視聴率調査では、アンケートへの記入や電話連絡に基づき、各時間帯における番組の視聴を調べる。機械式の視聴率調査では、テレビ受像器に接続された装置が、そのテレビ受像器において各時間帯に視聴されたチャネルを、自動的に記録する。 At present, there are questionnaire methods and mechanical methods as the methods of surveying audience rating. In the questionnaire type audience rating survey, viewing of programs in each time zone is checked based on the filling in of questionnaires and telephone calls. In mechanical ratings surveys, a device connected to a television receiver automatically records the channels viewed in each time zone on the television receiver.

これらの従来方式では、すべてのデータが集計され、調査機関から結果が発表されるまで、ある程度の時間を要するという問題がある。たとえば、現行の視聴率調査のデータは、放送の当日ではなく、翌日に提供されるのが原則であり、特に、金曜、土曜、日曜日の放送については、翌週の月曜日に提供されている。これに対し、最近では、時間帯別視聴率データが放送の１〜２時間後に得られる、「ほぼリアル・タイム」で速報値を入手することができる調査測定機が開発されている。 In these conventional methods, there is a problem that it takes a certain amount of time until all the data are aggregated and the results are announced from the research institute. For example, the data of the current audience rating survey is basically provided on the next day, not on the day of the broadcast, and in particular, on Friday, Saturday, and Sunday, the data is provided on the following Monday. On the other hand, recently, survey and measurement machines have been developed that can obtain preliminary values in "almost real time" in which audience rating data by time zone is obtained 1 to 2 hours after broadcast.

また、従来の方式では、各視聴者や視聴装置からデータを集計する必要があるために、当然ながら、手間とコストを要するという問題もある。
以下では、網羅的ではないが、後でリスト化されている従来技術文献において提案されている視聴率推定方法を、概観したい。非特許文献３には、既存の視聴率調査方法として、ピープルメータシステム、オンラインメータシステム、日記式アンケートによる調査の記載がある。また、特開２０１０−１３０５８５（特許文献４）では、未放送テレビ番組の番組視聴率の予測精度を向上させた視聴率予測装置が提案されている。特許文献４の視聴率予測装置は、過去の同時間帯において放送された複数の放送済テレビ番組それぞれの特性を示す放送済番組特性情報と、複数の視聴者それぞれが複数の放送済テレビ番組の中から選択して視聴したテレビ番組を示す選択番組情報とに基づいて、テレビ番組の選択行動をモデル化した番組選択モデルを視聴者ごとに構築する番組選択モデル構築部と、上記所定日時において放送される複数の未放送テレビ番組の編成を示す番組編成情報と、複数の未放送テレビ番組それぞれの特性を示す未放送番組特性情報とに基づいて、複数の未放送テレビ番組の中から視聴者が選択して視聴すると予測される未放送テレビ番組である予測視聴番組を上記番組選択モデルを用いて視聴者毎に判定する選択番組判定部とを備える。しかし、選択番組情報などの、予測に必要な情報の取得は、容易でなく、手間とコストがかかる。 Further, in the conventional method, since it is necessary to count data from each viewer or viewing device, there is a problem that it takes time and cost as a matter of course.
In the following, we would like to give an overview of the ratings estimation method proposed in the prior art document, which is not exhaustive but is listed later. Non-Patent Document 3 describes, as an existing audience rating survey method, a survey based on a people meter system, an online meter system, and a diary questionnaire. Moreover, in the Unexamined-Japanese-Patent No. 2010-130585 (patent document 4), the audience rating prediction apparatus which improved the prediction accuracy of the program audience rating of a non-broadcast television program is proposed. In the audience rating prediction apparatus of Patent Document 4, broadcasted program characteristic information indicating the characteristics of each of a plurality of broadcasted television programs broadcasted in the same time zone in the past, and each of a plurality of viewers have a plurality of broadcasted television programs. A program selection model construction unit for constructing, for each viewer, a program selection model modeling a television program selection behavior based on selected program information indicating a television program selected and viewed from among the programs; Is selected from among the plurality of unbroadcast television programs based on program scheduling information indicating the organization of the plurality of unbroadcast television programs to be selected and unbroadcast program characteristic information indicating the characteristics of each of the plurality of unbroadcast television programs And a selected program determination unit that determines, for each viewer, a predicted viewing program that is an unbroadcast television program predicted to be selected and viewed, using the program selection model. However, acquisition of information necessary for prediction, such as selected program information, is not easy and takes time and cost.

また、非特許文献６では、放送前の番組情報のみを用いたテレビドラマの視聴率予測を行う方法が考案されている。放送局や時間帯、出演する役者、製作スタッフ、そして役者の話題性など、多種類の特徴量の組み合わせを用いた予測に関する記載がある。役者の話題性の定量的な指標には、ウィキペディア（Wikipedia）の観覧数やツイッタ(Twitter)の投稿数が利用されている。しかし、多種類の特徴量の作成には手間が大きく、他方で、少数の特徴量のみを使った場合には予測精度が低いという問題がある。なお、非特許文献６には、あるドラマに出演している俳優に関するウィキペディアやツイッタの閲覧回数なども含めた視聴率予測がなされている。しかし、単に視聴回数の時系列データに着目するのではなく、ウィキペディアやツイッタにおける記載内容も考慮されており、多くの種類の予測因子を用い、それらに重み付けをして計算するため、計算量が膨大であり、システムに対する負荷も大きい。 Further, Non-Patent Document 6 proposes a method of predicting an audience rating of a television drama using only program information before broadcast. There is a description about prediction using a combination of many kinds of feature quantities, such as broadcast stations and time zones, appearing actors, production staff, and topicality of actors. The number of views of Wikipedia (Wikipedia) and the number of posts of Twitter (Twitter) are used as a quantitative indicator of the subjectivity of actors. However, there is a problem that creation of many types of feature quantities is time-consuming, and on the other hand, when only a small number of feature quantities are used, the prediction accuracy is low. In Non-Patent Document 6, an audience rating prediction is made including the number of browsing times of Wikipedia and Twitter related to an actor appearing in a certain drama. However, rather than focusing on the time series data of the number of viewing times, the content described in Wikipedia and Twitter is also taken into consideration, and the calculation amount is large because weighting is performed using many types of prediction factors. It is huge and the load on the system is large.

ニールセン社はテレビ番組に関連したソーシャルメディア（ツイッタ、フェイスブック）の利用数を示す指標（ニールセン・ソーシャル・コンテンツ・レーティング）を測定して、テレビ会社（ネットワーク）、スポンサ、広告会社などに、番組の視聴数や関心度を評価するための参考情報として提供している。利用数としては、投稿数や観覧数（ビュー）を含む、複数種類の測定量が用いられている。投稿数等はソーシャルメディアのサービスとして公開されているインターフェース（ＡＰＩ）から、自動的に取得できる。他方で、観覧数は公開されていないが、ソーシャルメディアのサービスを提供している会社と、提携することによって取得することができる。 Nielsen measures the index (Nielsen social content rating) showing the number of social media (Twitter, Facebook) usage related to TV programs, and shows them to TV companies (networks), sponsors, advertising companies, etc. It is provided as reference information to evaluate the number of viewers' interest and the degree of interest. As the number of uses, a plurality of types of measurement quantities are used, including the number of posts and the number of views (views). The number of posts can be automatically acquired from an interface (API) published as a social media service. On the other hand, the number of views is not public but can be acquired by partnering with a company providing social media services.

特許文献２と非特許文献４に記載された発明では、テレビ番組に関連したソーシャルメディア（ツイッタ、フェースブック）のメッセージ数を示す指標を測定し、番組視聴率の予測因子の一つとして使う方法が考案されている。特許文献２では、テレビでの番組放送直後の一週間（Live+7）の、録画などによるタイムシフト視聴（time-shift viewing）の予測に用いる方法が考案されている。非特許文献４は、テレビ番組に関連したソーシャルメディア（ツイッタ、フェイスブック）の利用数を示す指標を、番組視聴率の予測因子の一つとして用いた、一般的な学習・予測方法を考案している。しかし、通常はツイッタやフェースブックの観覧（ビュー）数の情報は、一般公開されておらず、取得するのに、作業負担とコストが大きい。一方、公開されている投稿数のみでは、番組視聴率との相関性が弱く、予測精度を高くするために、多数の他の予の予測因子と組み合わせる必要があり、手間とコストがかかる。 In the inventions described in Patent Document 2 and Non-Patent Document 4, a method of measuring an index indicating the number of messages of social media (Twitter, Facebook) related to a television program, and using it as one of the prediction factors of program ratings Has been devised. In Patent Document 2, a method used for predicting time-shift viewing by recording or the like for one week (Live + 7) immediately after a program broadcast on a television has been devised. Non-Patent Document 4 devised a general learning / prediction method using an index indicating the number of social media (Twitter, Facebook) usage related to television programs as one of the prediction factors for program ratings. ing. However, the information on the number of views (views) of Twitter and Facebook is not generally released to the public, and there is a large workload and cost to acquire. On the other hand, only with the number of posted posts, the correlation with the program viewing rate is weak, and in order to increase the prediction accuracy, it is necessary to combine with a large number of other prediction factors, which takes time and cost.

非特許文献５〜１２には、ソーシャルメディアの利用回数から、テレビ視聴率と映画の観客数を推定する研究が報告されている。非特許文献５、７、１２は、テレビドラマの前回までの視聴率とそれに対するフェースブック上でのコメント等の反響を利用して視聴率を予測する方法を考案し、フェースブックの反響を利用した方が次回視聴率の予測を向上できることを示した。 Non-patent documents 5 to 12 report researches for estimating television ratings and the number of movie audiences from the number of times of use of social media. Non Patent Literatures 5, 7 and 12 devise a method for predicting the audience rating using the audience rating of the TV drama up to the previous time and the response on the comment on the facebook, etc. Showed that it could improve the prediction of audience rating next time.

非特許文献３および１０では、視聴率を推定するために、番組に関連したツイッタへの投稿（ツイート）の数を集計する方法が考案されているが、ツイートの内容を自動的に解析してテレビ番組の視聴との関係性を判定することが難しく、作業負担が大きい。 In Non-Patent Documents 3 and 10, a method has been devised to count the number of posts (tweets) to a Twitter related to a program in order to estimate the rating, but the content of the tweets is automatically analyzed. It is difficult to determine the relationship between watching a television program, and the work load is large.

非特許文献８では、映画公開初日の売上を推定する方法が考案されている。映画公開前の映画情報サイトの利用者数を一つの指標として利用している。しかし、上映場所の数など、複数種類の情報を必要とするという問題がある。また、初日の売上に限定されているという問題がある。 In Non-Patent Document 8, a method for estimating sales on the first day of movie release is devised. The number of users of movie information sites before movie release is used as one index. However, there is a problem that it requires multiple types of information such as the number of screening places. In addition, there is a problem that it is limited to sales on the first day.

米国特許第８，８８７，１８８号U.S. Patent No. 8,887,188 米国特許出願公開第２０１６／０１４８２２８号U.S. Patent Application Publication No. 2016/0148228 米国特許第８，１１２，３０１号U.S. Patent No. 8,112,301 特開第２０１０−１３０５８５号Unexamined-Japanese-Patent No. 2010-130585

視聴率ハンドブック株式会社ビデオリサーチ 2017/4<http://www.videor.co.jp/rating/wh/rgb201704.pdf>Audience Ratings Handbook Video Research Inc. 2017/4 <http://www.videor.co.jp/rating/wh/rgb201704.pdf> Social Content Ratings<http://www.nielsensocial.com/socialcontentratings/>Social Content Ratings <http://www.nielsensocial.com/socialcontentratings/> R. Subramanyan,“The Relationship Between Social Media Buzz and TV Ratings. Nielsen Media and Entertainment,” (2011).<http://www.nielsen.com/us/en/insights/news/2011/the-relationship-between-social-media-buzz-and-tv-ratings.html>R. Subramanyan, “The Relationship Between Social Media Buzz and TV Ratings. Nielsen Media and Entertainment,” (2011). <Http://www.nielsen.com/us/en/insights/news/2011/the-relationship- between-social-media-buzz-and-tv-ratings.html> S. Sereday, J. Cui, “Using machine learning to predict future ratings,” Nielsen Journal of Measurement, Vol. 1 No. 3, pp. 30-40 (2016).S. Sereday, J. Cui, “Using machine learning to predict future ratings,” Nielsen Journal of Measurement, Vol. 1 No. 3, pp. 30-40 (2016). M.-H. Cheng, et al., “Television Meets Facebook: The Correlation between TV Ratings and Social Media,”American Journal of Industrial and Business Management, Vol. 6, NO. 3, pp. 282-290 (2016). DOI: 10.4236/ajibm.2016.63026M.-H. Cheng, et al., “Television Meets Facebook: The Correlation between TV Ratings and Social Media,” American Journal of Industrial and Business Management, Vol. 6, NO. 3, pp. 282-290 (2016) DOI: 10.4236 / ajibm.2016.63026 福島悠介、他「放送前の情報のみを用いたテレビドラマの視聴率予測」映像情報メディア学会誌 Vol. 70, No. 11, pp. J255-J261 (2016).Yusuke Fukushima et al., "Rate Forecasting for TV Dramas Using Only Information Before Broadcasting" Journal of the Institute of Image Information and Television Engineers Vol. 70, No. 11, pp. J255-J261 (2016). Yu-Yang Huang et al , “A Weight-Sharing Gaussian Process Model Using Web-Based Information for Audience Rating Prediction,“Lecture Notes in Computer Science, Vol. 8916, pp 198-208 (2014) DOI: 10.1007/978-3-319-13987-6_19Yu-Yang Huang et al, “A Weight-Sharing Gaussian Process Model Using Web-Based Information for Audience Rating Prediction,” “Lecture Notes in Computer Science, Vol. 8916, pp 198-208 (2014) DOI: 10.1007 / 978-3 -319-13987-6_19 M. Mestyan, T. Yasseri, and J. Kertesz, “Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data,” PLoS ONE 8(8): e71226 (2013). DOI: 10.1371/journal.pone.0071226M. Mestyan, T. Yasseri, and J. Kertesz, “Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data,” PLoS ONE 8 (8): e71226 (2013). DOI: 10.1371 / journal.pone.0071226 S. Wakamiya, R. Lee, and K. Sumiya “Towards Better TV Viewing Rates: Exploiting Crowd’s Media Life Logs over Twitter for TV Rating,” Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication ICUIMC’11, Article No. 39. (2011) DOI:10.1145/1968613.1968661S. Wakamiya, R. Lee, and K. Sumiya “Towards Better TV Viewing Rate: Exploiting Crowd's Media Life Logs over Twitter for TV Rating,” Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication ICUIMC '11, Article No. 39. (2011) DOI: 10.1145 / 1968613.1968661 W. T. Hsieh, S. T. Chou, Y. H. Cheng, and C. M. Wu,“Predicting TV Audience Rating with Social Media,” IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP), pp. 1-5 (2013).W. T. Hsieh, S. T. Chou, Y. H. Cheng, and C. M. Wu, “Predicting TV Audience Rating with Social Media,” IJCNLP 2013 Workshop on Natural Language Processing for Social Media (Social NLP), pp. 1-5 (2013). J. Kleinberg, “Bursty and Hierarchical Structure in Streams,” Data Mining and Knowledge Discovery, Vol. 7 Issue 4, pp. 373 - 397 (2003). DOI: 10.1023/A:1024940629314J. Kleinberg, “Bursty and Hierarchical Structure in Streams,” Data Mining and Knowledge Discovery, Vol. 7 Issue 4, pp. 373-397 (2003). DOI: 10.1023 / A: 1024940293143 A. J. C. Moreira, M. Y. Santos, "Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points," In: J. Braz, P.-P. Vazquez, J. M. Pereira (eds.) GRAPP 2007 - Proceedings of the Second International Conference on Computer Graphics Theory and Applications Volume GM-R, pp. 61-68 (2007). ISBN 978-972-8865-71-9.AJC Moreira, MY Santos, "Concave hull: A k-nearest approach for the region occupied by a set of points," In: J. Braz, P.-P. Vazquez, JM Pereira (eds.) GRAPP 2007-Proceedings of the Second International Conference on Computer Graphics Theory and Applications Volume GM-R, pp. 61-68 (2007). ISBN 978-972-8865-71-9.

以上で述べたように、従来技術による視聴率推定方法では、潜在的な視聴者に向けたアンケートを行う場合であれば、実際の調査および集計作業に人手を必要とするし、調査対象である家庭のテレビ受像器から直接的にデータを収集する場合であっても、テレビ受像器に設置するセットトップボックスなど、特別の監視装置が必要となる。いずれにしても、推定の基礎となるデータの収集に、相当のコストを要する。コストに関するこの課題は、現行の視聴率推定が大規模な企業によってのみ行われていることからも、推察できるであろう。また、基礎データの収集が容易でないために、最終的な推定結果が得られるまでに時間を要する。時間的遅延に関するこの課題は、現時点では番組放映の当日に視聴率が発表されないことから、明白である。 As described above, in the method of estimating the audience rating according to the prior art, in the case of conducting a questionnaire directed to a potential audience, the actual survey and tallying work require manpower and are the survey target Even when collecting data directly from a home television receiver, a special monitoring device such as a set top box installed on the television receiver is required. In any case, collecting data that is the basis of estimation requires considerable cost. This cost issue can also be inferred from the fact that current ratings estimates are being conducted only by large companies. In addition, since it is not easy to collect basic data, it takes time to obtain final estimation results. This issue of time delay is evident as no ratings are announced today on the day of the show airing.

また、ソーシャルメディアの利用回数に基づいてテレビ番組の視聴率や映画の観１客動員数を推定する方法も提案され始めているが、依然として精度が低い。テレビ番組の視聴率や映画の観客動員数は、広告料算定の基礎データとして用いられ得るという現実がある。そのような現実を考慮すると、精度が低いために信頼性が低い数値を、そのような経済的波及効果の大きな場面で用いることは、実際問題として困難である。 In addition, although methods have also been proposed to estimate television program ratings and movie watching numbers based on the number of times social media is used, accuracy is still low. The reality is that television program ratings and movie audience numbers can be used as basic data for calculating advertising rates. In view of such a reality, it is practically difficult to use a low-reliability numerical value due to low accuracy in a situation where such an economic ripple effect is large.

本発明は、従来技術において存在している以上の課題を解決し、精度の高い視聴率情報を安価かつ迅速に提供することを目的とする。 An object of the present invention is to solve the problems above existing in the prior art and to provide accurate and low rating information quickly and inexpensively.

本発明によると、視聴率推定システムが提供されるのであるが、この視聴率推定システムは、番組特定情報が入力されるクエリモジュールと、入力された番組特定情報を、クエリモジュールから提供される推定モジュールと、番組特定情報の提供に応答して、推定モジュールから、番組特定情報によって特定される番組と関連する情報へのアクセス回数の提供を要求される情報サイトアクセス回数データベースと、番組特定情報の提供に応答して、推定モジュールから、番組特定情報によって特定される番組と関連する過去の番組の視聴率の提供を要求される番組視聴率データベースと、を備えており、推定モジュールが、情報サイトアクセス回数データベースから、番組特定情報によって特定される番組と関連する情報へのアクセス回数履歴を受け取り、番組視聴率データベースから、番組特定情報によって特定される番組と関連する過去の番組の視聴率を受け取り、アクセス回数履歴と過去の番組の視聴率との相関関係に基づき、番組特定情報によって特定される番組の視聴率を推定する。 According to the present invention, an audience rating estimation system is provided, wherein the audience rating estimation system estimates a query module to which program specific information is input and an input program specific information provided from the query module. Module, and an information site access frequency database required to provide access counts of information related to a program identified by the program identification information from the estimation module in response to provision of the program identification information; A program rating database requested from the estimation module to provide ratings of past programs related to the program specified by the program specification information in response to the provision; The access frequency to the information related to the program specified by the program specific information from the access frequency database And the ratings of past programs related to the program specified by the program identification information from the program rating database, and based on the correlation between the access count history and the ratings of past programs, Estimate the audience rating of the identified program.

視聴率の推定においては、線形回帰を含む回帰分析が用いられ得る。また、視聴率の推定において、最小範囲モデルも用いられ得る。
視聴率の推定において推定された視聴率の精度に基づき、視聴率の推定において用いられる情報サイトにおける関連情報と番組特定情報によって特定される番組と関連する過去の番組との決定が再度行われることもあり得る。 In estimation of audience rating, regression analysis including linear regression can be used. Also, a minimum range model may be used in the estimation of the ratings.
Based on the accuracy of the audience rating estimated in the estimation of the audience rating, the determination of the related program in the information site used in the estimation of the rating and the past program related to the program specified by the program specifying information is again performed There is also a possibility.

視聴率の推定において用いられるデータを量子化することによりデータの量を減少させる手段が更に含まれることもあり得る。
また、本発明によると、相互にデータ伝送が可能な態様で結合されたクエリモジュールと、推定モジュールと、番組視聴率データベースと、情報サイトアクセス回数データベースとを備えたシステムにおいて、番組の視聴率を推定する方法も提供されるのであるが、この方法は、クエリモジュールが、クエリモジュールに入力された番組特定情報を、推定モジュールに提供するステップと、推定モジュールが、番組特定情報の提供に応答して、一方で、番組特定情報によって特定される番組と関連する情報へのアクセス回数を、情報サイトアクセス回数データベースに要求し、他方で、番組特定情報によって特定される番組と関連する過去の番組の視聴率を、番組視聴率データベースに要求するステップと、推定モジュールが、情報サイトアクセス回数データベースから、番組特定情報によって特定される番組と関連する情報へのアクセス回数履歴を受け取り、番組視聴率データベースから、番組特定情報によって特定される番組と関連する過去の番組の視聴率を受け取るステップと、推定モジュールが、アクセス回数履歴と過去の番組の視聴率との相関関係に基づき、番組特定情報によって特定される番組の視聴率を推定するステップと、を含む。 A means may also be included to reduce the amount of data by quantitating the data used in the estimation of the ratings.
Also, according to the present invention, in a system comprising a query module coupled in a manner that allows mutual data transmission, an estimation module, a program rating database, and an information site access frequency database A method of estimating is also provided, the method comprising the steps of the query module providing the program identification information input to the query module to the estimation module, and the estimation module responding to the provision of the program identification information. On the other hand, the number of accesses to information related to the program specified by the program specifying information is requested from the information site access number database, and on the other hand, the past program related to the program specified by the program specifying information Requesting an audience rating from a program audience database; and an estimation module The access frequency history to the information related to the program specified by the program specific information is received from the access frequency database, and the ratings of the past programs related to the program specified by the program specific information are received from the program rating database And the estimating module estimating the audience rating of the program specified by the program specifying information based on the correlation between the access count history and the rating of the past program.

システムの場合と同様に、視聴率の推定において、線形回帰を含む回帰分析が用いられることがあり得る。また、視聴率の推定において、最小範囲モデルが用いられることもあり得る。 As in the case of the system, regression analysis including linear regression can be used in the estimation of the ratings. In addition, a minimum range model may be used in estimation of audience rating.

視聴率を推定するステップにおいて推定された視聴率の精度に基づき、視聴率を推定するステップにおいて用いられる情報サイトにおける関連情報と番組特定情報によって特定される番組と関連する過去の番組との決定が再度行われることがあり得る。 Based on the accuracy of the audience rating estimated in the step of estimating the audience rating, determination of the related information in the information site used in the step of estimating the audience rating and the past program related to the program specified by the program specifying information It may be done again.

また、視聴率を推定するステップにおいて用いられるデータを量子化することによりデータの量を減少させるステップを更に含むこともあり得る。
更に、本発明は、本発明による視聴率推定方法に含まれる各ステップをコンピュータに実行させるコンピュータ実行可能命令を含むコンピュータプログラムとしても実現され得る。また、本発明は、そのようなコンピュータプログラムが記憶されたコンピュータ可読記憶媒体としても実現され得る。 It may also include the step of reducing the amount of data by quantizing the data used in the step of estimating the ratings.
Furthermore, the present invention can also be realized as a computer program including computer executable instructions that cause a computer to execute the steps included in the audience rating estimation method according to the present invention. The present invention can also be realized as a computer readable storage medium in which such computer program is stored.

本発明による視聴率推定方法を用いると、経済的波及効果の大きな用途にも利用可能な精度を有する視聴率情報を、ごく一般的なパソコンとインターネット上に無料で公開されている情報とを用いて、迅速に提供することが可能になる。 By using the audience rating estimation method according to the present invention, audience rating information having accuracy that can be used for applications with high economic ripple effect is used using a very common personal computer and information published free of charge on the Internet Can be provided quickly.

本発明のある実施例による視聴率推測システムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a rating estimation system according to an embodiment of the present invention. 番組特定情報が指定された場合に、その番組の視聴率を予測する手順の例を示す図である。When program specific information is designated, it is the figure which shows the example of the procedure which predicts the audience rating of the program. 参考番組に対応したアクセス数（ビュー数）と視聴率の履歴データとを取得し、視聴率推定モデルを抽出する手順の例を示す図である。It is a figure which shows the example of the procedure which acquires the number of accesses (number of views) corresponding to a reference program, and the historical data of audience rating, and extracts a audience rating estimation model. 番組ページのアクセス履歴データから予測因子の値を計算する手順の例を示す図である。It is a figure which shows the example of the procedure which calculates the value of a prediction factor from the access historical data of a program page. 視聴率の推定に「最小範囲」モデルを用いる場合の手順を示す図である。It is a figure which shows the procedure in the case of using the "minimum range" model for estimation of audience rating. 視聴率の推定に「線形回帰」モデルを用いる場合の定義および計算式を示す図である。It is a figure which shows the definition in the case of using a "linear regression" model for estimation of audience rating, and a formula. 線形回帰モデル手法を用いて視聴率を推定する場合の、クエリおよび選定された参考番組の例を示す図である。It is a figure which shows the example of a query and the selected reference program in the case of estimating an audience rating using a linear regression model method. 線形回帰モデル手法を用いて視聴率を推定する場合の、クエリおよび選定された参考番組のデータの例を示す図である。It is a figure which shows the example of the data of a query and the selected reference program in the case of estimating an audience rating using a linear regression model method. 線形回帰モデル手法を用いて視聴率を推定する場合の、番組カウント数データを、ＵＮＩＸ時間を横軸に取ったグラフとして可視化したものである。The program count number data in the case of estimating the audience rating using the linear regression model method is visualized as a graph in which the UNIX time is taken on the horizontal axis. 各番組の予測因子と視聴率の表と、予測因子を横軸に視聴率を縦軸にとってプロットしたものである。The table of predictors and ratings for each program and the predictors are plotted on the horizontal axis with ratings on the vertical axis. 線形回帰モデルの係数の計算と、グラフによる線形回帰モデルの可視化と、推定結果とを示す図である。It is a figure which shows calculation of the coefficient of a linear regression model, visualization of a linear regression model by a graph, and an estimation result. 最小範囲モデルを用いた推定の例を示す図である。It is a figure which shows the example of the estimation which used the minimum range model. 線形回帰モデル手法を用いて視聴率を予測する場合の、クエリおよびモデル選定を示す図である。FIG. 7 is a diagram showing queries and model selection in the case of predicting an audience rating using a linear regression model method. 線形回帰モデルの係数の計算と、予測結果と、グラフによる予測結果の可視化とを示す図である。It is a figure which shows calculation of the coefficient of a linear regression model, a prediction result, and visualization of the prediction result by a graph. 精度を向上させるための処理の一例の手順を示す図である。It is a figure which shows the procedure of an example of the process for improving precision. 精度を向上させるために、番組分類を変更する処理の一例を示す図である。It is a figure showing an example of processing which changes a program classification, in order to improve accuracy. 精度を向上させるために、選定番組を変更する処理において外れ値を削除する一例を示す図である。It is a figure which shows an example which deletes an outlier in the process which changes a selection program, in order to improve accuracy. 精度を向上させるために、選定番組を変更する処理において外れ値を削除する別の例を示す図である。It is a figure which shows another example which deletes an outlier in the process which changes a selection program, in order to improve accuracy. 精度を向上させるために、予測因子の計算方法を変更する一例を示す図である。It is a figure which shows an example which changes the calculation method of a prediction factor, in order to improve accuracy. 精度を向上させるために、予測因子の計算方法を変更する別の例を示す図である。It is a figure which shows another example which changes the calculation method of a predictor, in order to improve accuracy. 精度を向上させるために、予測因子の計算方法を変更する更に別の例を示す図である。It is a figure which shows the further another example which changes the calculation method of a predictor, in order to improve accuracy. 確率モデルを用いて時系列データを圧縮する方法の手順を示す図である。It is a figure which shows the procedure of the method of compressing time series data using a probability model. 時系列データ形式をＢＵＲデータ形式に変換する一例である。It is an example which converts a time-series data format into a BUR data format. カウント数およびレベル数の時系列の一例である。It is an example of the time series of count number and the number of levels. 図１１（ａ）の一例の、ＵＮＩＸ時間を横軸にしたグラフ化である。It is graphing which made the UNIX time the horizontal axis of an example of FIG. 11 (a). ＢＬＳデータを用いた推定の例であり、予測因子（予測因子＝Ｌａｓｔ，−３，３の場合）の計算結果である。It is an example of estimation using BLS data, and it is a calculation result of a predictor (in the case of predictor = Last, −3, 3). ＢＬＳデータを用いた推定の例であり、線形回帰モデル（予測因子＝Ｌａｓｔ，−３，３）の計算結果である。It is an example of estimation using BLS data, and is a calculation result of a linear regression model (predictor = Last, −3, 3). ＢＬＳデータを用いた予測の例であり、予測因子（予測因子＝Ｌａｓｔ，−１００，−４０の場合）の計算結果である。It is an example of the prediction using BLS data, and is a calculation result of a prediction factor (in the case of a prediction factor = Last, -100, -40). ＢＬＳデータを用いた予測の例であり、線形回帰モデル（予測因子＝Ｌａｓｔ，−１００，−４０）の計算結果である。It is an example of the prediction using BLS data, and is a calculation result of a linear regression model (predictor = Last, -100, -40). 圧縮効果の一例である。It is an example of the compression effect. 本発明の第２の実施例におけるモジュール構成図である。It is a module block diagram in the 2nd Example of this invention.

図１は、本発明の１つの実施例である視聴率推測システムの構成を示すブロック図であり、複数のモジュールが示されている。クエリモジュール１０１は、あるテレビ番組の視聴率推測を希望するユーザから、そのテレビ番組を特定する情報に関する入力を受け取り、その入力において指定されたテレビ局と放映時間枠とを推定モジュール１０２に提供する。本発明に特徴的なアルゴリズムに従って視聴率の推定が推定モジュール１０２において終了すると、クエリモジュール１０１は、推定結果を推定モジュール１０２から受け取り、ユーザに向けて出力する。 FIG. 1 is a block diagram showing the configuration of an audience rating estimation system according to an embodiment of the present invention, in which a plurality of modules are shown. The query module 101 receives, from a user desiring a viewer rating estimate of a television program, an input regarding information identifying the television program, and provides the estimation module 102 with the television station specified in the input and the airing time frame. When the estimation of the rating ends in the estimation module 102 according to the algorithm characteristic of the present invention, the query module 101 receives the estimation result from the estimation module 102 and outputs it to the user.

番組放映情報データモジュール１０３は、たとえば新聞のテレビ欄やテレビ案内雑誌のような、各番組の放映情報データが記憶されているデータベースである。放映予定が変更されたり新番組の発表がなされたりして放映情報が更新されると、このデータベースも自動的に更新される。記憶されている放映情報は、番組名、放映局、放映時間、出演者などを含む。また、放映情報には、その番組があるシリーズの一部であるか、シリーズの場合にはエピソード番号も含まれるなど、番組放映情報データモジュール１０３には、各番組の様々な属性情報も記憶されている。 The program broadcast information data module 103 is a database storing broadcast information data of each program, such as a television section of a newspaper or a television guide magazine. This database is also updated automatically when the broadcast information is updated as the broadcast schedule is changed or new programs are announced. The stored broadcast information includes a program title, a broadcast station, broadcast time, performers and the like. Also, the program broadcast information data module 103 stores various attribute information of each program, such as that the broadcast information is part of a series in which the program is present, or includes an episode number in the case of a series. ing.

番組視聴率データモジュール１０４は、過去の番組の視聴率が記憶されているデータベースである。このデータベースは、各番組の新たな視聴率が発表されるたびに更新される。したがって、その時点までに視聴率調査機関から発表された過去の視聴率データは、本発明による視聴率推定システムにおける推定に必要な期間という限定は存在し得るが、この番組視聴率データモジュール１０４に、すべて記憶されている。 The program rating data module 104 is a database in which the ratings of past programs are stored. This database is updated each time a new audience rating for each program is announced. Therefore, in the past audience rating data announced by the audience rating research institute up to that point, there may be a limitation of the period required for estimation in the audience rating estimation system according to the present invention. , All are remembered.

情報サイトアクセス数データモジュール１０５は、各番組に関連した情報を提供するインターネット上の情報サイトへのアクセス数を、時系列に従って提供するデータベースである。本発明では、この情報サイトとして、たとえば、ウィキペディアを考える。ウィキペディア以外の情報サイトを用いることも可能である。 The information site access number data module 105 is a database that provides the number of accesses to an information site on the Internet that provides information related to each program in chronological order. In the present invention, consider, for example, Wikipedia as this information site. It is also possible to use an information site other than Wikipedia.

広く知られているように、ウィキペディアは、ウェブブラウザ上でウェブページを編集することができるウィキというシステムを使用したインターネット上の百科事典である。ウィキメディア財団によって運営されており、コピーレフトなライセンスの下で、サイトにアクセス可能な誰もが、無料で、自由に、編集に参加できるという特徴を有する。項目ごとに対応言語数は異なるが、世界中の多くの言語で展開されている。ウィキペディアでは、百科事典の項目ごとに文章情報（ページ）を表示するアドレス（ＵＲＬ）へのアクセス（「ビュー」、または「観覧」ともいう）回数（ＰＶＣ）を１日または１時間の時間単位で記録したデータを、公開している。公開データは、ダウンロードして、ファイルとして保存することが可能である。情報サイトアクセス数データモジュール１０５には、たとえば、このウィキペディアへのアクセス数（ＰＶＣ）の時系列データが記憶されている。なお、時系列データとは、単位時間ごとのＰＶＣを意味し、たとえば、（ｔｊ，ｃｊ）のように表される。単位時間を１時間とすると、（ｔｊ，ｃｊ）とは、ある時刻ｔｊの１時間前からｔｊまで１時間の間のＰＶＣがｃｊ回である、という意味である。 As widely known, Wikipedia is an encyclopedia on the Internet using a system called wiki which can edit web pages on a web browser. It is run by the Wikimedia Foundation, and has the feature that anyone who can access the site can participate in editing freely and freely, under a copyleft license. Although the number of languages supported varies from item to item, it has been expanded in many languages around the world. In Wikipedia, the access (URL, also referred to as "view" or "view") number of times (PVC) to the address (URL) to display the text information (page) for each item of the encyclopedia in hourly or hourly hourly units We have released the recorded data. Public data can be downloaded and saved as a file. The information site access count data module 105 stores, for example, time-series data of the access count (PVC) to this Wikipedia. Note that time-series data means PVC per unit time, and is expressed, for example, as (tj, cj). Assuming that the unit time is one hour, (tj, cj) means that the PVC between one hour before a certain time tj and one hour from the time tj is cj times.

次に、図２には、ユーザによってテレビ局と放映時間枠とが指定された場合に、本発明による視聴率推定システムが、視聴率を推定する手順が図解されている。ステップ２０１では、ターゲットとなる番組の特定が開始される。たとえば、ユーザが、クエリモジュール１０１を介して、視聴率の推定を希望する番組を特定する情報を入力する。たとえば、番組名と放映年月日、または、放映年月日と番組放映時間枠を入力する。すると、ステップ２０２では、クエリモジュール１０１が、このユーザからの入力に応答して、番組情報データモジュール１０３から、入力された情報に対応する番組情報を取得する。たとえば、番組の放映時間とシリーズ名を取得する。次のステップ２０３では、ステップ２０２において番組情報データモジュール１０３から得られた番組情報に基づき、番組の種類を決定する。たとえば、番組の種類を表す複数の特徴を与え、番組の種類を特徴のリストとして表すことができる。たとえば、ある番組の種類を［ジャンル＝ドラマ、放送局＝ＦＮＳ、放映日＝ＭＯＮ、放映開始時間＝ＪＳＴ２１：００、番組枠＝日曜劇場、エピソード＝１回目、主役出演者＝木村拓哉、舞台＝病院］のように、特徴のリストで表すことができる。エピソードの特徴に関しては、「最終回」、または「最終回より２つ前」のように相対的にエピソードの特徴を表すことができる。
次に、ステップ２０４では、推定モジュール１０２が、番組情報データモジュール１０３から、視聴率推定に用いるため、特定された番組と同種類の参考番組の一覧を取得する。たとえば、視聴率の推定が求められている番組が、病院を舞台とするドラマのシリーズの１回目のエピソードである場合には、昨年に放映された、やはり病院を舞台とするドラマのシリーズの１回目のエピソードであった番組が、この参考番組になり得る。また、２年前に放映された学校が舞台となっているドラマシリーズの１回目エピソードであって、視聴率の推定を求められている番組と同じ俳優が出演していた番組も、舞台は異なるが、参考番組となり得る。 Next, FIG. 2 illustrates a procedure in which the audience rating estimation system according to the present invention estimates the audience rating when the television station and the broadcasting time frame are designated by the user. In step 201, identification of a target program is started. For example, the user inputs, via the query module 101, information specifying a program for which an audience rating estimation is desired. For example, the program name and the airing date, or the airing date and the airing time frame may be input. Then, in step 202, the query module 101 acquires program information corresponding to the input information from the program information data module 103 in response to the input from the user. For example, get the airing time and series name of the program. In the next step 203, the type of program is determined based on the program information obtained from the program information data module 103 in step 202. For example, a plurality of features representing the type of program can be provided, and the type of program can be represented as a list of features. For example, the type of a certain program is [genre = drama, broadcasting station = FNS, airing day = MON, airing start time = JST 21:00, program frame = sunday theater, episode = first time, leading performer = Takuya Kimura, stage =] It can be represented by a list of features, such as a hospital]. With regard to the feature of the episode, the feature of the episode can be relatively expressed as “last time” or “two before final time”.
Next, in step 204, the estimation module 102 acquires, from the program information data module 103, a list of reference programs of the same type as the specified program for use in estimation of the rating. For example, if the program for which the estimation of the audience rating is required is the first episode of a series of dramas set in a hospital, one of the series of dramas set in the hospital that was also aired last year The program that was the second episode may be this reference program. In addition, the first episode of a drama series in which a school broadcasted two years ago is the stage and the same actor as the one for which the estimation of audience rating is sought is also on a different stage However, it can be a reference program.

ステップ２０５では、これらの参考番組に関連する情報を含むウェブサイトのビュー数、すなわち、たとえば、関連する情報を含むウィキペディアのページへのアクセス数と視聴率の履歴データとを、情報サイトアクセス数データモジュール１０５と番組視聴率データモジュール１０４とからそれぞれ取得し、視聴率推定モデルを抽出する。なお、視聴率抽出モデルについては、図３との関係で後述する。最後に、ステップ２０６では、推定モジュール１０２が、番組に対応したウェブサイトへのアクセス数から予測因子を計算し、その計算された予測因子と視聴率推定モデルとを用いて、ユーザが希望する推定値を計算する。 In step 205, the number of views of the website including information related to these reference programs, that is, for example, the number of accesses to the page of Wikipedia including the related information and the historical data of the rating, It is acquired from the module 105 and the program rating data module 104, respectively, to extract a rating estimation model. The audience rating extraction model will be described later in relation to FIG. Finally, in step 206, the estimation module 102 calculates a prediction factor from the number of accesses to the website corresponding to the program, and uses the calculated prediction factor and the rating estimation model to estimate the user's desired Calculate the value.

次に、図３には、図２のステップ２０５で行われる動作に関する詳細が、すなわち、視聴率推定モデルの抽出に関する詳細が、示されている。まず、ステップ３０１では、予測因子の計算方法が設定される。予測因子とは、視聴率との相関性が高い因子を意味し、本発明の場合、たとえば、番組と関連する情報を含むウィキペディアにおける項目への累積アクセス総数や、番組の放映時間と一定の関係にある期間におけるアクセス総数などが想定される。よって、ステップ３０２では、推定モジュール１０２が、ウィキペディアなどの情報サイトアクセス数データモジュール１０５から、各参考番組のアクセス数の時系列を取得して、予測因子の値を計算する。なお、予想因子の計算の詳細については、図４を参照して後述する。ステップ３０３では、番組視聴率データモジュール１０４から、各参考番組の視聴率を取得し、ステップ３０４では、予想因子と視聴率との対を、すべての参考番組について記録する。そして、ステップ３０５において、すべての参考番組の予想因子と視聴率との対を用いて、視聴率推定モデルを抽出する。視聴率推定モデルの抽出については、図５との関係で後述する。 Next, FIG. 3 shows details regarding the operation performed in step 205 of FIG. 2, that is, details regarding extraction of a rating estimation model. First, in step 301, a calculation method of a prediction factor is set. The predictor means a factor that is highly correlated with the audience rating, and in the case of the present invention, for example, the cumulative total access to items in Wikipedia including information related to the program, and a certain relationship with the airing time of the program. The total number of accesses in a certain period is assumed. Therefore, in step 302, the estimation module 102 acquires the time series of the number of accesses of each reference program from the information site access count data module 105 such as Wikipedia and calculates the value of the predictor. The details of the calculation of the prediction factor will be described later with reference to FIG. In step 303, the rating of each reference program is acquired from the program rating data module 104, and in step 304, the prediction factor and the rating ratio are recorded for all reference programs. Then, in step 305, a rating estimation model is extracted using pairs of prediction factors and ratings of all reference programs. Extraction of the audience rating estimation model will be described later in relation to FIG.

図４は、予測因子を計算する手順の一例を示す。既に述べたように、本発明では、予測因子として、たとえば視聴率の推定を望む番組と関連する情報を含むウィキペディアの項目など、情報サイトへのアクセス数を用いる。予測因子を計算するためには、まず、ステップ４０１では、予測因子の測定期間が設定される。その際には、たとえば、期間開始時間Ｔ１と期間終了時間Ｔ２を指定することによって予測因子の測定期間を測定期間＝［Ｔ１，Ｔ２］として表すことができる。また、番組放映開始時間ｓｈｏｗＴｉｍｅを基準にして、期間開始時間Ｔ１＝ｓｈｏｗＴｉｍｅ＋ｐｒｅｄＢｅｇｉｎとし、期間終了時間Ｔ２＝ｓｈｏｗＴｉｍｅ＋ｐｒｅｄＥｎｄとし、相対開始時間ｐｒｅｄＢｅｇｉｎと相対終了時間ｐｒｅｄＥｎｄを用いて、測定期間を（ｐｒｅｄＢｅｇｉｎ，ｐｒｅｄＥｎｄ）として表すことができる。
そして、ステップ４０２では、情報サイトアクセス数データモジュール１０５に記憶されている時系列データから、そのようにして指定された期間内のアクセス総数が計算され、予測因子の値となる。また、予測因子の値の計算方法として、時間に関する重付関数を指定して、測定期間内のアクセス数に対する重付累積を計算してもよろしい。 FIG. 4 shows an example of a procedure for calculating a predictor. As mentioned above, the present invention uses, as a predictor, the number of accesses to the information site, for example, an item of Wikipedia including information related to a program for which it is desired to estimate the rating. In order to calculate a predictor, first, in step 401, a measurement period of the predictor is set. In that case, for example, by specifying the period start time T1 and the period end time T2, the measurement period of the prediction factor can be expressed as measurement period = [T1, T2]. Also, based on the program broadcast start time showTime, the period start time T1 = showTime + predBegin, the period end time T2 = showTime + predEnd, and using the relative start time predBegin and the relative end time predEnd, the measurement period is set as (predBegin, predEnd). Can be represented.
Then, in step 402, the total number of accesses within the designated period is calculated from the time-series data stored in the information site access count data module 105, and becomes the value of the prediction factor. In addition, as a calculation method of the value of the predictor, it is possible to specify a weighting function with respect to time to calculate the weighting accumulation with respect to the number of accesses in the measurement period.

図５には、すべての参考番組の予測因子の値と視聴率とに基づき視聴率を推定するための、２つの方法の例が示されている。
図５（ａ）は、最小範囲モデルの場合である。これは、予測因子と視聴率の２次元空間における分布範囲の包絡線を用いる方法である。まず、ステップ５０１では、予測因子と視聴率とを成分とする２次元空間におけるすべての参考番組の対に対して、多角形の分布範囲を計算する。この場合、たとえば、すべての対を含む最小の凸集合である凸包絡（convex hull）を用いる方法と、ｋ近傍を用いて凹包絡（concave hull）を計算する方法とがあり得る。これらの包絡線を計算する方法については、非特許文献１２に記載があり、広く知られた方法である。たとえば、凸包絡の場合、予測因子が最小である点から開始して、すべての残りの点の中で、前のリンクとの右回転角度がもっとも大きいとなる点を選ぶ操作を繰り返して行なうことによって、分布範囲を表す多角形を構築することができる。凹包絡の場合、予測因子が最小である点から開始して、以前のリンクを交差する場合、その点を除きながら、ｋ個のもっとも近い点の中で、前のリンクとの右回転角度が最も大きくなる点を選ぶ操作を繰り返して行うことによって、分布範囲を表す多角形を構築することができる。
そして、ステップ５０１において、凸包絡または凹包絡などの分布範囲が計算された後で、ステップ５０２では、それぞれの予測因子の値に対し、中間値を含み最大値から最小値まで一定の幅を有する視聴率を算出する。
図５（ｂ）には、視聴率の推定に線形回帰モデルを用いる場合の例が示されている。図５（ｂ）には、参考番組の視聴率と予測因子を用いて線形回帰モデルの係数を計算する数式の例も記載されている。図６および図７には、具体的な番組データとウィキペディアから得られたカウント数時系列データを用いた視聴率推定モデルの抽出の例を示す。 FIG. 5 shows an example of two methods for estimating the rating based on the values of the predictors of all the reference programs and the rating.
FIG. 5A shows the case of the minimum range model. This is a method using an envelope of a distribution range in a two-dimensional space of a predictor and an audience rating. First, in step 501, the distribution range of polygons is calculated with respect to all reference program pairs in a two-dimensional space whose components are predictors and ratings. In this case, for example, there may be a method using a convex envelope which is the smallest convex set including all pairs, and a method of calculating a concave hull using a k-neighbor. The method of calculating these envelopes is described in Non-Patent Document 12 and is a widely known method. For example, in the case of convex envelope, repeat the operation of selecting the point with the largest right rotation angle with the previous link among all the remaining points, starting from the point where the predictor is the smallest. Allows to construct a polygon representing the distribution range. In the case of a concave envelope, when crossing the previous link starting from the point where the predictor is the smallest, the right rotation angle with the previous link is the k nearest points, excluding that point A polygon representing the distribution range can be constructed by repeating the operation of selecting the largest point.
Then, after a distribution range such as a convex envelope or a concave envelope is calculated in step 501, step 502 has a constant width from the maximum value to the minimum value including the intermediate value for each predictor value. Calculate the audience rating.
FIG. 5 (b) shows an example of using a linear regression model to estimate the audience rating. FIG. 5 (b) also shows an example of a formula for calculating the coefficients of the linear regression model using the ratings of the reference program and the predictors. 6 and 7 show an example of extraction of a rating estimation model using specific program data and count number time series data obtained from Wikipedia.

図６（ａ）には、参考番組とそれに対応した視聴率推定モデルの抽出結果の一例が示されている。クエリ番組は、ドラマシリーズ「ＸＹＺ」（たとえば、「ビューティフルライフ」や「半沢直樹」など）であり、エピソード＝「ｌａｓｔ」とは、そのドラマシリーズの最終回の意味である。現在時刻が放映開始時刻＋４日であり、予測因子の計算方法として（ＰＶＣ，−３，３）が設定されている。すなわち、この推定モデルは、放送開始時刻の４日後（９６時間後）という現在時刻において、その「ＸＹＺ」の最終回の視聴率を、予測因子（ＰＶＣ，−３，３）を用いて、推定を行うとことを意味する。（ＰＶＣ，−３，３）とは、「放送送開始時刻の３日（７２時間）前から放送開始時刻の３日（７２時間）後までのＰＶＣ総数」を予測因子として用いるという意味である。クエリ番組と同類の番組として選定された参考番組としては、視聴率とＰＶＣのデータが存在するＴＢＳ日曜劇場の過去の複数のシリーズにおける最終回が用いられ、図６（ａ）の下にリスト化されている。なお、本明細書において視聴率を「推定する」とは、既に放映が終了している番組の視聴率の事後的な算出を意味する。 FIG. 6A shows an example of a reference program and an extraction result of a rating estimation model corresponding thereto. The query program is a drama series "XYZ" (for example, "beautiful life", "Naoki Hanzawa", etc.), and episode = "last" means the last round of the drama series. The present time is airing start time + 4 days, and (PVC, -3, 3) is set as a calculation method of a prediction factor. That is, this estimation model estimates the audience rating of the final round of "XYZ" using the prediction factor (PVC, -3, 3) at the current time 4 days after the broadcast start time (96 hours later) It means to do. (PVC, -3, 3) means that "the total number of PVCs from 3 days (72 hours) before the broadcast transmission start time to 3 days (72 hours) after the broadcast start time" is used as a predictor . As a reference program selected as a program similar to the query program, the last episode in a plurality of previous series of TBS Sunday theaters in which the ratings and PVC data exist is used, and is listed under FIG. 6 (a) It is done. In the present specification, “estimate” the audience rating means the ex post calculation of the audience rating of the program that has already been broadcasted.

次の図６（ｂ）には、図６（ａ）で参考番組に選定されリスト化されたＴＢＳ日曜劇場の２番目にある「０２−００」という番組シリーズに関するデータの例が示されている。ここで、データとは、各エピソードのの放映時間（開始および終了の時刻）、視聴率、関連するウィキペディアの項目がアクセスされたカウント数（ＰＶＣ）の時系列である。ここでは、時刻はＵＮＩＸ時間で表されている。
図６（ｃ）は、たとえば、リストの２番目にある「０２−００」という番組シリーズに関するカウント数の時系列データを、ＵＮＩＸ時間を横軸に取ったグラフとして可視化したものである。図６（ｄ）の上側には、各番組の予測因子と視聴率との表が作成されておる。予測因子と視聴率のピアソン（Pearson）相関度が計算され、相関度の値０．９４８が高いことであることが確認できる。図６（ｄ）の下側には、予測因子を横軸に視聴率を縦軸にとってプロットしたものが示されている。 The following FIG. 6 (b) shows an example of data concerning a program series "02-00" which is the second of the TBS Sunday theater selected and listed as a reference program in FIG. 6 (a). . Here, the data is a time series of airing time (start and end time) of each episode, an audience rating, and a count number (PVC) at which a related Wikipedia item is accessed. Here, the time is represented by UNIX time.
FIG. 6C shows, for example, the time series data of the count number of the program series "02-00", which is the second in the list, visualized as a graph with the UNIX time taken on the horizontal axis. At the top of FIG. 6D, a table of predictors and ratings of each program is created. A Pearson correlation of predictors and ratings is calculated, and it can be confirmed that the correlation value of 0.948 is high. On the lower side of FIG. 6 (d), the prediction factor is plotted on the horizontal axis and the ratings on the vertical axis.

図６（ｅ）には、以上のデータに基づいて線形回帰を行い、線形回帰モデルの係数を計算した例が示されている。左下に示されているのが、線形回帰モデルを用いた推定結果である。右側には、線形回帰モデルを直線として可視化された例が示されている。これに対して、図６（ｆ）では、最小範囲モデルを用いた例が示されている。上側の表には凸包絡として選定された包絡点のリストが示されている。クエリ番組の予測因子に対応した視聴率の予測結果も示されている。クエリ番組の予測因子に対応した視聴率の最小値および最大値を前後の包絡点を補間することによって求められ、その最小値と最大値の中間値が推定値とされている。下側には、凸包絡と中間線を可視化された例が示されている。 FIG. 6 (e) shows an example in which linear regression is performed based on the above data, and coefficients of the linear regression model are calculated. Shown at the lower left is the estimation result using a linear regression model. On the right side, an example is shown in which the linear regression model is visualized as a straight line. On the other hand, FIG. 6 (f) shows an example using the minimum range model. The upper table shows a list of envelope points selected as convex envelopes. The prediction results of the audience rating corresponding to the prediction factors of the query program are also shown. The minimum value and the maximum value of the viewing rate corresponding to the prediction factor of the query program can be obtained by interpolating the envelope points before and after, and the median value of the minimum value and the maximum value is taken as the estimated value. On the lower side, an example is shown in which the convex envelope and the middle line are visualized.

図６の例と類似しているが、視聴率を知りたい番組がまだ放映されていない場合の例が、図７に示されている。ここでは、まだ放映されていない将来の番組の視聴率の算出を、「予測」と称する。 An example similar to the example of FIG. 6 but in which the program for which the audience rating is to be known has not yet been broadcast is shown in FIG. Here, the calculation of the ratings of future programs that have not been broadcast yet is referred to as "prediction".

ここで、簡単に、「推定」と「予測」との関係について説明したい。既に述べたように、「推定」とは、既に放映された番組の視聴率を計算することであり、「予測」とは、これから放映される予定の番組の視聴率を計算することである。したがって、それぞれの計算では、計算の手順は同じであるが、計算において予測因子を積分する累積範囲が異なる。当然のことであるが、推定の場合には、放送前のデータと放送後のデータとの両方が存在するが、予測の場合には、最大でも、放送前の現在時刻までのデータしか存在しないからである。計算の手順が同じであるから、本明細書では、問題にならない限り、両方の用語を特に区別することなく用いる場合がある。 Here, I would like to briefly explain the relationship between "estimate" and "prediction". As described above, "estimation" is to calculate the rating of a program that has already been broadcast, and "prediction" is to calculate the rating of a program to be broadcast from now. Therefore, in each calculation, although the procedure of calculation is the same, the accumulation range which integrates a predictor in calculation differs. Naturally, in the case of estimation, both pre-broadcast data and post-broadcast data are present, but in the case of prediction, at most, only data up to the current time before broadcast is present It is from. Since the procedure of calculation is the same, in the present specification, both terms may be used without particular distinction unless it causes a problem.

図７（ａ）は、クエリおよびモデルの選定の一例が示されている。クエリ番組は、シリーズ名が「ＰＱＲ」であり、最終回のエピソードがターゲットである。現在時刻は放送開始時刻の４０日前（９６０時間前）であり、予測因子は（ＰＶＣ，−１００，−４０）、つまり、放送開始時刻の１００日前から現在時刻までの、ウィキペディアにおける関連項目へのアクセス総数である。参考番組は、図６と同じである。 FIG. 7 (a) shows an example of query and model selection. In the query program, the series name is “PQR” and the final episode is the target. The current time is 40 days (960 hours before) the broadcast start time, and the predictor is (PVC, -100, -40), that is, 100 days before the broadcast start time to the current time from the current time to the related item in Wikipedia It is the total number of accesses. The reference program is the same as in FIG.

次の図７（ｂ）には、図７（ａ）で参考番組に選定されリスト化されたＴＢＳ日曜劇場の各シリーズにおける最終回に関するデータと、それを用いた視聴率予測結果の例が示されている。図７（ｂ）の右側には、予測因子を横軸に視聴率を縦軸にとってプロットしたものが示されている。 The following Fig. 7 (b) shows data on the final round of each series of TBS Sunday theater selected and listed as a reference program in Fig. 7 (a) and an example of the ratings prediction result using it. It is done. On the right side of FIG. 7 (b), the prediction factor is plotted on the horizontal axis and the ratings on the vertical axis.

本発明による視聴率の推定では、精度を向上させるために、追加的な処理を行うことが可能である。推定の精度を向上させるための処理の実際のステップは図８に示されている。図８におけるステップ８０１から８０６の各ステップは、図２におけるステップ２０１から２０６に対応する。ただし、ステップ８０６において、視聴率がいったん推定された後で、精度を向上させるために追加的な処理がなされるのである。図８のステップ８０６の右側に示されているように、たとえば、追加的な処理は３つ存在する。 In the estimation of the ratings according to the invention, additional processing can be performed to improve the accuracy. The actual steps of the process to improve the accuracy of the estimation are shown in FIG. Steps 801 to 806 in FIG. 8 correspond to steps 201 to 206 in FIG. However, at step 806, once the ratings have been estimated, additional processing is done to improve accuracy. As shown on the right side of step 806 of FIG. 8, for example, there are three additional processes.

しかし、精度を向上させるには、前提として、本発明による推定方法によってどの程度の精度の推定値が得られたかを評価することが必要である。そのために、本発明に従ってステップ８０６において視聴率の推定値が計算された後で、ステップ８０７において、推定モデルの精度、または、計算された推定値の精度が評価されなければならない。その評価に応じて、処理の変更または修正方法が選択される。 However, in order to improve the accuracy, it is necessary to evaluate how accurate the estimated value has been obtained by the estimation method according to the present invention. To that end, after the estimate of the audience rating is calculated in step 806 according to the invention, in step 807 the accuracy of the estimation model or the accuracy of the calculated estimation must be evaluated. Depending on the evaluation, a process change or correction method is selected.

モデルの精度の計算方法としては、参考番組に対して計算された予測因子と観測された視聴率を用いて、ピアソン相関係数を計算する方法が知られている。このピアソン相関係数が高ければ精度が高いと評価する。モデルの精度の計算方法としては、一部の参考番組を用いてモデルを抽出し（たとえば、参考番組を一つずつ除く）、残りの番組に対して推定された予測値と観測値の誤差の平均値を計算し、誤差の平均値が低ければ精度が高いと評価する。他方で、予測値の精度の計算方法としては、予測因子の累積期間を変更し、（たとえば、累積開始時間や累積終了時間を一時間ずつ変更する）、得られた複数の予測値の分散を計算する。分散が小さければ、精度が高いと評価する。計算された推定モデルの精度、または推定値の精度、を評価するためには、たとえば、ある閾値を予め設定しておき、その所定の閾値を超えるかどうかによって判断する、という一般的な方法がある。上述したように、本発明では、推定値の精度を高めるために、例えば、図８（ａ）に示されている３つの処理方向を用いることが可能である。すなわち、番組の分類方法の変更、（同種類の番組の内にモデル抽出に用いるために選定する番組）選定番組の変更、予測因子を計算する方法の変更である。 As a method of calculating the accuracy of the model, there is known a method of calculating a Pearson correlation coefficient using a predicted factor calculated for a reference program and an observed rating. If this Pearson correlation coefficient is high, it is evaluated that the accuracy is high. As a method of calculating the accuracy of the model, a model is extracted using some reference programs (for example, one reference program is excluded one by one), and the error between the predicted value and the observed value estimated for the remaining programs The average value is calculated, and the lower the average value of the error, the higher the accuracy. On the other hand, as a method of calculating the accuracy of the prediction value, change the accumulation period of the prediction factor (for example, change the accumulation start time and the accumulation end time by one hour), and calculate the variance of the obtained plurality of prediction values calculate. If the variance is small, the accuracy is evaluated as high. In order to evaluate the accuracy of the calculated estimated model or the accuracy of the estimated value, for example, there is a general method of setting a certain threshold in advance and judging whether or not the predetermined threshold is exceeded. is there. As mentioned above, in the present invention it is possible to use, for example, the three processing directions shown in FIG. 8 (a) in order to increase the accuracy of the estimates. That is, a change in a program classification method, a change in a selected program (a program selected for use in model extraction in the same type of program), a change in a method of calculating a prediction factor.

精度を向上させるために番組の分類を変更する方法としては、番組情報を用いて、同種類であると分類するための判定条件を変更することがあり得る。たとえば、放映時間帯や放映年度や出演者が共通していることを、同種であると判断するための条件として追加することができる。番組分類を変更する場合には、ステップ８０７の後でステップ８０３に戻ることになる。選定番組の変更は、同種類の参考番組の内に、モデル抽出に用いるために選定する番組を変更することである。選定番組の変更するには、線形回帰の直線から一定の限度を超えて離れている外れ値（outlier）データを削除するのが一例である。この場合は、ステップ８０７の後で、ステップ８０４に戻ることになる。そして、予測因子の計算方法を変更する方法としては、たとえば、累積期間の開始時間の変更や、放映時間からの時間差に応じて減衰する重みを用いて、加算をする前に、アクセス数に重み付けを行うことがあり得る。つまり、放映時間からより遠い場合よりも、放映時間により近い時点で情報サイトにアクセスした場合の方が、実際に放映を視聴する確率が高いと想定して、そのような重み付けを行うわけである。この場合には、ステップ８０７の後で、ステップ８０５に戻ることになる。 As a method of changing the classification of programs in order to improve the accuracy, it is possible to change the determination conditions for classifying the same type using program information. For example, it can be added as a condition for judging that the broadcasting time zone, the broadcasting year and the cast members are the same. If the program classification is to be changed, the process returns to step 803 after step 807. The change of the selected program is to change the program to be selected for use in model extraction within the same type of reference program. In order to change the selected program, it is an example to delete outlier data which is separated from the straight line of linear regression by a certain limit. In this case, after step 807, the process returns to step 804. Then, as a method of changing the calculation method of the prediction factor, for example, the number of accesses is weighted before addition using a weight that attenuates according to a change of the start time of the accumulation period or a time difference from the airing time. It is possible to do. In other words, such weighting is performed assuming that the probability of actually watching the broadcast is higher when the information site is accessed at a point closer to the broadcast time than when it is farther from the broadcast time. . In this case, after step 807, the process returns to step 805.

また、予測因子の計算に用いられる各パラメータを、たとえば、次のように変更することが可能である。すなわち、パラメータを予め指定された手順に従って自動的に変更されるように設定しておくことにより、参考番組に関する予測因子と視聴率との相関が最大となるようにすることが可能である。パラメータの値を指定された手順に従って自動的に変更する例としては、開始時間Ｔ１の値を、ある指定された値Ｔ１−ｍｉｎ（たとえば、Ｔ１−ｍｉｎ＝３０日）から、ある指定された値Ｔ１−ｍａｘ（たとえば、Ｔ１−ｍｉｎ＝現在時刻−放送開始時間日）まで、指定された時間間隔（たとえば、１時間）ごとに増やすことが考えられる。 Moreover, it is possible to change each parameter used for calculation of a predictor, for example, as follows. That is, by setting the parameter to be automatically changed in accordance with a previously designated procedure, it is possible to maximize the correlation between the prediction factor for the reference program and the audience rating. As an example of automatically changing the value of the parameter according to the specified procedure, the value of the start time T1 is specified from a specified value T1-min (for example, T1-min = 30 days) It is conceivable to increase at each designated time interval (for example, one hour) up to T1-max (for example, T1-min = current time-the broadcast start time day).

図８（ｂ）から図８（ｇ）には、精度を向上させるための具体的な処理の例が示されている。図８（ｂ）は、番組分類を変更する処理の一例であり、図８（ｃ）は、線形回帰モデルを用いる場合に選定番組を変更するために外れ値を削除する一例であり、図８（ｄ）は、最小範囲モデルを用いる場合に選定番組を変更するために外れ値を削除する別の例であり、図８（ｅ）から図８（ｇ）は、それぞれ、予測因子の計算方法を変更するための処理の３つの例である。 8 (b) to 8 (g) show examples of specific processing for improving the accuracy. FIG. 8 (b) is an example of a process of changing a program classification, and FIG. 8 (c) is an example of deleting an outlier to change a selected program when using a linear regression model. (D) is another example of deleting outliers in order to change the selected program when the minimum range model is used, and FIG. 8 (e) to FIG. 8 (g) are each a calculation method of a predictor There are three examples of processing for changing.

以上で、視聴率推定の精度を向上させるために本発明において行われる処理について述べた。次には、時系列データを量子化することによって、履歴データの量を減らす方法について説明する。時系列データの量子化は、少ない離散値の時系列として表すことを意味する。信頼度を高めるためには、多数の参考番組のデータを利用する必要があるが、参考番組の数が増加すると、結果的に、時系列データの量も増大し、必要となるデータの記憶容量が増大し、データへのアクセス遅延が生じる。そこで、時系列データを圧縮した形式に変換することにより、記憶容量、処理容量および処理遅延を減少させることが望ましい。 Above, the processing performed in the present invention to improve the accuracy of the audience rating estimation has been described. Next, a method of reducing the amount of historical data by quantizing time-series data will be described. The quantization of time series data means expressing as a time series of small discrete values. Although it is necessary to use data of a large number of reference programs in order to improve reliability, when the number of reference programs increases, the amount of time-series data also increases as a result, and the storage capacity of the required data Increases, causing access delay to data. Therefore, it is desirable to reduce storage capacity, processing capacity and processing delay by converting time series data into a compressed format.

しかし、時系列データを圧縮した形式に変換すると、予測因子と視聴率との相関が悪くなる。視聴率との相関関係を想定する必要がある。そのために、予測因子と視聴率との相関を悪化させないような圧縮型の時系列データ記録方法が必要とされる。この問題を解決するために、本発明においては、確率モデルを用いた時系列データ変換方法を提案したい。 However, when the time series data is converted into a compressed format, the correlation between the prediction factor and the audience rating deteriorates. It is necessary to estimate the correlation with the audience rating. Therefore, a compression type time-series data recording method that does not deteriorate the correlation between the prediction factor and the audience rating is required. In order to solve this problem, in the present invention, we would like to propose a time-series data conversion method using a probability model.

図９には、ステップ９０１の時系列データの入力から、ステップ９０６の期間表現データの出力まで、確率モデルを用いた時系列データを変換する手順が示されている。この手順は、ステップ９０２において複数の離散レベルが設定された後で、ステップ９０３においてコスト関数を設定することを特徴とする。レベルコストに関しては、各離散レベルに対する確率関数に基づくコスト関数を設定するのであるが、確率の大きさに応じて、コストを小さくするようなコスト関数を用いる。そして、レベル遷移コストに関しては、低いレベルから高いレベルへの時間遷移に対して、レベルの増加量に応じて、コストを高くするようなコスト関数を用いる。設定された離散レベルの時系列として、時系列全体に対してレベルコストとレベル遷移コストの総合コストが最小になる時系列を求める。つまり、時系列データ量の削減と予測因子・視聴率の相関性とは、トレードオフの関係にあるのであるが、これらの２つの間の相互関係に関して最適解を与えるために、確率モデルに基づいた時系列コスト関数という方法が導入される。 FIG. 9 shows a procedure for converting time-series data using a probability model from input of time-series data of step 901 to output of period representation data of step 906. This procedure is characterized in setting the cost function in step 903 after a plurality of discrete levels have been set in step 902. As for the level cost, a cost function based on the probability function for each discrete level is set, but a cost function that reduces the cost according to the magnitude of the probability is used. Then, with regard to the level transition cost, a cost function that raises the cost according to the amount of increase in level is used for time transition from a low level to a high level. As a set of discrete level time series, a time series that minimizes the total cost of level cost and level transition cost for the entire time series is determined. That is, although there is a trade-off between reduction of time-series data volume and correlation between predictors and ratings, in order to give an optimal solution regarding the correlation between these two, it is based on a probability model A method called time series cost function is introduced.

コスト関数の一例としては、たとえば、非特許参考文献６に記載されている関数を用いることが可能である。すなわち、レベルコスト関数として、確率Ｐと総数Ｎとをパラメータとする２項分布Ｂ（Ｐ，Ｎ）を用いる。レベル指標Ｊを０，１，２，３，．．として、２項分布の確率ＰをＰ０＊２^Ｊとする。ここで、規定確率Ｐ０と総数Ｎはすべてのレベルコスト確率関数に共通である。ある時刻におけるカウント数Ｘに対するレベルＪのコストは、ｌｏｇ（Ｘ）−ｌｏｇ（Ｂ（Ｐ０＊２^Ｊ，Ｎ））とする。 As an example of the cost function, it is possible to use, for example, the function described in Non-Patent Document 6. That is, as the level cost function, a binomial distribution B (P, N) using the probability P and the total number N as parameters is used. The level index J is 0, 1, 2, 3,. . Let P 0 * 2 ^J be the probability P of the binomial distribution. Here, the defined probability P0 and the total number N are common to all level cost probability functions. Cost Level J for counting the number of X at a certain time, the log (X) -log (B ( P0 * 2 J, N)).

次に、レベルコストの計算に使用する規定確率Ｐ０と総数Ｎとの決め方を説明する。参考時系列を選ぶのであるが、視聴率の最も高い番組の時系列を用いることができるし、情報サイト全体のアクセス数の時系列も用いることができる。Ｎは時系列の最大値とする。Ｐ０は平均値とＮの比とする。Ｎは２４時間、または、一週間の周期を持つ時間の関数として設定され得る。Ｎは、番組の種類に応じるようにも、決定され得る。 Next, how to determine the defined probability P0 and the total number N used to calculate the level cost will be described. Although the reference time series is selected, the time series of the program with the highest audience rating can be used, and the time series of the number of accesses of the entire information site can also be used. N is the maximum value of time series. P0 is the ratio of the average value to N. N can be set as a function of time with a period of 24 hours or a week. N can also be determined to correspond to the type of program.

次に、ステップ９０４においてコスト関数を決定した後に、ステップ９０４において入力時系列の各時刻における値を（時刻、離散レベル）として表し、コストが最小となる時系列を求める。ステップ９０５において、ステップ９０４の出力である離散レベルの時系列に対して、同じレベルの値が続き期間を（開始時刻、終了時刻、レベル値）で表し、期間表現へ変換する。最後のステップ９０６において、期間表現の圧縮データを出力する。 Next, after the cost function is determined in step 904, the values at each time of the input time series are represented as (time, discrete level) in step 904, and the time series with the smallest cost is determined. In step 905, for the time series of discrete levels which is the output of step 904, the same level value represents the continuation period by (start time, end time, level value), and is converted into a period representation. In the final step 906, compressed data of period representation is output.

図１０および図１１には、出力データの例が示されている。すなわち、図１０においては、入力時系列データの、時刻ごとのＰＶＣ（ＰＶＣデータ形式）データと、ステップ９０４において得られた離散レベル時系列の出力データ（ＢＬＳデータ形式）に変化した場合の入力および出力との例が、示されている。ステップ９０５においてＢＬＳデータ形式を期間表現データ出力（ＢＵＲ形式）に変換した例も、示されている。図１１（ａ）および図１１（ｂ）には、実際の計算結果の例と、ＵＮＩＸ時間を横軸に取ってグラフ化したものが示されている。 10 and 11 show examples of output data. That is, in FIG. 10, input at the time of changing to PVC (PVC data format) data for each time of input time series data and discrete level time series output data (BLS data format) obtained in step 904, and An example with the output is shown. An example in which the BLS data format is converted to period representation data output (BUR format) in step 905 is also shown. FIGS. 11 (a) and 11 (b) show an example of the actual calculation result and a graph plotting UNIX time on the horizontal axis.

図１２（ａ）は、ＢＬＳデータを用いた推定モデルの例であり、予測因子＝（Ｌａｓｔ，−３，３）の場合の計算結果である。図１２（ｂ）は、ＢＬＳデータを用いた推定の例であり、予測因子＝（Ｌａｓｔ，−３，３）を用いた線形回帰モデルの計算結果である。更に、図１３（ａ）は、ＢＬＳデータを用いた予測の例であり、予測因子＝（Ｌａｓｔ，−１００，−４０）の場合の計算結果である。図１３（ｂ）は、ＢＬＳデータを用いた予測の例であり、予測因子＝（Ｌａｓｔ，−１００，−４０）を用いた線形回帰モデルの計算結果である。
図１４には、時系列データ圧縮の効果が図解されている。以上で説明したように、たとえば確率モデルを用いてデータを圧縮することによって、予測因子と視聴率との相関性を損なうことなく、必要なデータ圧縮が可能になり、本発明による視聴率推定を、大規模で高価なシステムを用いなくても、ごく一般的なパソコンで実行することが可能になる。 FIG. 12 (a) is an example of an estimation model using BLS data, and is a calculation result in the case of a prediction factor = (Last, −3, 3). FIG. 12 (b) is an example of estimation using BLS data, which is the calculation result of a linear regression model using a predictor = (Last, −3, 3). Further, FIG. 13 (a) is an example of prediction using BLS data, and is a calculation result in the case of a prediction factor = (Last, -100, -40). FIG. 13 (b) is an example of prediction using BLS data, which is the calculation result of a linear regression model using a predictor = (Last, −100, −40).
FIG. 14 illustrates the effect of time series data compression. As described above, by compressing data using, for example, a probability model, necessary data compression becomes possible without losing the correlation between the predictor and the rating, and the rating estimation according to the present invention It will be possible to run on a very common personal computer without using a large and expensive system.

以上では、本発明による視聴率の推定（予測）がなされるアルゴリズムに着目した実施例を説明した。次には、必要な条件を入力すると本発明によるアルゴリズムに従って視聴率の推定（予測）が可能なシステムを、たとえば、ウェブサイトとして実装する場合に関する第２の実施例について、図１５を参照することにより説明したい。
図１５には、第２の実施例を構成する各要素の構成が示されている。まず、ユーザインターフェース１５０１は、ユーザにより任意に指定された番組を入力として受け取り、入力された情報を、クエリモジュール１５０２に提供する。このユーザインターフェース１５０１は、たとえば、視聴率の推定を希望するユーザがネットワーク経由で本発明によるシステムにアクセスする場合には、クライアントアプリケーション、ウェブサイトあるいはスマートフォン向けアプリとして実装され得る。また、ユーザインターフェース１５０１は、本発明のアルゴリズムによる推定がなされると、推定された視聴率を推定モジュール１５０３から受け取る。推定モジュール１５０３は、複数の推定モデルを記録し、ユーザインターフェース１５０１が指定する番組の種類に対して推定結果を返す。たとえば、推定モデルが、複数の参考番組リストそれぞれに対して線形回帰モデルの係数、または、最小範囲モデルの凸包絡の点のリストをモデルとして記録し、指定された番組の予測因子を情報サイトアクセス数データ１５０４から取得し、その予測因子に対する推定視聴率を、推定モデルから求め、その結果をユーザインターフェース１５０１に返す。推定モジュールは予測因子を計算するために、時系列データを情報サイトアクセス数データ１５０４から取得する。また、推定モデルは、指定された番組に対応するモデルが存在しない場合、学習モジュール１５０５から新しいモデルを取得する。学習モジュール１５０５は、参考番組のデータを用いて推定モデルを学習するモジュールである。学習結果を推定モジュール１５０３に報告する。時系列解析モジュール１５０８は、時系列データから予測因子を計算するモジュールであり、また、必要に応じて、時系列データの圧縮データの形式を変換する。データ取得モジュール１５０９は、情報サイトアクセス数（ＰＶＣ）データを取得するモジュールである。 The above has described an embodiment focusing on an algorithm for estimating (predicting) the audience rating according to the present invention. Next, refer to FIG. 15 for a second embodiment relating to, for example, the implementation of a system capable of estimating (predicting) the rating according to the algorithm according to the present invention when the necessary conditions are input. I would like to explain by
FIG. 15 shows the configuration of each element constituting the second embodiment. First, the user interface 1501 receives a program arbitrarily designated by the user as an input, and provides the input information to the query module 1502. This user interface 1501 may be implemented as a client application, a website or an application for a smartphone, for example, when a user desiring to estimate the rating accesses the system according to the present invention via the network. The user interface 1501 also receives the estimated ratings from the estimation module 1503 when the estimation according to the algorithm of the present invention is made. The estimation module 1503 records a plurality of estimation models and returns estimation results for the type of program specified by the user interface 1501. For example, an estimation model records, as a model, a coefficient of a linear regression model or a list of convex envelope points of a minimum range model as a model for each of a plurality of reference program lists, and a predictor of a designated program is accessed as an information site The estimated audience rating for the predictor is obtained from the estimated model, and the result is returned to the user interface 1501. The estimation module obtains time-series data from the information site access number data 1504 to calculate a prediction factor. Also, the estimation model obtains a new model from the learning module 1505 when there is no model corresponding to the specified program. The learning module 1505 is a module that learns an estimated model using data of a reference program. The learning result is reported to the estimation module 1503. The time-series analysis module 1508 is a module that calculates a prediction factor from time-series data, and converts the format of compressed data of time-series data as needed. The data acquisition module 1509 is a module for acquiring information site access count (PVC) data.

また、別の利用方法として、クエリモジュール１５０２は、情報サイトアクセス数データ１５０４からユーザインターフェース１５０１が指定する番組と時系列が類似する検索結果を返す。 Also, as another usage method, the query module 1502 returns, from the information site access number data 1504, a search result having a time series similar to the program specified by the user interface 1501.

さらに、ユーザインターフェース１５０１は、ユーザによって指定された番組を入力として受け取り、上記で説明した各処理の結果をユーザに提示することができる。たとえば、指定された番組に対して、時系列の相関が高い、他の番組を検索して、その結果をユーザに返すことができる。
Further, the user interface 1501 can receive a program specified by the user as an input, and can present the result of each process described above to the user. For example, for a specified program, another program having high time-series correlation can be searched and the result can be returned to the user.

Claims

A query module into which program specific information is input;
The estimation module provided from the query module, the inputted program specific information;
An information site access number database which is requested by the estimation module to provide the number of accesses to information associated with the program specified by the program specific information in response to the provision of the program specific information;
A program rating database requested to provide ratings of past programs related to the program specified by the program specifying information from the estimation module in response to the provision of the program specifying information;
Equipped with
The estimation module receives, from the information site access frequency database, a history of access frequency to information related to a program specified by the program specification information, and a program specified by the program specification information from the program rating database In a system for receiving an audience rating of a past program related to the above and estimating the audience rating of a program specified by the program specifying information based on the correlation between the access count history and the rating of the past program.
The number of times of access to information associated with the program specified by the program specifying information stored in the information site access frequency database is quantized to reduce the amount of data thereof,
The quantization is
Multiple discrete levels set
Set a cost function having a cost based on the probability function for each discrete level and a level transition cost that raises the cost according to the amount of level increase for time transition from low level to high level,
The level cost and the level transition cost of the entire access count history to the information related to the program specified by the program specifying information stored in the information site access count database as the set discrete level time series Find a time series that minimizes the overall cost,
Made by,
A system characterized by

A query module into which program specific information is input;
The estimation module provided from the query module, the inputted program specific information;
An information site access number database which is requested by the estimation module to provide the number of accesses to information associated with the program specified by the program specific information in response to the provision of the program specific information;
A program rating database requested to provide ratings of past programs related to the program specified by the program specifying information from the estimation module in response to the provision of the program specifying information;
Equipped with
The estimation module receives, from the information site access frequency database, a history of access frequency to information related to a program specified by the program specification information, and a program specified by the program specification information from the program rating database In a system for receiving an audience rating of a past program related to the above and estimating the audience rating of a program specified by the program specifying information based on the correlation between the access count history and the rating of the past program.
A minimum range model is used in the estimation of the audience rating of the program specified by the program specifying information performed by the estimation module;
The minimum range model is
The envelope of the distribution range in a two-dimensional space of the correlation between the access count history and the rating of the past program for a plurality of past programs related to the program specified by the program specification information is convexly enveloped or concaved It is calculated by the envelope,
The estimation module estimates an audience rating of a program specified by the program specifying information performed by the estimation module, the value on the envelope of the distribution range of the correlation between the access count history and the rating of the past program. Output as maximum and minimum values of,
A system characterized by

The system according to claim 1, wherein regression analysis including linear regression is used in the estimation of the audience rating.

Based on the accuracy of the rating estimated in the estimation of the rating, the determination of the related information in the information site used in the estimation of the rating and the past program related to the program specified by the program specifying information is again performed The system according to any one of the preceding claims, which is performed.

What is claimed is: 1. A method of estimating program ratings in a system comprising: a query module coupled in a mutually data-transferable manner, an estimation module, a program rating database, and an information site access frequency database,
The query module provides the program identification information input to the query module to the estimation module;
The estimation module, in response to the provision of the program identification information, requests the access frequency to the information associated with the program identified by the program identification information from the information site access frequency database, and Requesting an audience rating of a past program associated with the program identified by the program identification information from a program audience database;
The estimation module receives, from the information site access frequency database, a history of access frequency to information related to a program specified by the program specification information, and a program specified by the program specification information from the program rating database Receiving ratings of past programs associated with the
The estimation module estimating the rating of the program specified by the program specifying information based on the correlation between the access count history and the rating of the past program;
In a method that includes
The number of times of access to information associated with the program specified by the program specifying information stored in the information site access frequency database is quantized to reduce the amount of data thereof,
The quantization is
Multiple discrete levels set
Set a cost function having a cost based on the probability function for each discrete level and a level transition cost that raises the cost according to the amount of level increase for time transition from low level to high level,
The level cost and the level transition cost of the entire access count history to the information related to the program specified by the program specifying information stored in the information site access count database as the set discrete level time series Find a time series that minimizes the overall cost,
Made by,
A method characterized by

What is claimed is: 1. A method of estimating program ratings in a system comprising: a query module coupled in a mutually data-transferable manner, an estimation module, a program rating database, and an information site access frequency database,
The query module provides the program identification information input to the query module to the estimation module;
The estimation module, in response to the provision of the program identification information, requests the access frequency to the information associated with the program identified by the program identification information from the information site access frequency database, and Requesting an audience rating of a past program associated with the program identified by the program identification information from a program audience database;
The estimation module receives, from the information site access frequency database, a history of access frequency to information related to a program specified by the program specification information, and a program specified by the program specification information from the program rating database Receiving ratings of past programs associated with the
The estimation module estimating the rating of the program specified by the program specifying information based on the correlation between the access count history and the rating of the past program;
In a method that includes
A minimum range model is used in the estimation of the audience rating of the program specified by the program specifying information performed by the estimation module;
The minimum range model is
The envelope of the distribution range in a two-dimensional space of the correlation between the access count history and the rating of the past program for a plurality of past programs related to the program specified by the program specification information is convexly enveloped or concaved It is calculated by the envelope,
The estimation module estimates an audience rating of a program specified by the program specifying information performed by the estimation module, the value on the envelope of the distribution range of the correlation between the access count history and the rating of the past program. Output as maximum and minimum values of,
A method characterized by

The method according to claim 5, wherein in the estimation of the audience rating, regression analysis including linear regression is used.

Related information in an information site used in the step of estimating the rating based on the accuracy of the rating estimated in the step of estimating the rating, and a past program related to a program specified by the program specifying information The method according to any one of claims 5 to 7, wherein the determination of is made again.

A computer program comprising computer executable instructions that cause a computer to perform the steps included in the method according to any one of claims 5 to 8.

A computer readable storage medium having stored thereon a computer program comprising computer executable instructions for causing a computer to perform each step included in the method according to any one of claims 5 to 8.