JP7510025B1

JP7510025B1 - DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND PROGRAM

Info

Publication number: JP7510025B1
Application number: JP2024024830A
Authority: JP
Inventors: 達也河原; 暁鈴木; 玄田村; 弘幸青島; 耕太坂田
Original assignee: Video Research Co Ltd
Current assignee: Video Research Co Ltd
Priority date: 2023-11-30
Filing date: 2024-02-21
Publication date: 2024-07-02
Anticipated expiration: 2044-02-21

Abstract

The present invention makes it possible to acquire pseudo-sample data that is useful for analyzing exposure to target content via multiple media.
[Solution] The system includes an actual data acquisition unit that acquires single-source data for multiple users, the single-source data including a first value indicating the usage status of a first medium and a second value indicating the usage status of a second medium of a single user; a pseudo data generation unit that generates a pseudo sample of the single-source data so that the correlation coefficient between the first value and the second value is the same as that of the single-source data for the multiple users; and a contact frequency allocation unit that calculates a first contact frequency to the target content via the first medium for each of the generated pseudo samples, where the contact frequency allocation unit uses data indicating the contact status to the target content in the first medium and calculates the first contact frequency based on the first value in each pseudo sample.
[Selected figure] Figure 3

Description

本発明は、コンテンツへの接触状況を、疑似標本を用いて増幅させたデータを用いて分析するデータ処理装置、データ処理方法、およびプログラムに関する。 The present invention relates to a data processing device, a data processing method, and a program that analyzes content exposure using data amplified with pseudo samples.

近年、ある商品の広告についてテレビコマーシャルと動画サイトの広告など、複数のメディアでの接触者数の規模を調査することが行われている。 In recent years, research has been conducted into the scale of exposure to a certain product through multiple media, such as television commercials and video site advertisements.

例えば、特許文献１には、テレビＣＭへの接触者数のデータと、デジタル広告への接触者数のデータ、および複数の対象者の各々について、当該テレビＣＭの視聴有無と当該デジタル広告が掲載されたサイトの閲覧回数を示すデータ（シングルソースデータ）を用いて、テレビＣＭとデジタル広告の少なくとも一方への接触者数を算出する方法が開示されている。 For example, Patent Document 1 discloses a method for calculating the number of people exposed to at least one of a television commercial and a digital advertisement, using data on the number of people exposed to a television commercial, data on the number of people exposed to a digital advertisement, and data (single-source data) indicating, for each of a number of targets, whether or not the target person viewed the television commercial and the number of times the site on which the digital advertisement was posted was viewed.

また、例えば、特許文献２に記載されているように、コンテンツへの接触状況の調査において、実際の標本データに基づいて作成された疑似標本データを用いて、データ数を増幅させることが知られている。 In addition, as described in Patent Document 2, for example, it is known that in a survey of content exposure, pseudo sample data created based on actual sample data is used to increase the amount of data.

特開２０２０－１６０６５７号公報JP 2020-160657 A 特開２０２２－０２８３７０号公報JP 2022-028370 A

一般にシングルソースデータには、同一個人についての、複数のメディアへの接触状況を示す情報が含まれているが、複数のメディアを介した対象広告への接触頻度を示す情報が含まれているとは限らない。しかし、複数のメディアを介した対象広告への接触状況を分析するためには、各々のメディアでの実態に即した接触状況を示す情報を含むデータが求められていた。 Single-source data generally contains information indicating the same individual's exposure to multiple media, but does not necessarily contain information indicating the frequency of exposure to targeted advertising via multiple media. However, in order to analyze exposure to targeted advertising via multiple media, data containing information indicating exposure that reflects the actual situation for each medium was required.

本発明は、複数のメディアを介した対象コンテンツへの接触状況の分析に有用な疑似標本データの取得を可能にすることを目的とする。 The present invention aims to make it possible to obtain pseudo-sample data that is useful for analyzing exposure to target content via multiple media.

本発明に係るデータ処理装置は、シングルユーザの第１のメディアの利用状況を示す第１の値および第２のメディアの利用状況を示す第２の値を含む複数のユーザについてのシングルソースデータを取得する実データ取得部と、前記第１の値と前記第２の値の相関係数が前記複数のユーザについてのシングルソースデータと変わらないように、疑似標本を生成する疑似データ生成部と、生成した各々の疑似標本について、前記第１のメディアを介して対象コンテンツへ接触した第１の接触頻度を算出する接触頻度割り当て部と、を備え、前記接触頻度割り当て部は、前記第１のメディアにおける前記対象コンテンツへの接触状況を示すデータを利用し、各々の疑似標本における前記第１の値に基づいて、前記第１の接触頻度を算出するものである。 The data processing device according to the present invention includes an actual data acquisition unit that acquires single-source data for a plurality of users, including a first value indicating the usage status of a first medium and a second value indicating the usage status of a second medium for a single user; a pseudo data generation unit that generates pseudo samples such that the correlation coefficient between the first value and the second value is not different from that of the single-source data for the plurality of users; and a contact frequency allocation unit that calculates a first contact frequency to a target content via the first medium for each of the generated pseudo samples, the contact frequency allocation unit using data indicating the contact status to the target content in the first medium and calculating the first contact frequency based on the first value in each pseudo sample.

本発明に係るデータ処理方法は、プロセッサが、シングルユーザの第１のメディアの利用状況を示す第１の値および第２のメディアの利用状況を示す第２の値を含む複数のユーザについてのシングルソースデータを取得する工程と、プロセッサが、前記第１の値と前記第２の値の相関係数が前記複数のユーザについてのシングルソースデータと変わらないように、前記シングルソースデータの疑似標本を生成する工程と、プロセッサが、生成した各々の疑似標本について、前記第１のメディアを介して対象コンテンツへ接触した第１の接触頻度を算出する工程と、を備え、前記第１の接触頻度を算出する工程では、前記第１のメディアにおける前記対象コンテンツへの接触状況を示すデータを利用し、各々の疑似標本における前記第１の値に基づいて、前記第１の接触頻度を算出するものである。 The data processing method according to the present invention includes a step of acquiring single-source data for a plurality of users, the single-source data including a first value indicating the usage status of a first medium and a second value indicating the usage status of a second medium by a single user, a step of generating pseudo samples of the single-source data such that the correlation coefficient between the first value and the second value is the same as that of the single-source data for the plurality of users, and a step of calculating a first contact frequency of contact with the target content via the first medium for each of the generated pseudo samples, in which the step of calculating the first contact frequency uses data indicating the contact status of the target content in the first medium and calculates the first contact frequency based on the first value in each pseudo sample.

本発明に係るプログラムは、コンピュータを、シングルユーザの第１のメディアの利用状況を示す第１の値および第２のメディアの利用状況を示す第２の値を含む複数のユーザについてのシングルソースデータを取得する実データ取得部と、前記第１の値と前記第２の値の相関係数が前記複数のユーザについてのシングルソースデータと変わらないように、前記シングルソースデータの疑似標本を生成する疑似データ生成部と、生成した各々の疑似標本について、前記第１のメディアを介して対象コンテンツへ接触した第１の接触頻度を算出する接触頻度割り当て部として機能させ、前記接触頻度割り当て部は、前記第１のメディアにおける前記対象コンテンツへの接触状況を示すデータを利用し、各々の疑似標本における前記第１の値に基づいて、前記第１の接触頻度を算出するものである。 The program of the present invention causes a computer to function as an actual data acquisition unit that acquires single-source data for multiple users, including a first value indicating the usage status of a first medium and a second value indicating the usage status of a second medium for a single user; a pseudo data generation unit that generates a pseudo sample of the single-source data such that the correlation coefficient between the first value and the second value is not different from that of the single-source data for the multiple users; and a contact frequency allocation unit that calculates a first contact frequency to a target content via the first medium for each of the generated pseudo samples, the contact frequency allocation unit using data indicating the contact status to the target content in the first medium and calculating the first contact frequency based on the first value in each pseudo sample.

本発明によれば、複数のメディアを介した対象コンテンツへの接触状況の分析に有用な疑似標本データの取得を可能にすることができる。 The present invention makes it possible to obtain pseudo-sample data that is useful for analyzing exposure to target content via multiple media.

本発明の実施の形態１による、データ処理装置１の構成を示すブロック図。1 is a block diagram showing a configuration of a data processing device 1 according to a first embodiment of the present invention. 本発明の実施の形態１による、データ処理装置１のプロセッサ１１によって実行されるプログラムの機能モジュールを示すブロック図。1 is a block diagram showing functional modules of a program executed by a processor 11 of a data processing device 1 according to a first embodiment of the present invention. 本発明の実施の形態１による、データ処理装置１の動作のフローチャート。3 is a flowchart of the operation of the data processing device 1 according to the first embodiment of the present invention. 本発明の実施の形態１による、シングルソースデータと疑似標本の具体例を示す図。3A and 3B are diagrams showing specific examples of single-source data and pseudo samples according to the first embodiment of the present invention. 本発明の実施の形態１による、疑似標本数の決定に利用する、テレビ所有者人口の性別／年齢構成の統計データを例示する図。FIG. 2 is a diagram illustrating statistical data on the gender/age structure of the television owner population used to determine the pseudo sample size according to the first embodiment of the present invention. 本発明の実施の形態１による、動画サイトでの広告Ｃへの接触回数の算出方法について説明する図。5 is a diagram for explaining a method for calculating the number of exposures to advertisement C on a video site according to the first embodiment of the present invention. FIG. 本発明の実施の形態１による、テレビでの広告Ｃへの接触回数の算出方法について説明する図。FIG. 4 is a diagram for explaining a method for calculating the number of exposures to advertisement C on television according to the first embodiment of the present invention. 本発明の実施の形態２による、動画サイトでの広告Ｃへの接触回数の算出方法について説明する図。FIG. 11 is a diagram for explaining a method for calculating the number of exposures to advertisement C on a video site according to the second embodiment of the present invention. 本発明の実施の形態２による、動画サイトでの広告Ｃへの接触回数の算出方法について説明する図。FIG. 11 is a diagram for explaining a method for calculating the number of exposures to advertisement C on a video site according to the second embodiment of the present invention.

次に、本発明を実施するための形態について、図面を参照して詳細に説明する。
（実施の形態１）
図１は、本発明の実施の形態１によるデータ処理装置１の構成を示すブロック図である。データ処理装置１は、１台または通信回線で接続された複数のコンピュータによって構成される。データ処理装置１は、プロセッサ１１と、メインメモリ１２と、入出力インタフェース１３と、通信インタフェース１４と、記憶装置１５を備えている。記憶装置１５は、半導体メモリ（例えば、揮発性メモリや不揮発性メモリ）、またはディスク媒体（例えば、磁気記録媒体や光磁気記録媒体）などのコンピュータ読み取り可能な記録媒体である。記憶装置１５には、プロセッサ１１に実行させるためのプログラムや、各種データ等が記憶されている。プログラムは、記憶装置１５からメインメモリ１２に読み込まれ、プロセッサ１１により解釈及び実行されることにより、各種機能が実行される。 Next, an embodiment of the present invention will be described in detail with reference to the drawings.
(Embodiment 1)
1 is a block diagram showing the configuration of a data processing device 1 according to a first embodiment of the present invention. The data processing device 1 is composed of one computer or a plurality of computers connected by a communication line. The data processing device 1 includes a processor 11, a main memory 12, an input/output interface 13, a communication interface 14, and a storage device 15. The storage device 15 is a computer-readable recording medium such as a semiconductor memory (e.g., a volatile memory or a non-volatile memory) or a disk medium (e.g., a magnetic recording medium or a magneto-optical recording medium). The storage device 15 stores programs to be executed by the processor 11, various data, and the like. The programs are read from the storage device 15 into the main memory 12, and are interpreted and executed by the processor 11 to perform various functions.

図２は、データ処理装置１のプロセッサ１１によって実行されるプログラムの機能モジュールを示すブロック図である。図２に示すように、データ処理装置１のプロセッサ１１によって実行される機能モジュールには、実データ取得部１０１、疑似データ生成部１０２、接触頻度割り当て部１０３、集計部１０４が含まれる。 Figure 2 is a block diagram showing the functional modules of a program executed by the processor 11 of the data processing device 1. As shown in Figure 2, the functional modules executed by the processor 11 of the data processing device 1 include an actual data acquisition unit 101, a pseudo data generation unit 102, a contact frequency allocation unit 103, and a counting unit 104.

記憶装置１５には、実測のシングルソースデータ（実データ）や実データに基づいて生成した疑似標本データが記憶されている。シングルソースデータとは、シングルユーザ（同一個人）における複数のメディアへの接触状況を測定した結果を含むデータである。本実施形態では、一例として、同一個人における、テレビ利用時間、テレビで広告に接触した回数、および動画サイト（ＹｏｕＴｕｂｅ（登録商標）等）の利用時間の計測結果を含むデータをシングルソースデータとして利用する。 The storage device 15 stores actual single-source data (actual data) and pseudo-sample data generated based on the actual data. Single-source data is data that includes the results of measuring the exposure of a single user (the same individual) to multiple media. In this embodiment, as an example, data that includes the results of measuring the amount of time a single individual spends watching television, the number of times they have been exposed to advertisements on television, and the amount of time they spend watching video sites (such as YouTube (registered trademark))) is used as single-source data.

次に、図３のフローチャートを用いて、データ処理装置１による、データ処理の流れについて説明する。データ処理装置１は、ある広告Ｃ（対象コンテンツ）について、テレビ（第２のメディア）での広告Ｃへの接触状況と、ウェブの動画サイト（第１のメディア）での広告Ｃへの接触状況を分析するためのデータを生成する。なお、ここでは、複数メディアでの対象広告への接触状況を分析する例を挙げているが、接触状況を分析する対象コンテンツは広告には限られず、例えば特定の番組や動画等であってもよい。 Next, the flow of data processing by the data processing device 1 will be described with reference to the flowchart of FIG. 3. For a certain advertisement C (target content), the data processing device 1 generates data for analyzing the exposure to advertisement C on television (second medium) and the exposure to advertisement C on a web video site (first medium). Note that, although an example of analyzing the exposure to a target advertisement in multiple media is given here, the target content for which the exposure is analyzed is not limited to advertisements and may be, for example, a specific program or video.

まず、実測データ取得部１０１は、テレビ（第２のメディア）の利用履歴とウェブの動画サイト（第１のメディア）の利用履歴に関するシングルソースデータ（実データ）を取得する（ステップＳ１０１）。図４（Ａ）は、シングルソースデータの具体例を示す図である。図４（Ａ）に示すように、シングルソースデータには、各々の調査対象ユーザ（Ｓｎｏ００１，００２，…）についての所定の調査期間（例えば、１週間）におけるテレビ利用時間（分）（利用状況を示す第２の値）、テレビで広告Ｃに接触した回数（回）、動画サイト（Ｙｏｕｔｕｂｅ等）の利用時間（分）（利用状況を示す第１の値）を含んでいる。なお、シングルソースデータには、動画サイトにおける広告Ｃへの接触回数は含まれていない。 First, the actual data acquisition unit 101 acquires single source data (actual data) related to the usage history of television (second medium) and the usage history of a web video site (first medium) (step S101). FIG. 4 (A) is a diagram showing a specific example of single source data. As shown in FIG. 4 (A), the single source data includes the television usage time (minutes) (second value indicating usage status) for each survey target user (Sno001, 002, ...) in a specified survey period (e.g., one week), the number of times (times) that the user came into contact with advertisement C on television, and the usage time (minutes) (first value indicating usage status) of a video site (YouTube, etc.). Note that the single source data does not include the number of times that the user came into contact with advertisement C on the video site.

また、シングルソースデータは、ユーザの属性情報（性別、年齢等）を含んでいてもよい。図４（Ａ）の例では、属性情報として性別・年齢区分を含んでおり、図に示すように１８～２４歳の男性（Ｍ１８－２４）のユーザに関するシングルソースデータが取得されている。 The single-source data may also include user attribute information (gender, age, etc.). In the example of FIG. 4(A), attribute information includes gender and age category, and as shown in the figure, single-source data is obtained for male users aged 18-24 (M18-24).

次に、疑似データ生成部１０２は、取得したシングルソースデータと同様のデータ項目を持ち、同様の分布を持つ疑似標本データを生成する（ステップＳ１０２）。図４（Ｂ）は、図４（Ａ）のシングルソースデータに基づいて生成した疑似標本データを例示する図である。疑似データ生成部１０２は、ステップＳ１０１で取得したシングルソースデータに基づいて、データを構成する３項目（テレビ利用時間、テレビ広告接触回数、動画サイトの利用時間）についての３次元正規分布を求める。さらに、求めた３次元正規分布に従って、ランダムに疑似標本データを生成する。疑似データ生成部１０２は、生成した疑似標本データにおいて、各項目（テレビ利用時間、テレビ広告接触回数、動画サイトの利用時間）の平均および項目間の相関係数が、元のシングルソースデータにおける平均および相関係数と同じになるように疑似標本データを生成する。なお、図４（Ｂ）の例では、疑似標本の各項目の数値には正規分布乱数が割り当てられているため、例えばテレビ広告接触回数についても、自然数ではなく小数点以下を含む数値となっている。 Next, the pseudo data generating unit 102 generates pseudo sample data having the same data items and the same distribution as the acquired single source data (step S102). FIG. 4B is a diagram illustrating pseudo sample data generated based on the single source data of FIG. 4A. The pseudo data generating unit 102 obtains a three-dimensional normal distribution for the three items constituting the data (television usage time, number of exposures to television advertisements, and usage time of video sites) based on the single source data acquired in step S101. Furthermore, the pseudo sample data is generated randomly according to the obtained three-dimensional normal distribution. The pseudo data generating unit 102 generates pseudo sample data so that the average and correlation coefficient between the items (television usage time, number of exposures to television advertisements, and usage time of video sites) in the generated pseudo sample data are the same as the average and correlation coefficient in the original single source data. Note that in the example of FIG. 4B, a normal distribution random number is assigned to the numerical value of each item of the pseudo sample, so that, for example, the number of exposures to television advertisements is not a natural number but a numerical value including a decimal point.

また、疑似データ生成部１０２が生成する疑似標本の数は調査の目的に応じて設定することができる。図４（Ｂ）の例では、図５に例示するテレビ所有者人口の性別／年齢構成の統計データに基づいて疑似標本数を決定している。図５は、疑似標本人数を１０万人とした場合の各性別／年齢区分におけるテレビ所有者人口を示しており、ＭＦ、Ｍ、Ｆはそれぞれ男女、男性、女性を表し、その横の数字が年齢層を表している。図４（Ｂ）は、１８～２４歳の男性（Ｍ１８－２４）の実データに基づいて生成された疑似標本であり、図５によれば、全ＴＶ所有者人口を１０万人とした場合、そのうちの１８～２４歳の男性の人数は３３４６人となるため、図４（Ｂ）の例では３３４６件の疑似標本を生成している。なお、ここでは性別／年齢区分毎のテレビ所有者人口を想定したデータを用いているが、テレビ所有者人口のみならず、例えば性別／年齢区分毎の全人口などを想定することもできる。 The number of pseudo samples generated by the pseudo data generating unit 102 can be set according to the purpose of the survey. In the example of FIG. 4(B), the number of pseudo samples is determined based on the statistical data of the gender/age composition of the television owner population shown in FIG. 5. FIG. 5 shows the television owner population in each gender/age category when the number of pseudo samples is 100,000, where MF, M, and F represent male and female, male, and female, respectively, and the numbers next to them represent the age group. FIG. 4(B) shows a pseudo sample generated based on actual data of men aged 18 to 24 (M18-24). According to FIG. 5, when the total TV owner population is 100,000, the number of men aged 18 to 24 is 3346, so in the example of FIG. 4(B), 3346 pseudo samples are generated. Note that here, data assuming the television owner population for each gender/age category is used, but it is also possible to assume not only the television owner population, but also the total population for each gender/age category, for example.

次に、接触頻度割り当て部１０３は、生成した各々の疑似標本について、動画サイトを介して広告Ｃに接触した回数（第１の接触頻度）を算出する（ステップＳ１０３）。 Next, the contact frequency allocation unit 103 calculates the number of times advertisement C was contacted via the video site (first contact frequency) for each of the generated pseudo samples (step S103).

図６を用いて、接触頻度割り当て部１０３による動画サイトでの広告Ｃへの接触回数の算出方法について説明する。接触回数の算出には、公式データとして提供されている動画サイトにおける広告Ｃへの接触回数の分布データを利用する。図６（Ａ）の表の２列目には、所定の母集団における広告Ｃへの接触回数（０回～１０回以上）の分布（公式データ）が例示されており、３列目には、ステップＳ１０２で生成した疑似標本（図４（Ｂ）の例では３３４６人分のデータ）を、２列目の分布に合わせて各接触回数（０回～１０回以上）に割り当てた標本数（データ数）が示されている。また、４列目には、３列目の数値の小数点以下を四捨五入し、接触回数１０回以上の人数を調整して合計が３３４６人になるようにした結果を示している。 Using FIG. 6, a method for calculating the number of contacts to advertisement C on a video site by the contact frequency allocation unit 103 will be described. The distribution data of the number of contacts to advertisement C on a video site provided as official data is used to calculate the number of contacts. The second column of the table in FIG. 6(A) illustrates the distribution (official data) of the number of contacts to advertisement C in a given population (0 to 10 or more), and the third column shows the number of samples (number of data) allocated to each number of contacts (0 to 10 or more) from the pseudo sample generated in step S102 (data for 3,346 people in the example of FIG. 4(B)) in accordance with the distribution in the second column. The fourth column shows the result of rounding off the decimal point of the numbers in the third column and adjusting the number of people with 10 or more contacts so that the total is 3,346 people.

図６（Ｂ）は、ステップＳ１０２で生成した各疑似標本に、テレビＣＭの接触回数の順位（表３列目）と動画サイトの利用時間の順位（表６列目）を付与した例を示す図である。テレビＣＭの接触回数の順位（表３列目）は、表４列目のテレビで広告Ｃに接触した回数が小さい順に、順位が付与されている。一方、動画サイトの利用時間の順位（表６列目）は、動画サイトの利用時間が短い順に順位が付与されている。 Figure 6 (B) shows an example in which the pseudo samples generated in step S102 are ranked by the number of exposures to TV commercials (third column of the table) and the time spent on video sites (sixth column of the table). The ranking of the number of exposures to TV commercials (third column of the table) is based on the number of exposures to advertisement C on TV in the fourth column of the table, in ascending order. On the other hand, the ranking of the time spent on video sites (sixth column of the table) is based on the time spent on video sites, in descending order.

接触頻度割り当て部１０３は、図６（Ａ）に示す動画サイトにおける広告Ｃへの接触回数の分布に基づいて、図６（Ｂ）の各疑似標本についての動画サイトにおける広告Ｃへの接触回数を算出する。図６（Ａ）の４列目を参照すると、疑似標本の３３４６件のうち、１２５５件については、動画サイトにおける広告Ｃへの接触回数は「０」回である。このため、接触頻度割り当て部１０３は、図６（Ｂ）の疑似標本のうち、動画サイトの利用時間が短い順に１２５５番目までの疑似標本について、広告Ｃへの接触回数を「０」回とする。同様に、１２５６番目から１６９０番目までの標本については、広告Ｃへの接触回数を「１」回、１６９１番目から２００８番目までは「２」回、２００９番目から２３１９番目までは「３」回、２３２０番目から２６７７番目までは「４」回とする。図６（Ｂ）の例で、Ｓｎｏ００１，００２の標本は１２５５番目までに含まれるため広告Ｃへの接触回数は０回となる。一方、Ｓｎｏ００３の標本は、２３２０番目から２６７７番目の範囲に含まれるため、広告Ｃへの接触回数は４回となる。以上のようにして、疑似標本データにおける動画サイトでの広告Ｃへの接触回数を設定することができる。 The contact frequency allocation unit 103 calculates the number of contacts to advertisement C on the video site for each pseudo sample in FIG. 6(B) based on the distribution of the number of contacts to advertisement C on the video site shown in FIG. 6(A). Referring to the fourth column in FIG. 6(A), for 1255 of the 3346 pseudo samples, the number of contacts to advertisement C on the video site is "0". Therefore, the contact frequency allocation unit 103 sets the number of contacts to advertisement C to "0" for the pseudo samples up to the 1255th pseudo sample in the order of the shortest usage time of the video site among the pseudo samples in FIG. 6(B). Similarly, for the 1256th to 1690th samples, the number of contacts to advertisement C is "1", for the 1691st to 2008th samples, "2", for the 2009th to 2319th samples, "3", and for the 2320th to 2677th samples, "4". In the example of FIG. 6(B), samples Sno 001 and 002 are included up to the 1255th, so the number of exposures to advertisement C is 0. On the other hand, sample Sno 003 is included in the range from the 2320th to the 2677th, so the number of exposures to advertisement C is 4. In this way, the number of exposures to advertisement C on the video site in the pseudo sample data can be set.

また、テレビＣＭの接触回数については疑似標本に既に値が含まれているが、テレビＣＭの接触回数の順位に基づいて、改めて設定するようにしてもよい。具体的には、動画サイトでの広告Ｃへの接触回数と同様に、公式データとして提供されているテレビでの広告Ｃへの接触回数の分布データ（図７の２列目）を利用し、３３４６人分のデータを各接触回数（例えば、０回～１０回以上）に割り当てて（図７の３列目）、各接触回数の割り当てデータ数を求め（図７の４列目）、図６（Ｂ）の３列目の順位にしたがって、テレビでの広告Ｃへの接触回数を割り当てていくようにしてもよい。これにより、テレビ広告についても、公式データの分布に整合する接触回数分布を持った疑似標本を作成することができる。例えば、図６（Ｂ）の例で、Ｓｎｏ００１は、疑似標本に元々示されているテレビＣＭの接触回数は５．６回であるが、テレビＣＭのランクが２２５３番目のため、図７の分布に従うと接触回数は２回となる。また、Ｓｎｏ００２は、疑似標本に元々示されているテレビＣＭの接触回数は３．３回であるが、テレビＣＭのランクが１５２１番目のため、図７の分布に従うと接触回数は０回となる。 In addition, although the pseudo sample already contains values for the number of exposures to TV commercials, it may be set again based on the ranking of the number of exposures to TV commercials. Specifically, similar to the number of exposures to advertisement C on a video site, the distribution data of the number of exposures to advertisement C on television provided as official data (second column of FIG. 7) is used, and data for 3346 people is assigned to each number of exposures (for example, 0 to 10 or more) (third column of FIG. 7), the number of assigned data for each number of exposures is found (fourth column of FIG. 7), and the number of exposures to advertisement C on television may be assigned according to the ranking in the third column of FIG. 6(B). This makes it possible to create a pseudo sample with a distribution of the number of exposures that matches the distribution of the official data for television advertisements as well. For example, in the example of FIG. 6(B), Sno001 has a number of exposures to TV commercials originally shown in the pseudo sample of 5.6, but since the rank of the TV commercial is 2253rd, the number of exposures according to the distribution of FIG. 7 is 2. In addition, for Sno002, the number of exposures to the TV commercial originally shown in the pseudo sample is 3.3 times, but because the TV commercial is ranked 1521st, the number of exposures is 0 according to the distribution in Figure 7.

以上のステップＳ１０１～Ｓ１０３の手順によって、テレビ利用時間、テレビにおける広告Ｃへの接触回数、および動画サイトの利用時間を含む限られた件数のシングルソースデータ（実データ）から、テレビにおける広告Ｃへの接触回数と動画サイトにおける広告Ｃへの接触回数を含む所望の件数の疑似標本を取得することができる。 By performing the above steps S101 to S103, a desired number of pseudo samples including the number of exposures to advertisement C on television and the number of exposures to advertisement C on the video site can be obtained from a limited number of single-source data (actual data) including television usage time, the number of exposures to advertisement C on television, and usage time on the video site.

（統合リーチ・重複リーチの分析）
集計部１０４は、生成した疑似標本を用いて統合リーチや重複リーチの推定を行う。統合リーチとは、複数の事象の少なくとも１つが成立する割合であり、上記の実施例ではテレビ広告と動画サイト広告の少なくとも一方に接触しているユーザの割合を示す。また、重複リーチとは、複数の事象の全てが成立する割合であり、上記の実施例ではテレビ広告と動画サイト広告の両方に接しているユーザの割合を示す。すなわち、上記の実施例では統合リーチと重複リーチは、例えば下記の式（１）、（２）で算出することができる。なお、下記の式（１）、（２）では、１回でも接触したユーザはリーチしたとみなすという前提で統合リーチ・重複リーチを計算している。リーチの定義はこれに限らず、例えば２回以上、３回以上接触した場合にリーチしたと判定する場合には、下記式において「接触回数≧２」、「接触回数≧３」と置き換えて計算することができる。 (Analysis of combined reach and overlapping reach)
The aggregation unit 104 estimates the integrated reach and overlapping reach using the generated pseudo sample. The integrated reach is the rate at which at least one of a plurality of events occurs, and in the above embodiment, it indicates the rate of users who are exposed to at least one of the television advertisement and the video site advertisement. The overlapping reach is the rate at which all of a plurality of events occur, and in the above embodiment, it indicates the rate of users who are exposed to both the television advertisement and the video site advertisement. That is, in the above embodiment, the integrated reach and overlapping reach can be calculated, for example, by the following formulas (1) and (2). Note that in the following formulas (1) and (2), the integrated reach and overlapping reach are calculated on the premise that a user who has been exposed even once is considered to have been reached. The definition of the reach is not limited to this, and for example, if it is determined that a user has been exposed two or more times, or three or more times, it can be calculated by replacing "number of exposures ≧ 2" and "number of exposures ≧ 3" in the following formulas.

統合リーチ＝（［テレビ広告の接触回数≧１のユーザの人数］＋［動画サイト広告の接触回数≧１のユーザの人数］－［テレビ広告と動画サイト広告の両方の接触回数≧１のユーザの人数］）／３３４６ …（１）
重複リーチ＝［テレビ広告と動画サイト広告の両方の接触回数≧１のユーザの人数］
／３３４６ …（２） Integrated reach = ([number of users with TV ad exposure count ≥ 1] + [number of users with video site ad exposure count ≥ 1] - [number of users with both TV ad exposure count and video site ad exposure count ≥ 1]) / 3346 ... (1)
Overlap reach = [number of users who have been exposed to both television ads and video site ads ≥ 1]
/3346 … (2)

生成した疑似標本を用いて統合リーチを求めることにより、テレビ広告と動画サイト広告それぞれへの接触率と、統合リーチの関係を分析し、効率の良い広告展開を行うために活用することができる。 By calculating the integrated reach using the generated pseudo-samples, the relationship between the contact rate for television ads and video site ads and the integrated reach can be analyzed, and this can be used to implement efficient advertising.

なお、上記の実施例では、テレビ広告と動画サイト広告への接触回数を含むシングルソースの疑似標本を取得しているが、疑似標本に含める項目は、分析目的によって調整することができる。例えば、動画サイトの広告Ｃに、テレビ画面で接触した場合とスマートフォンで接触した場合を区別するようにしてもよい。また、テレビの広告Ｃへの接触について、局別の接触回数を含むようにしてもよい。また、特定の時間帯や特定のサイトにおける接触回数も同様の手順で算出することができる。 In the above embodiment, a single-source pseudo sample is obtained that includes the number of exposures to television advertisements and video site advertisements, but the items included in the pseudo sample can be adjusted depending on the purpose of the analysis. For example, it is possible to distinguish between exposure to advertisement C on a video site on a television screen and exposure to it on a smartphone. In addition, the number of exposures to advertisement C on television may be included by station. The number of exposures during a specific time period or on a specific site can also be calculated using a similar procedure.

以上のように、本実施形態によれば、複数のメディアの利用時間を含むシングルソースデータを利用して、項目間の相関係数が変わらないように疑似標本を生成し、さらに、各メディアにおける対象広告Ｃへの接触回数の分布データを利用し、疑似標本における当該メディアの利用時間に基づいて、広告Ｃへの接触回数を割り当てるようにした。これにより、メディアの利用時間の情報しか含まれていないシングルソースデータを利用して、実態に即した接触回数を推定することができる。これにより、複数のメディアを介した広告Ｃへの接触状況の分析に活用できる疑似標本データを生成することができる。また、作成した疑似標本を用いて分析等を行っても、実測データを用いて分析した場合の結果と矛盾しない結果を得られることが期待できる。 As described above, according to this embodiment, single-source data including the usage time of multiple media is used to generate a pseudo sample so that the correlation coefficient between items does not change, and further, distribution data of the number of exposures to target advertisement C in each medium is used to assign the number of exposures to advertisement C based on the usage time of that medium in the pseudo sample. In this way, it is possible to estimate the number of exposures that is in line with the actual situation using single-source data that only includes information on the usage time of the media. This makes it possible to generate pseudo sample data that can be used to analyze the exposure situation to advertisement C via multiple media. Furthermore, even if an analysis is performed using the created pseudo sample, it is expected that results will be obtained that are not inconsistent with the results of an analysis using actual measured data.

本実施形態では、テレビ広告と動画サイトの広告への接触状況を示す疑似標本データを作成しているが、メディアの数や種類はこれに限られず、テレビやウェブの他に新聞やラジオなど複数のメディアへの接触状況に関する疑似標本の作成に利用することができる。また、統合リーチ、重複リーチ以外にも、シングルソースデータに基づいて分析、算出できる種々の指標や統計データを作成することができる。また、２種類のメディアの統合リーチや重複リーチに限らず、任意のメディア数の統合リーチや重複リーチ、その他の分析に対応することができる。 In this embodiment, pseudo sample data is created that indicates exposure to television advertisements and video site advertisements, but the number and types of media are not limited to this, and the data can be used to create pseudo samples related to exposure to multiple media, such as newspapers and radio in addition to television and the web. In addition to integrated reach and overlapping reach, various indicators and statistical data can be created that can be analyzed and calculated based on single-source data. Furthermore, the data is not limited to the integrated reach and overlapping reach of two types of media, but can handle the integrated reach and overlapping reach of any number of media, as well as other analyses.

また、作成した疑似標本データは、統合リーチ・重複リーチの分析だけでなく、例えば、以下のような用途にも利用することができる。
（１）広告接触者の属性プロフィールを描写に利用する。
（２）他のデータソースと融合させることにより、さらに様々な用途に利用することができる。具体的には以下のような例があげられる。
（２）－１：広告配信事業者のデータと融合し、リーチを補完するための効果的な配信を実現する。
（２）－２：ブランド評価データと融合し、ブランド評価への広告効果の分析に利用する。
（２）－３：購買履歴データと融合し、購買への広告効果の分析に利用する。
（２）－４：生活者の属性プロフィールデータと融合し、広告接触者の詳細なプロフィールの取得に利用する。 In addition, the pseudo-sample data that is created can be used not only for analyzing integrated reach and overlapping reach, but also for other purposes, such as the following:
(1) Use the demographic profile of those exposed to the ad to portray it.
(2) By combining it with other data sources, it can be used for a variety of purposes. Specific examples include the following:
(2)-1: By combining data from advertising distribution companies, we can achieve effective delivery to complement the reach.
(2)-2: Combine with brand evaluation data and use to analyze the effect of advertising on brand evaluation.
(2)-3: Combine with purchase history data and use to analyze the effectiveness of advertising on purchases.
(2)-4: Combine this with consumer attribute profile data to obtain detailed profiles of those who were exposed to the advertising.

（実施の形態２）
本発明の実施の形態２によるデータ処理装置１の構成およびデータ処理装置１のプロセッサ１１によって実行されるプログラムの機能モジュールは、図１，２に示す実施の形態１と同様である。また、データ処理装置１によるデータ処理の流れは、図３のフローチャートに示す流れと同様である。すなわち、図４（Ａ）に例示するようなシングルソースデータに基づいて、実施の形態１と同様に図４（Ｂ）に例示するような疑似標本データを生成する。さらに、接触頻度割り当て部１０３は、生成した各々の疑似標本について、動画サイトを介して広告Ｃに接触した回数（第１の接触頻度）を算出する。実施の形態２では、実施の形態１とは異なる方法で、動画サイトを介して広告Ｃに接触した回数を算出する。 (Embodiment 2)
The configuration of the data processing device 1 according to the second embodiment of the present invention and the functional modules of the program executed by the processor 11 of the data processing device 1 are the same as those of the first embodiment shown in Figures 1 and 2. Moreover, the flow of data processing by the data processing device 1 is the same as that shown in the flowchart of Figure 3. That is, based on single-source data as exemplified in Figure 4(A), pseudo sample data as exemplified in Figure 4(B) is generated as in the first embodiment. Furthermore, the contact frequency allocation unit 103 calculates the number of times advertisement C was contacted via the video site (first contact frequency) for each of the generated pseudo samples. In the second embodiment, the number of times advertisement C was contacted via the video site is calculated by a method different from that of the first embodiment.

実施の形態１では、公式データとして、図６（Ａ）に示すような動画サイトにおける広告Ｃへの接触回数の分布データが提供されており、これを利用して、各疑似標本における動画サイトを介して広告Ｃに接触した回数を算出した。一方、多くの動画サイトでは、上記のような広告Ｃへの接触回数の分布データは提供されていない。代わりに、当該動画サイトにおける広告Ｃへの接触の有無の比率を示すデータが提供されている場合がある。具体的には、所定の母集団（例えば、１８～２４歳の男性（Ｍ１８－２４））において、下記のように定義される値が提供されている。
接触有りの割合＝動画サイトにおける広告Ｃへの接触者数／母集団の人数
接触無しの割合＝１－（接触有りの割合） In the first embodiment, distribution data of the number of times of contact with advertisement C on a video site as shown in FIG. 6(A) is provided as official data, and this is used to calculate the number of times each pseudo sample was contacted with advertisement C via the video site. On the other hand, many video sites do not provide distribution data of the number of times of contact with advertisement C as described above. Instead, data showing the ratio of whether or not there was contact with advertisement C on the video site may be provided. Specifically, values defined as follows are provided for a given population (for example, men aged 18 to 24 (M18-24)):
Percentage of people with exposure = number of people who were exposed to ad C on the video site / number of people in the population Percentage of people without exposure = 1 - (percentage of people with exposure)

また、併せて広告Ｃへの接触有りの集団における平均接触回数が提供されている場合がある。具体的には、下記のように定義される値が提供されている。
平均接触回数＝動画サイトにおける広告Ｃの総表示回数／動画サイトにおける広告Ｃへの接触者数 In addition, the average number of contacts in the group that was exposed to advertisement C may also be provided. Specifically, values defined as follows are provided.
Average number of contacts = total number of times Ad C is displayed on the video site / number of people who came into contact with Ad C on the video site

実施の形態２では、動画サイトにおける広告Ｃへの接触の有無の比率を示すデータと、接触有りの集団における平均接触回数を利用して、各疑似標本における動画サイトを介して広告Ｃに接触した回数を算出する。 In the second embodiment, the number of times each pseudo sample was exposed to advertisement C via a video site is calculated using data indicating the ratio of those who were exposed to advertisement C on a video site and the average number of times the sample was exposed to advertisement C in the group who were exposed to advertisement C.

まず、接触頻度割り当て部１０３は、各疑似標本に、当該動画サイトにおける広告Ｃへの接触有無を割り当てる。図８（Ａ）の表の２列目は、公式データとして得られるデータであり、所定の母集団（例えば、１８～２４歳の男性（Ｍ１８－２４））における動画サイトでの広告Ｃへの接触有無の比率が例示されている。３列目は、疑似標本（ここでは１７９６４人分）を、２列目の比率に合わせて接触無し・接触有りに割り当てた人数を示している。また、４列目は、３列目の数値の小数点以下を四捨五入し、接触無しの人数を調整して合計が１７９６４人になるようにした結果を示している。 First, the contact frequency allocation unit 103 allocates to each pseudo sample whether or not they have been exposed to advertisement C on the video site. The second column of the table in FIG. 8(A) is data obtained as official data, and illustrates the ratio of whether or not a specific population (for example, men aged 18 to 24 (M18-24)) has been exposed to advertisement C on a video site. The third column shows the number of people in the pseudo sample (17,964 people in this case) who have been allocated to no contact or contact according to the ratio in the second column. In addition, the fourth column shows the result of rounding off the decimal point of the numbers in the third column and adjusting the number of people with no contact so that the total is 17,964 people.

図８（Ｂ）は、疑似標本に、動画サイトの利用時間の順位（表８列目）を付与した例を示す図である。動画サイトの利用時間の順位は、動画サイトの利用時間（表７列目）が短い順に順位が付与されている。接触頻度割り当て部１０３は、図８（Ａ）に示す動画サイトにおける広告Ｃへの接触有無の比率に基づいて、図８（Ｂ）の各疑似標本に、動画サイトにおける広告Ｃへの接触有無を割り当てる。図８（Ａ）の４列目を参照すると、疑似標本の１７９６４人のうち、１５７１９人については、動画サイトにおける広告Ｃへの接触は無しである。このため、接触頻度割り当て部１０３は、図８（Ｂ）の疑似標本のうち、動画サイトの利用時間が短い順に１５７１９番目までの疑似標本について、広告Ｃへの接触「無し」を割り当てる。同様に、１５７２０番目から１７９６４番目までの標本については、広告Ｃへの接触「有り」を割り当てる。 Figure 8 (B) is a diagram showing an example in which the pseudo samples are ranked by the time spent on the video site (column 8 of the table). The ranking of the time spent on the video site (column 7 of the table) is assigned in ascending order. The contact frequency allocation unit 103 assigns the presence or absence of contact with advertisement C on the video site to each pseudo sample in Figure 8 (B) based on the ratio of the presence or absence of contact with advertisement C on the video site shown in Figure 8 (A). With reference to the fourth column of Figure 8 (A), 15719 out of 17964 people in the pseudo sample have no contact with advertisement C on the video site. Therefore, the contact frequency allocation unit 103 assigns "no contact" with advertisement C to the 15719th pseudo sample in Figure 8 (B) in descending order of the time spent on the video site. Similarly, the 15720th to 17964th samples are assigned "yes" to contact with advertisement C.

次に、接触頻度割り当て部１０３は、広告Ｃへの接触「有り」の標本に対し、広告接触回数の期待値を割り当てる。接触頻度割り当て部１０３は、以下の３条件を満たす関係に基づき期待値を割り当てる。
条件１：期待値は、動画サイトの利用時間に比例する。
条件２：期待値の平均は、公式データの接触「有り」の集団における平均接触回数に一致する。
条件３：接触「有り」が割り当てられた疑似標本の中で、動画サイトの利用時間が最も短い標本の期待値は「１」になる。 Next, the contact frequency allocation unit 103 allocates an expected value of the number of advertisement contacts to the samples with “existence” of contact with advertisement C. The contact frequency allocation unit 103 allocates an expected value based on a relationship that satisfies the following three conditions.
Condition 1: The expected value is proportional to the amount of time spent on the video site.
Condition 2: The average expected value matches the average number of contacts among the group of people who have contact in the official data.
Condition 3: Among the pseudo-samples that are assigned "contact," the expected value of the sample with the shortest time spent on video sites is "1."

条件１～３を満たす関係に基づき期待値を求める手順について具体的に説明する。まず、接触頻度割り当て部１０３は、図９に示すような（Ｘ，Ｙ）＝（利用時間，接触回数の期待値）で定義される平面における以下の２点を通る直線（条件１）の式Ｙ＝ｃ＋ｂＸを求める。
点Ｐ１（条件３）：（接触「有り」の標本における利用時間の最小値，１）
点Ｐ２（条件２）：（接触「有り」の標本から算出した利用時間の平均Ａｔ，期待値の平均Ａｒ（ただし、期待値の平均Ａｒ＝公式データの「平均広告接触回数」）） A specific procedure for calculating the expected value based on the relationship that satisfies conditions 1 to 3 will be described below. First, the contact frequency allocation unit 103 calculates the equation Y=c+bX for a straight line (condition 1) that passes through the following two points on a plane defined by (X, Y)=(usage time, expected value of number of contacts) as shown in FIG.
Point P1 (Condition 3): (Minimum usage time in contact “present” samples, 1)
Point P2 (Condition 2): (Average usage time At calculated from samples with contact, average expected value Ar (where average expected value Ar = "average number of advertising contacts" in official data))

求められた直線の式（１）に、各標本の動画サイト利用時間（Ｘ）を代入して、各標本の広告接触回数の期待値Ｙを求める。
広告接触回数の期待値（Ｙ）＝ｃ＋ｂ×動画サイト利用時間（Ｘ）…（１）
（ｃ、ｂは定数） The video site usage time (X) of each sample is substituted into the equation (1) for the obtained straight line to obtain an expected value Y of the number of advertisement exposures for each sample.
Expected number of ad exposures (Y) = c + b × video site usage time (X) (1)
(c and b are constants)

さらに、接触頻度割り当て部１０３は、求められた各標本の期待値を利用して、各標本の広告接触回数を算出する。接触頻度割り当て部１０３は、例えば、期待値が各標本の期待値と一致する切断ポアソン分布に従う乱数を１つ発生させて、当該標本の広告接触回数とするようにしてもよい。広告接触回数は１以上の整数であるため、定義域が１以上である切断ポアソン分布を用いるようにしてよい。なお、切断ポアソン分布の乱数を発生させるために、切断前のポアソン分布の期待値（λ）が必要な場合には、各標本の期待値の範囲に応じて個別にλを計算するようにしてもよい。１以上で切断された切断ポアソン分布の期待値Ｅと切断前のポアソン分布の期待値λには下記の式の関係がある。
Ｅ＝λ／（１－ｅｘｐ（－λ）） Furthermore, the contact frequency allocation unit 103 uses the expected value of each sample thus obtained to calculate the number of advertisement contacts for each sample. The contact frequency allocation unit 103 may, for example, generate a random number according to a truncated Poisson distribution whose expected value coincides with the expected value of each sample, and set it as the number of advertisement contacts for that sample. Since the number of advertisement contacts is an integer of 1 or more, a truncated Poisson distribution whose domain is 1 or more may be used. Note that, if the expected value (λ) of the Poisson distribution before truncation is required to generate random numbers of the truncated Poisson distribution, λ may be calculated individually according to the range of the expected value of each sample. The expected value E of the truncated Poisson distribution truncated at 1 or more and the expected value λ of the Poisson distribution before truncation have the following relationship:
E = λ/(1-exp(-λ))

実施の形態２によれば、動画サイトにおける広告接触回数分布のデータが得られない場合でも、広告接触有無の比率と平均広告接触回数のデータが得られれば、疑似標本の実態に即した広告接触回数を推定することができる。これにより、実施の形態１と同様に、複数のメディアを介した広告Ｃへの接触状況の分析に活用できる疑似標本データを生成することができる。また、作成した疑似標本を用いて分析等を行っても、実測データを用いて分析した場合の結果と矛盾しない結果を得られることが期待できる。 According to the second embodiment, even if data on the distribution of ad exposures on video sites cannot be obtained, if data on the ratio of ad exposures and the average ad exposures can be obtained, the ad exposures can be estimated based on the actual situation of the pseudo sample. As a result, as in the first embodiment, pseudo sample data can be generated that can be used to analyze exposure to advertisement C via multiple media. Furthermore, even if an analysis is performed using the created pseudo sample, it is expected that results will be obtained that are consistent with the results of an analysis using actual data.

なお、期待値から広告接触回数を発生させるのに用いる確率分布は切断ポアソン分布に限られない。例えば、二項分布、負の二項分布、幾何分布、ベータ二項分布などを用いることもできる。また、実施の形態１と同様に、テレビＣＭの接触回数についても、テレビＣＭの接触回数の順位に基づいて、改めて設定するようにしてもよい。 The probability distribution used to generate the number of ad exposures from the expected value is not limited to the truncated Poisson distribution. For example, binomial distribution, negative binomial distribution, geometric distribution, beta binomial distribution, etc. can also be used. As in the first embodiment, the number of exposures to television commercials may also be reset based on the ranking of the number of exposures to television commercials.

なお、本発明は、上述した実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。このため、上記実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述した各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更し、または並列に実行することができる。また、各処理ステップ間に他のステップを追加してもよい。また、１ステップとして記載されているステップを、複数ステップに分けて実行してもよいし、複数ステップに分けて記載されているものを、１ステップとして把握することもできる。 The present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the spirit of the present invention. For this reason, the above-described embodiment is merely illustrative in every respect, and should not be interpreted in a restrictive manner. For example, the above-described processing steps can be arbitrarily changed in order or executed in parallel as long as no inconsistency occurs in the processing content. Other steps may be added between each processing step. A step described as one step may be divided into multiple steps and executed, and something described as being divided into multiple steps can be understood as one step.

１…データ処理装置
１１…プロセッサ
１２…メインメモリ
１３…入出力インタフェース
１４…通信インタフェース
１５…記憶装置
１０１…実データ取得部
１０２…疑似データ生成部
１０３…接触頻度割り当て部
１０４…集計部 1... data processing device 11... processor 12... main memory 13... input/output interface 14... communication interface 15... storage device 101... actual data acquisition unit 102... pseudo data generation unit 103... contact frequency allocation unit 104... counting unit

Claims

an actual data acquisition unit that acquires single source data for a plurality of users, the single user including a first value indicating a usage status of a first medium and a second value indicating a usage status of a second medium;
a pseudo data generating unit configured to generate a pseudo sample of the single-source data such that a correlation coefficient between the first value and the second value is the same as that of the single-source data for the plurality of users;
a contact frequency allocation unit that calculates a first contact frequency of the target content via the first medium for each of the generated pseudo samples,
The contact frequency allocation unit
A data processing device that uses data indicating a contact state with the target content in the first medium and calculates the first contact frequency based on the first value in each pseudo sample.

the data indicating the contact status with the target content is distribution data of contact frequency,
The contact frequency allocation unit
The data processing apparatus according to claim 1 , further comprising: ranking each pseudo sample according to a length of time spent using the first medium; and allocating the first exposure frequency based on distribution data of exposure frequency to the target content.

The data indicating the contact status with the target content is data indicating a ratio of contact presence/absence,
The contact frequency allocation unit
2. The data processing device of claim 1, further comprising: ranking each pseudo sample according to the length of time the pseudo sample has spent using the first medium; assigning each pseudo sample a status of contact with the target content based on data indicating the ratio of contact with the target content to a status of contact; and assigning the first contact frequency to each pseudo sample that has been assigned a status of contact with the target content based on the length of time the pseudo sample has spent using the first medium.

The contact frequency allocation unit
The data processing device according to claim 3 , wherein for a pseudo sample that is assigned a contact with the target content, a random number that follows a probability distribution having an expected value proportional to a length of time of using the first medium is assigned as the first contact frequency.

the single-source data includes a second frequency of exposure to the target content via the second medium;
The contact frequency allocation unit
The data processing device according to claim 1 or 3, wherein each pseudo sample is ranked according to the second contact frequency, and the second contact frequency is reallocated based on data indicating a situation regarding the target content in the second medium.

a processor obtaining single source data for a plurality of users, the single user including a first value indicative of a usage of a first medium and a second value indicative of a usage of a second medium;
a processor generating a pseudo-sample of the single-source data such that a correlation coefficient between the first values and the second values is invariant to single-source data for the plurality of users;
and calculating, by the processor, a first frequency of exposure to target content via the first medium for each of the generated pseudo samples;
In the step of calculating the first contact frequency,
A data processing method comprising: utilizing data indicating an exposure state to the target content in the first medium; and calculating the first exposure frequency based on the first value in each pseudo sample.

Computer,
an actual data acquisition unit that acquires single source data for a plurality of users, the single user including a first value indicating a usage status of a first medium and a second value indicating a usage status of a second medium;
a pseudo data generating unit configured to generate a pseudo sample of the single-source data such that a correlation coefficient between the first value and the second value is the same as that of the single-source data for the plurality of users;
a contact frequency allocation unit that calculates a first contact frequency of the target content via the first medium for each of the generated pseudo samples;
The contact frequency allocation unit
a program for calculating the first exposure frequency based on the first value for each pseudo sample by using data indicating an exposure state to the target content in the first medium;