JP2020149451A

JP2020149451A - Data processor and data processing method

Info

Publication number: JP2020149451A
Application number: JP2019047268A
Authority: JP
Inventors: 伊佐片柳; Isa Katayanagi; 達也河原; Tatsuya Kawahara
Original assignee: Video Research Co Ltd
Current assignee: Video Research Co Ltd
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2020-09-17
Anticipated expiration: 2039-03-14
Also published as: JP6637628B1

Abstract

To use data having no representative property as data having representative property.SOLUTION: A data processor comprises: a first storage part for storing pieces of first data which are collected for first object people being selected randomly, as objects, according to the number of the first object people; a second storage part for storing pieces of second data which are collected for second object people whose number is larger than that of the first object people and which satisfy a predetermined collection condition, according to the number of the second object people; and a data extraction part for extracting second data which is used as data for adding, from pieces of the second data stored in the second storage part. Both of the pieces of first and second data comprise data indicating a content of a common item which is common to both pieces of data. The data extraction part calculates a similarity of the content of the common item between each of pieces of first data and each of pieces of second data, and extracts second data of the second object people identified on the basis of the calculated similarity as data for adding.SELECTED DRAWING: Figure 4

Description

本発明は、データ処理装置及びデータ処理方法に係り、特に、代表性がないデータを処理するデータ処理装置及びデータ処理方法に関する。 The present invention relates to a data processing apparatus and a data processing method, and more particularly to a data processing apparatus and a data processing method for processing non-representative data.

諸処の収集データを集計する場合には、そのデータの代表性が重要となる。ここで、「代表性」とは、調査対象者全体の中から抽出された一部の対象者の調査結果が、調査対象者全体の結果を偏りなく正確に反映できているか否かを意味し、正確に反映できている場合には「代表性がある」という。 When aggregating the collected data from various places, the representativeness of the data is important. Here, "representativeness" means whether or not the survey results of some of the survey subjects extracted from all the survey subjects can accurately reflect the results of the entire survey subjects without bias. , If it can be reflected accurately, it is said to be "representative".

また、従来から、収集データの代表性を担保するための技術が開発されてきており、その一例としては、特許文献１に記載の技術が挙げられる。特許文献１には、調査対象全体から代表性がある調査対象者を低コスト且つ効率的に選出することが可能な調査支援装置が開示されている。 Further, conventionally, a technique for ensuring the representativeness of collected data has been developed, and an example thereof is the technique described in Patent Document 1. Patent Document 1 discloses a search support device capable of efficiently selecting a representative search subject from the entire search target at low cost.

特開２０１５−１８５００８号公報JP-A-2015-185008

他方、近年の通信技術の発展により、所謂ビッグデータに代表されるように、大規模なデータ（例えば、人の特定の行動履歴等を示すログデータ）の収集が可能となってきているが、データ収集の対象者についてランダム性が担保されていないために、収集データの代表性を欠いてしまう虞がある。 On the other hand, with the development of communication technology in recent years, it has become possible to collect large-scale data (for example, log data showing a specific behavior history of a person) as represented by so-called big data. Since the randomness of the data collection target is not guaranteed, there is a risk that the collected data will lack representativeness.

具体的に説明すると、ログデータの収集条件を満たす者を対象者としてデータ収集を行うとすると、そのログデータについての代表性が担保されていない。代表性がないログデータをそのまま集計してしまうと、集計結果に偏り（バイアス）が生じ得る。そのため、代表性がないデータについては、上記のバイアスを解消するための措置（補正）を講じる必要がある。しかし、代表性がないデータだけでは、バイアスの要因が特定し難い場合があり、そのような場合には、代表性がないデータ単独での補正が困難となる。 Specifically, if data is collected by targeting a person who satisfies the conditions for collecting log data, the representativeness of the log data is not guaranteed. If log data that is not representative is aggregated as it is, the aggregation result may be biased. Therefore, it is necessary to take measures (correction) to eliminate the above bias for data that is not representative. However, it may be difficult to identify the cause of the bias only from the non-representative data, and in such a case, it is difficult to correct the non-representative data alone.

そこで、本発明は、上記の事情に鑑みてなされたものであり、以下に示す目的を解決することを課題とする。
具体的には、本発明は、上記従来技術の問題点を解決し、代表性がないデータを代表性があるデータとして利用するためのデータ処理装置及びデータ処理方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above circumstances, and an object of the present invention is to solve the following object.
Specifically, an object of the present invention is to solve the above-mentioned problems of the prior art and to provide a data processing apparatus and a data processing method for using non-representative data as representative data. ..

上記の目的を達成するために、本発明のデータ処理装置は、ランダムに選出された第一対象者を対象として収集した第一データを、前記第一対象者の人数に応じた分、記憶している第一記憶部と、予め定められた収集条件を満たしており前記第一対象者よりも多い第二対象者を対象として取得した第二データを、前記第二対象者の人数に応じた分、記憶している第二記憶部と、前記第二記憶部に記憶された前記第二データの中から、集計用データとして用いる前記第二データを抽出するデータ抽出部と、を有し、前記第一データ及び前記第二データの双方には、当該双方に共通する共通項目の内容を示すデータが含まれており、前記データ抽出部は、前記第一データの各々と前記第二データの各々との間で前記共通項目の内容の類似度合いを算出し、算出した前記類似度合いに基づいて特定された前記第二対象者の前記第二データを、前記集計用データとして抽出することを特徴とする。 In order to achieve the above object, the data processing apparatus of the present invention stores the first data collected for the first target person randomly selected, in an amount corresponding to the number of the first target person. The first storage unit and the second data acquired for the second target person who meets the predetermined collection conditions and is larger than the first target person are obtained according to the number of the second target person. It has a second storage unit that stores minutes and a data extraction unit that extracts the second data to be used as aggregation data from the second data stored in the second storage unit. Both the first data and the second data include data indicating the contents of common items common to both, and the data extraction unit includes each of the first data and the second data. It is characterized in that the degree of similarity of the contents of the common items is calculated with each of them, and the second data of the second target person specified based on the calculated degree of similarity is extracted as the aggregation data. And.

上記のように構成された本発明のデータ処理装置は、代表性がない第二データのうち、代表性がある第一データに基づいて特定されたデータを集計用データとして抽出する。
より具体的に説明すると、第一データの各々と第二データの各々との間で算出した共通項目の内容の類似度合いに基づいて特定された第二対象者の第二データを、集計用データとして抽出する。このような手順により、代表性がない第二データの中から、代表性がある第一データと類似するデータ（詳しくは、共通項目の内容が似ているデータ）を抽出することができる。そして、抽出された第二データは、代表性があるデータとみなして取り扱うことが可能となる。
以上の結果、本来は代表性がないデータであっても、あたかも代表性があるデータとして取り扱うことが可能となる。 The data processing apparatus of the present invention configured as described above extracts the data specified based on the representative first data from the non-representative second data as the aggregation data.
More specifically, the second data of the second target person specified based on the degree of similarity of the contents of the common items calculated between each of the first data and each of the second data is the data for aggregation. Extract as. By such a procedure, data similar to the representative first data (specifically, data having similar contents of common items) can be extracted from the non-representative second data. Then, the extracted second data can be treated as representative data.
As a result of the above, even data that is not originally representative can be treated as if it is representative data.

また、上記のデータ処理装置において、前記第二データは、前記収集条件を満たす前記第二対象者が特定行動を行った場合に収集されるデータであってもよい。
上記の構成において、第二データは、第二対象者が特定行動を行うと、これをトリガーとして第二データが収集される。このようにして第二データが収集される場合、第二データについての代表性が欠落し易くなるため、本発明の「代表性がないデータを、代表性があるデータとみなして取り扱うことができるようにする」という効果がより際立って発揮されるようになる。 Further, in the above data processing device, the second data may be data collected when the second target person who satisfies the collection condition performs a specific action.
In the above configuration, when the second target person performs a specific action, the second data is collected by using this as a trigger. When the second data is collected in this way, the representativeness of the second data is likely to be lost. Therefore, the "non-representative data" of the present invention can be treated as representative data. The effect of "to do so" will be more prominent.

また、上記のデータ処理装置において、前記第二対象者は、前記収集条件として、前記第二対象者が放送メディアに接触するために用いる機器がインターネットに接続されているという条件を満たす者であってもよい。
上記の構成であれば、放送メディアに接触するために用いる機器がインターネットに接続された対象者（第二対象者）から第二データが収集される。この場合、第二データは、代表性がないデータとなる場合があり、本発明の効果が有効に発揮されることになる。 Further, in the above data processing device, the second target person is a person who satisfies the condition that the device used by the second target person for contacting the broadcasting media is connected to the Internet as the collection condition. You may.
With the above configuration, the second data is collected from the target person (second target person) whose device used for contacting the broadcasting media is connected to the Internet. In this case, the second data may be non-representative data, and the effect of the present invention will be effectively exhibited.

また、上記のデータ処理装置において、前記第二データは、前記第二対象者が前記機器を用いて前記放送メディアに接触した場合に前記機器が発信するログデータであってもよい。
上記の構成であれば、第二データが、放送メディアへの接触時に機器が発信するログデータであるため、比較的簡易に収集することができる反面、代表性がないデータとなり得る。また、一般的に、上記のログデータは、詳細な属性情報が付与されていないため、ログデータ単独での補正（バイアス解消策の実施）が困難である。したがって、上記のログデータを第二データとして収集する場合には、本発明の効果がより際立って発揮されることになる。 Further, in the above data processing device, the second data may be log data transmitted by the device when the second target person comes into contact with the broadcast media using the device.
With the above configuration, since the second data is log data transmitted by the device when it comes into contact with the broadcast media, it can be collected relatively easily, but it can be non-representative data. Further, in general, since detailed attribute information is not added to the above log data, it is difficult to correct the log data alone (implementation of bias elimination measures). Therefore, when the above log data is collected as the second data, the effect of the present invention will be more prominently exhibited.

上記のデータ処理装置において、前記共通項目は、前記放送メディアへの接触状況であってもよい。また、前記放送メディアへの接触状況は、テレビの視聴状況であってもよい。
上記の構成であれば、放送メディアへの接触状況（例えば、テレビの視聴状況）の類似度合いに基づき、代表性がない第二データの中から、代表性があるデータとなるように集計用データを抽出することが可能となる。 In the above data processing apparatus, the common item may be the contact status with the broadcast media. Further, the contact status with the broadcast media may be a television viewing status.
With the above configuration, data for aggregation is obtained so that the second data, which has no representativeness, becomes representative data based on the degree of similarity of the contact status with the broadcasting media (for example, the viewing status of television). Can be extracted.

上記のデータ処理装置において、前記データ抽出部は、前記第一対象者と前記第二対象者との組み合わせを変えて前記組み合わせ別に前記類似度合いを算出し、それぞれの前記第一対象者について、前記類似度合いが最大となる前記組み合わせから順に該組み合わせに属する前記第二対象者を特定し、特定された前記第二対象者の人数が設定人数に達したときに、特定された前記設定人数分の前記第二対象者の前記第二データを前記集計用データとして抽出すると好適である。
上記の構成であれば、それぞれの第一対象者について共通項目の内容が最も類似する第二対象者から順に特定していく。そして、特定された第二対象者の人数が設定人数に達したときに、それまでに特定された第二対象者の第二データを集計用データとして抽出する。これにより、第二データから集計用データを抽出する際に、代表性を確保する上でより妥当なデータを抽出することが可能となる。 In the above data processing apparatus, the data extraction unit changes the combination of the first target person and the second target person, calculates the degree of similarity for each combination, and for each of the first target persons, the said The second target person belonging to the combination is specified in order from the combination having the maximum degree of similarity, and when the number of the specified second target persons reaches the set number of people, the number of the specified set number of people is reached. It is preferable to extract the second data of the second subject as the aggregation data.
With the above configuration, each first target person is specified in order from the second target person who has the most similar contents of common items. Then, when the number of the specified second target person reaches the set number of people, the second data of the second target person specified so far is extracted as aggregation data. This makes it possible to extract more appropriate data for ensuring representativeness when extracting aggregation data from the second data.

また、上記のデータ処理装置において、前記データ抽出部が前記設定人数分の前記第二対象者を特定した際に、ある第二対象者が複数回重複して特定された場合、前記データ抽出部は、前記ある第二対象者の前記第二データを、前記ある第二対象者が特定された回数と同数の前記集計用データとして集計すると好適である。
上記の構成において、設定人数分の第二対象者を特定した際に、ある第二対象者が複数回重複して特定される場合があり得る。この場合、ある第二対象者の第二データを、その者が特定された回数と同数の集計用データとして抽出すれば、その後に集計用データを用いた集計作業を行う際に、特定回数を集計ウェイトとして利用することができ、より適切な集計が可能となる。 Further, in the above data processing device, when the data extraction unit identifies the second target person for the set number of people, if a certain second target person is specified more than once, the data extraction unit Is preferable to aggregate the second data of the second subject as the same number of aggregation data as the number of times the second subject is specified.
In the above configuration, when the second target person for the set number of people is specified, a certain second target person may be specified more than once. In this case, if the second data of a certain second target person is extracted as the same number of aggregation data as the number of times the person is specified, the specific number of times is calculated when the aggregation work using the aggregation data is subsequently performed. It can be used as an aggregation weight, and more appropriate aggregation becomes possible.

また、前述した課題を解決するために、本発明のデータ処理方法は、第一記憶部が、ランダムに選出された第一対象者を対象として収集した第一データを、前記第一対象者の人数に応じた分、記憶しており、第二記憶部が、予め定められた収集条件を満たしており前記第一対象者よりも多い第二対象者を対象として収集した第二データを、前記第二対象者の人数に応じた分、記憶しており、コンピュータが、前記第二記憶部に記憶された前記第二データの中から、集計用データとして用いる前記第二データを抽出し、前記第一データ及び前記第二データの双方には、当該双方に共通する共通項目の内容を示すデータが含まれており、前記コンピュータは、前記第一データの各々と前記第二データの各々との間で前記共通項目の内容の類似度合いを算出し、算出した前記類似度合いに基づいて特定された前記第二対象者の前記第二データを、前記集計用データとして抽出することを特徴とする。
上記のデータ処理方法によれば、本来、代表性がない第一データを、代表性がある集計用データとして抽出することができる。 Further, in order to solve the above-mentioned problems, in the data processing method of the present invention, the first storage unit collects the first data for the first target person randomly selected, and the first data is collected by the first target person. The second data collected by the second storage unit for the second target person, which meets the predetermined collection conditions and is larger than the first target person, is stored as much as the number of people. The amount corresponding to the number of the second target person is stored, and the computer extracts the second data to be used as the aggregation data from the second data stored in the second storage unit, and the said Both the first data and the second data include data indicating the contents of common items common to both, and the computer uses each of the first data and each of the second data. It is characterized in that the degree of similarity of the contents of the common items is calculated among the two, and the second data of the second target person specified based on the calculated degree of similarity is extracted as the aggregation data.
According to the above data processing method, the first data which is originally non-representative can be extracted as representative data for aggregation.

本発明によれば、代表性がないデータを代表性があるデータとして利用するためのデータ処理装置及びデータ処理方法が実現される。 According to the present invention, a data processing apparatus and a data processing method for using non-representative data as representative data are realized.

各収集データの説明図である。It is explanatory drawing of each collected data. 本発明の一実施形態に係るデータ処理装置の構成を示す図である。It is a figure which shows the structure of the data processing apparatus which concerns on one Embodiment of this invention. 第二データの抽出要領を示すイメージ図である。It is an image diagram which shows the extraction procedure of the 2nd data. 本発明の一実施形態に係るデータ処理方法の流れを示す図である。It is a figure which shows the flow of the data processing method which concerns on one Embodiment of this invention. 各第一対象者の第一データと各第二対象者の第二データとの対応関係を示すテーブルである。It is a table which shows the correspondence relation between the 1st data of each 1st subject and the 2nd data of each 2nd subject. 第一対象者と類似する第二対象者を特定する手順についての説明図である。It is explanatory drawing of the procedure for identifying the 2nd subject similar to the 1st subject. 重複して特定された第二対象者の第二データについての、抽出回数を示す図である。It is a figure which shows the number of times of extraction about the 2nd data of the 2nd subject specified by duplication. 各第二対象者の第二データを、第一対象者の第一データとの類似度合いに応じてクラスタリングしたときの図である。It is a figure when the second data of each second subject is clustered according to the degree of similarity with the first data of the first subject.

本発明の一実施形態（本実施形態）に係るデータ処理装置及びデータ処理方法について、添付の図面を参照しながら、以下に詳細に説明する。
なお、以下に説明する実施形態は、あくまでも、本発明の理解を容易にするために挙げた一例にすぎず、本発明を限定するものではない。すなわち、本発明は、その趣旨を逸脱しない限りにおいて、以下に説明する実施形態から変更又は改良され得る。また、当然ながら、本発明には、その等価物が含まれる。 A data processing apparatus and a data processing method according to an embodiment of the present invention (the present embodiment) will be described in detail below with reference to the accompanying drawings.
It should be noted that the embodiments described below are merely examples for facilitating the understanding of the present invention, and do not limit the present invention. That is, the present invention may be modified or improved from the embodiments described below without departing from the spirit of the present invention. Moreover, as a matter of course, the present invention includes an equivalent thereof.

また、本明細書において、「装置」とは、単独で特定の機能を発揮する一つの装置の他、分散して存在しているものの特定の機能を発揮するために協働する複数の装置をも含むものである。 Further, in the present specification, the term "device" refers to one device that independently exerts a specific function, or a plurality of devices that exist in a dispersed manner but cooperate to exert a specific function. Also includes.

また、以下の説明において、「人」又は「者」は、個人のみならず、個人が属するグループ（例えば、世帯）を含む概念である。 Further, in the following description, "person" or "person" is a concept including not only an individual but also a group (for example, a household) to which the individual belongs.

また、以下の説明において、「放送メディア」は、電波放送又はデータ放送にて番組及び広告を配信する情報伝達媒体（マスメディア）であり、具体的には、テレビ（インターネットテレビを含む）及びラジオ（ＩＰサイマルラジオを含む）等が該当する。
なお、以下では、放送メディアの一例として、テレビを挙げて説明する。ただし、以下に説明する内容は、当然ながら、テレビ以外の放送メディアにも適用され得る。 Further, in the following description, "broadcast media" is an information transmission medium (mass media) that distributes programs and advertisements by radio wave broadcasting or data broadcasting, and specifically, television (including Internet television) and radio. (Including IP simul radio) etc. are applicable.
In the following, television will be described as an example of broadcasting media. However, the contents described below can, of course, be applied to broadcasting media other than television.

また、以下の説明において、「放送メディアに接触するために用いられる機器」は、放送メディアから配信される映像信号及び音声信号を受信する機器であり、具体的には、テレビ受信機、ラジオ受信機、及びインターネット経由でテレビ又はラジオを利用する場合にはインターネットに接続された端末機器（パソコン、タブレット端末、スマートフォン及び携帯電話）等が該当する。
なお、以下では、放送メディアに接触するために用いられる機器の一例として、テレビ受信機（受像機）を例に挙げて説明する。ただし、以下に説明する内容は、当然ながら、放送メディアに接触するために用いられるテレビ受信機以外の機器にも適用され得る。 Further, in the following description, the "device used for contacting the broadcasting media" is a device that receives video signals and audio signals distributed from the broadcasting media, and specifically, a television receiver and a radio receiver. When using a TV or radio via the machine and the Internet, terminal devices (computers, tablet terminals, smartphones and mobile phones) connected to the Internet are applicable.
In the following, a television receiver (receiver) will be described as an example of a device used for contacting broadcast media. However, the contents described below may, of course, be applied to devices other than television receivers used for contacting broadcast media.

また、以下の説明において、「視聴」は、放送される番組及び広告をリアルタイムで視聴することの他に、番組及び広告を録画又は録音等して放送後の一定期間内に再生して視聴したりＷｅｂ配信されるものを視聴したりする、いわゆるタイムシフト視聴を含む。
また、以下の説明において、「属性」は、人の性別及び年齢等のデモグラフィック、人の興味関心及びライフスタイル等のサイコグラフィック、並びに、行動傾向及び行動履歴等に応じて設定される分類である。 Further, in the following description, "viewing" means not only viewing the broadcasted program and advertisement in real time, but also recording or recording the program and advertisement and reproducing and viewing the program and advertisement within a certain period after the broadcast. This includes so-called time-shifted viewing, in which users watch what is distributed on the Web.
Further, in the following explanation, "attribute" is a demographic such as a person's gender and age, a psychographic such as a person's interests and lifestyle, and a classification set according to a behavioral tendency and a behavioral history. is there.

＜＜第一データ及び第二データについて＞＞
本実施形態のデータ処理装置及びデータ処理方法の説明に先立ち、これらの処理対象となる第一データ及び第二データについて、図１を参照しながら説明することとする。図１は、各収集データの説明図であり、各収集データの配信経路を示している。 << About the first data and the second data >>
Prior to the description of the data processing apparatus and the data processing method of the present embodiment, the first data and the second data to be processed will be described with reference to FIG. FIG. 1 is an explanatory diagram of each collected data and shows a distribution route of each collected data.

（第一データ）
第一データは、第一対象者Ｔを対象として調査（厳密には、標本調査）を行うことで収集される調査用のデータである。ここで、第一対象者Ｔは、設定された母集団から統計学的手法によりランダム（無作為）に選出された者である。また、本実施形態では、母集団から第一対象者Ｔをランダムに選出するにあたり、調査地点・地域を設定し、当該調査地点・地域に居住する者の中から第一対象者Ｔを選出する。ただし、第一対象者Ｔの選出方法については、上記の選出方法に限られず、ランダムに選出する方法である限り、自由に採用することが可能である。 (First data)
The first data is data for a survey collected by conducting a survey (strictly speaking, a sample survey) on the first subject T. Here, the first subject T is a person randomly (randomly) selected from the set population by a statistical method. Further, in the present embodiment, when randomly selecting the first target person T from the population, a survey point / area is set, and the first target person T is selected from those who live in the survey point / area. .. However, the selection method of the first target person T is not limited to the above selection method, and any method can be freely adopted as long as it is a random selection method.

また、第一対象者Ｔの選出数については、調査目的に応じて適切な人数に設定されるのが好ましいが、本実施形態では例えば数百人〜数万人の規模に設定されていることとする。ちなみに、図１では、図示の都合上、第一対象者Ｔの人数が実際の人数よりも少なくなっている。 Further, the number of selected first target persons T is preferably set to an appropriate number according to the purpose of the survey, but in the present embodiment, for example, it is set to a scale of several hundred to tens of thousands. And. By the way, in FIG. 1, the number of the first subject T is smaller than the actual number for the convenience of illustration.

第一対象者Ｔを対象とする調査に関して説明すると、本実施形態では、放送メディアへの接触状況、具体的にはテレビの視聴状況について第一対象者Ｔを調査する。より具体的に説明すると、第一対象者Ｔの自宅には、テレビの視聴時間及び視聴チャンネル等を測定する公知の測定機器（不図示）が設置されている。この測定機器により、第一対象者Ｔのテレビの視聴状況が調査期間中、毎日調査される。 Explaining the survey targeting the first target person T, in the present embodiment, the first target person T is surveyed regarding the contact situation with the broadcasting media, specifically, the television viewing situation. More specifically, at the home of the first subject T, a known measuring device (not shown) for measuring the viewing time and viewing channel of television is installed. With this measuring device, the TV viewing situation of the first subject T is investigated every day during the investigation period.

上記の測定機器は、定期的（例えば、１分毎）な測定結果を示すデータ（以下、測定データとも言う）を生成し、第一収集センターＣ１に向けて送信する。第一収集センターＣ１は、専用の通信回線を通じて各第一対象者Ｔの測定機器から測定データを受信する。これにより、第一収集センターＣ１は、テレビの視聴状況を示す測定データを、第一対象者Ｔの人数に応じた分、取得する。また、第一収集センターＣ１は、各第一対象者Ｔから取得した測定データをデータベース化して記憶して蓄積する。 The above-mentioned measuring device generates data (hereinafter, also referred to as measurement data) indicating a measurement result on a regular basis (for example, every minute) and transmits the data to the first collection center C1. The first collection center C1 receives measurement data from the measuring device of each first subject T through a dedicated communication line. As a result, the first collection center C1 acquires measurement data indicating the viewing status of television in an amount corresponding to the number of first target persons T. In addition, the first collection center C1 stores and stores the measurement data acquired from each first subject T in a database.

ここで、第一収集センターＣ１が各第一対象者Ｔから取得する測定データは、第一データに該当し、各第一対象者Ｔのテレビ視聴状況に関する調査結果を示す。より具体的に説明すると、各第一対象者Ｔから取得する測定データには、各第一対象者Ｔの識別情報、各第一対象者Ｔが視聴したテレビ番組又はテレビＣＭを放送するテレビ局（視聴チャンネル）、視聴年月日及び視聴時刻等を示すデータが含まれている。 Here, the measurement data acquired by the first collection center C1 from each first subject T corresponds to the first data, and shows the survey result regarding the television viewing situation of each first subject T. More specifically, the measurement data acquired from each first target person T includes the identification information of each first target person T, and a TV station that broadcasts a TV program or TV commercial viewed by each first target person T ( It contains data indicating the viewing channel), viewing date, viewing time, and the like.

なお、第一データの発信、すなわち上記測定機器からの測定データの送信については、分単位で行われてもよく、あるいは１時間分又は１日分のデータをまとめて送信してもよい。また、本実施形態では、テレビの視聴状況を示す第一データとして、上記の測定機器から送られてくる測定データを、通信回線を通じて取得することとしたが、これに限定されるものではない。例えば、各第一対象者Ｔが所定の記入用紙にテレビの視聴状況（具体的には、それぞれの時間帯におけるテレビ局毎の視聴時間等）を記入し、第一収集センターＣ１が記入済みの用紙を各第一対象者Ｔから回収し、回収した用紙の記入内容を第一収集センターＣ１側で入力することで、第一対象者Ｔのテレビ視聴状況を示すデータ（第一データ）を第一対象者Ｔ毎に取得してもよい。 The transmission of the first data, that is, the transmission of the measurement data from the measuring device may be performed in minutes, or the data for one hour or one day may be transmitted together. Further, in the present embodiment, as the first data indicating the viewing status of the television, the measurement data sent from the above-mentioned measuring device is acquired through the communication line, but the present invention is not limited to this. For example, each first target person T fills in the TV viewing status (specifically, the viewing time for each TV station in each time zone, etc.) on a predetermined entry form, and the first collection center C1 has completed the form. Is collected from each first target person T, and by inputting the contents of the collected form on the first collection center C1 side, the data (first data) indicating the TV viewing status of the first target person T is first. It may be acquired for each target person T.

ところで、第一対象者Ｔの人数に応じた分の測定データ（第一データ）は、上述したように、ランダムに選出された第一対象者Ｔを対象として取得されるデータであるため、代表性があるデータと言える。すなわち、第一データが示す各第一対象者Ｔのテレビ視聴状況は、母集団（調査地点・地域に居住する者全体）のテレビ視聴状況を偏り（バイアス）なく正確に反映していることになる。 By the way, since the measurement data (first data) corresponding to the number of the first target person T is the data acquired for the randomly selected first target person T as described above, it is representative. It can be said that the data has sex. That is, the TV viewing status of each first target person T shown in the first data accurately reflects the TV viewing status of the population (all the people living in the survey point / area) without bias. Become.

（第二データ）
第二データは、第二対象者Ｕを対象として収集されるデータである。ここで、第二対象者Ｕは、予め設定された選出条件を満たす制限付きの対象者である。具体的に説明すると、本実施形態の第二対象者Ｕは、上記の収集条件として、その者が利用するテレビ視聴用の機器（すなわち、テレビ受信機）がインターネットに接続されているという条件を満たす者である。より厳密に説明すると、第二対象者Ｕは、テレビの視聴履歴を示すログデータ（以下、デバイスログデータとも言う）をテレビ受信機からインターネット経由で提供することを承諾した者である。 (Second data)
The second data is data collected for the second target person U. Here, the second target person U is a restricted target person who satisfies a preset selection condition. Specifically, the second target person U of the present embodiment has the condition that the TV viewing device (that is, the TV receiver) used by the person is connected to the Internet as the above collection condition. A person who meets. More strictly speaking, the second target person U is a person who has consented to provide log data (hereinafter, also referred to as device log data) indicating a television viewing history from a television receiver via the Internet.

なお、第二対象者Ｕの収集条件については、上記の内容に限定されるものではなく、上記の内容以外の条件であってもよく、例えば、インターネットを日常的に利用しているという条件であってもよい。 The collection conditions of the second target person U are not limited to the above contents, and may be conditions other than the above contents, for example, on the condition that the Internet is used on a daily basis. There may be.

また、第二対象者Ｕの人数は、第一対象者Ｔの選出数よりも多くなっており、本実施形態では例えば数十万人〜数百万人の規模であることとする。ちなみに、図１では、図示の都合上、第二対象者Ｕの人数が実際の人数よりも少なくなっている。 Further, the number of the second target person U is larger than the number of the first target person T selected, and in the present embodiment, for example, it is assumed that the number of the second target person U is several hundred thousand to several million. By the way, in FIG. 1, the number of the second subject U is smaller than the actual number for the convenience of illustration.

第二対象者Ｕを対象とするデータ収集について説明すると、本実施形態では、第一対象者Ｔと同様、放送メディアへの接触状況、具体的にはテレビの視聴状況に関するデータ収集が行われる。より具体的に説明すると、各第二対象者Ｕの自宅には、デバイスログを送信することが可能なテレビ受信機（不図示）が設置されており、且つ、当該テレビ受信機はインターネットに接続（結線）されている。 Explaining the data collection for the second target person U, in the present embodiment, as with the first target person T, data collection regarding the contact status with the broadcasting media, specifically, the viewing status of television is performed. More specifically, at the home of each second subject U, a television receiver (not shown) capable of transmitting device logs is installed, and the television receiver is connected to the Internet. (Connected).

そして、各第二対象者Ｕがテレビ受信機にてテレビを視聴している期間中には、テレビ受信機がデバイスログデータを定期的に（例えば、１分〜数分の間隔で）生成し、テレビ受信機内の記憶装置に蓄積する。また、テレビ受信機は、それまで蓄積してきたデバイスログデータを、一定の周期（例えば、１時間〜１日単位の周期）で当該テレビ受信機の製造メーカＭへインターネット経由で送信する。製造メーカＭは、受信したデバイスログデータを第二収集センターＣ２に対して提供する。 Then, while each second target person U is watching TV on the TV receiver, the TV receiver periodically generates device log data (for example, at intervals of 1 minute to several minutes). , Stored in the storage device in the TV receiver. Further, the television receiver transmits the device log data accumulated up to that point to the manufacturer M of the television receiver via the Internet at a fixed cycle (for example, a cycle of 1 hour to 1 day). The manufacturer M provides the received device log data to the second collection center C2.

第二収集センターＣ２は、インターネット等の通信回線を通じて製造メーカＭから各第二対象者Ｕのデバイスログデータを受信する。これにより、第二収集センターＣ２は、第二対象者Ｕの人数に応じた分のデバイスログデータを取得する。また、第二収集センターＣ２は、各第二対象者Ｕのデバイスログデータをデータベース化して記憶して蓄積する。その後、第二収集センターＣ２では、蓄積されたデバイスログデータが集計されて所定の分析（例えば、視聴率算出等）に供じられる。 The second collection center C2 receives the device log data of each second target person U from the manufacturer M through a communication line such as the Internet. As a result, the second collection center C2 acquires the device log data corresponding to the number of the second target person U. Further, the second collection center C2 stores and stores the device log data of each second target person U in a database. After that, in the second collection center C2, the accumulated device log data is aggregated and used for a predetermined analysis (for example, audience rating calculation).

なお、図１に図示のケースでは、第一収集センターＣ１と第二収集センターＣ２とが別々に存在しているが、これに限定されず、第一収集センターＣ１と第二収集センターＣ２とが同一の機関であってもよい。また、図１に図示のケースでは、テレビ受信機の製造メーカＭが一つのみとなっているが、当然ながら、製造メーカＭが複数存在してもよく、その場合には第二収集センターＣ２がそれぞれの製造メーカＭからデバイスログデータを提供されることになる。 In the case shown in FIG. 1, the first collection center C1 and the second collection center C2 exist separately, but the present invention is not limited to this, and the first collection center C1 and the second collection center C2 It may be the same institution. Further, in the case shown in FIG. 1, there is only one manufacturer M of the television receiver, but of course, there may be a plurality of manufacturers M, and in that case, the second collection center C2. Will be provided with device log data from each manufacturer M.

また、デバイスログデータについて、各第二対象者Ｕのテレビ受信機から製造メーカＭに送信される周期、及び、製造メーカＭから第二収集センターＣ２に提供される周期については任意に設定することができ、例えば、分単位で設定してもよく、あるいは、１時間〜１日分のデータをまとめて送信するように設定してもよい。 Further, regarding the device log data, the cycle of transmission from the television receiver of each second target U to the manufacturer M and the cycle provided by the manufacturer M to the second collection center C2 can be arbitrarily set. For example, the data may be set in minutes, or the data for one hour to one day may be collectively transmitted.

また、本実施形態では、テレビの視聴ログを示すデバイスログデータ（第二データ）が、各第二対象者Ｕのテレビ受信機から製造メーカＭを経由して第二収集センターＣ２に送信されることとしたが、これに限定されるものではない。例えば、視聴ログを示すデータが各第二対象者Ｕのテレビ受信機からインターネット経由でテレビ局側に送られ、その後にテレビ局から第二収集センターＣ２に送信されてもよい。あるいは、第二収集センターＣ２が各第二対象者Ｕのテレビ受信機から直接、デバイスログデータを受信してもよい。 Further, in the present embodiment, device log data (second data) indicating a television viewing log is transmitted from the television receiver of each second target U to the second collection center C2 via the manufacturer M. However, it is not limited to this. For example, data indicating a viewing log may be sent from the television receiver of each second target person U to the television station side via the Internet, and then transmitted from the television station to the second collection center C2. Alternatively, the second collection center C2 may receive the device log data directly from the television receiver of each second subject U.

ここで、第二収集センターＣ２が取得する各第二対象者Ｕのデバイスログデータは、第二データに該当し、各第二対象者Ｕのテレビ視聴状況を示す。より具体的に説明すると、デバイスログデータは、第二対象者Ｕがテレビ受信機を用いてテレビを視聴した場合に当該テレビ受信機が発信するログデータであり、詳しくは、各第二対象者Ｕの識別情報、各第二対象者Ｕが視聴したテレビ番組又はテレビＣＭを放送するテレビ局（視聴チャンネル）、視聴年月日及び視聴時刻等を示すデータを含んでいる。なお、第二対象者Ｕの識別情報とは、デバイスログデータに組み込まれるＩＤ情報（機器ＩＤ）等が該当する。 Here, the device log data of each second target person U acquired by the second collection center C2 corresponds to the second data, and indicates the television viewing status of each second target person U. More specifically, the device log data is log data transmitted by the TV receiver when the second target U watches TV using the TV receiver. Specifically, each second target person It includes U identification information, a TV station (viewing channel) that broadcasts a TV program or TV CM watched by each second target person U, viewing date, viewing time, and the like. The identification information of the second target person U corresponds to ID information (device ID) or the like incorporated in the device log data.

ちなみに、本実施形態では、各第二対象者Ｕのテレビ視聴状況を示す第二データとして、各第二対象者Ｕのテレビ受信機から発信されるデバイスログデータを、インターネット等の通信回線を通じて取得することとしたが、これに限定されるものではない。例えば、各第二対象者Ｕが所定の記入用紙にテレビの視聴状況（具体的には、それぞれの時間帯におけるテレビ局毎の視聴時間等）を記入し、第二収集センターＣ２が記入済みの用紙を各第二対象者Ｕから回収し、回収した用紙の記入内容を第二収集センターＣ２側で入力することで、上記のデータ（第二データ）を第二対象者Ｕ毎に取得してもよい。 By the way, in the present embodiment, as the second data indicating the TV viewing status of each second target U, the device log data transmitted from the TV receiver of each second target U is acquired through a communication line such as the Internet. I decided to do it, but it is not limited to this. For example, each second target person U fills in the TV viewing status (specifically, the viewing time for each TV station in each time zone, etc.) on a predetermined entry form, and the second collection center C2 has completed the form. Is collected from each second target person U, and the above data (second data) can be obtained for each second target person U by inputting the contents of the collected form on the second collection center C2 side. Good.

以上のように、第一データである測定データと、第二データであるデバイスログデータとは、いずれも、テレビの視聴状況（具体的には、視聴時間及び視聴チャンネル）を示すデータを含んでいる。換言すると、第一データ及び第二データの双方には、当該双方に共通する共通項目の内容を示すデータが含まれており、本実施形態では、共通項目の内容がテレビの視聴状況となっている。 As described above, both the measurement data, which is the first data, and the device log data, which is the second data, include data indicating the TV viewing status (specifically, the viewing time and the viewing channel). There is. In other words, both the first data and the second data include data indicating the contents of common items common to both, and in the present embodiment, the contents of the common items are the television viewing status. There is.

第二データ（具体的には、デバイスログデータ）について付言すると、第二データは、収集条件を満たす第二対象者が特定行動（具体的には、テレビ視聴行動）を行った場合に収集されるデータであるため、代表性を欠く場合がある。この場合、代表性がない第二データを集計して分析処理（例えば、視聴率の算出処理等）を実施した場合、分析結果に偏り（バイアス）が生じる場合がある。 To add to the second data (specifically, device log data), the second data is collected when the second target person who meets the collection conditions performs a specific action (specifically, TV viewing behavior). Data may lack representativeness. In this case, when the second data having no representativeness is aggregated and the analysis process (for example, the audience rating calculation process) is performed, the analysis result may be biased.

具体的な一例を挙げて説明すると、第二対象者Ｕとして選出された者が特有の属性（例えば、インターネットの利用頻度が高いという属性）に該当する傾向にある場合が想定される。その場合には、上記の属性に起因してバイアスが生じるために、各第二対象者Ｕのデバイスログデータを集計してテレビ視聴率を算出したときに、その算出結果が、デバイスログデータ以外のデータから算出した視聴率より小さくなる等、すべての対象者（すなわち、母集団）全体の結果を正確に反映したものにならない可能性がある。 To explain with a specific example, it is assumed that the person selected as the second target person U tends to correspond to a unique attribute (for example, an attribute that the Internet is frequently used). In that case, since a bias occurs due to the above attributes, when the device log data of each second target person U is aggregated to calculate the TV audience rating, the calculation result is other than the device log data. It may not accurately reflect the results of all the subjects (that is, the population), such as being smaller than the audience rating calculated from the data of.

一方、第二データとしてのデバイスログデータは、一般的に、データ提供元の第二対象者Ｕに関する詳細な属性情報を含んでいない。そのため、デバイスログデータのみではバイアスの要因が特定し難く、デバイスログデータ単独での補正（バイアス解消のための措置）が困難となる。 On the other hand, the device log data as the second data generally does not include detailed attribute information regarding the second target person U of the data provider. Therefore, it is difficult to identify the cause of the bias only from the device log data, and it is difficult to correct the device log data alone (measures for eliminating the bias).

そこで、本発明では、代表性が担保されていないデバイスログデータを、代表性がある測定データに基づいて処理し、代表性があるデータとして取り扱えるようにした。具体的には、本実施形態に係るデータ処理装置を利用することで、第二対象者Ｕの人数に応じた分のデバイスログデータの一部を集計用データとして抽出する。その際、本実施形態に係るデータ処理装置は、抽出されたデバイスログデータの代表性が担保されるように、各第一対象者Ｔの測定データと各第二対象者Ｕのデバイスログデータとの関係に基づいてデバイスログデータを抽出する。これにより、抽出されたデバイスログデータ（すなわち、集計用データ）を用いて所定の分析を実施すれば、分析結果にバイアスが生じ難くなる。 Therefore, in the present invention, the device log data whose representativeness is not guaranteed is processed based on the representative measurement data so that it can be handled as the representative data. Specifically, by using the data processing device according to the present embodiment, a part of the device log data corresponding to the number of the second target person U is extracted as aggregation data. At that time, the data processing device according to the present embodiment includes the measurement data of each first target person T and the device log data of each second target person U so that the representativeness of the extracted device log data is guaranteed. Extract device log data based on the relationship of. As a result, if a predetermined analysis is performed using the extracted device log data (that is, aggregation data), the analysis result is less likely to be biased.

なお、本実施形態に係るデータ処理装置の機能については、次項以降において詳しく説明することとする。 The functions of the data processing device according to this embodiment will be described in detail in the following sections.

＜＜本実施形態に係るデータ処理装置の構成について＞＞
本実施形態に係るデータ処理装置（以下、データ処理装置１０）の構成について、図２を参照しながら説明する。図２は、データ処理装置１０の構成を示す図である。 << About the configuration of the data processing device according to this embodiment >>
The configuration of the data processing device (hereinafter, data processing device 10) according to the present embodiment will be described with reference to FIG. FIG. 2 is a diagram showing the configuration of the data processing device 10.

データ処理装置１０は、第一対象者Ｔの人数に応じた分の測定データ、及び、第二対象者Ｕの人数に応じた分のデバイスログデータを処理する装置である。本実施形態において、データ処理装置１０は、第一収集センターＣ１が管理して利用するサーバコンピュータ（以下、処理側サーバ１１）と、第二収集センターＣ２が管理して利用するサーバコンピュータ（以下、データ提供側サーバ１２）とによって構成されている。すなわち、本実施形態では、処理側サーバ１１及びデータ提供側サーバ１２がデータ処理装置１０としての機能を発揮するために協働する。ただし、これに限定されるものではなく、第一収集センターＣ１及び第二収集センターＣ２のいずれか一方のサーバが、他方のサーバの機能を併せ持ち、一台でデータ処理装置１０を構成してもよい。あるいは、第一収集センターＣ１及び第二収集センターＣ２のいずれとも異なる第三のサーバがデータ処理装置１０として機能してもよく、例えば、ＡＳＰ（Application Service Provider）サーバが、データ処理装置１０としての機能をＡＳＰサービスとして提供してもよい。 The data processing device 10 is a device that processes measurement data corresponding to the number of the first target person T and device log data corresponding to the number of the second target person U. In the present embodiment, the data processing device 10 includes a server computer managed and used by the first collection center C1 (hereinafter, processing side server 11) and a server computer managed and used by the second collection center C2 (hereinafter, hereinafter). It is configured by a data provider server 12). That is, in the present embodiment, the processing side server 11 and the data providing side server 12 cooperate in order to exert the function as the data processing device 10. However, the present invention is not limited to this, and even if one of the servers of the first collection center C1 and the second collection center C2 has the functions of the other server and the data processing device 10 is configured by one unit. Good. Alternatively, a third server different from either the first collection center C1 or the second collection center C2 may function as the data processing device 10. For example, the ASP (Application Service Provider) server may serve as the data processing device 10. The function may be provided as an ASP service.

処理側サーバ１１及びデータ提供側サーバ１２は、通常のサーバコンピュータと同じハードウェア構成となっており、図２に示すように、ＣＰＵ１１ａ、１２ａと、ＲＯＭ及びＲＡＭからなるメモリ１１ｂ、１２ｂと、通信用インターフェイス１１ｃ、１２ｃと、補助記憶装置としてのハードディスクドライブ１１ｄ、１２ｄと、キーボード及びマウス等からなる入力機器１１ｅ、１２ｅと、ディスプレイ及びプリンタ等からなる出力機器１１ｆ、１２ｆとを有する。また、処理側サーバ１１及びデータ提供側サーバ１２の各々には、データ処理装置１０としての機能のうち、各サーバと対応する機能を発揮するためのプログラム（データ処理用プログラム）がインストールされている。 The processing side server 11 and the data providing side server 12 have the same hardware configuration as a normal server computer, and as shown in FIG. 2, communicate with the CPUs 11a and 12a and the memories 11b and 12b composed of ROM and RAM. It has interfaces 11c and 12c, hard disk drives 11d and 12d as auxiliary storage devices, input devices 11e and 12e including a keyboard and a mouse, and output devices 11f and 12f including a display and a printer. In addition, a program (data processing program) for exerting a function corresponding to each server among the functions of the data processing device 10 is installed in each of the processing side server 11 and the data providing side server 12. ..

処理側サーバ１１は、各第一対象者Ｔの測定機器から測定データを受信し、第一対象者Ｔの人数に応じた分の測定データをハードディスクドライブ１１ｄに記憶して蓄積している。すなわち、処理側サーバ１１のハードディスクドライブ１１ｄは、本発明の『第一記憶部』として機能する。ただし、これに限定されるものではなく、処理側サーバ１１に外付け形式で接続された補助記憶装置、若しくは、処理側サーバ１１と通信可能に接続された他のコンピュータ（データ提供側サーバ１２を含む）が第一記憶部として機能してもよい。 The processing side server 11 receives the measurement data from the measuring device of each first target person T, and stores and stores the measurement data corresponding to the number of the first target person T in the hard disk drive 11d. That is, the hard disk drive 11d of the processing side server 11 functions as the "first storage unit" of the present invention. However, the present invention is not limited to this, and the auxiliary storage device connected to the processing side server 11 in an external format or another computer (data providing side server 12) communicably connected to the processing side server 11 is used. Includes) may function as the first storage unit.

データ提供側サーバ１２は、製造メーカＭから各第二対象者Ｕのデバイスログデータを受け取り、第二対象者Ｕの人数に応じた分のデバイスログデータをハードディスクドライブ１２ｄに記憶して蓄積している。すなわち、データ提供側サーバ１２のハードディスクドライブ１２ｄは、本発明の『第二記憶部』として機能する。ただし、これに限定されるものではなく、データ提供側サーバ１２に外付け形式で接続された補助記憶装置、若しくは、データ提供側サーバ１２と通信可能に接続された他のコンピュータ（処理側サーバ１１を含む）が第二記憶部として機能してもよい。 The data provider server 12 receives the device log data of each second target U from the manufacturer M, and stores and stores the device log data corresponding to the number of the second target U in the hard disk drive 12d. There is. That is, the hard disk drive 12d of the data providing side server 12 functions as the "second storage unit" of the present invention. However, the present invention is not limited to this, and the auxiliary storage device connected to the data providing side server 12 in an external format or another computer (processing side server 11) connected to the data providing side server 12 in a communicable manner. (Including) may function as a second storage unit.

また、本実施形態において、処理側サーバ１１は、データ提供側サーバ１２と通信することで、データ提供側サーバ１２のハードディスクドライブ１２ｄにアクセスし、同ハードディスクドライブ１２ｄに記憶された各第二対象者Ｕのデバイスログデータ（第二データ）を読み出すことができる。 Further, in the present embodiment, the processing side server 11 accesses the hard disk drive 12d of the data providing side server 12 by communicating with the data providing side server 12, and each second target person stored in the hard disk drive 12d. The device log data (second data) of U can be read.

さらに、処理側サーバ１１は、データ処理装置１０の主要部をなすコンピュータとして機能する。より具体的に説明すると、処理側サーバ１１は、データ提供側サーバ１２のハードディスクドライブ１２ｄに記憶された各第二対象者Ｕのデバイスログデータの中から、集計用データとして用いられるデバイスログデータを抽出する。すなわち、処理側サーバ１１は、本発明の『データ抽出部』として機能する。厳密に説明すると、処理側サーバ１１のＣＰＵ１１ａと、処理側サーバ１１にインストールされたデータ処理用プログラムとが協働することにより、本発明の『データ抽出部』が実現される。 Further, the processing side server 11 functions as a computer that forms a main part of the data processing device 10. More specifically, the processing side server 11 selects the device log data used as the aggregation data from the device log data of each second target person U stored in the hard disk drive 12d of the data providing side server 12. Extract. That is, the processing side server 11 functions as the "data extraction unit" of the present invention. Strictly speaking, the "data extraction unit" of the present invention is realized by the cooperation between the CPU 11a of the processing side server 11 and the data processing program installed in the processing side server 11.

デバイスログデータの抽出結果（すなわち、どの第二対象者Ｕのデバイスログデータが集計用データとして抽出されたか）については、処理側サーバ１１がデータ提供側サーバ１２と通信することで、第一収集センターＣ１から第二収集センターＣ２に通知される。なお、集計用データとして抽出されたデバイスログデータは、第二収集センターＣ２側で行われる分析処理に利用される。 The extraction result of the device log data (that is, which second target U's device log data was extracted as aggregation data) is first collected by the processing side server 11 communicating with the data providing side server 12. The center C1 notifies the second collection center C2. The device log data extracted as the aggregation data is used for the analysis process performed on the second collection center C2 side.

また、処理側サーバ１１は、デバイスログデータを抽出するにあたり、処理側サーバ１１のハードディスクドライブ１１ｄに記憶された各第一対象者Ｔの測定データを参照し、各第一対象者Ｔの測定データと各第二対象者Ｕのデバイスログデータとの関係を求める。 Further, when the processing side server 11 extracts the device log data, the processing side server 11 refers to the measurement data of each first target person T stored in the hard disk drive 11d of the processing side server 11, and the measurement data of each first target person T. And the device log data of each second target person U are obtained.

具体的に説明すると、処理側サーバ１１は、それぞれの第一対象者Ｔの測定データに対して、各第二対象者Ｕのデバイスログデータを紐付ける。この紐付けにより、各第一対象者Ｔの測定データと類似している第二対象者Ｕのデバイスログデータが割り出されるようになる。ここで、「類似」とは、第一データ及び第二データにおける共通項目の内容、具体的にはテレビの視聴状況が類似していることを意味する。なお、各測定データに対するデバイスログデータの紐付けについては、後に詳しく説明することとする。 Specifically, the processing side server 11 associates the device log data of each second target person U with the measurement data of each first target person T. By this association, the device log data of the second subject U, which is similar to the measurement data of each first subject T, can be calculated. Here, "similarity" means that the contents of common items in the first data and the second data, specifically, the viewing situation of television is similar. The association of device log data with each measurement data will be described in detail later.

その後、処理側サーバ１１は、図３に示すように、データ提供側サーバ１２側に記憶された各第二対象者Ｕのデバイスログデータのうち、各第一対象者Ｔの測定データと類似しているデバイスログデータを集計用データとして抽出する。このようにして抽出されたデバイスログデータは、図３に示すように、代表性がある測定データと類似していることから、疑似的に「代表性があるデータ」として取り扱うことができる。
図３は、デバイスログデータの抽出要領を示すイメージ図である。 After that, as shown in FIG. 3, the processing side server 11 is similar to the measurement data of each first target person T among the device log data of each second target person U stored in the data providing side server 12 side. Extract the device log data that is being used as aggregation data. As shown in FIG. 3, the device log data extracted in this way is similar to the representative measurement data, and thus can be treated as pseudo “representative data”.
FIG. 3 is an image diagram showing a procedure for extracting device log data.

処理側サーバ１１は、データ抽出後、デバイスログデータの抽出結果を示す情報を、データ提供側サーバ１２に伝送する。これにより、どの第二対象者Ｕのデバイスログデータが処理側サーバ１１によって抽出されたのかを第二収集センターＣ２側で把握することが可能となる。そして、第二収集センターＣ２では、抽出されたデバイスログデータを集計用データとして利用して所定の分析を実施する。この際、抽出されたデバイスログデータは、前述したように、代表性があるデータとして用いることができるので、偏り（バイアス）が抑えられた分析結果が得られるようになる。 After extracting the data, the processing side server 11 transmits information indicating the extraction result of the device log data to the data providing side server 12. As a result, it becomes possible for the second collection center C2 side to grasp which device log data of the second target person U has been extracted by the processing side server 11. Then, the second collection center C2 uses the extracted device log data as aggregation data to perform a predetermined analysis. At this time, as described above, the extracted device log data can be used as representative data, so that an analysis result with suppressed bias can be obtained.

＜＜本実施形態に係るデータ処理方法について＞＞
次に、上述したデータ処理装置１０の動作例として、データ処理装置１０が収集データを処理する流れ（以下、データ処理フロー）について説明する。
なお、データ処理フローでは、本発明のデータ処理方法が採用されている。すなわち、以下の説明には、本発明のデータ処理方法に関する説明が含まれており、また、以下に述べるデータ処理フロー中の各ステップは、本発明のデータ処理方法を構成する工程に相当する。 << About the data processing method according to this embodiment >>
Next, as an operation example of the above-mentioned data processing device 10, a flow in which the data processing device 10 processes the collected data (hereinafter, data processing flow) will be described.
In the data processing flow, the data processing method of the present invention is adopted. That is, the following description includes a description of the data processing method of the present invention, and each step in the data processing flow described below corresponds to a step constituting the data processing method of the present invention.

データ処理フローにおいて、データ処理装置１０は、図４に図示の各ステップを実施する。図４は、本実施形態に係るデータ処理方法の流れを示す図であり、データ処理フローについての説明図である。 In the data processing flow, the data processing apparatus 10 carries out each step shown in FIG. FIG. 4 is a diagram showing a flow of a data processing method according to the present embodiment, and is an explanatory diagram of the data processing flow.

データ処理フローの実行に際して、第一対象者Ｔがランダムに選出され、また、テレビ受像機がインターネットに結線された第二対象者Ｕがデバイスログデータの提供を承諾する。その後に、第一対象者Ｔを対象とするデータ取得、及び、第二対象者Ｕを対象とするデータ取得がそれぞれ実施される。なお、本実施形態では、上記２つのデータ取得がいずれも、テレビの視聴状況に関するデータ取得となっている。また、上記２つのデータ取得は、同時期に実施されてもよく、あるいは互いに異なる時期に実施されてもよい。 When executing the data processing flow, the first target person T is randomly selected, and the second target person U to which the television receiver is connected to the Internet consents to the provision of device log data. After that, data acquisition targeting the first target person T and data acquisition targeting the second target person U are carried out, respectively. In the present embodiment, both of the above two data acquisitions are data acquisitions related to the viewing status of television. Further, the above two data acquisitions may be carried out at the same time, or may be carried out at different times.

第一対象者Ｔを対象とする調査の実施期間中、第一収集センターＣ１側では、処理側サーバ１１が各第一対象者Ｔから測定データを取得する（Ｓ００１）。具体的に説明すると、テレビを視聴している第一対象者Ｔについては、その者の自宅に設置された測定機器がテレビ視聴時間中、定期的に測定データを生成し、生成したデータを発信する。処理側サーバ１１は、上記の測定機器から発信された測定データを、通信回線を通じて取得（受信）する。これにより、処理側サーバ１１は、第一対象者Ｔの人数に応じた数の測定データを取得する。 During the implementation period of the survey targeting the first target person T, the processing side server 11 acquires the measurement data from each first target person T on the first collection center C1 side (S001). Specifically, for the first subject T who is watching TV, the measuring device installed at that person's home periodically generates measurement data during the TV viewing time and transmits the generated data. To do. The processing side server 11 acquires (receives) the measurement data transmitted from the above-mentioned measuring device through the communication line. As a result, the processing side server 11 acquires the number of measurement data corresponding to the number of the first target person T.

また、ステップＳ００１において、処理側サーバ１１は、取得した各第一対象者Ｔの測定データを、第一記憶部としてのハードディスクドライブ１１ｄに記憶して蓄積する。この際、処理側サーバ１１は、測定データが示す第一対象者Ｔの識別情報に基づき、各第一対象者Ｔの測定データを各第一対象者Ｔ別に記憶する。 Further, in step S001, the processing side server 11 stores and stores the acquired measurement data of each first target person T in the hard disk drive 11d as the first storage unit. At this time, the processing side server 11 stores the measurement data of each first target person T for each first target person T based on the identification information of the first target person T indicated by the measurement data.

なお、ステップＳ００１は、例えば、第一対象者Ｔを対象とする調査の実施期間が満了するまで繰り返して行われる。 In addition, step S001 is repeated, for example, until the implementation period of the survey targeting the first subject T expires.

他方、第二対象者Ｕを対象とするデータ取得の期間中、第二収集センターＣ２側では、データ提供側サーバ１２が、各第二対象者Ｕのテレビ受信機から発信されたデバイスログデータを取得する（Ｓ００２）。具体的に説明すると、第二対象者Ｕがテレビ受信機を通じてテレビを視聴すると、その視聴ログがデバイスログデータとしてテレビ受信機に記憶され、テレビ受信機は、所定のタイミングにてデバイスログデータを製造メーカＭに送信する。製造メーカＭは、データ提供側サーバ１２の要求に応じて、又は所定のタイミングで自動的に各第二対象者Ｕのデバイスログデータを提供する。データ提供側サーバ１２は、インターネット等の通信回線を通じて製造メーカＭから各第二対象者Ｕのデバイスログデータを取得（受信）する。これにより、データ提供側サーバ１２は、第二対象者Ｕの人数に応じた数のデバイスログデータを取得する。 On the other hand, during the data acquisition period for the second target U, on the second collection center C2 side, the data provider server 12 receives the device log data transmitted from the television receiver of each second target U. Acquire (S002). Specifically, when the second target person U watches TV through the TV receiver, the viewing log is stored in the TV receiver as device log data, and the TV receiver stores the device log data at a predetermined timing. Send to manufacturer M. The manufacturer M automatically provides the device log data of each second target person U in response to the request of the data providing side server 12 or at a predetermined timing. The data providing side server 12 acquires (receives) the device log data of each second target person U from the manufacturer M through a communication line such as the Internet. As a result, the data providing side server 12 acquires the number of device log data corresponding to the number of the second target person U.

また、ステップＳ００２において、データ提供側サーバ１２は、取得した各第二対象者Ｕのデバイスログデータを、第二記憶部としてのハードディスクドライブ１２ｄに記憶して蓄積する。この際、データ提供側サーバ１２は、デバイスログデータが示す第二対象者Ｕの識別情報に基づき、各第二対象者Ｕのデバイスログデータを各第二対象者Ｕ別に記憶する。 Further, in step S002, the data providing side server 12 stores and stores the acquired device log data of each second target person U in the hard disk drive 12d as the second storage unit. At this time, the data providing side server 12 stores the device log data of each second target person U for each second target person U based on the identification information of the second target person U indicated by the device log data.

なお、ステップＳ００２は、例えば、第二対象者Ｕを対象とするデータ取得の実施期間が満了するまで繰り返して行われる。また、図４では、ステップＳ００２がステップＳ００１の後に行われることになっているが、このような場合に限定されず、例えば、ステップＳ００１よりも前に行われてもよく、また、同時期に行われてもよく、あるいは一方のステップが行われている期間中に他方のステップが行われてもよい。 In addition, step S002 is repeated, for example, until the execution period of data acquisition for the second target person U expires. Further, in FIG. 4, step S002 is supposed to be performed after step S001, but the present invention is not limited to such a case, and may be performed before step S001, for example, at the same time. It may be performed, or the other step may be performed while one step is being performed.

その後のステップ（具体的には、図４のＳ００３〜Ｓ００７）は、データ処理フローのメインフローであり、主に処理側サーバ１１によって行われる。 Subsequent steps (specifically, S003 to S007 in FIG. 4) are the main flow of the data processing flow, and are mainly performed by the processing side server 11.

ステップＳ００３〜Ｓ００７では、コンピュータである処理側サーバ１１がデータ抽出部として機能する。先ず、処理側サーバ１１は、データ提供側サーバ１２と通信し、データ提供側サーバ１２側に記憶された各第二対象者Ｕのデバイスログデータを読み出す（Ｓ００３）。デバイスログデータの読み出しは、データ提供側サーバ１２に記憶されたデバイスログデータ全部を対象としてもよく、デバイスログデータが示すテレビ視聴時期又は時間が所定の時期に該当するデバイスログデータのみを対象としてもよい。 In steps S003 to S007, the processing side server 11, which is a computer, functions as a data extraction unit. First, the processing side server 11 communicates with the data providing side server 12 and reads out the device log data of each second target person U stored in the data providing side server 12 side (S003). The device log data may be read out for all the device log data stored in the data provider server 12, and only for the device log data corresponding to the TV viewing time or time indicated by the device log data at a predetermined time. May be good.

次に、処理側サーバ１１は、自身が記憶している各第一対象者Ｔの測定データと、データ提供側サーバ１２から読み出した各第二対象者Ｕのデバイスログデータとの間で類似度合いを算出する（Ｓ００４）。ここで、類似度合いとは、測定データ及びデバイスログデータの双方に共通する共通項目の内容、具体的にはテレビ視聴状況についての類似度合いである。処理側サーバ１１は、第一対象者Ｔと第二対象者Ｕとの組み合わせを変えて組み合わせ別に上記の類似度合いを算出する。つまり、第一対象者Ｔの人数をＸとし、第二対象者Ｕの人数をＹとすると（Ｘ、Ｙはともに自然数）、Ｘ＊Ｙ個の組み合わせのそれぞれについて類似度合いが計算されることになる。 Next, the processing side server 11 has a degree of similarity between the measurement data of each first target person T stored by itself and the device log data of each second target person U read from the data providing side server 12. Is calculated (S004). Here, the degree of similarity is the degree of similarity regarding the contents of common items common to both the measurement data and the device log data, specifically, the television viewing situation. The processing side server 11 changes the combination of the first target person T and the second target person U, and calculates the above degree of similarity for each combination. That is, if the number of the first subject T is X and the number of the second subject U is Y (both X and Y are natural numbers), the degree of similarity is calculated for each of the XY combinations. Become.

なお、データ間の類似度合いの算出方法については、公知の方法が利用可能であり、例えば、類似度合いの指標値として相関係数を求める方法を採用してもよく、あるいは絶対誤差（Absolute Error）を割り出す方法を採用してもよく、若しくは距離（ユークリッド距離、マハラノビス距離又はコサイン距離等）を算出する方法を採用してもよい。 As a method for calculating the degree of similarity between data, a known method can be used. For example, a method of obtaining a correlation coefficient as an index value of the degree of similarity may be adopted, or an absolute error (Absolute Error). The method of calculating the distance (Euclidean distance, Mahalanobis distance, cosine distance, etc.) may be adopted.

ステップＳ００４では、前述したように、それぞれの第一対象者Ｔの測定データについて、Ｙ人分の第二対象者Ｕのデバイスログデータとの類似度合いが算出される。これにより、各第一対象者Ｔの測定データについて、すべての第二対象者Ｕのデバイスログデータを、類似度合いに基づいて順位付けすることが可能となる。すなわち、Ｘ人分の第一対象者Ｔの測定データのそれぞれに対して、Ｙ人分の第二対象者Ｕのデバイスログデータを、図５に示すように類似度合い順に紐付けておくことができるようになる。図５は、各第一対象者Ｔの測定データと各第二対象者Ｕのデバイスログデータとの対応関係（紐付け）を示すテーブルである。なお、図中、「Ｔ_ｉ（ｉは１〜Ｘ）」という表記は、各第一対象者Ｔを表しており、「Ｕ_ｊ（ｊは１〜Ｙ）」という表記は、各第二対象者Ｕを表している。 In step S004, as described above, the degree of similarity between the measurement data of each first subject T and the device log data of the second subject U for Y is calculated. As a result, for the measurement data of each first subject T, the device log data of all the second subject U can be ranked based on the degree of similarity. That is, it is possible to associate the device log data of the second subject U for Y with each of the measurement data of the first subject T for X people in the order of similarity as shown in FIG. become able to. FIG. 5 is a table showing the correspondence (association) between the measurement data of each first subject T and the device log data of each second subject U. In the figure, the notation "T _i (i is 1 to X)" represents each first target person T, and the notation "U _j (j is 1 to Y)" is each second target. Represents person U.

その後、処理側サーバ１１は、それぞれの第一対象者Ｔについて、類似度合いが最大となる組み合わせから順に当該組み合わせに属する第二対象者Ｕを特定する（Ｓ００５）。つまり、本ステップＳ００５では、各第一対象者Ｔについて、当該各第一対象者Ｔとテレビ視聴状況（つまり、共通項目の内容）が類似している第二対象者Ｕを、類似度合いが大きい方から順に特定する。本ステップＳ００５の具体的な手順について、図６を参照しながら以下に詳しく説明する。図６は、第一対象者Ｔと類似する第二対象者Ｕを特定する手順についての説明図である。 After that, the processing side server 11 specifies the second target person U belonging to the combination in order from the combination having the maximum degree of similarity for each first target person T (S005). That is, in this step S005, for each first target person T, the second target person U whose TV viewing status (that is, the contents of common items) is similar to that of each first target person T has a large degree of similarity. Specify in order from the side. The specific procedure of this step S005 will be described in detail below with reference to FIG. FIG. 6 is an explanatory diagram of a procedure for identifying a second subject U similar to the first subject T.

なお、以下の説明では、説明を分かり易くするために、第一対象者Ｔ及び第二対象者Ｕの各々の人数を実際の人数よりも少ない数とし、具体的には第一対象者Ｔの人数Ｘを１０人とし、第二対象者Ｕの人数Ｙを１００人とする。 In the following explanation, in order to make the explanation easier to understand, the number of each of the first subject T and the second subject U is set to be smaller than the actual number, and specifically, the number of the first subject T The number of people X is 10, and the number Y of the second target person U is 100.

前段のステップＳ００４により、図６に示すように、１０人の第一対象者Ｔの測定データのそれぞれに対して、１００人分の第二対象者Ｕのデバイスログデータが類似度合い順に紐付けられている。例えば、ある第一対象者Ｔ_１の測定データについては、第二対象者Ｕ_２のデバイスログデータが最も類似しており、第二対象者Ｕ_５９のデバイスログデータが２番目に類似しており、以降、残り９８人の第二対象者Ｕのデバイスログデータが類似度合いの大きさに応じて順位付けられている。 In step S004 of the previous stage, as shown in FIG. 6, the device log data of 100 second subjects U are associated with each of the measurement data of 10 first subjects T in order of similarity. ing. For example, there is the first subject T ₁ of the measurement data, the device logs the data of the second subject U ₂ are most similar, and device log data of the second subject U ₅₉ is similar to the second Since then, the device log data of the remaining 98 second subjects U are ranked according to the degree of similarity.

ステップＳ００５では、処理側サーバ１１が、図６に図示の関係を参照しながら、第一対象者Ｔと第二対象者Ｕとの組み合わせのうち、類似度合いが最大となる組み合わせから順に当該組み合わせに属する第二対象者Ｕを、各第一対象者Ｔ別に特定する。より具体的に説明すると、処理側サーバ１１は、各第一対象者Ｔについて、テレビ視聴状況が類似している第二対象者Ｕを、類似度合いが大きい方から設定人数だけ特定する。 In step S005, the processing-side server 11 selects the combination of the first target person T and the second target person U in order from the combination having the maximum degree of similarity, referring to the relationship shown in FIG. The second target person U to which the person belongs is specified for each first target person T. More specifically, the processing side server 11 specifies, for each first target person T, the second target person U having similar TV viewing conditions by the set number of people from the one with the largest degree of similarity.

ここで、設定人数とは、データ処理フローの実行に際して予め設定された値であり、具体的には、集計用データとして必要となるデバイスログデータの数（すなわち、必要抽出データ数）である。この設定人数は、任意の数に設定可能であり、また、設定後に変更することも可能である。なお、以下では、設定人数が２０であることとする。 Here, the set number of people is a value set in advance when executing the data processing flow, and specifically, is the number of device log data required as aggregation data (that is, the number of required extraction data). This set number of people can be set to any number, and can be changed after setting. In the following, it is assumed that the set number of people is 20.

ステップＳ００５では、処理側サーバ１１が、１０人の第一対象者Ｔのそれぞれについて、最も類似している第二対象者Ｕを特定する。これにより、先ず１０人分の第二対象者Ｕが特定されることになる。以降、処理側サーバ１１は、類似度合いが大きい第二対象者Ｕから順に特定する。以上のような手順により、設定人数（２０人）分の第二対象者Ｕが特定されることになる。 In step S005, the processing server 11 identifies the most similar second target U for each of the ten first target Ts. As a result, the second target U for 10 people is first identified. After that, the processing side server 11 specifies in order from the second target person U having the highest degree of similarity. By the above procedure, the second target person U for the set number of people (20 people) is specified.

そして、処理側サーバ１１は、特定された第二対象者Ｕの人数が設定人数に達したときに、特定された設定人数分の第二対象者Ｕのデバイスログデータ（図６のケースでは、枠囲みされた２０人分のデバイスログデータ）を集計用データとして抽出する（Ｓ００６）。 Then, when the number of the specified second target U reaches the set number, the processing side server 11 has the device log data of the second target U corresponding to the specified set number (in the case of FIG. 6, in the case of FIG. Device log data for 20 people surrounded by a frame) is extracted as aggregation data (S006).

この際、抽出したデバイスログデータの中に、同一の第二対象者Ｕのデバイスログデータが重複している場合がある。すなわち、ステップＳ００５において、処理側サーバ１１が設定人数分の第二対象者Ｕを特定した際に、ある第二対象者が複数回重複して特定されることがある。例えば、図６のケースで説明すると、第二対象者Ｕ_２が二回重複して特定されており、第二対象者Ｕ_７が三回重複して特定されている。 At this time, the device log data of the same second target U may be duplicated in the extracted device log data. That is, in step S005, when the processing side server 11 specifies the second target person U for the set number of people, a certain second target person may be specified more than once. For example, in the case of FIG. 6, the second subject U ₂ is identified twice and the second subject U ₇ is identified three times.

上記の場合において、処理側サーバ１１は、重複して特定された第二対象者Ｕのデバイスログデータを、その者が特定された回数と同数の集計用データとして抽出する。つまり、重複して特定された第二対象者Ｕのデバイスログデータは、図７に示すように、特定された回数と同じ回数だけ重複して抽出される。図７は、重複して特定された第二対象者Ｕのデバイスログデータについての、抽出回数を示す図である。 In the above case, the processing side server 11 extracts the device log data of the second target person U, which is duplicately specified, as the same number of aggregation data as the number of times that person is specified. That is, as shown in FIG. 7, the device log data of the second target U specified in duplicate is extracted in duplicate as many times as the number of times specified. FIG. 7 is a diagram showing the number of extractions of the device log data of the second target U, which is duplicately identified.

重複して抽出されたデバイスログデータの抽出回数は、その後の集計作業においてウェイトとして利用される。すなわち、ｎ回（ｎは２以上の自然数）重複して抽出されたデバイスログデータは、集計時に、ｎ人分の第二対象者Ｕのデバイスログデータとして取り扱われることになる。 The number of times the device log data extracted in duplicate is extracted is used as a weight in the subsequent aggregation work. That is, the device log data extracted n times (n is a natural number of 2 or more) is treated as the device log data of the second target person U for n people at the time of aggregation.

なお、本実施形態では、上述したように、重複して特定された第二対象者Ｕのデバイスログデータを、重複回数と同数の集計用データとして抽出することとしたが、これに限定されるものではない。具体的に説明すると、設定人数分の第二対象者Ｕを特定した際、ある第二対象者が複数回重複して特定される場合に、その者のデバイスログデータを重複せずに１つのデバイスログデータとして抽出してもよい。その場合には、デバイスログデータの抽出数が設定人数を下回ることになるので、不足分のデータを、類似度合いの順位に基づいて、より高順位の第二対象者Ｕのデバイスログデータ（図６のケースでは、３番目に類似する第二対象者Ｕのデバイスログデータ）から順に補填すればよい。 In the present embodiment, as described above, the device log data of the second target U specified in duplicate is extracted as the same number of aggregation data as the number of duplicates, but the present invention is limited to this. It's not a thing. Specifically, when the second target person U for the set number of people is specified, if a certain second target person is specified more than once, one device log data of that person is not duplicated. It may be extracted as device log data. In that case, the number of extracted device log data will be less than the set number of people, so the missing data will be the device log data of the second target U with a higher ranking based on the ranking of the degree of similarity (Fig. In the case of 6, the device log data of the second subject U, which is similar to the third, may be supplemented in order.

ステップＳ００６の終了後、処理側サーバ１１は、データ提供側サーバ１２と通信し、ステップＳ００６での抽出結果（すなわち、どの第二対象者Ｕのデバイスログデータを集計用データとして抽出したか）を示す情報を第二収集センターＣ２側に伝送する（Ｓ００７）。以上までのステップが完了した時点で、データ処理フローが終了する。 After the end of step S006, the processing side server 11 communicates with the data providing side server 12 and extracts the extraction result in step S006 (that is, which second target U's device log data is extracted as aggregation data). The indicated information is transmitted to the second collection center C2 side (S007). When the steps up to the above are completed, the data processing flow ends.

データ処理フローの終了後、第二収集センターＣ２側では、処理側サーバ１１から伝送された情報をインターネット経由で受信し、処理側サーバ１１によって抽出されたデバイスログデータを集計して所定の分析を実施する。 After the end of the data processing flow, the second collection center C2 side receives the information transmitted from the processing side server 11 via the Internet, aggregates the device log data extracted by the processing side server 11, and performs a predetermined analysis. carry out.

＜＜本実施形態の有効性について＞＞
以上までに説明してきたように、本実施形態では、各第一対象者Ｔの測定データ（第一データ）と各第二対象者Ｕのデバイスログデータ（第二データ）との間で、テレビ視聴状況（共通項目の内容）についての類似度合いを算出する。そして、算出した類似度合いに基づいて特定された第二対象者Ｕのデバイスログデータを、集計用データとして抽出する。 << About the effectiveness of this embodiment >>
As described above, in the present embodiment, between the measurement data (first data) of each first subject T and the device log data (second data) of each second subject U, a television is used. Calculate the degree of similarity regarding the viewing status (contents of common items). Then, the device log data of the second target person U specified based on the calculated degree of similarity is extracted as aggregation data.

以上により、本来は代表性がない第二対象者Ｕのデバイスログデータの中から、代表性がある第一対象者Ｔの測定データと類似するデータを、集計用データとして抽出することができる。そして、抽出されたデバイスログデータを集計して所定の分析を実施すれば、偏り（バイアス）が抑えられた分析結果を得られるようになる。 As described above, from the device log data of the second subject U, which is originally non-representative, data similar to the measurement data of the first subject T, which has representativeness, can be extracted as aggregation data. Then, if the extracted device log data is aggregated and a predetermined analysis is performed, an analysis result with suppressed bias can be obtained.

＜＜その他の実施形態＞＞
以上までに、本発明のデータ処理装置及びデータ処理方法について、一つの具体的な実施形態を挙げて説明したが、当該実施形態は、あくまでも一例に過ぎず、他の実施形態も考えられる。 << Other Embodiments >>
The data processing apparatus and data processing method of the present invention have been described above with reference to one specific embodiment, but the embodiment is merely an example, and other embodiments are also conceivable.

例えば、上述した実施形態では、データ処理フローにおいて、各第一対象者Ｔの測定データに対して、すべての第二対象者Ｕのデバイスログデータを類似度合いに応じて紐付け（厳密には、順位付け）することとした。そして、それぞれの第一対象者Ｔについて、類似度合いがより高い（すなわち、より高順位の）第二対象者Ｕのデバイスログデータから順に集計用データとして抽出することとした。ただし、これに限定されるものではなく、他の方式でデバイスログデータを抽出する形態（以下、変形例）も考えられる。 For example, in the above-described embodiment, in the data processing flow, the device log data of all the second target persons U are associated with the measurement data of each first target person T according to the degree of similarity (strictly speaking, (Ranking) was decided. Then, for each of the first subject T, the device log data of the second subject U having a higher degree of similarity (that is, a higher rank) is extracted in order as aggregation data. However, the present invention is not limited to this, and a form of extracting device log data by another method (hereinafter referred to as a modified example) can be considered.

変形例について具体的に説明すると、例えば、各第一対象者Ｔの測定データと各第二対象者Ｕのデバイスログデータとの類似度合いを算出した後、類似度合いに応じて、各第二対象者Ｕのデバイスログデータを、当該各第二対象者Ｕと最も類似する第一対象者Ｔの測定データに紐付ける。これにより、Ｙ人の第二対象者Ｕの各々のデバイスログデータは、図８に示すようにクラスタリングされ、Ｘ人の第一対象者Ｔと同数のグループのうち、いずれか一つのグループに属するようになる。図８は、各第二対象者Ｕのデバイスログデータを各第一対象者Ｔの測定データとの類似度合いに応じてクラスタリングしたときの図である。なお、図８では、第二対象者Ｕのデバイスログデータを黒点で示し、第一対象者Ｔの測定データをバツ印で示している。また、図示の都合上、図８では、第一対象者Ｔ及び第二対象者Ｕの人数が実際よりも少ない人数となっており、それぞれ５人、３０人となっている。 Specifically, a modification will be described. For example, after calculating the degree of similarity between the measurement data of each first subject T and the device log data of each second subject U, each second object is subjected to the degree of similarity. The device log data of the person U is associated with the measurement data of the first subject T, which is most similar to each of the second subject U. As a result, the device log data of each of the Y second subjects U is clustered as shown in FIG. 8, and belongs to any one of the same number of groups as the X first subjects T. Will be. FIG. 8 is a diagram when the device log data of each second subject U is clustered according to the degree of similarity with the measurement data of each first subject T. In FIG. 8, the device log data of the second subject U is indicated by a black dot, and the measurement data of the first subject T is indicated by a cross. Further, for convenience of illustration, in FIG. 8, the number of the first subject T and the second subject U is smaller than the actual number, which is 5 and 30, respectively.

クラスタリングの終了後には、各クラス（グループ）において、当該各クラスと対応する第一対象者Ｔの測定データとの類似度合いが最も大きくなる第二対象者Ｕのデバイスログデータから順に、各クラスのデバイスログデータを集計用データとして抽出する。そして、設定人数分のデバイスログデータが得られるまでデバイスログデータの抽出を繰り返す。この際、毎回異なるデバイスログデータが各クラスから抽出されるので、上述した実施形態のように同一の第二対象者Ｕのデバイスログデータが複数回重複して抽出されることがない。つまり、変形例では、第二対象者Ｕのデバイスログデータを重複なく抽出することが可能である。 After the completion of clustering, in each class (group), the device log data of the second subject U, which has the greatest degree of similarity with the measurement data of the first subject T corresponding to each class, is in order from each class. Extract device log data as aggregation data. Then, the extraction of the device log data is repeated until the device log data for the set number of people is obtained. At this time, since different device log data is extracted from each class each time, the device log data of the same second target U is not extracted a plurality of times as in the above-described embodiment. That is, in the modified example, it is possible to extract the device log data of the second target person U without duplication.

また、上述した実施形態では、第一データが、第一対象者Ｔのテレビ視聴状況を専用の測定機器で測定した結果を示す測定データであることとし、第二データが、第二対象者Ｕがテレビを視聴した際にテレビ受信機から発信される視聴ログ（デバイスログデータ）であることとした。ただし、これに限定されるものではなく、第一データが、第一対象者Ｔを対象として行われるアンケート調査における各質問の回答内容を示すデータであってもよい。同様に、第二データが、第二対象者Ｕを対象として行わるアンケート調査における各質問の回答内容を示すデータであってもよい。この場合の第二対象者Ｕは、アンケート調査の要請に応じるという条件（収集条件）を満たす者である。
その他の第二データとしては、以下の例が挙げられる。
［１］インターネット調査のパネルから収集したテレビ視聴状況測定データ
［２］ネットワークに接続された家電機器等（例えば、ハードディスクレコーダ）の操作ログデータ
［３］店で会員カードを提示して商品を購買した場合等に生成されるＩＤ付きＰＯＳ（Point of Sales）データ
ここで、［１］の例については、インターネット調査の要請に応じるという条件（収集条件）を満たす者、すなわち調査パネルが第二対象者Ｕに該当する。また、［２］の例については、上記の家電機器を所有し、且つデータ収集に応じるという条件（収集条件）を満たす者が第二対象者Ｕに該当する。［３］の例については、ＰＯＳデータ取得の契機となる購買行動を行うという条件（収集条件）を満たす者が第二対象者Ｕに該当する。 Further, in the above-described embodiment, the first data is measurement data indicating the result of measuring the television viewing status of the first target person T with a dedicated measuring device, and the second data is the second target person U. Is the viewing log (device log data) transmitted from the TV receiver when watching TV. However, the present invention is not limited to this, and the first data may be data indicating the answer contents of each question in the questionnaire survey conducted for the first subject T. Similarly, the second data may be data indicating the answer contents of each question in the questionnaire survey conducted for the second target person U. The second target person U in this case is a person who satisfies the condition (collection condition) of responding to the request of the questionnaire survey.
Other second data include the following examples.
[1] TV viewing status measurement data collected from the Internet survey panel [2] Operation log data of home appliances connected to the network (for example, hard disk recorder) [3] Present the membership card at the store to purchase products POS (Point of Sales) data with ID generated when the data is collected Here, in the example of [1], the second target is a person who satisfies the condition (collection condition) of responding to the request of the Internet survey, that is, the survey panel. Corresponds to person U. Further, in the example of [2], a person who owns the above-mentioned home electric appliances and satisfies the condition (collection condition) of responding to data collection corresponds to the second target person U. In the example of [3], a person who satisfies the condition (collection condition) of performing a purchasing action that triggers the acquisition of POS data corresponds to the second target person U.

また、上述した実施形態では、第一データ（具体的には、測定データ）及び第二データ（具体的には、デバイスログデータ）の双方に共通する共通項目が、放送メディアへの接触状況、より詳しくは、テレビ視聴状況であることとした。ただし、共通項目については特に限定されるものではなく、放送メディアへの接触状況以外の内容であってもよく、例えば、性別及び年齢等のようなデモグラフィックな属性であってもよく、あるいは、興味関心及びライフスタイル等のようなサイコグラフィックな属性であってもよい。 Further, in the above-described embodiment, the common item common to both the first data (specifically, measurement data) and the second data (specifically, device log data) is the contact status with the broadcasting media. More specifically, it was decided that the situation was TV viewing. However, the common items are not particularly limited, and may be contents other than the contact status with the broadcasting media, and may be demographic attributes such as gender and age, or may be. It may be a psychographic attribute such as interests and lifestyle.

１０データ処理装置
１１処理側サーバ
１１ａＣＰＵ
１１ｂメモリ
１１ｃ通信用インターフェイス
１１ｄハードディスクドライブ
１１ｅ入力機器
１１ｆ出力機器
１２データ提供側サーバ
１２ａＣＰＵ
１２ｂメモリ
１２ｃ通信用インターフェイス
１２ｄハードディスクドライブ
１２ｅ入力機器
１２ｆ出力機器
Ｃ１第一収集センター
Ｃ２第二収集センター
Ｍ製造メーカ
Ｔ第一対象者
Ｕ第二対象者 10 Data processing device 11 Processing side server 11a CPU
11b Memory 11c Communication interface 11d Hard disk drive 11e Input device 11f Output device 12 Data provider server 12a CPU
12b Memory 12c Communication interface 12d Hard disk drive 12e Input device 12f Output device C1 First collection center C2 Second collection center M Manufacturer T First target person U Second target person

上記の目的を達成するために、本発明のデータ処理装置は、ランダムに選出された第一対象者を対象として収集した第一データを、前記第一対象者の人数に応じた分、記憶している第一記憶部と、予め定められた収集条件を満たしており前記第一対象者よりも多い第二対象者を対象として取得した第二データを、前記第二対象者の人数に応じた分、記憶している第二記憶部と、前記第二記憶部に記憶された前記第二データの中から、集計用データとして用いる前記第二データを抽出するデータ抽出部と、を有し、前記第一データ及び前記第二データの双方には、当該双方に共通する共通項目の内容を示すデータが含まれており、前記データ抽出部は、前記第一データの各々と前記第二データの各々との間で前記共通項目の内容の類似度合いを算出し、算出した前記類似度合いに基づいて特定された設定人数分の前記第二対象者の前記第二データを、前記集計用データとして抽出することを特徴とする。 In order to achieve the above object, the data processing apparatus of the present invention stores the first data collected for the first target person randomly selected, in an amount corresponding to the number of the first target person. The first storage unit and the second data acquired for the second target person who meets the predetermined collection conditions and is larger than the first target person are obtained according to the number of the second target person. It has a second storage unit that stores minutes and a data extraction unit that extracts the second data to be used as aggregation data from the second data stored in the second storage unit. Both the first data and the second data include data indicating the contents of common items common to both, and the data extraction unit includes each of the first data and the second data. The degree of similarity of the contents of the common items with each is calculated, and the second data of the second target person for the set number of people specified based on the calculated degree of similarity is extracted as the aggregation data. It is characterized by doing.

また、前述した課題を解決するために、本発明のデータ処理方法は、第一記憶部が、ランダムに選出された第一対象者を対象として収集した第一データを、前記第一対象者の人数に応じた分、記憶しており、第二記憶部が、予め定められた収集条件を満たしており前記第一対象者よりも多い第二対象者を対象として収集した第二データを、前記第二対象者の人数に応じた分、記憶しており、コンピュータが、前記第二記憶部に記憶された前記第二データの中から、集計用データとして用いる前記第二データを抽出し、前記第一データ及び前記第二データの双方には、当該双方に共通する共通項目の内容を示すデータが含まれており、前記コンピュータは、前記第一データの各々と前記第二データの各々との間で前記共通項目の内容の類似度合いを算出し、算出した前記類似度合いに基づいて特定された設定人数分の前記第二対象者の前記第二データを、前記集計用データとして抽出することを特徴とする。
上記のデータ処理方法によれば、本来、代表性がない第一データを、代表性がある集計用データとして抽出することができる。 Further, in order to solve the above-mentioned problems, in the data processing method of the present invention, the first storage unit collects the first data for the first target person randomly selected, and the first data is collected by the first target person. The second data collected by the second storage unit for the second target person, which meets the predetermined collection conditions and is larger than the first target person, is stored as much as the number of people. The amount corresponding to the number of the second target person is stored, and the computer extracts the second data to be used as the aggregation data from the second data stored in the second storage unit, and the said Both the first data and the second data include data indicating the contents of common items common to both, and the computer uses each of the first data and each of the second data. The degree of similarity of the contents of the common items is calculated between the two, and the second data of the second target person for the set number of people specified based on the calculated degree of similarity is extracted as the aggregation data. It is a feature.
According to the above data processing method, the first data which is originally non-representative can be extracted as representative data for aggregation.

Claims

The first storage unit that stores the first data collected for the first randomly selected target person according to the number of the first target person, and
The second data collected for the second target person who meets the predetermined collection conditions and is larger than the first target person is stored in the amount corresponding to the number of the second target person. With the second memory
It has a data extraction unit that extracts the second data to be used as aggregation data from the second data stored in the second storage unit.
Both the first data and the second data include data indicating the contents of common items common to both.
The data extraction unit calculates the degree of similarity of the contents of the common items between each of the first data and each of the second data, and the second object specified based on the calculated degree of similarity. A data processing apparatus characterized in that the second data of a person is extracted as the aggregation data.

The data processing device according to claim 1, wherein the second data is data collected when the second target person who satisfies the collection condition performs a specific action.

The data according to claim 1 or 2, wherein the second target person is a person who satisfies the condition that the device used by the second target person for contacting the broadcasting media is connected to the Internet as the collection condition. Processing equipment.

The data processing device according to claim 3, wherein the second data is log data transmitted by the device when the second target person contacts the broadcast media using the device.

The data processing device according to claim 4, wherein the content of the common item is a contact situation with the broadcasting media.

The data processing device according to claim 5, wherein the contact status with the broadcast media is a television viewing status.

The data extraction unit calculates the degree of similarity for each of the combinations by changing the combination of the first subject and the second subject, and for each of the first subjects, the degree of similarity is maximized. The second target person belonging to the combination is specified in order from the combination, and when the number of the specified second target person reaches the set number of people, the second target person of the specified number of people is said to be said. The data processing apparatus according to any one of claims 1 to 6, which extracts the second data as the aggregation data.

When the data extraction unit identifies the second target person for the set number of people, if a second target person is specified more than once, the data extraction unit will be used for the second target person. The data processing apparatus according to claim 7, wherein the second data is extracted as the same number of aggregation data as the number of times the second target person is specified.

The first storage unit stores the first data collected for the first randomly selected target persons according to the number of the first target persons.
The second storage unit meets the predetermined collection conditions and collects the second data for the second target person, which is larger than the first target person, according to the number of the second target person. I remember
The computer extracts the second data to be used as aggregation data from the second data stored in the second storage unit.
Both the first data and the second data include data indicating the contents of common items common to both.
The computer calculates the degree of similarity of the contents of the common items between each of the first data and each of the second data, and the second subject specified based on the calculated degree of similarity. A data processing method characterized by extracting the second data as the aggregation data.