JP2011238169A

JP2011238169A - User attribute estimating device, user attribute estimating method, and program

Info

Publication number: JP2011238169A
Application number: JP2010111166A
Authority: JP
Inventors: Karin Maebashi; 佳林前橋; Hidekatsu Kuwano; 秀豪桑野; Yukinobu Taniguchi; 行信谷口; Akihito Akutsu; 明人阿久津
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-05-13
Filing date: 2010-05-13
Publication date: 2011-11-24
Anticipated expiration: 2030-05-13
Also published as: JP5536531B2

Abstract

PROBLEM TO BE SOLVED: To provide a user attribute estimating device, a user attribute estimating method and a program capable of efficiently acquiring user attributes.SOLUTION: The user attribute estimating device estimates user attributes and comprises: a known user attribute storage unit 505 for storing user attributes which are already known; a model generating unit 500 for learning a relation between geographic attributes and the user attributes based on known user attributes stored in the known user attribute storage unit 505 and generating a learning model; and a user attribute estimating unit 600 for estimating unknown user attributes from the geographic attributes based on the learning model generated by the model generating unit 500.

Description

本発明は、デジタルサイネージ向け広告配信サービスなど屋外空間向けＩＣＴ（Information and Communication Technology）サービスにおいて、駅や街などの場所の特徴を効率的に記述するために利用されるユーザ属性推定装置、ユーザ属性推定方法、およびプログラムに関する。 The present invention relates to a user attribute estimation device, user attribute, and the like used for efficiently describing characteristics of places such as stations and towns in ICT (Information and Communication Technology) services for outdoor spaces such as advertisement distribution services for digital signage. The present invention relates to an estimation method and a program.

近年、屋外空間向けＩＣＴサービスが浸透している。例えば、位置情報を取得して知人と共有するＩＣＴサービスや、位置情報の取得だけでなく位置情報に合わせて自動的にお薦め情報を選択して携帯電話に配信するＩＣＴサービスなどが知られている。 In recent years, ICT services for outdoor spaces have become popular. For example, an ICT service that acquires location information and shares it with an acquaintance, and an ICT service that automatically selects recommended information according to location information and distributes it to a mobile phone as well as acquisition of location information are known. .

なかでも、デジタルサイネージ（電子看板）の設置数は増加し、ネットワーク化の進行も顕著である。デジタルサイネージにおける広告配信は、時間と場所に合わせて最適化することが可能である。デジタルサイネージの設置数が増加すると、全て人手により最適化させるのは困難であるため、ある程度自動的に広告配信を最適化する必要がある。広告に関する情報は広告主より取得することが可能であるが、場所についての情報を得ることは困難である。そこで、デジタルサイネージ設置場所などの場所の特徴（以下、「場所特徴」という。）を自動的に記述・蓄積し、システムで活用可能にすることが必要である。 In particular, the number of digital signage (digital signage) is increasing and the progress of networking is remarkable. Advertising distribution in digital signage can be optimized according to time and place. As the number of digital signage installations increases, it is difficult to optimize everything manually, so it is necessary to automatically optimize advertisement distribution to some extent. Information about the advertisement can be obtained from the advertiser, but it is difficult to obtain information about the place. Therefore, it is necessary to automatically describe and store the characteristics of places such as digital signage installation places (hereinafter referred to as “location characteristics”) so that they can be used in the system.

場所特徴は、地理属性とユーザ属性により構成される。地理属性とは、その場所周辺の施設に関する情報や天候についての情報など、建造物や自然環境等の地理的な特徴を示す情報をいう。ユーザ属性とは、その場所を往来する人々の基本属性や行動特性など、集団の典型的特徴を示す情報をいう。ユーザ属性は、年代、利用目的、同行者などいくつかの項目を持つ。現状では、ユーザ属性は調査等によりコストをかけて取得することが必要であり、これを自動化することが望まれている。ユーザ属性の記述に応用可能な従来技術としては、以下のものが挙げられる。 A place feature is composed of a geographic attribute and a user attribute. Geographic attributes refer to information indicating geographical features such as buildings and natural environments, such as information about facilities around the place and information about weather. The user attribute is information indicating typical characteristics of a group such as basic attributes and behavioral characteristics of people who come and go around the place. User attributes have several items such as age, purpose of use, and accompanying persons. At present, it is necessary to obtain user attributes at a high cost by investigation or the like, and it is desired to automate this. Examples of conventional techniques applicable to the description of user attributes include the following.

すなわち、ある場所に設置されたカメラの映像を解析し、往来する人々の年齢、性別等の属性を推定する映像解析技術が知られている。例えば、非特許文献１には、画像から人々のおおよその人数（混雑度）を推定する技術が開示されている。また、ＧＰＳ（Global Positioning System）対応端末を利用して取得したデータを解析し、行動動線を抽出する位置情報解析技術も知られている（非特許文献２参照）。 That is, there is known a video analysis technique that analyzes a video of a camera installed in a certain place and estimates attributes such as age and sex of people who come and go. For example, Non-Patent Document 1 discloses a technique for estimating the approximate number of people (congestion degree) from an image. A position information analysis technique for analyzing data acquired using a GPS (Global Positioning System) compatible terminal and extracting a behavior flow line is also known (see Non-Patent Document 2).

杵渕哲也他、「画像処理による広告効果測定技術」、ＮＴＴ技術ジャーナル２００９．７、Ｐ１６−１９Tetsuya Tsuji et al., "Advertising Effect Measurement Technology by Image Processing", NTT Technical Journal 2009.7, P16-19 西野正彬他、「滞在地情報からの行動パターン抽出方式の検討」、社団法人情報処理学会研究報告ＩＰＳＪＳＩＧＴｅｃｈｎｉｃａｌＲｅｐｏｒｔ、２００８−ＵＢＩ−２０（１０）２００８／１１／１３、Ｐ５７−６４Masanobu Nishino et al., “Examination of Behavior Pattern Extraction Method from Location Information”, IPSJ SIG Technical Report, 2008-UBI-20 (10) 2008/11/13, P57-64

しかしながら、従来の映像解析技術によるとカメラを設置する必要があり、従来の位置情報解析技術によるとユーザがＧＰＳ端末を持ち歩き、位置情報を公開する必要がある。すなわち、従来技術によると設備やユーザ側の制約があるため、量的に十分なデータが得られにくく、ユーザ属性を多くの場所について取得することが困難である。 However, according to the conventional video analysis technique, it is necessary to install a camera, and according to the conventional position information analysis technique, the user needs to carry the GPS terminal and disclose the position information. That is, according to the prior art, there are restrictions on equipment and the user side, so that it is difficult to obtain quantitatively sufficient data, and it is difficult to acquire user attributes for many places.

本発明は、上述した従来技術に鑑み、ユーザ属性を効率よく取得することができるユーザ属性推定装置、ユーザ属性推定方法、およびプログラムを提供することを目的とする。 An object of this invention is to provide the user attribute estimation apparatus, the user attribute estimation method, and program which can acquire a user attribute efficiently in view of the prior art mentioned above.

上記目的を達成するため、第１の態様に係る発明は、ユーザ属性を推定するユーザ属性推定装置であって、既知のユーザ属性を蓄積する既知ユーザ属性蓄積部と、前記既知ユーザ属性蓄積部に蓄積された既知のユーザ属性に基づいて地理属性とユーザ属性の関係を学習して学習モデルを生成するモデル生成部と、前記モデル生成部により生成された学習モデルに基づいて地理属性から未知のユーザ属性を推定するユーザ属性推定部とを備えたことを要旨とする。 In order to achieve the above object, an invention according to a first aspect is a user attribute estimation device for estimating a user attribute, comprising: a known user attribute accumulation unit that accumulates a known user attribute; and the known user attribute accumulation unit. A model generation unit that generates a learning model by learning the relationship between a geographical attribute and a user attribute based on the accumulated known user attribute, and an unknown user from the geographical attribute based on the learning model generated by the model generation unit The gist of the invention is that it includes a user attribute estimation unit that estimates an attribute.

第２の態様に係る発明は、第１の態様に係る発明において、地理属性項目をユーザ属性との関連性の観点からツリー構造で蓄積するとともにそのツリー構造を構成する各ノードの参照番号を蓄積する地理属性項目構造蓄積部と、推定対象であるユーザ属性項目と前記ノードの参照番号との対応関係を示す情報を蓄積するユーザ属性−地理項目構造関係蓄積部とを備え、前記モデル生成部が、前記ユーザ属性−地理項目構造関係蓄積部を参照して推定対象であるユーザ属性項目に対応する前記ノードの参照番号を取得し、取得した前記ノードの参照番号に基づいて前記地理属性項目構造蓄積部に蓄積された地理属性項目を選択することにより、学習モデルで用いられる変数を取捨選択することを要旨とする。 The invention according to the second aspect is the invention according to the first aspect, wherein the geographical attribute items are accumulated in a tree structure from the viewpoint of the relevance to the user attribute, and the reference numbers of the respective nodes constituting the tree structure are accumulated. A geographic attribute item structure storage unit, and a user attribute-geographic item structure relationship storage unit that stores information indicating a correspondence relationship between a user attribute item to be estimated and a reference number of the node, and the model generation unit includes: , Obtaining a reference number of the node corresponding to the user attribute item to be estimated with reference to the user attribute-geographic item structure relationship accumulation unit, and storing the geographic attribute item structure based on the obtained reference number of the node The gist is to select variables used in the learning model by selecting geographic attribute items accumulated in the section.

第３の態様に係る発明は、第１または２の態様に係る発明において、前記ユーザ属性が、その場所が一般に何をする場所であるかを表すｓｕｐｐｌｙ変数群と、時間によって変化するｃｏｎｔｅｘｔ変数群とに依存することを要旨とする。 The invention according to the third aspect is the invention according to the first or second aspect, wherein the user attribute is a supply variable group that indicates what the place is generally to do and a context variable group that changes with time. The main point is to depend on

また、上記目的を達成するため、第４の態様に係る発明は、ユーザ属性を推定するユーザ属性推定方法であって、既知のユーザ属性を既知ユーザ属性蓄積部に蓄積する既知ユーザ属性蓄積ステップと、前記既知ユーザ属性蓄積部に蓄積された既知のユーザ属性に基づいて地理属性とユーザ属性の関係を学習して学習モデルを生成するモデル生成ステップと、前記モデル生成ステップで生成された学習モデルに基づいて地理属性から未知のユーザ属性を推定するユーザ属性推定ステップとを備えたことを要旨とする。 In order to achieve the above object, the invention according to the fourth aspect is a user attribute estimation method for estimating a user attribute, wherein a known user attribute accumulation step of accumulating a known user attribute in a known user attribute accumulation unit; A model generation step of generating a learning model by learning a relationship between a geographic attribute and a user attribute based on a known user attribute stored in the known user attribute storage unit, and a learning model generated in the model generation step. And a user attribute estimation step for estimating an unknown user attribute from a geographic attribute.

第５の態様に係る発明は、第４の態様に係る発明において、地理属性項目をユーザ属性との関連性の観点からツリー構造で蓄積するとともにそのツリー構造を構成する各ノードの参照番号を地理属性項目構造蓄積部に蓄積する地理属性項目構造蓄積ステップと、推定対象であるユーザ属性項目と前記ノードの参照番号との対応関係を示す情報をユーザ属性−地理項目構造関係蓄積部に蓄積するユーザ属性−地理項目構造関係蓄積ステップとを備え、前記モデル生成ステップで、前記ユーザ属性−地理項目構造関係蓄積部を参照して推定対象であるユーザ属性項目に対応する前記ノードの参照番号を取得し、取得した前記ノードの参照番号に基づいて前記地理属性項目構造蓄積部に蓄積された地理属性項目を選択することにより、学習モデルで用いられる変数を取捨選択することを要旨とする。 The invention according to the fifth aspect is the invention according to the fourth aspect, wherein the geographical attribute items are accumulated in a tree structure from the viewpoint of the relevance with the user attribute, and the reference numbers of the respective nodes constituting the tree structure are A user who accumulates in the user attribute-geographic item structure relationship accumulation unit information indicating the correspondence between the user attribute item to be estimated and the reference number of the node, and the geographic attribute item structure accumulation step to be accumulated in the attribute item structure accumulation unit An attribute-geographic item structure relationship accumulation step, wherein the model generation step refers to the user attribute-geographic item structure relationship accumulation unit to obtain a reference number of the node corresponding to the user attribute item to be estimated By selecting a geographic attribute item stored in the geographic attribute item structure storage unit based on the acquired reference number of the node, a learning model And it is required to sift through the variables that are needed.

第６の態様に係る発明は、第４または５の態様に係る発明において、前記ユーザ属性が、その場所が一般に何をする場所であるかを表すｓｕｐｐｌｙ変数群と、時間によって変化するｃｏｎｔｅｘｔ変数群とに依存することを要旨とする。 The invention according to a sixth aspect is the invention according to the fourth or fifth aspect, wherein the user attribute is a supply variable group indicating what the place generally does, and a context variable group that changes with time. The main point is to depend on

また、上記目的を達成するため、第７の態様に係る発明は、ユーザ属性を推定するためのプログラムであって、既知のユーザ属性を既知ユーザ属性蓄積部に蓄積する既知ユーザ属性蓄積ステップと、前記既知ユーザ属性蓄積部に蓄積された既知のユーザ属性に基づいて地理属性とユーザ属性の関係を学習して学習モデルを生成するモデル生成ステップと、前記モデル生成ステップで生成された学習モデルに基づいて地理属性から未知のユーザ属性を推定するユーザ属性推定ステップとをコンピュータに実行させるためのプログラムであることを要旨とする。 In order to achieve the above object, an invention according to a seventh aspect is a program for estimating a user attribute, a known user attribute accumulation step of accumulating a known user attribute in a known user attribute accumulation unit, A model generation step of generating a learning model by learning a relationship between a geographical attribute and a user attribute based on a known user attribute stored in the known user attribute storage unit, and based on the learning model generated in the model generation step The gist of the present invention is a program for causing a computer to execute a user attribute estimation step of estimating an unknown user attribute from a geographic attribute.

本発明によれば、ユーザ属性を効率よく取得することができるユーザ属性推定装置、ユーザ属性推定方法、およびプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the user attribute estimation apparatus, the user attribute estimation method, and program which can acquire a user attribute efficiently can be provided.

本発明の実施の形態におけるユーザ属性推定装置の構成図である。It is a block diagram of the user attribute estimation apparatus in embodiment of this invention. 本発明の実施の形態における場所特徴の例を示す図である。It is a figure which shows the example of the location feature in embodiment of this invention. 本発明の実施の形態におけるユーザ属性の説明図である。It is explanatory drawing of the user attribute in embodiment of this invention. 本発明の実施の形態における地理属性項目構造蓄積部に蓄積されるデータの例を示す図である。It is a figure which shows the example of the data accumulate | stored in the geographical attribute item structure storage part in embodiment of this invention. 本発明の実施の形態におけるユーザ属性−地理項目構造関係蓄積部に蓄積されるデータの例を示す図である。It is a figure which shows the example of the data accumulate | stored in the user attribute-geographic item structure relationship storage part in embodiment of this invention. 本発明の実施の形態における指標選択ルール蓄積部に蓄積されるデータの例を示す図である。It is a figure which shows the example of the data accumulate | stored in the parameter | index selection rule storage part in embodiment of this invention. 本発明の実施の形態における既知ユーザ属性蓄積部に蓄積されるデータの例を示す図である。It is a figure which shows the example of the data accumulate | stored in the known user attribute storage part in embodiment of this invention. 本発明の実施の形態におけるユーザ属性推定装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the user attribute estimation apparatus in embodiment of this invention. 本発明の実施の形態における学習モデルを生成するステップを詳細に示すフローチャートである。It is a flowchart which shows the step which produces | generates the learning model in embodiment of this invention in detail.

以下、本発明の実施の形態について図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

まず、本発明の実施の形態の概要を説明する。本発明の実施の形態では、地理属性とユーザ属性の関係性を利用してユーザ属性を推定する。すなわち、地理属性は、各種ＤＢを参照することで、ユーザ属性よりは低コストに生成することが可能である。そこで、既知のユーザ属性を利用して、地理属性とユーザ属性の関係モデルを学習などの手法を用いて構築し、その学習モデルに基づいて未知のユーザ属性を推定する。 First, an outline of an embodiment of the present invention will be described. In the embodiment of the present invention, the user attribute is estimated using the relationship between the geographic attribute and the user attribute. That is, the geographic attribute can be generated at a lower cost than the user attribute by referring to various DBs. Therefore, using a known user attribute, a relationship model between a geographic attribute and a user attribute is constructed using a technique such as learning, and an unknown user attribute is estimated based on the learning model.

ここで、学習モデルに用いる変数（地理属性項目）の候補は多数に上るため、推定に効果のある変数を取捨選択する必要がある。しかし、学習等に用いるサンプル（既知のユーザ属性データ）を数千〜数万オーダーで収集することはコスト面から困難である。よって、少ないサンプルから学習モデルを構築しなければない。過学習等を防ぐために、学習モデルで用いられる変数は少ない方がよく、多数の変数候補から取捨選択を行い、少数の変数を選択する必要がある。すべての組合せを調べると、計算量が膨大になるからである。 Here, since there are many candidates for variables (geographic attribute items) used in the learning model, it is necessary to select variables that are effective for estimation. However, it is difficult to collect samples (known user attribute data) used for learning etc. in the order of thousands to tens of thousands. Therefore, a learning model must be constructed from a small number of samples. In order to prevent overlearning or the like, it is better that the number of variables used in the learning model is small, and it is necessary to select a small number of variables by selecting from a large number of variable candidates. This is because when all combinations are examined, the amount of calculation becomes enormous.

そこで、変数の取捨選択を効率的に行うために、ユーザ属性との関連性の観点から地理属性項目をツリー構造として記述する。ユーザ属性の推定に対する効果が似ていると考えられる地理属性項目は同じノード下に配置し、変数選択の際に、同一ノードからは１つもしくは少数の変数のみ取捨選択の対象とする。これにより、効率よく推定に効果のある変数を選択することができる。推定するユーザ属性項目と、取捨選択対象にするツリー部分の対応関係は、あらかじめユーザ属性と地理属性の関係性の仮説もしくは実証結果より決めておく。この仮説の例については後述する。 Therefore, in order to efficiently select variables, the geographical attribute items are described as a tree structure from the viewpoint of the relationship with the user attributes. Geographic attribute items that are considered to have similar effects on user attribute estimation are placed under the same node, and at the time of variable selection, only one or a few variables are selected from the same node. As a result, it is possible to efficiently select variables that are effective for estimation. The correspondence between the user attribute item to be estimated and the tree portion to be selected is determined in advance based on a hypothesis or verification result of the relationship between the user attribute and the geographic attribute. An example of this hypothesis will be described later.

図１は、本発明の実施の形態におけるユーザ属性推定装置の構成図である。このユーザ属性推定装置は、屋外空間向けＩＣＴサービスにおいて駅や街などの場所特徴（図２参照）を効率的に記述するために利用される装置であって、機能的には、指標蓄積部３１と、地理属性項目構造蓄積部４０１と、ユーザ属性−地理項目構造関係蓄積部４０２と、指標選択部４０４と、指標選択ルール蓄積部４０５と、モデル生成部５００と、場所ＩＤ−位置変換部５０４と、既知ユーザ属性蓄積部５０５と、入力部５１と、ユーザ属性推定部６００と、入力部６１と、出力部６２とを備えている。モデル生成部５００には、変数選択部５０１と、アルゴリズム選択部５０２と、モデル蓄積部５０３とが含まれる。ユーザ属性推定部６００には、モデル選択部６０１と、推定計算部６０２とが含まれる。 FIG. 1 is a configuration diagram of a user attribute estimation apparatus according to an embodiment of the present invention. This user attribute estimation device is a device that is used to efficiently describe a location feature (see FIG. 2) such as a station or a city in an outdoor space ICT service. A geographic attribute item structure storage unit 401, a user attribute-geographic item structure relationship storage unit 402, an index selection unit 404, an index selection rule storage unit 405, a model generation unit 500, and a location ID-position conversion unit 504. A known user attribute storage unit 505, an input unit 51, a user attribute estimation unit 600, an input unit 61, and an output unit 62. The model generation unit 500 includes a variable selection unit 501, an algorithm selection unit 502, and a model storage unit 503. The user attribute estimation unit 600 includes a model selection unit 601 and an estimation calculation unit 602.

指標蓄積部３１は、店舗情報、地価情報、乗換案内情報、催事情報、気象情報、時刻情報、曜日情報など（以下、「指標」という。）の各種外部ＤＢ（database）を蓄積する。地理属性項目構造蓄積部４０１は、地理属性項目をユーザ属性との関連性の観点からツリー構造で蓄積するとともに、そのツリー構造を構成する各ノードの参照番号を蓄積する。ユーザ属性−地理項目構造関係蓄積部４０２は、推定対象であるユーザ属性項目と、ノードの参照番号（以下、「ノード番号」という。）との対応関係を示す情報を蓄積する。指標選択部４０４は、指標選択ルール蓄積部４０５に蓄積された指標選択テーブルを参照し、地理属性項目構造蓄積部４０１とユーザ属性−地理項目構造関係蓄積部４０２の各変数群の計算に必要な指標を指標蓄積部３１の各々のＤＢから取得する。指標選択ルール蓄積部４０５は、地理属性項目構造蓄積部４０１とユーザ属性−地理項目構造関係蓄積部４０２のｓｕｐｐｌｙ，ｃｏｎｔｅｘｔ変数群の計算に必要な指標の種類と取得先アドレスの対応テーブルを蓄積する。モデル生成部５００は、既知ユーザ属性蓄積部５０５に蓄積された既知のユーザ属性に基づいて地理属性とユーザ属性の関係を学習して学習モデルを生成する。具体的には、ユーザ属性−地理項目構造関係蓄積部４０２を参照して推定対象であるユーザ属性項目に対応するノード番号を取得し、取得したノード番号に基づいて地理属性項目構造蓄積部４０１に蓄積された地理属性項目を選択することにより、学習モデルで用いられる変数を取捨選択する。既に説明した通り、モデル生成部５００には、変数選択部５０１とアルゴリズム選択部５０２とモデル蓄積部５０３とが含まれる。変数選択部５０１は、学習モデル構築に用いるｓｕｐｐｌｙ，ｃｏｎｔｅｘｔ変数群の中から変数群の組合せをユーザ属性−地理項目構造関係蓄積部４０２を参照して決定する。また、ｓｕｐｐｌｙ，ｃｏｎｔｅｘｔ変数群を構成する指標の選択も行う。アルゴリズム選択部５０２は、学習モデル構築に用いる関数／アルゴリズムを選択する。モデル蓄積部５０３は、変数選択部５０１とアルゴリズム選択部５０２により構築されたモデルを蓄積する。場所ＩＤ−位置変換部５０４は、ユーザ属性にふられている場所ＩＤを位置情報（住所、緯度経度等）に変換する。既知ユーザ属性蓄積部５０５は、既知のユーザ属性とその場所ＩＤを蓄積する。入力部５１は、学習モデルを生成したいユーザ属性項目を入力する。入力部６１は、推定したい場所とユーザ属性項目名、曜日・時間を入力する。ユーザ属性推定部６００は、モデル生成部５００により生成された学習モデルに基づいて地理属性から未知のユーザ属性を推定する。既に説明した通り、ユーザ属性推定部６００には、モデル選択部６０１と推定計算部６０２とが含まれる。モデル選択部６０１は、モデル蓄積部５０３に蓄積された学習モデルより入力部６１からの入力に合った学習モデルを選択する。推定計算部６０２は、モデル選択部６０１およびモデル生成部５００を介して必要な変数を地理属性項目構造蓄積部４０１およびユーザ属性−地理項目構造関係蓄積部４０２より取得し、該当ユーザ属性の推定を行う。出力部６２は、推定計算部６０２の推定結果を出力する。 The index accumulation unit 31 accumulates various external DBs (databases) such as store information, land price information, transfer guidance information, event information, weather information, time information, day information (hereinafter referred to as “index”). The geographic attribute item structure accumulating unit 401 accumulates geographic attribute items in a tree structure from the viewpoint of the relevance with the user attributes, and accumulates reference numbers of the nodes constituting the tree structure. The user attribute-geographic item structure relationship storage unit 402 stores information indicating a correspondence relationship between a user attribute item to be estimated and a node reference number (hereinafter referred to as “node number”). The index selection unit 404 refers to the index selection table stored in the index selection rule storage unit 405, and is necessary for calculation of each variable group of the geographic attribute item structure storage unit 401 and the user attribute-geographic item structure relationship storage unit 402. An index is acquired from each DB of the index accumulation unit 31. The index selection rule accumulation unit 405 accumulates a correspondence table of index types and acquisition destination addresses necessary for calculating the supply and context variable groups of the geographic attribute item structure accumulation unit 401 and the user attribute-geographic item structure relation accumulation unit 402. . The model generation unit 500 generates a learning model by learning the relationship between the geographical attribute and the user attribute based on the known user attribute stored in the known user attribute storage unit 505. Specifically, the node number corresponding to the user attribute item to be estimated is acquired by referring to the user attribute-geographic item structure relation storage unit 402, and the geographic attribute item structure storage unit 401 is based on the acquired node number. By selecting the accumulated geographic attribute items, the variables used in the learning model are selected. As already described, the model generation unit 500 includes a variable selection unit 501, an algorithm selection unit 502, and a model storage unit 503. The variable selection unit 501 determines a combination of variable groups from the supply and context variable groups used for learning model construction with reference to the user attribute-geographic item structure relation accumulation unit 402. In addition, the index constituting the supply and context variable group is also selected. The algorithm selection unit 502 selects a function / algorithm used for learning model construction. The model storage unit 503 stores the model constructed by the variable selection unit 501 and the algorithm selection unit 502. The place ID-position conversion unit 504 converts the place ID used in the user attribute into position information (address, latitude / longitude, etc.). The known user attribute storage unit 505 stores known user attributes and their location IDs. The input unit 51 inputs user attribute items for which a learning model is to be generated. The input unit 61 inputs a place to be estimated, a user attribute item name, and a day / time. The user attribute estimation unit 600 estimates an unknown user attribute from the geographic attribute based on the learning model generated by the model generation unit 500. As already described, the user attribute estimation unit 600 includes a model selection unit 601 and an estimation calculation unit 602. The model selection unit 601 selects a learning model that matches the input from the input unit 61 from the learning models stored in the model storage unit 503. The estimation calculation unit 602 acquires necessary variables from the geographic attribute item structure storage unit 401 and the user attribute-geographic item structure relationship storage unit 402 via the model selection unit 601 and the model generation unit 500, and estimates the corresponding user attribute. Do. The output unit 62 outputs the estimation result of the estimation calculation unit 602.

次に、地理属性項目のツリー構造について説明する。ツリーの構成は、例えば以下のような仮説により決定することができる。 Next, the tree structure of geographic attribute items will be described. The configuration of the tree can be determined by the following hypothesis, for example.

すなわち、ユーザ属性は広告配信に用いるので、広告接触シーンとしてはビジネスシーン、余暇シーン、家庭生活のシーンなどを想定することができる。例えば、「仕事上の移動中に広告を見る」「休日に遊びに行く途中に広告を見る」といったシーンである。シーンの違いは、その場所が一般に何をする場所であるかということと、時間帯などの場所以外の要因とに関係すると考え、図３に示すように、それぞれをｓｕｐｐｌｙ変数群とｃｏｎｔｅｘｔ変数群の２変数群により表現する。例えば、仕事中に訪れそうなオフィス数が多い場所であるかどうかはｓｕｐｐｌｙ変数で表現し、仕事中の可能性が高い時間帯であるかどうかはｃｏｎｔｅｘｔ変数で表現する。 That is, since the user attribute is used for advertisement distribution, a business scene, leisure scene, home life scene, and the like can be assumed as the advertisement contact scene. For example, there are scenes such as “watching an advertisement while moving on work” and “watching an advertisement on the way to play on a holiday”. The difference in scene is considered to be related to what the place is generally doing and factors other than the place such as the time zone. As shown in FIG. 3, each of the supply variable group and the context variable group is shown in FIG. It is expressed by the two variable group. For example, whether or not there is a large number of offices that are likely to visit during work is represented by a supply variable, and whether or not it is a time zone with a high possibility of working is represented by a context variable.

ここで、ｓｕｐｐｌｙ変数群の構成変数について説明する。ｓｕｐｐｌｙ変数群は、その場所が一般に何をする場所であるかを表す。上述のように、例えば余暇シーン、ビジネスシーン、家庭生活のシーンを広告接触シーンとして想定し、「購買の機会」「娯楽の機会」「ビジネスの機会」「居住の機会」の多さを表現する以下の４種類の指数を用いてｓｕｐｐｌｙ変数群を表現してもよい。 Here, the constituent variables of the supply variable group will be described. The supply variable group represents what the place is typically doing. As described above, for example, a leisure scene, a business scene, and a family life scene are assumed as advertisement contact scenes, and a large number of "purchase opportunities", "entertainment opportunities", "business opportunities", and "residential opportunities" are expressed. The supply variable group may be expressed using the following four types of indices.

ｓｕｐｐｌｙ変数群 ← ｛購買指数、娯楽指数、ビジネス指数、居住指数｝
以下、各指数を求めるにあたり、必要な指標の例を示す。指数の算出にあたっては、購買指数、娯楽指数、ビジネス指数、居住指数のうち用意できるだけのものを用意し、変数選択部５０１により最適なものを数個選出する。購買指数は、その場所における小売業年間売上高、流通小売チェーンの出店数、商業用地の平均地価などを表す。娯楽指数は、その場所における飲食チェーンの出店数、遊園地等大規模娯楽施設の有無などを表す。ビジネス指数は、その場所における従業者数、大企業事業所数、１事業所あたりの従業者数などを表す。居住指数は、その場所における世帯数、１世帯あたりの平均人員、総宅地面積などを表す。これらの指標は、市販されているＤＢ、ＡＰＩ（Application Program Interface）サービス、行政発行の統計資料、公示地価より入手することが可能である。 supply variable group ← {Purchase index, entertainment index, business index, residence index}
In the following, examples of necessary indices are shown for obtaining each index. In calculating the index, a purchase index, an entertainment index, a business index, and a residence index that can be prepared are prepared, and the variable selection unit 501 selects several optimal ones. The purchase index represents the annual sales of the retail business at the location, the number of stores in the retail chain, the average land price of commercial land, and so on. The entertainment index represents the number of restaurants open at that location, the presence of large-scale entertainment facilities such as an amusement park, and the like. The business index represents the number of employees at the location, the number of large business establishments, the number of employees per establishment, and the like. The residence index represents the number of households in the place, the average number of people per household, the total land area, and the like. These indicators can be obtained from commercially available DBs, API (Application Program Interface) services, administrative-issued statistical materials, and published land prices.

次いで、ｃｏｎｔｅｘｔ変数群の構成変数について説明する。ｃｏｎｔｅｘｔ変数群は、ｓｕｐｐｌｙ変数では表現できない、時間によって変化する変数により構成される。上述のように、例えば余暇シーン、ビジネスシーン、家庭生活のシーンを広告接触シーンとして想定し、余暇シーンとビジネスシーンの切り分けに有効な「曜日」や「時間帯」、余暇シーンの中でもどのような余暇シーンなのか（購買なのか娯楽なのかなど）の切り分けに有効な「気象情報」や「催事情報」いとった時間変動のある変数によりｃｏｎｔｅｘｔ変数群を構成してもよい。 Next, configuration variables of the context variable group will be described. The context variable group is composed of variables that change with time, which cannot be expressed by supply variables. As described above, for example, a leisure scene, a business scene, or a family life scene is assumed as an advertisement contact scene, and “day of the week” or “time zone”, which is effective for separating the leisure scene from the business scene, what kind of leisure scene The context variable group may be composed of time-variable variables such as “meteorological information” and “event information” that are effective for determining whether the scene is a leisure scene (such as purchase or entertainment).

ｃｏｎｔｅｘｔ変数群 ← ｛曜日、時間帯、気象情報、催事情報｝
以下、各指数を求めるにあたり、必要な指標の例を示す。指数の算出にあたっては、曜日、時間帯、気象情報、催事情報のうち用意できるだけのものを用意し、変数選択部５０１により最適なものを数個選出する。曜日は、平日・休日などを表す。時間帯は、朝（６：００−１１：００）、昼（１１：００−１７：００）、夜（１７：００−２４：００）などを表す。催事情報は、周辺における催事開催時間帯、終了時間帯、催事のジャンルなどを表す。気象情報は、各時間帯における予想天候と気温を表す。これらの指標うち、曜日や時間帯はユーザ属性の観測時刻より算出し、その他は気象情報提供サービスや、各催事主催団体や催事会場の催事情報を参照することにより取得することが可能である。 Context variable group ← {Day of the week, time zone, weather information, event information}
In the following, examples of necessary indices are shown for obtaining each index. In calculating the index, the day of the week, the time zone, the weather information, and the event information that can be prepared are prepared, and the variable selection unit 501 selects several optimal ones. The day of the week represents a weekday or holiday. The time zone represents morning (6: 00-11: 00: 00), daytime (11: 00-17: 00), night (17: 00-24: 00), and the like. The event information represents the event holding time zone, end time zone, event genre, etc. in the vicinity. The weather information represents the expected weather and temperature in each time zone. Among these indicators, the day of the week and the time zone are calculated from the observation time of the user attribute, and the others can be acquired by referring to the weather information providing service, and the event information of each event hosting organization and event venue.

図４は、地理属性項目構造蓄積部４０１に蓄積されるデータの例を示す図である。この図に示すように、ツリー構造を構成する各ノードにはノード番号が付されている。ここでは、ｓｕｐｐｌｙ変数群にはノード番号１、ｃｏｎｔｅｘｔ変数群にはノード番号２、購買指数にはノード番号１１、娯楽指数にはノード番号１２、ビジネス指数にはノード番号１３、居住指数にはノード番号１４、曜日・時間帯にはノード番号２１、気象情報にはノード番号２２、催事情報には２３がそれぞれ付されているものとする。 FIG. 4 is a diagram illustrating an example of data stored in the geographic attribute item structure storage unit 401. As shown in this figure, each node constituting the tree structure is given a node number. Here, node number 1 for the supply variable group, node number 2 for the context variable group, node number 11 for the purchase index, node number 12 for the entertainment index, node number 13 for the business index, and node number for the residence index It is assumed that node number 21, node number 21 for day of the week, and time zone, node number 22 for weather information, and 23 for event information, respectively.

図５は、ユーザ属性−地理項目構造関係蓄積部４０２に蓄積されるデータの例を示す図である。この図に示すように、ユーザ属性−地理項目構造関係蓄積部４０２は、推定対象であるユーザ属性項目とノード番号との対応関係を示す情報を蓄積する。ノード番号は、モデル生成部５００が変数選択に用いる変数の組合せを意味している。すなわち、モデル生成部５００は、地理属性項目構造蓄積部４０１の地理属性項目のツリー構造を参照し、ノード番号で指定されたノードの地理属性項目を変数選択の対象とする。 FIG. 5 is a diagram illustrating an example of data stored in the user attribute-geographic item structure relationship storage unit 402. As shown in this figure, a user attribute-geographic item structure relationship storage unit 402 stores information indicating a correspondence relationship between a user attribute item to be estimated and a node number. The node number means a combination of variables used by the model generation unit 500 for variable selection. That is, the model generation unit 500 refers to the tree structure of the geographic attribute items in the geographic attribute item structure storage unit 401, and sets the geographic attribute item of the node specified by the node number as a variable selection target.

図６は、指標選択ルール蓄積部４０５に蓄積されるデータの例を示す図である。この図に示すように、指標選択ルール蓄積部４０５は、ユーザ属性項目と、ｓｕｐｐｌｙ，ｃｏｎｔｅｘｔ変数群の計算に必要な指標の組合せと、それら指標の取得先データベースのアドレスとの対応関係を示す情報を蓄積している。指標と取得先との対応は、あらかじめ取得先より提供される指標をもとに決めておく。取得先としては、データとして公開・販売されているものや、ＡＰＩとして提供されているものを利用してもよい。具体的には、国勢調査のデータや、公示地価のデータ、住宅地図ＡＰＩ等を用いることができる。ただし、ｃｏｎｔｅｘｔ変数の曜日、時間帯に関しては、既知ユーザ属性蓄積部５０５よりデータを取得する際に提供されるデータの観測時間帯を用いる。 FIG. 6 is a diagram illustrating an example of data stored in the index selection rule storage unit 405. As shown in this figure, the index selection rule accumulating unit 405 is information indicating a correspondence relationship between a user attribute item, a combination of indices necessary for calculating a supply and context variable group, and an address of an acquisition destination database of these indices. Has accumulated. The correspondence between the index and the acquisition destination is determined in advance based on the index provided by the acquisition destination. As an acquisition destination, you may use what is disclosed and sold as data, or what is provided as an API. Specifically, national census data, public land price data, house map API, or the like can be used. However, regarding the day of the week and the time zone of the context variable, the observation time zone of data provided when data is acquired from the known user attribute storage unit 505 is used.

図７は、既知ユーザ属性蓄積部５０５に蓄積されるデータの例を示す図である。この図に示すように、既知ユーザ属性蓄積部５０５は、場所ＩＤと、場所名と、観測時期（曜日・時間帯）と、そのユーザ属性との対応関係を示す情報を蓄積する。ユーザ属性は、その観測時期にその場所を訪れる人々について、該当ユーザ属性項目の典型的な値を１つ記してもよい。 FIG. 7 is a diagram illustrating an example of data stored in the known user attribute storage unit 505. As shown in this figure, the known user attribute storage unit 505 stores information indicating a correspondence relationship between a place ID, a place name, an observation period (day of the week / time zone), and the user attribute. The user attribute may describe one typical value of the corresponding user attribute item for people who visit the place during the observation period.

図８は、本発明の実施の形態におけるユーザ属性推定装置の動作を示すフローチャートである。以下、このユーザ属性推定装置の動作を図８に従って説明する。 FIG. 8 is a flowchart showing the operation of the user attribute estimation apparatus according to the embodiment of the present invention. Hereinafter, the operation of the user attribute estimation apparatus will be described with reference to FIG.

まず、指標蓄積部３１、地理属性項目構造蓄積部４０１、ユーザ属性−地理項目構造関係蓄積部４０２、指標選択ルール蓄積部４０５、既知ユーザ属性蓄積部５０５にそれぞれ必要な情報を蓄積しておく（Ｓ１）。次いで、モデル生成部５００が後述する手法で学習モデルを生成・蓄積した後（Ｓ２）、入力部６１が推定対象を入力すると（Ｓ３）、入力部６１からの入力に合った学習モデルをモデル選択部６０１がモデル蓄積部５０３から選択する（Ｓ４）。これにより、選択された学習モデルに基づいて推定計算部６０２がユーザ属性を推定すると（Ｓ５）、その推定結果を出力部６２が出力する（Ｓ６）。 First, necessary information is accumulated respectively in the index accumulation unit 31, the geographic attribute item structure accumulation unit 401, the user attribute-geographic item structure relation accumulation unit 402, the index selection rule accumulation unit 405, and the known user attribute accumulation unit 505 ( S1). Next, after the model generation unit 500 generates and accumulates a learning model by a method to be described later (S2), when the input unit 61 inputs an estimation target (S3), a model is selected that matches the input from the input unit 61. The unit 601 selects from the model storage unit 503 (S4). Accordingly, when the estimation calculation unit 602 estimates the user attribute based on the selected learning model (S5), the output unit 62 outputs the estimation result (S6).

図９は、学習モデルを生成するステップ（図８のＳ２）を詳細に示すフローチャートである。以下、図９を用いて、ユーザ属性項目「利用目的」を推定するための学習モデルを生成する場合について説明する。 FIG. 9 is a flowchart showing in detail the step of generating a learning model (S2 in FIG. 8). Hereinafter, a case where a learning model for estimating the user attribute item “purpose of use” is generated will be described with reference to FIG. 9.

まず、モデル生成部５００は、ユーザ属性項目のうち「利用目的」の指定を入力部５１より受けると（Ｓ１１）、「利用目的」の値が記されているデータの場所ＩＤと、その「利用目的」の値と、そのデータの観測時刻を既知ユーザ属性蓄積部５０５から取得する（Ｓ１２）。「利用目的」の値は、「仕事」「ショッピング」等の離散値（ラベル）で表現してもよいし、全体のうち仕事の目的で利用する人の割合を「仕事度」と定義するなどして連続量で表現してもよい。 First, when the model generation unit 500 receives designation of “use purpose” from the user attribute items from the input unit 51 (S11), the model generation unit 500 stores the location ID of the data in which the value of “use purpose” is written and the “use” The value of “purpose” and the observation time of the data are acquired from the known user attribute storage unit 505 (S12). The value of “use purpose” may be expressed as a discrete value (label) such as “work” or “shopping”, or the ratio of people who use for the purpose of work is defined as “work level”. Then, it may be expressed as a continuous amount.

次いで、モデル生成部５００は、場所ＩＤ−位置変換部５０４に場所ＩＤを渡し、場所ＩＤ−位置変換部５０４から位置情報を取得する（Ｓ１３）。位置情報は、住所や緯度経度など、指標蓄積部３１の各種指標に対応する系に合わせる。 Next, the model generation unit 500 passes the location ID to the location ID-position conversion unit 504, and acquires location information from the location ID-position conversion unit 504 (S13). The position information is matched to a system corresponding to various indexes of the index storage unit 31, such as an address and latitude / longitude.

次いで、変数選択部５０１は、ユーザ属性−地理項目構造関係蓄積部４０２を参照し、地理属性項目ツリー構造のうち、「利用目的」を推定する場合のノード番号“１１−１４，２１，２３”を取得する（Ｓ１４）。「利用目的」を推定するにあたり、学習モデルに使用する変数の候補となる地理属性項目は、このノード番号で指定された部分の地理属性項目である。 Next, the variable selection unit 501 refers to the user attribute-geographic item structure relation accumulation unit 402, and in the geographic attribute item tree structure, the node number “11-14, 21, 23” in the case of estimating “use purpose” Is acquired (S14). In estimating the “purpose of use”, the geographic attribute items that are candidates for variables used in the learning model are the geographic attribute items of the portion specified by the node number.

次いで、変数選択部５０１は、地理属性項目構造蓄積部４０１を参照し、地理属性項目ツリー構造のノード番号１１−１４，２１，２３にあたるノードから「小売業年間売上高」「飲食チェーンの出店数」などの地理属性項目を変数候補として選択する（Ｓ１５）。そして、ステップＳ１５で選択した地理属性項目名およびステップＳ１３で取得した位置情報を指標選択部４０４に通知する（Ｓ１６）。 Next, the variable selection unit 501 refers to the geographic attribute item structure storage unit 401 and determines from the nodes corresponding to the node numbers 11-14, 21 and 23 of the geographic attribute item tree structure to “Retail annual sales” and “Number of restaurants open. ”Or the like is selected as a variable candidate (S15). Then, the index selection unit 404 is notified of the geographic attribute item name selected in step S15 and the position information acquired in step S13 (S16).

次いで、指標選択部４０４は、指標選択ルール蓄積部４０５を参照し、各地理属性項目の取得先と位置情報を利用して指標蓄積部３１の各種ＤＢより必要な指標を選択し、変数選択部５０１に渡す（Ｓ１７）。ただし、ノード２１については、ステップＳ１２で既に取得している各データの観測時刻から曜日と時間帯を算出する。 Next, the index selection unit 404 refers to the index selection rule accumulation unit 405, selects the necessary index from various DBs of the index accumulation unit 31 using the acquisition destination and position information of each geographic attribute item, and the variable selection unit It is passed to 501 (S17). However, for the node 21, the day of the week and the time zone are calculated from the observation time of each data already acquired in step S12.

次いで、変数選択部５０１は、上記で得られた各場所ごとの地理属性項目の値をアルゴリズム選択部５０２に通知する（Ｓ１８）。 Next, the variable selection unit 501 notifies the algorithm selection unit 502 of the value of the geographical attribute item for each location obtained above (S18).

次いで、アルゴリズム選択部５０２は、ステップＳ１８で得た地理属性項目の一部と、既知ユーザ属性蓄積部５０５から取得しておいたユーザ属性項目「利用目的」の値を利用して幾つかのアルゴリズムを試し、もっとも推定精度のよいアルゴリズムを選択する（Ｓ１９）。あるいは、既に同種のデータを用いてモデル生成を行った履歴がある場合は、そのアルゴリズムと同様のものを始めから用いてもよい。一方、履歴がない場合は、「利用目的」の値がラベルデータのような離散値であれば、サポートベクターマシン等の識別アルゴリズムを試してもよいし、「利用目的」の値が連続値であれば、非線形回帰モデルなどの多変量解析手法の中から精度のよいモデルを求めてもよい。多変量解析手法としては、公知の様々な手法を採用することができる（水野欽司（著），「多変量データ解析講義」，朝倉書店）。 Next, the algorithm selection unit 502 uses the part of the geographic attribute item obtained in step S18 and the value of the user attribute item “purpose of use” acquired from the known user attribute storage unit 505 to perform several algorithms. And the algorithm with the best estimation accuracy is selected (S19). Alternatively, if there is a history of model generation using the same kind of data, the same algorithm as that may be used from the beginning. On the other hand, when there is no history, if the value of “use purpose” is a discrete value such as label data, an identification algorithm such as a support vector machine may be tried, or the value of “use purpose” may be a continuous value. If there is, a highly accurate model may be obtained from multivariate analysis methods such as a non-linear regression model. Various known methods can be employed as the multivariate analysis method (Meiji Mizuno (Author), “Multivariate Data Analysis Lecture”, Asakura Shoten).

次いで、変数選択部５０１は、ステップＳ１９の結果を受けて地理属性項目構造蓄積部４０１を参照し、ノード番号１１に該当する地理属性項目のうち１つもしくは数個を選択し、残りのノードについても同様に変数を選択する（Ｓ２０）。そして、ステップＳ１９で求めたアルゴリズム／モデルを用いて、既知ユーザ属性蓄積部５０５に蓄積されている既知のユーザ属性データのうち一部をテストデータ、残りを訓練データとし、訓練データより学習モデルを作成するとともにテストデータで該当学習モデルの性能を評価する。 Next, the variable selection unit 501 receives the result of step S19, refers to the geographic attribute item structure storage unit 401, selects one or several of the geographic attribute items corresponding to the node number 11, and determines the remaining nodes. Similarly, a variable is selected (S20). Then, using the algorithm / model obtained in step S19, a part of the known user attribute data stored in the known user attribute storage unit 505 is set as test data, the rest is set as training data, and a learning model is determined from the training data. Create and evaluate the performance of the corresponding learning model with test data.

次いで、変数選択部５０１は、各ノードから選択する地理属性項目の組合せを変えてステップＳ１７〜Ｓ２０を繰り返し、性能が最もよい指標の組合せを決定する（Ｓ２１）。計算量に余裕があれば全組合せを試し、もっとも精度のよい組み合わせを求めてもよい。この結果、例えば、ノード番号１１の購買指数には「小売店の数」を、ノード番号１２の娯楽指数には「大規模娯楽施設の有無」を、ノード番号１３のビジネス指数には「従業者数」を、ノード番号１４の居住指数には「世帯数」を・・・、といったように指標の組合せを決定することができる。このように、売店数、大規模娯楽施設の有無、従業者数、世帯数、・・・という６変数をステップＳ１９で求めたアルゴリズム／モデルに適用して学習モデルとする（Ｓ２２）。 Next, the variable selection unit 501 repeats Steps S17 to S20 by changing the combination of geographic attribute items selected from each node, and determines the combination of indices having the best performance (S21). If there is a margin in the calculation amount, all combinations may be tried to obtain the most accurate combination. As a result, for example, the number of retail stores is used for the purchase index of node number 11, “there is a large-scale entertainment facility” for the entertainment index of node number 12, and “employee” is the business index of node number 13. The combination of indicators can be determined such as “number”, “number of households” for the residence index of the node number 14, and so on. As described above, the learning variable is obtained by applying the six variables such as the number of shops, the presence / absence of large-scale entertainment facilities, the number of employees, the number of households,... To the algorithm / model obtained in step S19 (S22).

上記の学習モデルを求めるステップを全ユーザ属性項目について行い（Ｓ２３）、ユーザ属性項目ごとに作成された学習モデルをモデル蓄積部５０３に蓄積する（Ｓ２４）。 The step of obtaining the learning model is performed for all user attribute items (S23), and the learning model created for each user attribute item is accumulated in the model accumulation unit 503 (S24).

以上のように、ユーザ属性項目との関係より地理属性項目のツリー構造を規定することで、全ての地理属性項目の組合せを調べることなく、多数の地理属性項目の中から効率よく最適な少数の組合せを発見することができる。 As described above, by defining the tree structure of geographic attribute items based on the relationship with user attribute items, it is possible to efficiently and optimally select a small number of geographic attribute items from among a large number of geographic attribute items without examining combinations of all geographic attribute items. Combinations can be found.

次に、モデル蓄積部５０３に蓄積された学習モデルを用いてユーザ属性未知の場所のユーザ属性を推定する場合の一例について説明する。ここでは、「自由が丘」における利用目的を求めたいとする。この場合は、ユーザ属性が未知のスポット名「自由が丘」とユーザ属性項目「利用目的」が入力部６１に入力される。これにより、未知のユーザ属性をモデル蓄積部５０３から参照し、ユーザ属性項目に合わせてモデルを選択、推定する。その結果、「自由が丘」の「利用目的」は「ショッピング」もしくは「仕事度：２１％」などと推定されることになる。 Next, an example in the case of estimating the user attribute of a place where the user attribute is unknown using the learning model stored in the model storage unit 503 will be described. Here, it is assumed that the purpose of use in “Jiyugaoka” is to be obtained. In this case, the spot name “Jiyugaoka” whose user attribute is unknown and the user attribute item “purpose of use” are input to the input unit 61. Thereby, an unknown user attribute is referred to from the model storage unit 503, and a model is selected and estimated according to the user attribute item. As a result, the “use purpose” of “Jiyugaoka” is estimated as “shopping” or “work rate: 21%”.

以上のように、本発明の実施の形態におけるユーザ属性推定装置によれば、ユーザ属性を効率よく取得することができる。すなわち、一部の場所についてユーザ属性を取得することができれば、ユーザ属性よりも低コストで取得することができる地理属性というデータよりユーザ属性が推定される。その結果、全ての場所についてユーザ属性を調査などする必要がなくなり、コストの削減を期待することができる。 As described above, according to the user attribute estimation device in the embodiment of the present invention, user attributes can be acquired efficiently. That is, if the user attribute can be acquired for a part of the location, the user attribute is estimated from the data called the geographic attribute that can be acquired at a lower cost than the user attribute. As a result, it is not necessary to investigate user attributes for all locations, and cost reduction can be expected.

より具体的に説明すると、地理属性とユーザ属性の関係に基づいた地理属性項目のツリー構造を作成し、ユーザ属性項目とそれに関係するツリー部分を指定し、学習モデルに用いる地理属性項目の組合せを取捨選択することで、既知のユーザ属性データの数が少なく地理属性項目の数が多い場合にも、ユーザ属性の推定に効果のある地理属性項目を効率よく選択して学習モデルを生成することができる。その結果、多くの場所についてユーザ属性を取得することができ、設置場所に最適化された広告配信を低コストに実現することが可能である。 More specifically, create a tree structure of geographic attribute items based on the relationship between geographic attributes and user attributes, specify the user attribute items and related tree parts, and specify the combination of geographic attribute items used in the learning model. By selecting, even when the number of known user attribute data is small and the number of geographic attribute items is large, it is possible to generate a learning model by efficiently selecting geographic attribute items that are effective in estimating user attributes. it can. As a result, user attributes can be acquired for many places, and advertisement distribution optimized for the place of installation can be realized at low cost.

なお、本発明は、ユーザ属性推定装置として実現することができるだけでなく、このようなユーザ属性推定装置が備える特徴的な処理部をステップとするユーザ属性推定方法として実現したり、それらのステップをコンピュータに実行させるプログラムとして実現したりすることもできる。このようなプログラムは、ＣＤ−ＲＯＭ等の記録媒体やインターネット等の伝送媒体を介して配信することができるのはいうまでもない。 Note that the present invention can be realized not only as a user attribute estimation device, but also as a user attribute estimation method using a characteristic processing unit included in such a user attribute estimation device as a step. It can also be realized as a program executed by a computer. It goes without saying that such a program can be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.

また、上記の説明では特に言及しなかったが、本発明には、公知の機械学習手法や特徴選択手法を採用することができる。例えば、公知の機械学習手法としては、「N. Cristianini and J. Shawe-Taylor 著，大北剛訳，"サポートベクターマシン入門", pp.128-149, 共立出版, 東京, 2005.」や「柳井啓司, "サポートベクターマシン関連ツール", 映像情報メディア学会誌, Vol.63, No.12,」に開示されている手法を採用することができる。また、公知の特徴選択手法としては、「Guyon, I. and Eliseeff, A. “An introduction to variable and feature selection,” Journal of Machine Learning Research, 3, pp1157-1182., 2003.」に開示されている手法を採用することができる。 Although not particularly mentioned in the above description, a known machine learning method or feature selection method can be adopted in the present invention. For example, well-known machine learning techniques include “N. Cristianini and J. Shawe-Taylor, Takeshi Ohkita,“ Introduction to Support Vector Machine ”, pp.128-149, Kyoritsu Publishing, Tokyo, 2005.” and “ The method disclosed in Keiji Yanai, "Support Vector Machine Related Tools", The Journal of the Institute of Image Information and Television Engineers, Vol. 63, No. 12, can be used. Known feature selection methods are disclosed in “Guyon, I. and Eliseeff, A.“ An introduction to variable and feature selection, ”Journal of Machine Learning Research, 3, pp1157-1182., 2003.” Can be used.

５０５…既知ユーザ属性蓄積部
５００…モデル生成部
６００…ユーザ属性推定部
４０１…地理属性項目構造蓄積部
４０２…ユーザ属性−地理項目構造関係蓄積部 505: Known user attribute storage unit 500 ... Model generation unit 600 ... User attribute estimation unit 401 ... Geographic attribute item structure storage unit 402 ... User attribute-geographic item structure relation storage unit

Claims

A user attribute estimation device for estimating a user attribute,
A known user attribute storage unit for storing known user attributes;
A model generation unit that learns a relationship between a geographical attribute and a user attribute based on a known user attribute stored in the known user attribute storage unit and generates a learning model;
A user attribute estimation unit that estimates an unknown user attribute from a geographic attribute based on the learning model generated by the model generation unit;
A user attribute estimation device comprising:

A geographic attribute item structure storage unit that stores the geographical attribute items in a tree structure from the viewpoint of the relevance to the user attributes and stores the reference numbers of the respective nodes constituting the tree structure;
A user attribute-geographic item structure relationship accumulation unit that accumulates information indicating a correspondence relationship between the user attribute item to be estimated and the reference number of the node
The model generation unit acquires a reference number of the node corresponding to the user attribute item to be estimated with reference to the user attribute-geographic item structure relation storage unit, and based on the acquired reference number of the node The user attribute estimation apparatus according to claim 1, wherein a variable used in the learning model is selected by selecting a geographic attribute item stored in the geographic attribute item structure storage unit.

The user attribute estimation apparatus according to claim 1 or 2, wherein the user attribute depends on a supply variable group indicating what the place is generally a place to perform and a context variable group that changes with time. .

A user attribute estimation method for estimating user attributes, comprising:
A known user attribute accumulation step of accumulating known user attributes in a known user attribute accumulation unit;
A model generation step of generating a learning model by learning a relation between a geographical attribute and a user attribute based on a known user attribute stored in the known user attribute storage unit;
A user attribute estimation step for estimating an unknown user attribute from a geographic attribute based on the learning model generated in the model generation step;
A user attribute estimation method comprising:

Geographic attribute item structure accumulation step for accumulating geographic attribute items in a tree structure from the viewpoint of relevance with user attributes and accumulating reference numbers of respective nodes constituting the tree structure in the geographic attribute item structure accumulation unit;
A user attribute-geographic item structure relation accumulation step for accumulating information indicating the correspondence between the user attribute item to be estimated and the reference number of the node in the user attribute-geographic item structure relation accumulation unit
In the model generation step, the node reference number corresponding to the user attribute item to be estimated is acquired with reference to the user attribute-geographic item structure relation storage unit, and the reference number of the node is acquired based on the acquired reference number of the node. 5. The user attribute estimation method according to claim 4, wherein a variable used in the learning model is selected by selecting a geographic attribute item stored in the geographic attribute item structure storage unit.

6. The user attribute estimation method according to claim 4 or 5, wherein the user attribute depends on a supply variable group that represents what the place generally does, and a context variable group that changes with time. .

A program for estimating user attributes,
A known user attribute accumulation step of accumulating known user attributes in a known user attribute accumulation unit;
A model generation step of generating a learning model by learning a relation between a geographical attribute and a user attribute based on a known user attribute stored in the known user attribute storage unit;
A user attribute estimation step for estimating an unknown user attribute from a geographic attribute based on the learning model generated in the model generation step;
A program that causes a computer to execute.