JP2009238193A

JP2009238193A - Circulation prediction system, method and program, and influence degree estimation system, method and program

Info

Publication number: JP2009238193A
Application number: JP2008101872A
Authority: JP
Inventors: Yukiko Kuroiwa; 由希子黒岩
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2008-03-07
Filing date: 2008-04-09
Publication date: 2009-10-15
Anticipated expiration: 2028-04-09
Also published as: JP5104496B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a circulation prediction system capable of predicting a time when a customer starts to use an article or service for each customer. <P>SOLUTION: A customer DB 2 stores a set of individual data for each customer including a use starting period of the article etc. or a fact that the article etc. is not yet used, and items indicating attributes of customers. A test data generation part 51 generates test data including individual data about customers who do not start use at a present time using the customer DB 2. A learning data generation part 52 attaches a first label to the individual data about the customers who start the use of the article etc. at the present time, attaches a second label to the individual data about the customers who do not start the use of the article etc. at the present time, generates learning data, and a classifier generation part 53 generates a classifier from the learning data. A test data label determination part 54 determines labels to each piece of individual data by collating the test data with the classifier. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、商品またはサービスの普及に関する予測を行う普及予測システム、普及予測方法および普及予測プログラムと、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いを推定する影響度推定システム、影響度推定方法および影響度推定プログラムに関する。 The present invention relates to a spread prediction system, a spread prediction method, and a spread prediction program that make predictions regarding the spread of a product or service, and a customer of the product or service in a certain period causes other people to use the product or service. The present invention relates to an influence degree estimation system, an influence degree estimation method, and an influence degree estimation program for estimating a degree.

顧客に商品またはサービスを提供する場合、商品またはサービスの費用対効果や資源を調達すべき時期や量を推定するため、普及の推移を予測することが必要である。ここで、商品とは、製品などの完成品のみならず、製品の機能を維持するための最小単位も含む。商品やサービスの普及の推移は、時間に対する普及率（または販売数）で示すことができる。 When providing goods or services to customers, it is necessary to predict the transition of diffusion in order to estimate the cost-effectiveness of goods or services and the timing and amount of resources to be procured. Here, the product includes not only a finished product such as a product but also a minimum unit for maintaining the function of the product. The transition of the spread of products and services can be shown by the spread rate (or the number of sales) with respect to time.

このような普及予測に適用可能な予測装置の一例が、特許文献１に記載されている。特許文献１に記載された予測装置は、入力部と、事象内容記憶部と、累積件数集計部と、予測条件記憶部と、回帰分析部と、分布推定部と、出力部とを備えている。入力部は、注目する現象で予測対象となる事象が発生した期日やその発生件数などの過去の履歴、およびその予測対象となる事象において予測すべき事象の上限や下限などの値の範囲や予測時点などに代表される予測条件を入力する入力手段である。回帰分析部は、ロジスティック曲線モデル（あるいは別の曲線モデル）を用いて、実績となる普及の推移に当てはまるように注目する現象を記述するパラメタを推定するとともに、推定したパラメタに基づいて、普及の推移である予測曲線を得る。 An example of a prediction device applicable to such a spread prediction is described in Patent Document 1. The prediction device described in Patent Literature 1 includes an input unit, an event content storage unit, a cumulative number counting unit, a prediction condition storage unit, a regression analysis unit, a distribution estimation unit, and an output unit. . The input section includes the past history such as the date and number of occurrences of the event to be predicted for the phenomenon of interest, and the range and prediction of values such as the upper and lower limits of the event to be predicted in the event to be predicted This is an input means for inputting a prediction condition represented by a point in time. The regression analysis unit uses a logistic curve model (or another curve model) to estimate the parameters describing the phenomenon to be noted so as to apply to the transition of the actual results, and based on the estimated parameters, Obtain a prediction curve that is a transition.

また、特許文献２には、決定木学習用データを生成する時系列データ分類・予測装置が記載されている。特許文献２に記載の時系列データ分類・予測装置は、複数の時刻の情報を含む決定木学習用データを生成する。 Patent Document 2 describes a time-series data classification / prediction device that generates decision tree learning data. The time-series data classification / prediction device described in Patent Document 2 generates decision tree learning data including information on a plurality of times.

また、特許文献３には、顧客属性データをマスタ・データとする分析装置が記載されている。 Further, Patent Document 3 describes an analysis apparatus using customer attribute data as master data.

特開２００４−７８７８０号公報JP 2004-78780 A 特開平６−９６０５２号公報JP-A-6-96052 特開２００１−２２９１５０号公報JP 2001-229150 A

顧客が商品やサービスを利用し始める時期は、顧客の特徴によって異なる。例えば、顧客は、イノベータ、アーリーアダプタ、アーリーマジョリティ等のカテゴリに分類することができる。 The time when a customer starts using a product or service depends on the customer's characteristics. For example, customers can be categorized into categories such as innovators, early adapters, early majority.

上記のように、顧客が商品やサービスを利用し始める時期は顧客の特徴によって異なるので、個々の顧客毎に商品やサービスの利用開始時期を予測しようとしても、ロジスティック曲線等の曲線モデルを用いてパラメータフィッティングを行う予測手法では、個々の顧客の特徴が反映されず、顧客毎の商品やサービスの利用開始時期を予測することはできなかった。 As described above, the time when a customer starts using a product or service varies depending on the characteristics of the customer, so even when trying to predict the start of use of a product or service for each individual customer, a logistic model such as a logistic curve is used. In the prediction method that performs parameter fitting, the characteristics of individual customers are not reflected, and it is impossible to predict the use start time of products and services for each customer.

また、商品やサービスの提供者にとって、既に商品やサービスを利用している者が他の者に商品やサービスの利用を喚起させる度合いを知ることは好ましい。例えば、広告活動や販売促進活動などにより、いわゆる口コミが活発化し、既に商品やサービスの利用を開始している者からの影響によりどの程度、未利用者の利用開始意欲が喚起されたかを知ることができることが好ましい。ここで、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いを影響度と呼ぶ。影響度は、ある期間において、商品やサービスを利用していなかった者が既に利用を開始した者からの影響でその商品を受け入れる受容性の程度と言うこともできる。 In addition, it is preferable for a provider of a product or service to know the degree to which a person who already uses the product or service urges others to use the product or service. For example, knowing how much the so-called word-of-mouth has been activated by advertising activities and sales promotion activities, and the degree of motivation for non-users to start using the products and services. It is preferable that Here, the degree to which a customer of a product or service urges others to use the product or service within a certain period is called an influence level. The degree of influence can also be said to be the degree of acceptability for accepting a product due to the influence from a person who has not started using the product or service in a certain period.

そこで、本発明は、顧客が商品やサービスを利用し始める時期を顧客毎に予測することができる普及予測システム、普及予測方法および普及予測プログラムを提供することを目的とする。また、本発明は、時期に応じて商品やサービスの影響度を推定することができる影響度推定システム、影響度推定方法および影響度推定プログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide a spread prediction system, a spread prediction method, and a spread prediction program capable of predicting for each customer when a customer starts using a product or service. It is another object of the present invention to provide an influence degree estimation system, an influence degree estimation method, and an influence degree estimation program that can estimate the influence degree of a product or service according to time.

本発明の普及予測システムは、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースと、前記顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成するテストデータ生成部と、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いである影響度に応じて変動させたデータである学習データを生成する学習データ生成部と、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を、前記学習データに基づいて生成する分類器生成部と、前記分類器と前記テストデータ内の各個別データの項目とから、前記テストデータ内の各個別データに対するラベルを判定するテストデータラベル判定部とを備えることを特徴とする。 The spread prediction system of the present invention indicates that the customer or service starts when the product or service is started, and indicates that the product or service is not used when the product or service is not used. Using a customer database that stores customer data that is a set of individual data for each customer including use start information and one or more items representing customer attributes other than the use start information, and determination using the customer database A test data generation unit that generates test data that includes individual data of customers who have not started using the product or service at the current time that defines the target time; and the use of the product or service is started at the current time Label each customer's individual data with a first label, and second customer individual data that has not started using the goods or services. The number of individual data labeled with the first label and the first label varies depending on the degree of influence, which is the degree to which the customer of the product or service will encourage others to use the product or service within a certain period of time Classification that is a rule for determining which of the first label and the second label to label the individual data of the customer from the item representing the attribute of the customer, and the learning data generation unit that generates the learning data that is the processed data A test data label for determining a label for each individual data in the test data from a classifier generating unit that generates a classifier based on the learning data, and each individual data item in the classifier and the test data And a determination unit.

また、本発明の影響度推定システムは、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースと、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いである影響度の推定対象時刻として用いられる現時刻と、前記現時刻の一定時間前を指定するための時刻間隔と、影響度の候補である複数の仮影響度とが入力され、前記現時刻から前記時刻間隔前の時刻である前時刻を計算し、前記顧客データベースを用いて、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けた現時刻データを生成する現時刻データ生成部と、個々の仮影響度毎に、前時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を前記仮影響度に応じて変動させたデータである前時刻データを生成する前時刻データ群生成部と、個々の前時刻データ毎に、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を前記前時刻に基づいて生成する前時刻分類器群生成部と、前時刻データ毎に生成された個々の分類器毎に、当該分類器と前記現時刻データ内の各個別データの項目とから前記現時刻データ内の個別データにラベル付けられるラベルを予測し、予測結果と前記現時刻データとの誤差を算出する誤差群算出部と、個々の分類器毎に算出された誤差のうち、最小の誤差を特定し、最小の誤差に対応する仮影響度が１つである場合には、当該仮影響度を影響度として定め、最小の誤差に対応する仮影響度が複数個存在する場合には、前記複数の仮影響度に基づいて影響度を定める影響度算出部とを備えることを特徴とする。 The influence estimation system of the present invention represents the use start time of the product or service when the customer starts using the product or service, and is not used when the product or service is not used. A customer database that stores customer data, which is a set of individual data for each customer, including usage start information representing the effect and one or more items representing customer attributes other than the usage start information; A current time used as an estimation target time of the degree of influence, which is a degree that a customer of the product or service urges others to use the product or service, and a time interval for designating a predetermined time before the current time A plurality of provisional influence degrees that are candidates for influence degree are input, a previous time that is a time before the time interval from the current time is calculated, and the customer database is used to The first label is labeled on the individual data of the customer who has started using the product or service at the current time, and the second label is labeled on the individual data of the customer who has not started using the product or service. A current time data generating unit that generates time data, and for each temporary influence degree, the first label is labeled on individual data of a customer who has started using the product or service at the previous time, and the product or service Before generating the previous time data, which is data obtained by labeling the individual data of customers who have not started using the second label with the second label, and changing the number of individual data labeled with the first label according to the temporary influence degree For each time data group generation unit and each previous time data, the customer's individual data is labeled with either the first label or the second label from the item representing the customer's attribute. A time classifier group generating unit that generates a classifier that is a rule for determining the time based on the previous time, and for each individual classifier generated for each previous time data, the classifier and the current time data For each individual data item, an error group calculation unit that predicts a label to be labeled on the individual data in the current time data, calculates an error between the prediction result and the current time data, and for each classifier Among the calculated errors, the minimum error is identified, and when there is one temporary influence level corresponding to the minimum error, the temporary influence level is determined as the influence level, and the temporary influence corresponding to the minimum error is determined. When there are a plurality of degrees, an influence degree calculation unit that determines the influence degree based on the plurality of provisional influence degrees is provided.

また、本発明の普及予測システムは、顧客が商品またはサービスを利用し始めている場合には商品またはサービスの利用開始時期を表し、商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースと、顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成するテストデータ生成部と、現時刻で商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けたラベル付けデータを生成し、ラベル付けデータ中の第１ラベルでラベル付けられた個別データ数の割合を商品またはサービスの普及率として計算するラベル付けデータ生成部と、顧客の属性を表す項目から顧客の個別データに第１ラベルがラベル付けられる確からしさのスコアを定めるルールである分類器を、ラベル付けデータに基づいて生成する分類器生成部と、普及率に対するスコアの閾値の関数である閾値関数にラベル付けデータ生成部が計算した普及率を代入して閾値を算出し、分類器とテストデータ内の各個別データの項目とから、テストデータ内の各個別データに第１ラベルがラベル付けられる確からしさのスコアを定め、テストデータ内の個別データのうちスコアが閾値以上である個別データに対するラベルが第１ラベルであると判定し、スコアが閾値未満の個別データに対するラベルが第２ラベルであると判定するテストデータラベル判定部とを備えることを特徴とする。 Further, the spread prediction system of the present invention indicates that the customer or service starts when the customer starts using the product or service, and indicates that the product or service is unused when the product or service is not used. Using a customer database that stores customer data that is a set of individual data for each customer including usage start information and one or more items representing customer attributes other than the usage start information, and a determination target time using the customer database A test data generation unit that generates test data that includes individual data of customers who have not started using products or services at the current time, and customers who have started using products or services at the current time Labels with the first label on the individual data and labels with the second label on the individual data of customers who have not started using goods or services A labeling data generation unit that generates a labeling data and calculates a ratio of the number of individual data labeled with the first label in the labeling data as a penetration rate of goods or services, and an item representing customer attributes from the customer A classifier that generates a classifier, which is a rule that determines the probability score that the first label is labeled on the individual data, and a threshold function that is a function of a threshold value of the score with respect to the penetration rate The threshold is calculated by substituting the penetration rate calculated by the labeling data generation unit into the first label for each individual data in the test data from the classifier and each individual data item in the test data. A probability score is defined, and the label for the individual data whose score is equal to or greater than the threshold among the individual data in the test data is the first label Determined, characterized in that the labels for the individual data below score threshold and a determining test data label determining unit that the second label.

また、本発明の普及予測方法は、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成し、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いである影響度に応じて変動させたデータである学習データを生成し、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を、前記学習データに基づいて生成し、前記分類器と前記テストデータ内の各個別データの項目とから、前記テストデータ内の各個別データに対するラベルを判定することを特徴とする。 Further, the spread prediction method of the present invention indicates that the use start time of the product or service is indicated when a customer starts using the product or service, and is not used when the product or service is not used. And using a customer database that stores customer data that is a set of individual data for each customer including use start information that represents and one or more items that represent customer attributes other than the use start information. Test data that includes individual data of customers who have not started using the product or service at the specified current time is generated, and the individual data of customers who have started using the product or service at the current time Label one label, label the second label on individual data of customers who have not started using the goods or services, and label the first label Generating learning data that is data obtained by changing the number of different data according to the degree of influence that is a degree that a customer of a product or service urges others to use the product or service within a certain period, Based on the learning data, a classifier that is a rule for determining which one of the first label and the second label is to be labeled on the individual data of the customer from the item representing the attribute of the customer, the classifier and the A label for each individual data in the test data is determined from an item of each individual data in the test data.

また、本発明の普及予測方法は、顧客が商品またはサービスを利用し始めている場合には商品またはサービスの利用開始時期を表し、商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成し、現時刻で商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けたラベル付けデータを生成し、ラベル付けデータ中の第１ラベルでラベル付けられた個別データ数の割合を商品またはサービスの普及率として計算し、顧客の属性を表す項目から顧客の個別データに第１ラベルがラベル付けられる確からしさのスコアを定めるルールである分類器を、ラベル付けデータに基づいて生成し、普及率に対するスコアの閾値の関数である閾値関数に計算した普及率を代入して閾値を算出し、分類器とテストデータ内の各個別データの項目とから、テストデータ内の各個別データに第１ラベルがラベル付けられる確からしさのスコアを定め、テストデータ内の個別データのうちスコアが閾値以上である個別データに対するラベルが第１ラベルであると判定し、スコアが閾値未満の個別データに対するラベルが第２ラベルであると判定することを特徴とする。 Further, the spread prediction method of the present invention indicates that the customer or service starts when the customer starts using the product or service, and indicates that the product or service is unused when the product or service is not used. Current time for determining the determination target time using a customer database that stores customer data that is a collection of individual data for each customer including use start information and one or more items representing customer attributes other than use start information Generate test data that includes individual data of customers who have not started using products or services in, and label the first data to individual data of customers who have started using products or services at the current time , Generate the labeling data by labeling the second label on the individual data of customers who have not started using the goods or services. This is a rule that calculates the ratio of the number of individual data labeled with a label as the penetration rate of goods or services, and determines the probability score that the first label is labeled on the customer's individual data from the item representing the customer's attribute. A classifier is generated based on the labeling data, and the threshold value is calculated by substituting the penetration rate calculated in the threshold function, which is a function of the score threshold with respect to the penetration rate, for each individual data in the classifier and test data. The probability score that the first label is labeled to each individual data in the test data is determined from the item, and the label for the individual data in which the score is equal to or greater than the threshold among the individual data in the test data is the first label And determining that the label for the individual data having a score lower than the threshold is the second label.

また、本発明の普及予測プログラムは、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースを備えるコンピュータに搭載される普及予測プログラムであって、前記コンピュータに、前記顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成するテストデータ生成処理、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いである影響度に応じて変動させたデータである学習データを生成する学習データ生成処理、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を、前記学習データに基づいて生成する分類器生成処理、および、前記分類器と前記テストデータ内の各個別データの項目とから、前記テストデータ内の各個別データに対するラベルを判定するテストデータラベル判定処理を実行させることを特徴とする。 Further, the spread prediction program of the present invention represents the use start time of the product or service when a customer starts using the product or service, and is unused when the product or service is not used. Popularization installed in a computer having a customer database that stores customer data that is a set of individual data for each customer including use start information that represents customer attributes and one or more items that represent customer attributes other than the use start information A test program for generating test data in the computer using the customer database, the test data being data including individual data of customers who have not started using goods or services at the current time for determining the determination target time Data generation processing, the first data is added to the individual data of customers who have started using the goods or services at the current time. The second label to the individual data of customers who have not started using the product or service, and the number of individual data labeled with the first label is the customer of the product or service within a certain period. Learning data generation processing for generating learning data, which is data that is varied according to the degree of influence that is the degree to which the other person is encouraged to use the product or service, and the individual of the customer from the item representing the attribute of the customer A classifier generating process for generating a classifier, which is a rule for determining whether the data is labeled with a first label or a second label, based on the learning data, and each of the classifier and the test data A test data label determination process for determining a label for each individual data in the test data from the individual data item is performed. To.

また、本発明の普及予測プログラムは、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースを備えるコンピュータに搭載される普及予測プログラムであって、前記コンピュータに、前記顧客データベースを用いて、判定対象時刻を定める現時刻で商品またはサービスの利用を開始していない顧客の個別データを含むデータであるテストデータを生成するテストデータ生成処理、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けたラベル付けデータを生成し、前記ラベル付けデータ中の第１ラベルでラベル付けられた個別データ数の割合を前記商品またはサービスの普及率として計算するラベル付けデータ生成処理、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルがラベル付けられる確からしさのスコアを定めるルールである分類器を、前記ラベル付けデータに基づいて生成する分類器生成処理、普及率に対するスコアの閾値の関数である閾値関数にラベル付けデータ生成処理で計算した普及率を代入して閾値を算出し、前記分類器と前記テストデータ内の各個別データの項目とから、前記テストデータ内の各個別データに第１ラベルがラベル付けられる確からしさのスコアを定め、前記テストデータ内の個別データのうちスコアが前記閾値以上である個別データに対するラベルが第１ラベルであると判定し、スコアが前記閾値未満の個別データに対するラベルが第２ラベルであると判定するテストデータラベル判定処理を実行させることを特徴とする。 Further, the spread prediction program of the present invention represents the use start time of the product or service when a customer starts using the product or service, and is unused when the product or service is not used. Popularization installed in a computer having a customer database that stores customer data that is a set of individual data for each customer including use start information that represents customer attributes and one or more items that represent customer attributes other than the use start information A test program for generating test data in the computer using the customer database, the test data being data including individual data of customers who have not started using goods or services at the current time for determining the determination target time Data generation processing, the first data is added to the individual data of customers who have started using the goods or services at the current time. Labeling data, generating labeling data in which the second label is labeled on individual data of customers who have not started using the goods or services, and are labeled with the first label in the labeling data A labeling data generation process for calculating a ratio of the number of data as the diffusion rate of the product or service, and a rule for determining a probability score that the first label is labeled on the individual data of the customer from the item representing the attribute of the customer A classifier is generated based on the labeling data, and a threshold is calculated by substituting the penetration rate calculated in the labeling data generation processing into a threshold function that is a function of the threshold value of the score with respect to the penetration rate. A first label for each individual data in the test data from the classifier and each individual data item in the test data. A probability score to be labeled is determined, and among the individual data in the test data, it is determined that the label for the individual data whose score is equal to or greater than the threshold is the first label, and the label for the individual data whose score is less than the threshold A test data label determination process for determining that is a second label is executed.

また、本発明の影響度推定方法は、ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合いである影響度の推定対象時刻として用いられる現時刻と、前記現時刻の一定時間前を指定するための時刻間隔と、影響度の候補である複数の仮影響度とが入力され、前記現時刻から前記時刻間隔前の時刻である前時刻を計算し、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データを記憶する顧客データベースを用いて、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けた現時刻データを生成し、個々の仮影響度毎に、前時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を前記仮影響度に応じて変動させたデータである前時刻データを生成し、個々の前時刻データ毎に、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を前記前時刻に基づいて生成し、前時刻データ毎に生成された個々の分類器毎に、当該分類器と前記現時刻データ内の各個別データの項目とから前記現時刻データ内の個別データにラベル付けられるラベルを予測し、予測結果と前記現時刻データとの誤差を算出し、個々の分類器毎に算出された誤差のうち、最小の誤差を特定し、最小の誤差に対応する仮影響度が１つである場合には、当該仮影響度を影響度として定め、最小の誤差に対応する仮影響度が複数個存在する場合には、前記複数の仮影響度に基づいて影響度を定めることを特徴とする。 Further, the influence degree estimation method of the present invention includes a current time used as an influence degree estimation target time, which is a degree that a customer of a product or service urges others to use the product or service in a certain period. A time interval for designating a predetermined time before the current time and a plurality of temporary influence levels that are candidates for the influence level are input, and a previous time that is a time before the time interval is calculated from the current time. When the customer starts to use the product or service, the use start information represents the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use Using a customer database storing customer data that is a set of individual data for each customer including one or more items representing customer attributes other than start information, the product or service at the current time is stored. Generating the current time data by labeling the first label on the individual data of the customer who has started using the service, and labeling the second label on the individual data of the customer who has not started using the product or service, For each temporary influence degree, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service The second label is labeled, the previous time data that is the data in which the number of individual data labeled with the first label is changed according to the temporary influence degree is generated, and the attribute of the customer is generated for each previous time data. Based on the previous time, a classifier that is a rule for determining which of the first label and the second label to label the individual data of the customer from the item representing the For each generated classifier, a label to be labeled on the individual data in the current time data is predicted from the classifier and the individual data items in the current time data, and the prediction result and the current time When the error with the data is calculated, the smallest error among the errors calculated for each classifier is specified, and there is one provisional influence corresponding to the smallest error, the provisional influence degree Is determined as an influence degree, and when there are a plurality of temporary influence degrees corresponding to the minimum error, the influence degree is determined based on the plurality of temporary influence degrees.

また、本発明の影響度推定プログラムは、顧客が商品またはサービスを利用し始めている場合には前記商品またはサービスの利用開始時期を表し、前記商品またはサービスを未利用の場合には未利用である旨を表す利用開始情報と、前記利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である顧客データ記憶する顧客データベースを備えるコンピュータに搭載される影響度推定プログラムであって、影響度推定対象時刻として用いられる現時刻と、前記現時刻の一定時間前を指定するための時刻間隔と、影響度の候補である複数の仮影響度とが入力され、前記現時刻から前記時刻間隔前の時刻である前時刻を計算し、前記顧客データベースを用いて、前記現時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けた現時刻データを生成する現時刻データ生成処理、個々の仮影響度毎に、前時刻で前記商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、前記商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、第１ラベルをラベル付けた個別データ数を前記仮影響度に応じて変動させたデータである前時刻データを生成する前時刻データ群生成処理、個々の前時刻データ毎に、前記顧客の属性を表す項目から当該顧客の個別データに第１ラベルと第２ラベルのいずれをラベル付けるかを判定するルールである分類器を前記前時刻に基づいて生成する前時刻分類器群生成処理、前時刻データ毎に生成された個々の分類器毎に、当該分類器と前記現時刻データ内の各個別データの項目とから前記現時刻データ内の個別データにラベル付けられるラベルを予測し、予測結果と前記現時刻データとの誤差を算出する誤差群算出処理、および、個々の分類器毎に算出された誤差のうち、最小の誤差を特定し、最小の誤差に対応する仮影響度が１つである場合には、当該仮影響度を影響度として定め、最小の誤差に対応する仮影響度が複数個存在する場合には、前記複数の仮影響度に基づいて影響度を定める影響度算出処理を実行させることを特徴とする。 The influence estimation program of the present invention represents the use start time of the product or service when the customer starts using the product or service, and is not used when the product or service is not used. Impact of mounting on a computer having a customer database that stores customer data that is a set of individual data for each customer, including usage start information representing the effect and one or more items representing customer attributes other than the usage start information Is a degree estimation program, in which a current time used as an influence degree estimation target time, a time interval for designating a predetermined time before the current time, and a plurality of temporary influence degrees as influence degree candidates are input. , Calculate the previous time that is the time before the time interval from the current time, and start using the goods or services at the current time using the customer database Current time data generation processing for generating current time data by labeling individual data of a customer who has a first label and labeling a second label to individual data of a customer who has not started using the product or service, For each temporary impact level, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service is The previous time data group generation processing for generating the previous time data which is the data obtained by changing the number of individual data labeled with two labels and the first label according to the temporary influence degree, for each previous time data , A classifier that is a rule for determining which of the customer's individual data is labeled with the first label or the second label from the item representing the customer's attribute at the previous time For each individual classifier generated for each previous time data, for each classifier generated for each previous time data, from the classifier and each individual data item in the current time data, An error group calculation process for calculating an error between a prediction result and the current time data is predicted, and a minimum error is specified among errors calculated for each classifier. When there is one temporary influence degree corresponding to the minimum error, the temporary influence degree is determined as the influence degree. When there are a plurality of temporary influence degrees corresponding to the minimum error, the plurality of temporary influence degrees are determined. An influence degree calculation process for determining the influence degree based on the temporary influence degree is executed.

本発明によれば、顧客が商品やサービスを利用し始める時期を顧客毎に予測することができる。また、本発明によれば、時期に応じて商品やサービスの影響度を推定することができる。 ADVANTAGE OF THE INVENTION According to this invention, the time when a customer starts using goods and service can be estimated for every customer. Moreover, according to this invention, the influence degree of goods or a service can be estimated according to time.

以下、本発明の実施形態を図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

なお、有料で商品やサービスを利用する者だけでなく、無料で商品やサービスを利用する者も顧客と呼ぶ。 Not only those who use products and services for a fee, but also those who use products and services for free are called customers.

また、以下に説明する影響度推定システムおよび普及予測システムは、一つの商品またはサービスに関して影響度推定や普及予測を行う。 In addition, the influence degree estimation system and the spread prediction system described below perform influence degree estimation and spread prediction for one product or service.

実施形態１．
図１は、本発明の影響度推定システムの例を示すブロック図である。本発明の影響度推定システムは、影響度（ある期間の中で商品またはサービスの顧客が他の者にその商品またはサービスの利用を喚起させる度合い）を推定する影響度推定装置３と、顧客データベース（以下、顧客ＤＢと記す）２とを備える。製品などの完成品だけでなく、製品の機能を維持するための最小単位も商品の概念に含まれる。 Embodiment 1. FIG.
FIG. 1 is a block diagram showing an example of an influence estimation system of the present invention. The influence degree estimation system of the present invention includes an influence degree estimation device 3 that estimates the degree of influence (the degree that a customer of a product or service urges others to use the product or service within a certain period), and a customer database. (Hereinafter referred to as customer DB) 2. The concept of products includes not only finished products such as products but also the minimum unit for maintaining the functions of products.

顧客ＤＢ２は、顧客データを記憶する記憶手段であり、顧客データ記憶手段と呼ぶことができる。顧客データは、個々の顧客毎に定められたデータの集合であり、以下、個々の顧客毎のデータを個別データと呼ぶ。顧客ＤＢ２において、各顧客毎の個別データは、利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む。利用開始情報とは、顧客が商品またはサービスを利用し始めている場合には、その商品またはサービスを利用し始めた時期を表し、顧客が商品またはサービスを未利用である（すなわち、まだ利用開始していない）場合には未利用である旨を表す情報である。利用開始情報以外の顧客の属性を表す項目として、例えば、性別、年齢、住所の地域区分、勤務地の地域区分、他の商品の利用履歴、様々な嗜好性などの特徴を用いることができるが、これらに限定されるわけではない。個々の個別情報は、これらの項目を一つ以上含む。顧客の属性を表す項目は、例えば、店頭での店員による入力、商品を利用開始時点での登録、顧客会員カード、アンケートなど、様々な方法で収集することができる。 The customer DB 2 is storage means for storing customer data and can be called customer data storage means. Customer data is a set of data determined for each individual customer. Hereinafter, data for each individual customer is called individual data. In the customer DB 2, the individual data for each customer includes usage start information and one or more items representing customer attributes other than the usage start information. When the customer starts using the product or service, the usage start information indicates the time when the customer started using the product or service, and the customer has not used the product or service (that is, the customer has not started using the product or service yet). If not, it is information indicating that it is not used. As items representing customer attributes other than use start information, for example, characteristics such as gender, age, address area classification, work area area, use history of other products, and various preferences can be used. However, it is not limited to these. Each individual information includes one or more of these items. Items representing customer attributes can be collected by various methods such as, for example, input by a store clerk at a store, registration at the time of starting use of a product, customer membership card, questionnaire, and the like.

影響度推定装置３は、現時刻データ生成部３１と、前時刻データ群生成部３２と、前時刻分類器群生成部３３と、誤差群算出部３４と、影響度算出部３５とを備える。 The influence level estimation device 3 includes a current time data generation unit 31, a previous time data group generation unit 32, a previous time classifier group generation unit 33, an error group calculation unit 34, and an influence level calculation unit 35.

現時刻データ群生成部３１は、現時刻と、その現時刻の一定時間前を指定するための時刻間隔と、複数の仮影響度とが入力される。後述するように、キーボード等の入力装置を介して、現時刻データ群生成部３１に、現時刻、時刻間隔、および複数の仮影響度が入力されてもよい。本実施形態における現時刻とは、影響度の推定対象時刻（影響度推定対象時刻）として用いられる時刻であり、時刻間隔は、影響度を推定しようとする期間である。そして、本実施形態では、過去のある時点を影響度推定対象時刻として、その時点から時刻間隔分遡った時刻からその影響度推定対象時刻までの期間における影響度を推定する。従って、過去のある時点での時刻が現時刻として入力される。現時刻から時刻間隔前の時刻（現時刻から時刻間隔分遡った時刻）を、前時刻と記す。現時刻データ群生成部３１は、現時刻から時刻間隔を減算して前時刻を計算する。 The current time data group generation unit 31 receives the current time, a time interval for designating a certain time before the current time, and a plurality of temporary influence levels. As will be described later, the current time, the time interval, and a plurality of temporary influences may be input to the current time data group generation unit 31 via an input device such as a keyboard. The current time in this embodiment is a time used as an influence degree estimation target time (impact degree estimation target time), and a time interval is a period in which the influence degree is to be estimated. In the present embodiment, an influence degree in a period from a time pointed back by a time interval from the time point to the influence degree estimation target time is estimated using a past time point as the influence degree estimation target time. Accordingly, the time at a certain point in the past is input as the current time. The time before the time interval from the current time (the time that is back by the time interval from the current time) is described as the previous time. The current time data group generation unit 31 calculates the previous time by subtracting the time interval from the current time.

また、入力される仮影響度は、求めようとしている影響度の候補であり、影響度推定装置３は、複数の仮影響度から影響度を求める。仮影響度として現時刻データ群生成部３１に入力される数値は、０以上の数値である。例えば、影響度推定システムのユーザが、０以上から５０程度までの値の中から影響度の候補を複数選択し、仮影響度として入力する。候補として仮影響度を多く入力すれば、影響度の推定精度が高くなる。 The input temporary influence degree is a candidate of the influence degree to be obtained, and the influence degree estimation device 3 obtains the influence degree from a plurality of temporary influence degrees. The numerical value input to the current time data group generation unit 31 as the temporary influence degree is a numerical value of 0 or more. For example, the user of the influence degree estimation system selects a plurality of influence degree candidates from values ranging from 0 to about 50, and inputs them as temporary influence degrees. If a large number of provisional influence degrees are input as candidates, the influence degree estimation accuracy is increased.

また、現時刻データ群生成部３１は、現時刻と顧客ＤＢ２とに基づいて、１つの現時刻データを生成する。現時刻データは、現時刻で商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付けたデータである。第１ラベルは、商品またはサービスの利用を開始している状態を示す情報であり、第２ラベルは、商品またはサービスの利用を開始していない状態を示す情報である。以下、第１ラベルを正（または＋）と記し、第２ラベルを負（または−）と記す。現時刻データ群生成部３１は、顧客ＤＢ２を用いて（すなわち、顧客データＤＢ２に記憶された顧客データに基づいて）、現時刻データを生成する。 The current time data group generation unit 31 generates one current time data based on the current time and the customer DB 2. For the current time data, the first label is labeled on the individual data of the customer who has started using the product or service at the current time, and the second is added to the individual data of the customer who has not started using the product or service at the current time. This is data with labels. The first label is information indicating a state where the use of the product or service is started, and the second label is information indicating a state where the use of the product or service is not started. Hereinafter, the first label is described as positive (or +), and the second label is described as negative (or-). The current time data group generation unit 31 generates current time data using the customer DB 2 (that is, based on customer data stored in the customer data DB 2).

前時刻データ群生成部３２は、前時刻と複数の仮影響度と顧客ＤＢ２に記憶されている顧客データとに基づいて、複数の前時刻データを生成する。前時刻データは、前時刻で商品またはサービスの利用を開始している顧客の個別データに第１ラベル（正）をラベル付け、前時刻で商品またはサービスの利用を開始していない顧客の個別データに第２ラベル（負）をラベル付け、仮影響度に応じて重み付けを行ったデータである。重み付けとは、第２ラベルをラベル付けた個別データ数に対して、第１ラベルをラベル付けた個別データ数を相対的に変動させることである。前時刻データ群生成部３２は、仮影響度毎に前時刻データを生成する。仮影響度は複数あるので、複数の前時刻データを生成することになる。 The previous time data group generation unit 32 generates a plurality of previous time data based on the previous time, the plurality of temporary influence degrees, and the customer data stored in the customer DB 2. For the previous time data, the first label (correct) is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service at the previous time The second label (negative) is labeled and data is weighted according to the temporary influence degree. The weighting means that the number of individual data labeled with the first label is changed relative to the number of individual data labeled with the second label. The previous time data group generation unit 32 generates previous time data for each temporary influence degree. Since there are a plurality of temporary influence degrees, a plurality of previous time data are generated.

前時刻分類器群生成部３３は、前時刻データ群生成部３２によって生成された各前時刻データ毎に１つずつ分類器を生成する。従って、前時刻分類器群生成部３３は、複数の分類器を生成することになる。分類器は、顧客の属性を表す項目からその顧客の個別データに正（第１ラベル）と負（第２ラベル）のいずれをラベル付けるかを判定するルールである。換言すれば、顧客の属性を表す項目を独立変数として、従属変数のとり得る値を正または負とし、その独立変数から従属変数を定めるルールである。 The previous time classifier group generation unit 33 generates one classifier for each previous time data generated by the previous time data group generation unit 32. Therefore, the previous time classifier group generation unit 33 generates a plurality of classifiers. The classifier is a rule that determines whether positive (first label) or negative (second label) is to be labeled on individual customer data from items representing customer attributes. In other words, it is a rule that defines an item representing a customer attribute as an independent variable, a possible value of the dependent variable as positive or negative, and determines the dependent variable from the independent variable.

誤差群算出部３４は、前時刻分類器群生成部３３で生成された各分類器を用いて、現時刻データのラベルを予測する。すなわち、誤差群算出部３４は、分類器と、現時刻データ内の各個別データの項目（顧客の属性を示す項目）と照合し、現時刻データ内の各個別データ毎に、その個別データにラベル付けられるラベルを予測する。また、その現時刻データ内の各個別データには既に実際にラベル付けが行われている。誤差群算出部３４は、分類器と項目とから予測したラベルと、実際に現時刻データでラベル付けられているラベルとの誤差を算出する。誤差群算出部３４は、この処理を分類器毎に行う。したがって、誤差群算出部３４は複数の誤差を算出する。 The error group calculation unit 34 predicts the label of the current time data using each classifier generated by the previous time classifier group generation unit 33. That is, the error group calculation unit 34 compares the classifier with each individual data item (item indicating the customer's attribute) in the current time data, and converts each individual data in the current time data into the individual data. Predict labels to be labeled. Further, each individual data in the current time data has already been actually labeled. The error group calculator 34 calculates an error between the label predicted from the classifier and the item and the label actually labeled with the current time data. The error group calculation unit 34 performs this process for each classifier. Therefore, the error group calculation unit 34 calculates a plurality of errors.

影響度算出部３５は、個々の分類器毎に算出された誤差のうちの最小の誤差を特定し、最小の誤差に対応する仮影響度を影響度として定める。具体的には、最小の誤差に対応する仮影響度の数が一つである場合、影響度算出部３５は、その仮影響度を影響度として定める。また、最小の誤差に対応する仮影響度が複数個存在する場合には、その複数の仮影響度に基づいて影響度を定める。例えば、最小の誤差に対応する仮影響度が複数個存在する場合、その複数の仮影響度の平均値を計算し、その平均値を影響度として定める。以下、最小の誤差に対応する仮影響度が複数個存在する場合、その仮影響度の平均値を影響度と定める場合を例にして説明する。 The influence degree calculation unit 35 specifies the minimum error among the errors calculated for each classifier, and determines the provisional influence degree corresponding to the minimum error as the influence degree. Specifically, when the number of provisional influence levels corresponding to the minimum error is one, the influence degree calculation unit 35 determines the provisional influence degree as an influence degree. In addition, when there are a plurality of provisional influence degrees corresponding to the minimum error, the influence degree is determined based on the plurality of provisional influence degrees. For example, when there are a plurality of provisional influence degrees corresponding to the minimum error, an average value of the plurality of provisional influence degrees is calculated, and the average value is determined as the influence degree. Hereinafter, a case where there are a plurality of provisional influence levels corresponding to the minimum error and an average value of the provisional influence degrees is defined as the influence degree will be described as an example.

影響度推定装置３が備える現時刻データ生成部３１、前時刻データ群生成部３２、前時刻分類器群生成部３３、誤差群算出部３４、および影響度算出部３５は、例えば、影響度推定プログラムに従って動作するＣＰＵによって実現される。すなわち、影響度推定システムに設けられた記憶装置から影響度推定プログラムを読み込んだＣＰＵが現時刻データ生成部３１、前時刻データ群生成部３２、前時刻分類器群生成部３３、誤差群算出部３４、および影響度算出部３５として動作してもよい。 The current time data generation unit 31, the previous time data group generation unit 32, the previous time classifier group generation unit 33, the error group calculation unit 34, and the influence level calculation unit 35 included in the influence degree estimation device 3 are, for example, influence degree estimation. It is realized by a CPU that operates according to a program. That is, the CPU that reads the influence degree estimation program from the storage device provided in the influence degree estimation system is the current time data generation unit 31, the previous time data group generation unit 32, the previous time classifier group generation unit 33, and the error group calculation unit. 34 and the influence calculation unit 35 may be operated.

なお、キーボード等の入力装置を介して、現時刻データ群生成部３１に現時刻、時刻間隔、および複数の仮影響度が入力されてもよい。また、影響度算出部３５が求めた影響度を出力するための出力装置が設けられていてもよい。図２は、入力装置および出力装置を備えた影響度推定システムの例を示すブロック図である。図１に示す構成要素と同様の構成要素については、図１と同一の符号を付して説明を省略する。 Note that the current time, the time interval, and a plurality of temporary influence levels may be input to the current time data group generation unit 31 via an input device such as a keyboard. Further, an output device for outputting the influence degree obtained by the influence degree calculation unit 35 may be provided. FIG. 2 is a block diagram illustrating an example of an influence degree estimation system including an input device and an output device. Constituent elements similar to those shown in FIG. 1 are denoted by the same reference numerals as those in FIG.

入力装置１は、例えばキーボードなどの入力装置である。このような入力装置１を介して現時刻、時刻間隔、および複数の仮影響度が入力されてもよい。ただし、他の態様で現時刻、時刻間隔、および複数の仮影響度が現時刻データ群生成部３１に入力されてもよい。 The input device 1 is an input device such as a keyboard. The current time, the time interval, and a plurality of temporary influences may be input via such an input device 1. However, the current time, the time interval, and a plurality of temporary influences may be input to the current time data group generation unit 31 in other manners.

また、出力装置４は、例えばディスプレイ装置などの出力装置である。影響度算出部３５は、求めた影響度を出力装置４に出力してもよい。例えば、出力装置４がディスプレイ装置である場合、出力装置４に影響度を表示させてもよい。また、出力装置４が印刷装置であってもよい。 The output device 4 is an output device such as a display device. The influence degree calculation unit 35 may output the obtained influence degree to the output device 4. For example, when the output device 4 is a display device, the influence degree may be displayed on the output device 4. Further, the output device 4 may be a printing device.

次に、動作について説明する。
図３は、本発明の影響度推定システムの処理経過の例を示すフローチャートである。例えば入力装置１（図２参照）を介して、現時刻データ生成部３１に現時刻、時刻間隔、および複数の仮影響度が入力されると、影響度推定システムは以下のように動作する。 Next, the operation will be described.
FIG. 3 is a flowchart showing an example of processing progress of the influence degree estimation system of the present invention. For example, when the current time, time interval, and a plurality of temporary influence levels are input to the current time data generation unit 31 via the input device 1 (see FIG. 2), the influence level estimation system operates as follows.

まず、現時刻データ生成部３１は、前時刻を計算し、現時刻データを作成する（ステップＡ１）。さらに、ステップＡ１において、現時刻データ生成部３１は、誤差群算出部３４が計算する誤差のうちの最小値を示す変数（ＥｒｒｏｒＭｉｎとする。）の初期値を設定し、変数ｉ，ｐに対して、ｉ＝１，ｐ＝０という初期値を設定する。ＥｒｒｏｒＭｉｎの初期値は、誤差群算出部３４が計算する誤差のとり得る値の最大値、あるいは、その誤差のとり得る値に比べて十分に大きな値であればよい。変数ｉは、入力された複数の仮影響度を順番に指定するための変数である。例えば、ｉ＝１であれば、１番目の仮影響度を指定していることを意味する。また、ｐは、最小の誤差に対応する仮影響度を指定するための変数である。例えば、最小の誤差に対応する仮影響度がｐ個あったとすると、その最小誤差に対応する仮影響度の１番目からｐ番目までをそれぞれ、Ｉｍｐ［１］，・・・，Ｉｍｐ［ｐ］とする。 First, the current time data generation unit 31 calculates the previous time and creates current time data (step A1). Further, in step A1, the current time data generation unit 31 sets an initial value of a variable (referred to as ErrorMin) indicating the minimum value of errors calculated by the error group calculation unit 34, and the variables i and p are set. Thus, initial values of i = 1 and p = 0 are set. The initial value of ErrorMin may be a maximum value that can be taken by the error calculated by the error group calculation unit 34 or a value that is sufficiently larger than a value that can be taken by the error. The variable i is a variable for sequentially specifying a plurality of input temporary influence degrees. For example, if i = 1, it means that the first temporary influence degree is designated. P is a variable for designating a temporary influence level corresponding to the minimum error. For example, if there are p temporary influences corresponding to the minimum error, the first to pth temporary influences corresponding to the minimum error are respectively Imp [1],..., Imp [p]. And

ステップＡ１において、現時刻データ生成部３１は、前時刻を、現時刻−時刻間隔と設定する。すなわち、現時刻から時刻間隔を減算した時刻を前時刻とする。 In step A1, the current time data generation unit 31 sets the previous time as the current time-time interval. That is, the time obtained by subtracting the time interval from the current time is set as the previous time.

また、ステップＡ１において、現時刻データ生成部３１は、顧客ＤＢ２に記憶された顧客データを読み込む。そして、現時刻で商品またはサービスの利用を開始している顧客の個別データ（利用開始情報が現時刻以前の時刻を表している個別データ）に正をラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付けることによって、現時刻データを生成する。 In step A1, the current time data generation unit 31 reads customer data stored in the customer DB2. Then, individual data of customers who start using the product or service at the current time (individual data whose use start information indicates the time before the current time) is labeled positive and the use of the product or service at the current time The current time data is generated by labeling the individual data of customers who have not started the process negative.

このとき、現時刻データ生成部３１は、顧客ＤＢ２に記憶されている顧客データに属する個別データのうち、前時刻で商品またはサービスの利用を開始していた顧客の個別データを除外した個別データの集合から現時刻データを生成してもよい。すなわち、利用開始情報が前時刻以前の時刻となっている個別データを除外し、残りの個別データに対して正または負のラベル付けを行ってもよい。以下、このように、利用開始情報が前時刻以前の時刻となっている個別データが現時刻データに含まれないように除外する場合を例にして説明するが、現時刻データ生成部３１は、顧客データに属する各個別データに対して正または負のラベル付けを行ってもよい。 At this time, the current time data generation unit 31 includes, among the individual data belonging to the customer data stored in the customer DB 2, the individual data excluding the individual data of the customer who started using the product or service at the previous time. Current time data may be generated from the set. That is, the individual data whose use start information is before the previous time may be excluded, and the remaining individual data may be labeled positively or negatively. Hereinafter, the case where the individual data whose use start information is the time before the previous time is excluded so as not to be included in the current time data will be described as an example. Each individual data belonging to customer data may be labeled positively or negatively.

顧客データから現時刻データを生成する動作を具体的に示す。図４は、顧客ＤＢ２される顧客データの例を示す説明図である。図４に示す各行はそれぞれ一人の顧客の個別データとなっていて、その個別データの集合が顧客データである。図４示す例では、個々の個別データの識別情報（ＩＤ）が付与されている場合を示している。図４では、顧客が商品またはサービスの利用を開始した月を利用開始時期としている場合を例示しているが、利用開始時期を日単位や時間単位などで表してもよい。また、利用開始情報が“？”で表されている場合、その“？”は、「未利用（すなわち、利用開始していない状態）」を意味しているが、他の記号で「未利用」を表してもよい。また、図４では、それぞれの個別データが、利用開始情報以外の顧客の属性を表すＮ個の項目を含んでいる場合を例示している。本例では、項目１が性別である。各項目に関し、不明である場合は欠損値を示す記号で表せばよい。本例では、欠損値を表す記号として“？”を用いている。 The operation | movement which produces | generates present time data from customer data is shown concretely. FIG. 4 is an explanatory diagram illustrating an example of customer data stored in the customer DB 2. Each row shown in FIG. 4 is individual data of one customer, and a set of the individual data is customer data. The example shown in FIG. 4 shows a case where identification information (ID) of each individual data is given. Although FIG. 4 illustrates the case where the month when the customer starts using the product or service is used as the use start time, the use start time may be expressed in units of days or hours. In addition, when the use start information is represented by “?”, The “?” Means “unused (that is, the state where the use has not started)”, but other symbols “unused”. May be represented. FIG. 4 illustrates a case where each individual data includes N items representing customer attributes other than the use start information. In this example, item 1 is gender. For each item, if it is unknown, it can be represented by a symbol indicating a missing value. In this example, “?” Is used as a symbol representing a missing value.

現時刻が２００７年４月であり、計算した前時刻が２００７年２月であるとする。現時刻データ生成部３１は、各項目の項目値はそのままとして、利用開始情報を無くし、代わりに利用を開始しているか否かを示すクラスを作成し、個別データに正または負をラベル付ける。なお、後の分類器を作成するステップで必要となるならば、男性／女性を０，１等の数値に変換してもよい。また、現時刻データ生成部３１は、図４に示す顧客データ内の個別データのうち、利用開始情報が前時刻（２００７年２月）以前の時刻となっている個別データ（ＩＤ＝１の個別データ）を除外し、他の個別データに対して、現時刻（２００７年４月）で商品またはサービスを利用開始しているならば正をラベル付け、そうでなければ負をラベル付ける。例えば、ＩＤ＝２の個別データでは、２００７年３月から利用開始となっているので、正とする。また、ＩＤ＝３の個別データでは、２００７年５月から利用開始となっていて、２００７年４月では商品またはサービスの利用を開始していないので、負とする。また、利用開始情報が“？（未利用）”となっている個別データについても負とする。図５は、このようにして図２に示す顧客データから生成された現時刻データの例を示す説明図である。前時刻以前の２００７年１月から利用を開始していた顧客の顧客データ（ＩＤ＝１）は、現時刻データでは除かれ、２００７年４月以前に利用開始した顧客の顧客データ（ＩＤ＝２）は正（＋）とラベル付けられ、他の顧客データは負（−）とラベル付けられている。なお、図５では、分かりやすさのためにＩＤを残した現時刻データを示しているが、分類に無関係な項目は予め削除してもよい。 It is assumed that the current time is April 2007 and the calculated previous time is February 2007. The current time data generation unit 31 leaves the item values of the respective items as they are, eliminates the use start information, creates a class indicating whether or not the use is started instead, and labels the individual data as positive or negative. It should be noted that male / female may be converted to a numerical value such as 0, 1 if necessary in a later step of creating a classifier. Also, the current time data generation unit 31 includes individual data (ID = 1 individual data) whose usage start information is before the previous time (February 2007) among the individual data in the customer data shown in FIG. Data) and other individual data are labeled positive if the product or service has begun to be used at the current time (April 2007), otherwise it is labeled negative. For example, in the individual data with ID = 2, since the use has started from March 2007, it is assumed to be positive. Further, the individual data with ID = 3 has been used since May 2007, and since the use of goods or services has not started in April 2007, it is negative. In addition, the individual data whose use start information is “? (Unused)” is also negative. FIG. 5 is an explanatory view showing an example of the current time data generated from the customer data shown in FIG. Customer data (ID = 1) of a customer who has started use from January 2007 before the previous time is excluded from the current time data, and customer data (ID = 2) of customers who have started use before April 2007 is excluded. ) Is labeled positive (+) and other customer data is labeled negative (-). Although FIG. 5 shows the current time data in which the ID is left for ease of understanding, items unrelated to the classification may be deleted in advance.

ステップＡ１の後、前時刻データ群生成部３２は、ｉ番目の仮影響度（以下、Ｆａｌｓｅ［ｉ］と記す。）を用いて、仮影響度Ｆａｌｓｅ［ｉ］に対応する前時刻データを生成する（ステップＡ２）。前時刻データ群生成部３２は、顧客ＤＢ２に記憶された顧客データを読み込み、前時刻以前で商品またはサービスの利用を開始している顧客の個別データに正をラベル付け、前時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付ける。すなわち、利用開始情報（図４参照）が前時刻以前の時刻となっている個別データに正をラベル付け、利用開始情報が前時刻以前の時刻でない個別データに負をラベル付ける。この時点では、また仮影響度［ｉ］に応じた重み付けは行っていない。図６は、前時刻データ生成過程において重み付けが行われる前のデータの例を示す説明図である。ＩＤ＝１の個別データでは２００７年１月から利用開始となっていて、前時刻（２００７年２月）では利用が開始されているので、正（＋）をラベル付けている。また、ＩＤ＝２以降の個別データでは、利用開始の時期が前時刻（２００７年２月）よりも後であり、前時刻では利用開始されていなかったので、ＩＤ＝２以降の個別データにはいずれも負（−）をラベル付ける（図４、図６参照）。なお、前時刻データを生成する際には、個別データを除外しない。 After step A1, the previous time data group generation unit 32 generates previous time data corresponding to the temporary influence degree False [i] using the i-th temporary influence degree (hereinafter referred to as False [i]). (Step A2). The previous time data group generation unit 32 reads the customer data stored in the customer DB 2, labels the individual data of the customers who have started using the product or service before the previous time, and labels the product or service at the previous time. Label negative data for individual customers who have not started using. That is, individual data whose use start information (see FIG. 4) is a time before the previous time is labeled positive, and individual data whose use start information is not before the previous time is labeled negative. At this time, weighting according to the temporary influence degree [i] is not performed. FIG. 6 is an explanatory diagram illustrating an example of data before weighting is performed in the previous time data generation process. In the individual data with ID = 1, the use is started from January 2007, and the use is started at the previous time (February 2007). Therefore, positive (+) is labeled. In addition, in the individual data after ID = 2, the use start time is later than the previous time (February 2007), and the use has not started at the previous time. Both labels are negative (−) (see FIGS. 4 and 6). Note that individual data is not excluded when generating previous time data.

そして、前時刻データ群生成部３２は、仮影響度Ｆａｌｓｅ［ｉ］に応じた重み付けを行う。すなわち、仮影響度Ｆａｌｓｅ［ｉ］に応じて、正とした個別データの数を、負とした個別データの数に対して相対的に変動させる。本例では、仮影響度Ｆａｌｓｅ［ｉ］の関数の関数値を計算し、正をラベル付けた個別データをその関数値倍に増加させる場合を例にする。 Then, the previous time data group generation unit 32 performs weighting according to the temporary influence degree False [i]. That is, according to the temporary influence degree False [i], the number of positive individual data is changed relative to the number of negative individual data. In this example, the function value of the function of the temporary influence degree False [i] is calculated, and the case where the individual data labeled positive is increased to the function value multiple is taken as an example.

仮影響度Ｆａｌｓｅ［ｉ］の関数として、例えば、前時刻で商品またはサービスの利用を開始している顧客数と前時刻で未だその商品またはサービスの利用を開始していない顧客数との比率を仮影響度の係数とする関数を用いてもよい。すなわち、以下に例示する関数を用いてもよい。 As a function of the provisional influence level False [i], for example, the ratio of the number of customers who have started using the product or service at the previous time and the number of customers who have not started using the product or service at the previous time You may use the function used as a coefficient of a temporary influence degree. That is, the functions exemplified below may be used.

Ｆａｌｓｅ［ｉ］×（前時刻の未利用者数）／（前時刻の利用者数） False [i] x (number of unused users at the previous time) / (number of users at the previous time)

上記の関数において、「前時刻の未利用者数」は、前時刻で未だ商品またはサービスの利用を開始していない顧客の数である。すなわち、顧客データにおいて、利用開始情報が前時刻以前ではない個別データの数である。また、「前時刻の利用者数」は、前時刻で商品またはサービスの利用を開始している顧客の数である。すなわち、顧客データにおいて、利用開始情報が前時刻以前となっている個別データの数である。このように、「前時刻の未利用者数」を「前時刻の利用者数」で除算して得た比率をＦａｌｓｅ［ｉ］の係数とする関数を用いてもよい。 In the above function, the “number of unused users at the previous time” is the number of customers who have not yet started using the product or service at the previous time. That is, in the customer data, the number of individual data whose usage start information is not before the previous time. Further, the “number of users at the previous time” is the number of customers who have started using the product or service at the previous time. That is, in the customer data, the number of individual data whose use start information is before the previous time. In this way, a function may be used in which the ratio obtained by dividing the “number of unused users at the previous time” by the “number of users at the previous time” is the coefficient of False [i].

また、仮影響度Ｆａｌｓｅ［ｉ］の関数として、例えば、前時刻で商品またはサービスの利用を開始している顧客数と全顧客数との比率を仮影響度の係数とする関数を用いてもよい。すなわち、以下に例示する関数を用いてもよい。 Further, as a function of the temporary influence degree False [i], for example, a function using a ratio between the number of customers who have started using the product or service at the previous time and the total number of customers as a coefficient of the temporary influence degree may be used. Good. That is, the functions exemplified below may be used.

Ｆａｌｓｅ［ｉ］×（顧客人数）／（前時刻の利用者数） False [i] x (number of customers) / (number of users at the previous time)

上記の関数において、「顧客人数」は、全顧客数（顧客データ内の全個別データ数）であり、前時刻で利用を開始している顧客の数と、前時刻で未だ利用を開始していない顧客の数の和である。「前時刻の利用者数」は、上述の関数の場合と同様である。このように、「顧客人数」を「前時刻の利用者数」で除算して得た比率をＦａｌｓｅ［ｉ］の係数とする関数を用いてもよい。 In the above function, the “number of customers” is the total number of customers (the total number of individual data in the customer data). The number of customers who have started using the previous time and the number of customers who have already started using the previous time. There is no sum of the number of customers. The “number of users at the previous time” is the same as in the case of the above function. In this way, a function may be used in which a ratio obtained by dividing “number of customers” by “number of users at the previous time” is a coefficient of False [i].

前時刻データ群生成部３２は、このような関数の関数値倍に、正をラベル付けた個別データを増加させる。例えば、仮影響度がＦａｌｓｅ［ｉ］のときの関数値が５であるとする。この場合、正をラベル付けた個別データの数が５倍になるように、正をラベル付けた個別データの複製を生成すればよい。この結果得られるデータが前時刻データである。図７は、重み付けを行って得た前時刻データの例を示す説明図である。図６に示す例では、ＩＤ＝１の個別データが正となっていて、他の個別データが負となっている。従って、前時刻データ群生成部３２は、ＩＤ＝１の個別データと同内容のデータを５つ生成する。ただし、ＩＤはそれぞれ１’，１’’等として、区別している。 The previous time data group generation unit 32 increases the individual data labeled positive as the function value multiple of such a function. For example, it is assumed that the function value is 5 when the temporary influence degree is False [i]. In this case, a copy of the individual data labeled positive may be generated so that the number of individual data labeled positive is five times. The data obtained as a result is the previous time data. FIG. 7 is an explanatory diagram illustrating an example of previous time data obtained by weighting. In the example shown in FIG. 6, the individual data with ID = 1 is positive, and the other individual data is negative. Therefore, the previous time data group generation unit 32 generates five pieces of data having the same contents as the individual data with ID = 1. However, the IDs are distinguished as 1 ', 1 ", etc., respectively.

なお、正とラベル付けた個別データのコピーだけを生成する場合には、正とラベル付けた個別データに対する重み付けを行ってから、他の個別データに対して負のラベル付けを行ってもよい。 If only a copy of the individual data labeled positive is generated, the individual data labeled positive may be weighted, and then the other individual data may be negatively labeled.

また、例示した２つの関数における仮影響度の係数の値は、いずれも時間が経過して利用者数が増えるに従って減少する。仮影響度の係数は、このような係数に限定されない。また、仮影響度Ｆａｌｓｅ［ｉ］の値をそのまま関数値として用いてもよい。この場合係数を１としていることになる。 In addition, the value of the coefficient of the temporary influence degree in the two exemplified functions decreases as time passes and the number of users increases. The coefficient of the temporary influence degree is not limited to such a coefficient. Further, the value of the temporary influence degree False [i] may be used as a function value as it is. In this case, the coefficient is 1.

また、上記の例では、正とラベル付けた個別データを関数値倍に増加させる場合を示しているが、重み付けでは、正とラベル付けた個別データ数を、負とラベル付けた個別データ数に対して相対的に変化させればよい。例えば、重み（関数値）が５で、正とラベル付けた個別データ数を１０倍とするならば、負とラベル付けた個別データ数（未利用者数）を２倍にすればよい。また、重みが０．５で、正とラベル付けた個別データ数を１倍とするならば、負とラベル付けた個別データ数を２倍にすればよい。なお、ここでは、個別データ数を１０倍とする等の例を挙げたが、このように、個別データ数をｋ倍にする等の条件を定める場合、そのｋ等は予め外部から入力しておけばよい。 In the above example, the case where the individual data labeled positive is increased by a multiple of the function value, but in weighting, the number of individual data labeled positive is changed to the number of individual data labeled negative. What is necessary is just to change relatively with respect to it. For example, if the weight (function value) is 5 and the number of individual data labeled as positive is 10 times, the number of individual data labeled as negative (the number of unused users) may be doubled. Further, if the weight is 0.5 and the number of individual data labeled positive is to be doubled, the number of individual data labeled negative is to be doubled. Here, an example is given in which the number of individual data is increased by 10 times. However, when conditions such as increasing the number of individual data by k times are determined in this way, the k is input from the outside in advance. Just keep it.

また、ここでは、実際に個別データの複製を作成することで個別データを増加させていたが、次のステップで分類器を作成する際に、仮影響度Ｆａｌｓｅ［ｉ］の関数値を重み付けパラメタとして用いてもよい。すなわち、正とラベル付けた個別データを何倍に増やすかということを決めておき、その結果をパラメータとして分類器を生成してもよい。 Here, the individual data is actually increased by creating a copy of the individual data. However, when the classifier is created in the next step, the function value of the temporary influence level False [i] is used as the weighting parameter. It may be used as That is, it is possible to determine how many times the individual data labeled as positive is increased, and generate a classifier using the result as a parameter.

ステップＡ２で前時刻データ群生成部３２がＦａｌｓｅ［ｉ］に応じた前時刻データを生成した後、前時刻分類器群生成部３３は、その前時刻データを用いて、その前時刻データに対応する分類器（前時刻分類器ｉと記す。）を生成する（ステップＡ３）。ＩＤは分類器作成に不要であり、前時刻分類器群生成部３３は、ＩＤを削除してよい。前時刻分類器群生成部３３は、データマイニングの様々な手法により分類器を生成すればよい。例えば、重回帰分析、決定木、ニューラルネットワーク、サポートベクタマシン、ベイジアンネットワークなどの手法のいずれかあるいはその組合せなどにより分類器を生成すればよい。 After the previous time data group generation unit 32 generates the previous time data corresponding to False [i] in step A2, the previous time classifier group generation unit 33 uses the previous time data to correspond to the previous time data. A classifier (denoted as previous time classifier i) is generated (step A3). The ID is not necessary for creating the classifier, and the previous time classifier group generation unit 33 may delete the ID. The previous time classifier group generation unit 33 may generate a classifier by various data mining techniques. For example, the classifier may be generated by any one or a combination of methods such as multiple regression analysis, decision tree, neural network, support vector machine, and Bayesian network.

以下、分類器として決定木を用いる場合を例にして、分類器の具体例を説明する。本例では、「ゴルフをするか否か」を従属変数とし、「天気」、「天候」、「湿度」、「風が強いか否か」を独立変数とする場合を例にして説明する。図８は、既知の独立変数および従属変数の組み合わせの例を示す説明図である。図９は、分類器の例を示し、ここでは、決定木を分類器としている。図１０は、分類器から予測される従属変数の例を示す説明図である。図８に示すように、既知の独立変数および従属変数の組み合わせがあれば、その既知の独立変数および従属変数から決定木を生成することができる。そして、その決定木と既知の独立変数から従属変数を予測することができる。例えば、図９に示す決定木が得られていれば、図１０に例示する独立変数から、図１０に例示する従属変数を予測することができる。項目の項目値に応じて枝分かれさせることを分割という。 Hereinafter, a specific example of the classifier will be described by taking a case where a decision tree is used as the classifier as an example. In this example, “whether or not to play golf” is a dependent variable, and “weather”, “weather”, “humidity”, and “whether wind is strong” are independent variables. FIG. 8 is an explanatory diagram showing an example of a combination of known independent variables and dependent variables. FIG. 9 shows an example of a classifier, where a decision tree is used as a classifier. FIG. 10 is an explanatory diagram illustrating an example of a dependent variable predicted from a classifier. As shown in FIG. 8, if there is a combination of a known independent variable and dependent variable, a decision tree can be generated from the known independent variable and dependent variable. The dependent variable can be predicted from the decision tree and the known independent variable. For example, if the decision tree shown in FIG. 9 is obtained, the dependent variable exemplified in FIG. 10 can be predicted from the independent variable exemplified in FIG. Branching according to the item value of an item is called division.

図８に示す例では、各行が、「ゴルフをする／しない」とラベル付けられた個別データに相当する。以下、図８に示す各行を個別データとして説明する。また、「ゴルフをする」は、正（＋）のラベルに相当し、「ゴルフをしない」は、負（−）のラベルに相当する。決定木では、従属変数のとり得るラベル毎（例えば、「する（正）」、「しない（負）」というラベル毎）に個別データ数をまとめた情報をノードとする。例えば、図８に示すルートのノードでは、「する：９，しない：５」という情報をノードとしている。 In the example shown in FIG. 8, each row corresponds to individual data labeled “go / do not play golf”. Hereinafter, each row shown in FIG. 8 will be described as individual data. Further, “go golf” corresponds to a positive (+) label, and “do not play golf” corresponds to a negative (−) label. In the decision tree, information that summarizes the number of individual data for each label that can be taken by the dependent variable (for example, for each label of “Yes (positive)” and “No (negative)”) is used as a node. For example, in the node of the route shown in FIG. 8, the information “Yes: 9, No: 5” is used as the node.

前時刻分類器群生成部３３は、前時刻データが与えられると、どの項目で最初に分割させるかを決定する。このとき、前時刻分類器群生成部３３は、項目１〜Ｎの各項目について、分割時の評価値を計算し、その評価値が最大の項目を、分割に最も適した項目として選択する。ここでは、分割前のノードのエントロピーと、分割後のエントロピーの差を評価値とする場合を例にするが、他の計算方法で評価値を求めてもよい。ノードのエントロピーは、ラベルが正（＋）の個別データの割合をｑとし、ラベルが負（−）の個別データの割合を１−ｑとすると、−ｑｌｏｇｑ−（１−ｑ）ｌｏｇ（１−ｑ）で表される。分割後のノードのエントロピーは、分割後の各ノードのエントロピーの加重平均である。 When the previous time data is given, the previous time classifier group generation unit 33 determines which item is first divided. At this time, the previous time classifier group generation unit 33 calculates an evaluation value at the time of division for each of items 1 to N, and selects an item having the largest evaluation value as an item most suitable for division. Here, although the case where the difference between the entropy of the node before the division and the entropy after the division is used as an evaluation value is taken as an example, the evaluation value may be obtained by another calculation method. The entropy of a node is -qlogq- (1-q) log (1-), where q is the ratio of individual data with a positive (+) label and 1-q is the ratio of individual data with a negative (-) label. q). The entropy of the node after the division is a weighted average of the entropy of each node after the division.

例えば、前時刻データにおいて、「正」が９データあり、「負」が５データあるとするとルートのノードは「正：９，負：５」となる。この場合、正（＋）のデータが９データあり、負（−）のデータが５データあるので、ルートのノードのエントロピーは、−（９／１４）×ｌｏｇ（９／１４）−（５／１４）×（５／１４）＝０．９４０となる。 For example, in the previous time data, if “positive” has 9 data and “negative” has 5 data, the root node is “positive: 9, negative: 5”. In this case, since there are nine positive (+) data and five negative (-) data, the entropy of the root node is-(9/14) * log (9/14)-(5 / 14) × (5/14) = 0.940.

前時刻分類器群生成部３３は、一つの項目でルートのノードを分割して得られるノードを求める。すなわち、その項目の項目値毎に、正および負の個別データ数を表す情報（ノード）を生成する。例えば、その項目１のとり得る値が「０」または「１」であり、項目１の値が「０」のときには、正が５データあり、負が２データあるとし、項目１の値が「１」のときには、正が０データあり、負が７データあるとする。この場合、項目１の値が「０」か「１」かで分岐するノードとして、「正：５，負：２」というノードと、「正：０，負：７」というノードとを生成する。前時刻分類器群生成部３３は、分割後の各ノードのエントロピーを計算し、分割後の各ノードにおける正または負としてカウントされる個別データ数に応じて各ノードのエントロピーの加重平均を求める。上記の例では「正：５，負：２」というノードにおいても、「正：０，負：７」というノードにおいても個別データの総数は７であるので、加重平均を行う際の重み付け係数は各ノードでいずれも（７／１４）となる。従って、本例の場合、前時刻分類器群生成部３３は、分割後のエントロピーを以下のように計算する。 The previous time classifier group generation unit 33 obtains a node obtained by dividing the root node by one item. That is, information (node) representing the number of positive and negative individual data is generated for each item value of the item. For example, when the possible value of the item 1 is “0” or “1”, and the value of the item 1 is “0”, it is assumed that there are 5 positive data and 2 negative data, and the value of the item 1 is “ In the case of “1”, it is assumed that there is 0 data for positive and 7 data for negative. In this case, a node “positive: 5, negative: 2” and a node “positive: 0, negative: 7” are generated as nodes that branch depending on the value of item 1 being “0” or “1”. . The previous time classifier group generation unit 33 calculates the entropy of each node after division, and obtains a weighted average of entropy of each node according to the number of individual data counted as positive or negative in each node after division. In the above example, since the total number of individual data is 7 in both the nodes “positive: 5, negative: 2” and “positive: 0, negative: 7”, the weighting coefficient when performing the weighted average is Each node is (7/14). Therefore, in this example, the previous time classifier group generation unit 33 calculates the entropy after the division as follows.

（７／１４）×｛−（５／７）×ｌｏｇ（５／７）−（２／７）×ｌｏｇ（２／７）｝
＋（７／１４）×｛−（０／７）×ｌｏｇ（０／７）−（７／７）×ｌｏｇ（７／７）｝
＝０．４３２ (7/14) × {− (5/7) × log (5/7) − (2/7) × log (2/7)}
+ (7/14) × {− (0/7) × log (0/7) − (7/7) × log (7/7)}
= 0.432

従って、本例の場合、前時刻分類器群生成部３３は、項目１で分割した場合の評価値を、０．９４０−０．４３２＝０．５０８と計算する。 Therefore, in the case of this example, the previous time classifier group generation unit 33 calculates the evaluation value when divided by item 1 as 0.940−0.432 = 0.508.

前時刻分類器群生成部３３は、項目１だけでなく、他の項目についても同様に、その項目で分割したときの評価値を計算し、評価値が最大となる項目で分割すると決定する。このようにして、ルートのノードを分割する項目を決定する。 The previous time classifier group generation unit 33 calculates not only the item 1 but also other items in the same manner, calculates an evaluation value when the item is divided, and determines that the item is divided by the item having the maximum evaluation value. In this way, the item to divide the root node is determined.

なお、上記の項目１の例では、項目１のとり得る値が「０」または「１」の二つだけである場合を示した。項目値が年齢であり、その値が２０，２１，２２のように連続する値の場合には、どこで項目値で分割させるのかも決める。この場合、前時刻分類器群生成部３３は、各項目値間の中間値をしきい値とし、各しきい値毎に、その「しきい値以下」および「そのしきい値より大」とに分割させた場合の評価値を求める。そして、評価値が最大となる場合を選択することによって、どの項目値で分割させるのかも決定する。例えば、項目値が２０，２１，２２，・・・と連続する場合では、「２０．５以下」および「２０．５より大」で分割した場合の評価値、「２１．５以下」および「２１．５より大」で分割した場合の評価値等をそれぞれ計算し、評価値が最も高くなるように分割すればよい。 In the above item 1 example, the case where the value that the item 1 can take is only “0” or “1” is shown. When the item value is age and the value is a continuous value such as 20, 21, 22, it is also determined where the item value is divided. In this case, the previous time classifier group generation unit 33 uses the intermediate value between the item values as a threshold value, and for each threshold value, “below the threshold value” and “greater than the threshold value” The evaluation value in the case of dividing into two is obtained. Then, by selecting a case where the evaluation value is maximized, it is also determined which item value is to be divided. For example, in the case where the item values are continuous with 20, 21, 22,..., The evaluation values when divided by “20.5 or less” and “greater than 20.5”, “21.5 or less” and “ An evaluation value or the like when dividing by “greater than 21.5” may be calculated and divided so that the evaluation value becomes the highest.

前時刻分類器群生成部３３は、分割後の各ノードについても、上記と同様の処理を行い、次にどの項目で分割するのかを決定する処理を順次、繰り返す。また、前時刻分類器群生成部３３は、所定の条件が満たされたときには、ノードの分割を停止する。所定の条件とは、例えば、「ノードにおける個別データのクラスが全て同じになる」という条件や、「ノードにおける正または負としてカウントされる個別データ数が所定数（例えば２）以下になる」という条件を用いてよい。前者の条件を採用すると、ノードにおける個別データが全て正または負になると、そのノードの分割を継続しない。このように、前時刻分類器群生成部３３は、ルートのノードから順次、分割を繰り返し、木構造の決定木を生成する。 The previous time classifier group generation unit 33 performs the same process as described above for each node after division, and sequentially repeats the process of determining which item is to be divided next. Further, the previous time classifier group generation unit 33 stops dividing the node when a predetermined condition is satisfied. The predetermined condition is, for example, a condition that “the classes of individual data in the node are all the same” or “the number of individual data counted as positive or negative in the node is equal to or less than a predetermined number (for example, 2)”. Conditions may be used. When the former condition is adopted, when all the individual data in the node becomes positive or negative, the division of the node is not continued. As described above, the previous time classifier group generation unit 33 sequentially repeats the division from the root node to generate a tree-structured decision tree.

また、前時刻分類器群生成部３３は、上記のように、木構造の決定木を生成した後、その決定木に対する枝刈りを行う。決定木において、分割されて生成された最終的なノードを葉と呼ぶ。ある葉に分類されたデータ数がＮであるとする（すなわち、正または負としてカウントされる個別データ数がＮであるとする）。この葉に分類されたＮデータ中、Ｅデータが誤りであるとする。この仮定では、Ｎ回の試行中、誤りという事象をＥ回観測したとみなし、大きさＮの標本で、誤りという事象が起きる確率がｒである二項分布と考えることができる。予め与えられた信頼度ＣＦに対して、ｒの上限をＵ＿ＣＦ（Ｅ，Ｎ）と表すことにすると、Ｎデータでの誤りの発生する期待値は、Ｎ×Ｕ＿ＣＦ（Ｅ，Ｎ）となる。前時刻分類器群生成部３３は、子のノードが全て葉である親のノードに対し、親における誤りの期待値（誤りの発生する期待値）と、子である葉の誤りの期待値の合計とを比較する。そして、子での期待値の合計の方が親の誤りの期待値よりも大きければ、前時刻分類器群生成部３３は、葉を縮退して、その親を葉とする。前時刻分類器群生成部３３は、この処理を順次繰り返すことで、決定木全体の葉の枝刈りを行う。 Further, as described above, the previous time classifier group generation unit 33 generates a tree-structured decision tree, and then performs pruning on the decision tree. In the decision tree, the final node generated by dividing is called a leaf. Assume that the number of data classified into a certain leaf is N (that is, the number of individual data counted as positive or negative is N). It is assumed that E data is erroneous in N data classified into these leaves. Under this assumption, it is considered that the event of error is observed E times during N trials, and can be considered as a binomial distribution in which the probability of the event of error occurring in a sample of size N is r. If the upper limit of r is expressed as U_CF (E, N) with respect to the reliability CF given in advance, the expected value at which an error occurs in N data is N × U_CF (E, N). The previous time classifier group generation unit 33 generates an expected value of an error in the parent (expected value in which an error occurs) and an expected value of an error in the child leaf for the parent node whose child nodes are all leaves. Compare the total. If the sum of the expected values at the child is larger than the expected value of the parent error, the previous time classifier group generation unit 33 degenerates the leaf and sets the parent as the leaf. The previous time classifier group generation unit 33 sequentially repeats this process to prun the leaves of the entire decision tree.

葉を縮退する場合、前時刻分類器群生成部３３は、葉を削除して、その削除した葉の親のノードを葉とすればよい。例えば、図９に例示する決定木において、「湿度」という項目の値に応じて分割したノードを縮退する場合、前時刻分類器群生成部３３は、湿度の項目値が７０％以下となっている個別データ数を表すノード「する：２，しない：０」と、湿度の項目値が７０％より高くなっている個別データ数を表すノード「する：０，しない：３」とを削除して、その２つののノードの親ノード「する：２，しない：３」を葉とすればよい。 When degenerating a leaf, the previous time classifier group generation unit 33 may delete the leaf and use the parent node of the deleted leaf as the leaf. For example, in the decision tree illustrated in FIG. 9, when the node divided according to the value of the item “humidity” is degenerated, the previous time classifier group generation unit 33 has the humidity item value of 70% or less. Delete the node “Yes: 2, No: 0” representing the number of individual data and the node “Yes: 0, No: 3” representing the number of individual data whose humidity value is higher than 70%. The parent node “Yes: 2, No: 3” of the two nodes may be regarded as a leaf.

分類器として決定木を生成する場合、例えば、上記のように、決定木を定めて枝刈りを行うことで、決定木を生成すればよい。 When generating a decision tree as a classifier, for example, a decision tree may be generated by determining and pruning a decision tree as described above.

ステップＡ３でＦａｌｓｅ［ｉ］に応じた前時刻データから分類器（前時刻分類器ｉ）を生成すると、誤差群算出部３４は、その前時刻分類器ｉを用いて、現時刻データのラベルを予測する（ステップＡ４）。すなわち、誤差群算出部３４は、現時刻データ内の各個別データ毎に、その個別データにおける顧客の属性を示す項目と、前時刻分類器ｉとを照合し、その個別データにラベル付けられるラベルを予測する。 When the classifier (previous time classifier i) is generated from the previous time data corresponding to False [i] in step A3, the error group calculation unit 34 uses the previous time classifier i to label the current time data. Predict (step A4). That is, for each individual data in the current time data, the error group calculation unit 34 collates an item indicating the customer attribute in the individual data with the previous time classifier i, and labels the individual data to be labeled Predict.

例えば、分類器が決定木である場合、誤差群算出部３４は、そのルートのノードの項目に関して現時刻データ内の個別データの項目値を参照し、その項目値に応じて子ノードを辿る。誤差群算出部３４は、葉のノードまで辿ったならば、葉のノードでカウント数の多い方のラベルを予測結果とすればよい。例えば、葉のノードで、「正：３，負：０」となっていれば、「正」と予測すればよい。 For example, when the classifier is a decision tree, the error group calculation unit 34 refers to the item value of the individual data in the current time data for the item of the node of the route, and traces the child node according to the item value. If the error group calculation unit 34 has traced to the leaf node, the error group calculation unit 34 may use the label with the larger count number at the leaf node as the prediction result. For example, if “positive: 3, negative: 0” in a leaf node, “positive” may be predicted.

ステップＡ４の後、誤差群算出部３４は、現時刻データ内の各個別データにラベル付けられるラベルの予測結果と、現時刻データとの誤差を算出する（ステップＡ５）。すなわち、現時刻データ内の各個別データには、現時刻データ生成時に正または負のラベルがラベル付けられているので、ステップＡ４での予測結果と、実際の現時刻データでラベル付けされているラベルとの誤差を予測する。この誤差をＥｒｒ［ｉ］と記す。 After step A4, the error group calculation unit 34 calculates an error between the prediction result of the label labeled on each individual data in the current time data and the current time data (step A5). That is, since each individual data in the current time data is labeled with a positive or negative label when the current time data is generated, it is labeled with the prediction result in step A4 and the actual current time data. Predict the error from the label. This error is referred to as Err [i].

誤差群算出部３４は、ステップＡ５において、例えば、現時刻データ内の個別データ毎に予測したラベルと、実際に現時刻データの各個別データにラベル付けられているラベルとを比較し、両者が異なっている個別データ数をカウントして、そのカウント値をＥｒｒ［ｉ］として求めてもよい。 In step A5, for example, the error group calculation unit 34 compares the label predicted for each individual data in the current time data with the label actually labeled on each individual data in the current time data. The number of different individual data may be counted, and the count value may be obtained as Err [i].

あるいは、誤差群算出部３４は、現時刻データ内の個別データ毎に予測したラベルのうち、正がラベル付けられると予測した個別データの数と、現時刻データの中で実際に正がラベル付けられた個別データの数との差を計算して、その差をＥｒｒ［ｉ］として求めてもよい。すなわち、正のラベルの予測数と、実際に現時刻データの中で正がラベル付けられた個別データ数（実際の利用者数）との差の絶対値をＥｒｒ［ｉ］としてもよい。 Alternatively, the error group calculation unit 34 may label the number of individual data predicted to be positive among the labels predicted for each individual data in the current time data, and actually label the positive in the current time data. A difference from the number of obtained individual data may be calculated, and the difference may be obtained as Err [i]. That is, the absolute value of the difference between the predicted number of positive labels and the number of individual data (actual number of users) actually labeled positive in the current time data may be Err [i].

ステップＡ５の後、影響度算出部３５は、誤差Ｅｒｒ［ｉ］と最小誤差値ＥｒｒｏｒＭｉｎとを比較し、Ｅｒｒ［ｉ］がＥｒｒｏｒＭｉｎ未満であるか否かを判定する（ステップＡ６）。Ｅｒｒ［ｉ］がＥｒｒｏｒＭｉｎ未満であるならば（ステップＡ６におけるＹｅｓ）、影響度算出部３５は、ｐを０に初期化し、ＥｒｒｏｒＭｉｎにＥｒｒ［ｉ］を代入する（ステップＡ１１）。Ｅｒｒ［ｉ］がＥｒｒｏｒＭｉｎ未満ということは、これまで最小としていた誤差よりもさらに小さい最小値が見つかったことを意味する。この場合、ステップＡ１１において、その最小値でＥｒｒｏｒＭｉｎを更新し、また、その誤差に対応する仮影響度が複数ある場合に各仮影響度を個別に指定するための変数ｐを初期化している。 After step A5, the influence calculation unit 35 compares the error Err [i] with the minimum error value ErrorMin, and determines whether Err [i] is less than ErrorMin (step A6). If Err [i] is less than ErrorMin (Yes in Step A6), the influence calculation unit 35 initializes p to 0 and substitutes Err [i] for ErrorMin (Step A11). If Err [i] is less than ErrorMin, it means that a minimum value smaller than the error that has been minimized so far has been found. In this case, in step A11, ErrorMin is updated with the minimum value, and when there are a plurality of temporary influence degrees corresponding to the error, a variable p for individually specifying each temporary influence degree is initialized.

ステップＡ１１の後、影響度算出部３５は、ｐの値を１インクリメントし、Ｉｍｐ［ｐ］にＦａｌｓｅ［ｉ］を代入する（ステップＡ１２）。また、ステップＡ６において、Ｅｒｒ［ｉ］がＥｒｒｏｒＭｉｎ以上であると判定した場合（ステップＡ６におけるＮｏ）、影響度算出部３５は、Ｅｒｒ［ｉ］がＥｒｒｏｒＭｉｎと等しいか否かを判定する（ステップＡ７）。ここで等しいと判定した場合（ステップＡ７におけるＹｅｓ）にも、影響度算出部３５は、ｐの値を１インクリメントし、Ｉｍｐ［ｐ］にＦａｌｓｅ［ｉ］を代入する（ステップＡ１２）。 After step A11, the influence calculation unit 35 increments the value of p by 1, and substitutes False [i] for Imp [p] (step A12). In Step A6, when it is determined that Err [i] is equal to or greater than ErrorMin (No in Step A6), the influence calculation unit 35 determines whether Err [i] is equal to ErrorMin (Step A7). ). Also when it determines with it being equal here (Yes in step A7), the influence calculation part 35 increments the value of p by 1, and substitutes False [i] to Imp [p] (step A12).

ステップＡ１１からステップＡ１２に移行した場合、ｐ＝１であり、現在着目している仮影響度Ｆａｌｓｅ［ｉ］を、誤差が最小となる１番目の仮影響度とする。また、ステップＡ７において、着目しているＦａｌｓｅ［ｉ］に対応するＥｒｒ［ｉ］がＥｒｒｏｒＭｉｎと等しく、ステップＡ１２に移行した場合、ｐは２以上の値になる。この場合、既に最小の誤差に対応する仮影響度は１つ以上見つかっていて、影響度算出部３５は、現在着目している仮影響度Ｆａｌｓｅ［ｉ］を、誤差が最小となるｐ番目の仮影響度と定めることになる。 When the process proceeds from step A11 to step A12, p = 1, and the temporary influence degree False [i] currently focused on is set as the first temporary influence degree that minimizes the error. In Step A7, when Err [i] corresponding to the focused False [i] is equal to ErrorMin and the process proceeds to Step A12, p becomes a value of 2 or more. In this case, at least one temporary influence degree corresponding to the minimum error has already been found, and the influence degree calculation unit 35 determines the temporary influence degree False [i] currently focused on as the p-th error with the minimum error. The provisional influence level is determined.

ステップＡ１２の後、あるいは、ステップＡ７でＥｒｒ［ｉ］がＥｒｒｏｒＭｉｎと等しくないと判定した場合（ステップＡ７におけるＮｏ）、影響度算出部３５は、ｉの値をインクリメントする（ステップＡ８）。次に、影響度算出部３５は、ｉの値が、最初に入力された仮影響度の個数（Ｍとする。）以下であるか否かを判定する（ステップＡ９）。ｉが仮影響度数Ｍ以下であるならば（ステップＡ９におけるＹｅｓ）、ステップＡ８でインクリメントされたｉによって定まる仮影響度Ｆａｌｓｅ［ｉ］に関してステップＡ２以降の処理を行う。すなわち、着目する仮影響度を変更してステップＡ２以降の処理を行う。 After step A12 or when it is determined in step A7 that Err [i] is not equal to ErrorMin (No in step A7), the influence degree calculation unit 35 increments the value of i (step A8). Next, the influence degree calculation unit 35 determines whether or not the value of i is equal to or less than the number of temporary influence degrees (M) that is input first (step A9). If i is less than or equal to the temporary influence frequency M (Yes in step A9), the processes after step A2 are performed on the temporary influence degree False [i] determined by i incremented in step A8. That is, the temporary influence degree to which attention is paid is changed, and the processes after step A2 are performed.

ｉの値が仮影響度数Ｍよりも大きければ（ステップＡ９におけるＮｏ）、ステップＡ１０に移行する。ここで、ｉは、誤差算出が終了した仮影響度の数を表している。ステップＡ１０において、影響度算出部３５は、最小誤差に対応する各仮影響度の平均値を計算し、その平均値を影響度として出力する（ステップＡ１０）。最小誤差に対応する仮影響度が１つしかなければ、ｐ＝１となっている。このとき、影響度算出部３５は、Ｉｍｐ［１］を影響度として出力すればよい。また、最小誤差に対応する仮影響度が複数ある場合、ｐは２以上の値となっている。このとき、影響度算出部３５は、Ｉｍｐ［１］からＩｍｐ［ｐ］までの各影響度の平均値を計算し、その平均値を影響度として出力すればよい。 If the value of i is larger than the temporary influence frequency M (No in step A9), the process proceeds to step A10. Here, i represents the number of temporary influences for which error calculation has been completed. In step A10, the influence degree calculation unit 35 calculates an average value of each temporary influence degree corresponding to the minimum error, and outputs the average value as the influence degree (step A10). If there is only one temporary influence corresponding to the minimum error, p = 1. At this time, the influence calculation unit 35 may output Imp [1] as the influence. In addition, when there are a plurality of provisional influence levels corresponding to the minimum error, p is a value of 2 or more. At this time, the influence degree calculation unit 35 may calculate an average value of the influence degrees from Imp [1] to Imp [p] and output the average value as the influence degree.

影響度算出部３５は、例えば、出力装置４に影響度を表示させる。あるいは、他の出力態様で影響度を出力してもよい。例えば、出力装置４が印刷装置であって、出力装置４に影響度を印刷させてもよい。 For example, the influence degree calculation unit 35 displays the influence degree on the output device 4. Or you may output an influence degree in another output mode. For example, the output device 4 may be a printing device, and the output device 4 may print the influence degree.

次に、第１の実施形態の効果について説明する。
第１の実施の形態では、入力された複数の仮影響度のそれぞれに対し予測誤差を求め、その誤差が最小となる仮影響度の平均を影響度とする。従って、前時刻から現時刻までの期間における影響度を推定することができる。また、影響度を調べようとする期間にあわせて、入力する現時刻および時刻間隔を定めることにより、調べようとする期間における影響度を推定することができる。 Next, the effect of the first embodiment will be described.
In the first embodiment, a prediction error is obtained for each of a plurality of inputted temporary influence degrees, and an average of the temporary influence degrees that minimizes the error is set as the influence degree. Therefore, it is possible to estimate the degree of influence in the period from the previous time to the current time. In addition, by determining the input current time and time interval in accordance with the period for which the degree of influence is to be examined, the degree of influence in the period to be examined can be estimated.

また、誤差群算出部３４が、予測したラベルと実際に現時刻データの各個別データにラベル付けられているラベルとを比較し、両者が異なっている個別データ数をカウントして、そのカウント値を誤差（Ｅｒｒ［ｉ］）とする場合には、影響度の推定性能を高めることができる。 Further, the error group calculation unit 34 compares the predicted label with the label actually labeled on each individual data of the current time data, counts the number of individual data that are different from each other, and the count value Is the error (Err [i]), it is possible to improve the estimation performance of the influence degree.

また、誤差群算出部３４が、現時刻データ内の個別データ毎に予測したラベルのうち、正がラベル付けられると予測した個別データの数と、現時刻データの中で実際に正がラベル付けられた個別データの数との差を誤差（Ｅｒｒ［ｉ］）とする場合には、現時刻で利用を開始すべき顧客が次の時刻で開始したり、次の時刻で利用を開始すべき顧客が現時刻で早めに利用を開始したりするといった細かな変動を均して影響度を推定することができる。 In addition, among the labels predicted by the error group calculation unit 34 for each individual data in the current time data, the number of individual data predicted to be positive and the actual time label in the current time data. When the difference (Err [i]) is the difference from the number of obtained individual data, the customer who should start using at the current time should start at the next time or start using at the next time It is possible to estimate the degree of influence by smoothing out fine fluctuations such as the customer starting to use the service at the current time early.

また、上記の実施形態では、入力された１つの現時刻における影響度を推定する場合を説明したが、複数種類の現時刻が入力され、その各現時刻に対して、それぞれステップＡ１以降の処理を行い、各現時刻毎に影響度を求めてもよい。この場合、刻々と変動する影響度を推定することができる。 Further, in the above-described embodiment, the case where the influence level at one input current time is estimated has been described. However, a plurality of types of current time are input, and the processing after step A1 is performed for each current time. And the degree of influence may be obtained for each current time. In this case, it is possible to estimate the degree of influence that changes every moment.

また、例えば、複数の現時刻が入力され、各現時刻毎に影響度を求める場合、以下の様に動作してもよい。すなわち、影響度推定システムのユーザから各現時刻毎に影響度を求めるように指定された場合に、各現時刻毎にステップＡ１以降の処理を行って、各現時刻毎の影響度を求め、影響度算出部３５は、それらの現時刻毎の影響度の平均値を影響度として求めてもよい。このように影響度を求めれば、広告・販売促進活動の影響を受けにくい期間で多くの時刻を用いると影響度の推定誤差を減少できると思われる場合に、推定誤差の少ない影響度を求めることができる。また、各現時刻毎の影響度の平均値を求めるときに、最近に近いほど（すなわち、現時刻が遅くなっているほうの影響度ほど）重みを付けて、重み付き平均の値を求め、その重み付き平均の値を影響度としてもよい。 Further, for example, when a plurality of current times are input and an influence degree is obtained for each current time, the following operation may be performed. That is, when it is specified by the user of the influence degree estimation system to obtain the influence degree for each current time, the process after step A1 is performed for each current time, and the influence degree for each current time is obtained. The influence degree calculation unit 35 may obtain the average value of the influence degree at each current time as the influence degree. If the degree of influence is calculated in this way, if it seems that the estimation error of the degree of influence can be reduced if many times are used in a period that is not easily affected by the advertisement / promotion activities, the degree of influence with a small estimation error should be obtained. Can do. In addition, when calculating the average value of the degree of influence at each current time, the closer to the most recent (that is, the degree of influence that the current time is late) is weighted, the weighted average value is obtained, The weighted average value may be used as the influence degree.

また、影響度推定システムのユーザが複数の現時刻を指定する場合、現時刻の最小値、最大値、および現時刻を変化させる間隔を現時刻データ生成部３１に入力してもよい。現時刻データ生成部３１は、現時刻の最小値、最大値、および現時刻を変化させる間隔が入力されると、その現時刻の最小値から最大値までの期間をその間隔毎に区切って、入力された最小値および最大値の他に、その境界の時刻を現時刻としてもよい。そして、それらの現時刻毎に影響度を求め、その移動平均あるいは移動重み付き平均を求めてもよい。 When the user of the influence level estimation system designates a plurality of current times, the current time minimum value, the maximum value, and an interval for changing the current time may be input to the current time data generation unit 31. When the current time minimum value, the maximum value, and the interval for changing the current time are input, the current time data generation unit 31 divides the period from the minimum value to the maximum value of the current time for each interval, In addition to the input minimum value and maximum value, the time at the boundary may be the current time. Then, the degree of influence may be obtained for each current time, and the moving average or the moving weighted average may be obtained.

また、複数の現時刻が入力され、各現時刻毎に各仮影響度における誤差を計算し、その各現時刻における各仮影響度の誤差から影響度を定めてもよい。例えば、複数の現時刻Ｔ１〜Ｔｘが入力されているとする。本例では、影響度推定システムは、現時刻Ｔ１に関し、ステップＡ１〜Ａ５を行い、Ｅｒｒ［１］を求める。ステップＡ５の後に誤差算出部３５は、ステップＡ８およびステップＡ９を行い、さらにｉ＝２以上の場合の誤差Ｅｒｒ［２］〜Ｅｒｒ［Ｍ］を順次求める。同様に、他の現時刻に関してもＥｒｒ［１］〜Ｅｒｒ［Ｍ］を求める。図１１は、このように現時刻毎に求めた各仮影響度における誤差を示している。各現時刻におけるＥｒｒ［１］〜Ｅｒｒ［Ｍ］を求めた後、誤差算出部３５は、１〜Ｍまでの各ｉ毎に、各現時刻におけるＥｒｒ［ｉ］の和を計算し、その和が最小となるｉによって定められる仮影響度を影響度としてもよい。図１１に示す例では、各行毎に、Ｅｒｒ［ｉ］の和を求め、その和が最小となるｉによって定まるＦａｌｓｅ［ｉ］を影響度としてもよい。 Alternatively, a plurality of current times may be input, an error in each temporary influence degree may be calculated for each current time, and the influence degree may be determined from the error in each temporary influence degree at each current time. For example, assume that a plurality of current times T1 to Tx are input. In this example, the influence degree estimation system performs steps A1 to A5 with respect to the current time T1 to obtain Err [1]. After step A5, the error calculation unit 35 performs steps A8 and A9, and sequentially determines errors Err [2] to Err [M] when i = 2 or more. Similarly, Err [1] to Err [M] are obtained for other current times. FIG. 11 shows the error in each temporary influence degree obtained at each current time in this way. After obtaining Err [1] to Err [M] at each current time, the error calculator 35 calculates the sum of Err [i] at each current time for each i from 1 to M, and the sum. The temporary influence degree determined by i that minimizes may be used as the influence degree. In the example illustrated in FIG. 11, the sum of Err [i] is obtained for each row, and False [i] determined by i that minimizes the sum may be used as the degree of influence.

また、影響度推定システムのユーザが複数の仮影響度を指定する場合、仮影響度の最大値、最小値、および仮影響度を変化させる間隔を現時刻データ生成部３１に入力してもよい。現時刻データ生成部３１は、仮影響度の最大値、最小値、および仮影響度を変化させる間隔が入力されると、その仮影響度の最小値から最大値までの値の範囲をその期間毎に区切って、入力された最小値および最大値の他に、その境界の値を仮影響度として定めてもよい。 In addition, when the user of the influence degree estimation system designates a plurality of temporary influence degrees, the maximum value and the minimum value of the temporary influence degree and an interval for changing the temporary influence degree may be input to the current time data generation unit 31. . When the current time data generation unit 31 receives the maximum value, the minimum value, and the interval for changing the temporary influence degree, the range of the value from the minimum value to the maximum value of the temporary influence degree is set to the period. In addition to the input minimum value and maximum value, the boundary value may be determined as the temporary influence degree.

また、第１の実施の形態では、影響度を出力するよう記載したが、影響度の信頼性をユーザーが知るため、算出した誤差などを出力してもよい。例えば、影響度算出手段３５が、影響度の他に、Ｅｒｒ［ｉ］を出力装置４に出力させてもよい。 Further, in the first embodiment, the influence degree is described to be output. However, in order for the user to know the reliability of the influence degree, a calculated error or the like may be output. For example, the influence degree calculation means 35 may cause Err [i] to be output to the output device 4 in addition to the influence degree.

また、第１の実施形態では、影響度推定システムが顧客ＤＢ２を備える場合を説明したが、顧客ＤＢ２が影響度推定システムの外部に設けられ、影響度推定システムがその顧客ＤＢ２を参照して処理を行う構成であってもよい。 Further, in the first embodiment, the case where the influence degree estimation system includes the customer DB 2 has been described. However, the customer DB 2 is provided outside the influence degree estimation system, and the influence degree estimation system performs processing with reference to the customer DB 2. The structure which performs this may be sufficient.

実施形態２．
図１２は、本発明の普及予測システムの例を示すブロック図である。第１の実施形態と同様の構成要素については、図１と同一の符号を付す。本発明の普及予測システムは、個々の顧客毎に、予測対象の時刻において商品またはサービスの利用を開始するか否かを判定する普及予測装置５と、顧客データベース（以下、顧客ＤＢと記す）２とを備える。 Embodiment 2. FIG.
FIG. 12 is a block diagram showing an example of a spread prediction system of the present invention. Components similar to those in the first embodiment are denoted by the same reference numerals as those in FIG. The spread prediction system of the present invention includes a spread prediction device 5 that determines whether or not to start using a product or service for each customer at a predicted time, and a customer database (hereinafter referred to as customer DB) 2. With.

本実施形態の顧客ＤＢ２は、顧客データを記憶する記憶手段であり、第１の実施形態における顧客ＤＢ２（図１参照）と同様である。本実施形態の顧客ＤＢ２が記憶する顧客データも、第１の実施形態における顧客データと同様である。すなわち、顧客データは、図４に示すように、個々の顧客毎に定められた個別データの集合である。顧客ＤＢ２において、各個別データは、利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む。顧客が商品またはサービスを利用し始めている場合には、その商品またはサービスを利用し始めた時期が利用開始情報となる。また、顧客が商品またはサービスを未利用である場合には、未利用である旨が利用開始情報となる。 The customer DB 2 of this embodiment is a storage unit that stores customer data, and is the same as the customer DB 2 (see FIG. 1) in the first embodiment. The customer data stored in the customer DB 2 of the present embodiment is the same as the customer data in the first embodiment. That is, the customer data is a set of individual data defined for each individual customer as shown in FIG. In the customer DB 2, each individual data includes usage start information and one or more items representing customer attributes other than the usage start information. When the customer starts using the product or service, the use start information is the time when the customer starts using the product or service. Further, when the customer does not use the product or service, the fact that it is not used is the use start information.

普及予測装置５は、現時刻と影響度とが入力される。第２の実施形態における現時刻は、顧客が商品またはサービスを利用し始めたか否かの判定対象時刻を定める。また、影響度は、第１の実施形態の影響度推定システムが求めた影響度であることが好ましいが、普及予測システムのユーザが決めた影響度であってもよい。また、本実施形態における現時刻は、将来の時刻であってもよい。 The spread prediction device 5 receives the current time and the influence level. The current time in the second embodiment defines a determination target time for determining whether or not the customer has started using the product or service. The influence degree is preferably the influence degree obtained by the influence degree estimation system of the first embodiment, but may be the influence degree determined by the user of the popularization prediction system. In addition, the current time in the present embodiment may be a future time.

普及予測装置５は、テストデータ生成部５１と、学習データ生成部５２と、分類器生成部５３と、テストデータラベル判定部５４とを備える。 The spread prediction device 5 includes a test data generation unit 51, a learning data generation unit 52, a classifier generation unit 53, and a test data label determination unit 54.

テストデータ生成部５１は、現時刻と顧客ＤＢ２とに基づいて、現時刻に利用を開始していない顧客の個別データを含むテストデータを生成する。 The test data generation unit 51 generates test data including individual data of customers who have not started use at the current time based on the current time and the customer DB 2.

テストデータは、現時刻において、商品またはサービスの利用を開始していない顧客の個別データを含むデータである。テストデータは、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみを含むことが好ましいが、現時刻で商品またはサービスの利用を開始している顧客の個別データを含んでいてもよい。以下、テストデータが、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみを含む場合を例にして説明する。 The test data is data including individual data of customers who have not started using goods or services at the current time. The test data preferably includes only individual data of customers who have not started using goods or services at the current time, but includes individual data of customers who have started using goods or services at the current time. Also good. Hereinafter, a case where the test data includes only individual data of customers who have not started using the product or service at the current time will be described as an example.

学習データ生成部５２は、入力された現時刻および影響度と顧客ＤＢ２に記憶されている顧客データとに基づいて、学習データを生成する。学習データは、現時刻で商品またはサービスの利用を開始している顧客の個別データに第１ラベルをラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに第２ラベルをラベル付け、入力された影響度に応じて重み付けを行ったデータである。第１の実施形態で述べたように、重み付けとは、第２ラベルをラベル付けた個別データ数に対して、第１ラベルをラベル付けた個別データ数を相対的に変動させることである。また、第１の実施形態と同様に、第１ラベルを正（または＋）と記し、第２ラベルを負（または−）と記す。 The learning data generation unit 52 generates learning data based on the input current time and degree of influence and customer data stored in the customer DB 2. For learning data, the first label is labeled on the individual data of the customer who has started using the product or service at the current time, and the second label is applied on the individual data of the customer who has not started using the product or service at the current time. The data is labeled and weighted according to the input degree of influence. As described in the first embodiment, the weighting means that the number of individual data labeled with the first label is changed relative to the number of individual data labeled with the second label. Similarly to the first embodiment, the first label is described as positive (or +), and the second label is described as negative (or −).

分類器生成部５３は、学習データ生成部５２が生成した学習データを用いて分類器を生成する。 The classifier generation unit 53 generates a classifier using the learning data generated by the learning data generation unit 52.

テストデータラベル判定部５４は、テストデータ内の各個別データのラベルを分類器によって予測する。すなわち、テストデータ内の各個別データの項目と分類器とを照合して、正または負のいずれにラベル付けられるかという判定を行う。正と判定された個別データの顧客は、現時刻の次の時刻で商品またはサービスの利用を開始すると予測されることになる。負と判定された個別データの顧客は、現時刻の次の時刻ではまだ商品またはサービスの利用を開始しないと予測されることになる。なお、現時刻の次の時刻とは、現時刻に、定められた一定時間を加算した時刻である。 The test data label determination unit 54 predicts the label of each individual data in the test data by the classifier. That is, each individual data item in the test data is collated with the classifier to determine whether the label is positive or negative. The customer of the individual data determined to be positive is predicted to start using the product or service at the time after the current time. The customer of the individual data determined to be negative is predicted not to start using the goods or services yet at the time after the current time. The time next to the current time is a time obtained by adding a predetermined fixed time to the current time.

なお、テストデータラベル判定部５４は、テストデータ内の各個別データに対するラベルの予測結果（判定結果）を、出力装置（図１２において図示せず。）に出力させてもよい。例えば、予測結果を、ディスプレイ装置などに表示させても、印刷装置などに印刷させてもよい。 The test data label determination unit 54 may cause the output device (not shown in FIG. 12) to output a label prediction result (determination result) for each individual data in the test data. For example, the prediction result may be displayed on a display device or the like, or may be printed on a printing device or the like.

また、テストデータ生成部５１および学習データ生成部５２は、外部から現時刻を入力されてもよい。また、学習データ生成部５２は、外部から影響度を入力されてもよい。このとき、例えば、キーボードなどの入力装置（図１２において図示せず。）を介して現時刻および影響度が入力されてもよい。あるいは、第１の実施形態の影響度推定システムがある現時刻における影響度を求めた場合、例えば、第１の実施形態の影響度推定システムの影響度算出手段３５が、現時刻をテストデータ生成部５１および学習データ生成部５２に渡し、影響度を学習データ生成部５２に渡してもよい。 In addition, the test data generation unit 51 and the learning data generation unit 52 may receive the current time from the outside. Further, the learning data generation unit 52 may receive an influence degree from the outside. At this time, for example, the current time and the degree of influence may be input via an input device (not shown in FIG. 12) such as a keyboard. Alternatively, when the influence degree estimation system according to the first embodiment obtains the influence degree at a certain current time, for example, the influence degree calculation means 35 of the influence degree estimation system according to the first embodiment generates test data for the current time. The degree of influence may be passed to the learning data generation unit 52.

普及予測装置５が備えるテストデータ生成部５１、学習データ生成部５２、分類器生成部５３、およびテストデータラベル判定部５４は、例えば普及予測プログラムに従って動作するＣＰＵによって実現される。すなわち、普及予測装置５に設けられた記憶装置から普及予測プログラムを読み込んだＣＰＵがテストデータ生成部５１、学習データ生成部５２、分類器生成部５３、およびテストデータラベル判定部５４として動作してもよい。 The test data generation unit 51, the learning data generation unit 52, the classifier generation unit 53, and the test data label determination unit 54 included in the spread prediction device 5 are realized by a CPU that operates according to a spread prediction program, for example. That is, the CPU that has read the diffusion prediction program from the storage device provided in the diffusion prediction device 5 operates as the test data generation unit 51, the learning data generation unit 52, the classifier generation unit 53, and the test data label determination unit 54. Also good.

次に、普及予測システムの動作について説明する。
図１３は、本発明の普及予測システムの処理経過の例を示すフローチャートである。例えば、キーボートなどの入力装置（図１３において図示せず。）を介して、普及予測装置５に現時刻および影響度が入力されると、普及予測システムは以下のように動作する。 Next, the operation of the spread prediction system will be described.
FIG. 13 is a flowchart showing an example of processing progress of the spread prediction system of the present invention. For example, when the current time and the degree of influence are input to the spread prediction device 5 via an input device (not shown in FIG. 13) such as a keyboard, the spread prediction system operates as follows.

まず、テストデータ生成部５１は、テストデータを生成する（ステップＢ１）。テストデータ生成部５１は、顧客データＤＢ２に記憶されている顧客データの中から、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみを抜き出して、その個別データの集合をテストデータとすればよい。具体的には、利用開始情報（図４参照）が現時刻以前の時刻となっていない個別データのみを抜き出して、その個別データの集合をテストデータとすればよい。ここでは、テストデータが、現時刻で利用を開始していない顧客の個別データのみを含む場合を例にしているが、現時刻で利用を開始している顧客の個別データをテストデータに含めるようにしてもよい。 First, the test data generation unit 51 generates test data (step B1). The test data generation unit 51 extracts only individual data of customers who have not started using the product or service from the customer data stored in the customer data DB 2 and tests the set of the individual data. Data can be used. Specifically, it is only necessary to extract only individual data whose use start information (see FIG. 4) is not before the current time and use the set of the individual data as test data. In this example, the test data includes only individual data of customers who have not started use at the current time. However, individual data of customers who have started using at the current time is included in the test data. It may be.

ステップＢ１の後、学習データ生成部５２は、学習データを生成する（ステップＢ２）。学習データ生成部５２は、顧客ＤＢ２に記憶された顧客データを読み込み、現時刻以前で商品またはサービスの利用を開始している顧客の個別データに正をラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付ける。すなわち、利用開始情報が現時刻以前の時刻となっている個別データに正をラベル付け、利用開始情報が現時刻以前の時刻でない個別データに負をラベル付ける。そして、学習データ生成部５２は、影響度に応じて、正とした個別データの数を、負とした個別データの数に対して相対的に変動させる。本例では、影響度の関数の関数値を計算し、その関数値倍に、正をラベル付けた個別データを増加させる場合を例にする。 After step B1, the learning data generation unit 52 generates learning data (step B2). The learning data generation unit 52 reads customer data stored in the customer DB 2, labels positive data on individual data of customers who have started using the product or service before the current time, and uses the product or service at the current time. Label negative data for individual customers who have not started That is, the individual data whose use start information is before the current time is labeled positive, and the individual data whose use start information is not before the current time is labeled negative. Then, the learning data generation unit 52 varies the number of positive individual data relative to the number of negative individual data in accordance with the degree of influence. In this example, the function value of the function of the influence degree is calculated, and the case where the individual data labeled positive is increased to the function value multiple is taken as an example.

影響度の関数として、例えば、現時刻で商品またはサービスの利用を開始している顧客数と現時刻で未だその商品またはサービスの利用を開始していない顧客数との比率を影響度の係数とする関数を用いてもよい。すなわち、以下に例示する関数を用いてもよい。 As a function of the degree of influence, for example, the ratio of the number of customers who have started using the product or service at the current time and the number of customers who have not started using the product or service at the current time You may use the function to do. That is, the functions exemplified below may be used.

影響度×（現時刻の未利用者数）／（現時刻の利用者数） Impact x (Number of unused users at current time) / (Number of users at current time)

上記の関数において、「現時刻の未利用者数」は、現時刻で未だ商品またはサービスの利用を開始していない顧客の数（より具体的には個別データの数）である。すなわち、顧客データにおいて、利用開始情報が現時刻以前でない個別データの数である。また、「現時刻の利用者数」は、現時刻で商品またはサービスの利用を開始している顧客の数である。すなわち、顧客データにおいて、利用開始情報が現時刻以前となっている個別データの数である。このように、「現時刻の未利用者数」を「現時刻の利用者数」で除算して得た比率を影響度の係数とする関数を用いてもよい。 In the above function, “the number of unused users at the current time” is the number of customers who have not yet started using the product or service at the current time (more specifically, the number of individual data). That is, in the customer data, the number of individual data whose usage start information is not before the current time. The “number of users at the current time” is the number of customers who have started using goods or services at the current time. That is, in the customer data, the number of individual data whose use start information is before the current time. In this way, a function may be used in which a ratio obtained by dividing “the number of unused users at the current time” by “the number of users at the current time” is an influence coefficient.

また、影響度の関数として、例えば、現時刻で商品またはサービスの利用を開始している顧客数と全顧客数との比率を影響度の係数とする関数を用いてもよい。すなわち、以下に例示する関数を用いてもよい。 Further, as a function of the influence degree, for example, a function having a coefficient of the influence degree as a ratio of the number of customers who start using the product or service at the current time and the total number of customers may be used. That is, the functions exemplified below may be used.

影響度×（顧客人数）／（現時刻の利用者数） Impact x (number of customers) / (number of users at the current time)

上記の関数において、「顧客人数」は、全顧客数（顧客データ内の全個別データ数）である。「現時刻の未利用者数」は、上述の関数の場合と同様である。このように、「顧客人数」を「現時刻の利用者数」で除算して得た比率を影響度の係数とする関数を用いてもよい。 In the above function, “number of customers” is the total number of customers (the total number of individual data in the customer data). The “number of unused users at the current time” is the same as in the case of the above function. In this way, a function may be used in which a ratio obtained by dividing “number of customers” by “number of users at the current time” is an influence coefficient.

学習データ生成部５２は、このような関数の関数値倍に、正をラベル付けた個別データを増加させる。例えば、入力された影響度を関数に代入して得た値がｊであるとする。この場合、正をラベル付けた個別データの数がｊ倍になるように、正をラベル付けた個別データの複製を生成すればよい。 The learning data generation unit 52 increases the individual data labeled positive as the function value multiple of such a function. For example, suppose that j is a value obtained by substituting the input degree of influence into a function. In this case, a copy of the individual data labeled positive may be generated so that the number of individual data labeled positive is j times.

また、例示した２つの関数における影響度の係数の値は、いずれも時間が経過して利用者数が増えるに従って減少する。影響度の係数は、このような係数に限定されない。また、影響度の値をそのまま関数値として用いてもよい。この場合、係数を１としていることになる。 Also, the value of the coefficient of influence in the two exemplified functions decreases as the number of users increases with time. The coefficient of influence is not limited to such a coefficient. Further, the influence value may be used as a function value as it is. In this case, the coefficient is 1.

また、上記の例では、正とラベル付けた個別データを関数値倍に増加させる場合を示しているが、重み付けでは、正とラベル付けた個別データ数を、負とラベル付けた個別データ数に対して相対的に変化させればよい。この点は、第１の実施形態において前時刻データ群生成部３２が行う重み付けと同様である。 In the above example, the case where the individual data labeled positive is increased by a multiple of the function value, but in weighting, the number of individual data labeled positive is changed to the number of individual data labeled negative. What is necessary is just to change relatively with respect to it. This point is the same as the weighting performed by the previous time data group generation unit 32 in the first embodiment.

ステップＢ２の後、分類器生成部５３は、学習データを用いて分類器を生成する（ステップＢ３）。分類器生成部５３は、重回帰分析、決定木、ニューラルネットワーク、サポートベクタマシン、ベイジアンネットワークなどのデータマイニングによる手法のいずれかあるいはその組合せにより分類器を生成すればよい。 After step B2, the classifier generation unit 53 generates a classifier using the learning data (step B3). The classifier generation unit 53 may generate a classifier by any one or a combination of data mining methods such as multiple regression analysis, decision tree, neural network, support vector machine, and Bayesian network.

また、第１の実施形態において前時刻データ群生成部３２の分類器生成動作として例示したように、分類器として決定木を生成してもよい。すなわち、分類器生成部５３は、学習データが与えられると、どの項目で最初に分割させるかを決定する。このとき、分類器生成部５３は、項目１〜Ｎの各項目について、分割時の評価値を計算し、その評価値が最大の項目を、分割に最も適した項目として選択すればよい。評価値として、例えば、分割前のノードのエントロピーと、分割後のエントロピーの差を用いればよい。分割前のノードのエントロピー、および分割後のエントロピーの計算方法は、第１の実施形態で既に説明した計算方法と同様であり、説明を省略する。分類器生成部５３は、分割後の各ノードについても、上記と同様の処理を行い、次にどの項目で分割するのかを決定する処理を順次繰り返し、所定の条件が満たされたときには、ノードの分割を停止する。そして、分類器生成部５３は、このようにして得た木構造の決定木に対して、枝刈りを行うことにより、分類器となる決定木を生成する。上記の所定の条件や枝刈りの処理も、第１の実施形態で示した所定の条件や枝刈り処理と同様であり、説明を省略する。このようにして生成した決定木（分類器）は、学習データ内で第１ラベルをラベル付けられた個別データの割合が高くなるほど、テストデータ内で第２ラベルをラベル付けられている個別データのうち、第１ラベルをラベル付けられた個別データの項目値と類似する項目値を持つ個別データに対して、第１ラベルをラベル付けると判定する頻度が高くなるという性質を有している。なお、分類器は決定木に限定されず、分類器生成部５３は、上記以外の動作で分類器を生成してもよい。 Further, as exemplified in the classifier generation operation of the previous time data group generation unit 32 in the first embodiment, a decision tree may be generated as a classifier. That is, the classifier generation unit 53 determines which item is first divided when learning data is given. At this time, the classifier generation unit 53 may calculate an evaluation value at the time of division for each of the items 1 to N and select an item having the largest evaluation value as an item most suitable for division. As the evaluation value, for example, the difference between the entropy of the node before division and the entropy after division may be used. The entropy of the node before the division and the entropy calculation method after the division are the same as the calculation method already described in the first embodiment, and the description is omitted. The classifier generation unit 53 performs the same process as described above for each node after the division, and sequentially repeats the process of determining which item to divide next, and when a predetermined condition is satisfied, Stop splitting. The classifier generation unit 53 generates a decision tree to be a classifier by pruning the tree-structured decision tree thus obtained. The predetermined conditions and the pruning process are the same as the predetermined conditions and the pruning process described in the first embodiment, and the description thereof is omitted. The decision tree (classifier) generated in this way increases the ratio of the individual data labeled with the first label in the learning data, and increases the ratio of the individual data labeled with the second label in the test data. Among them, there is a property that the frequency of determining to label the first label with respect to individual data having an item value similar to the item value of the individual data labeled with the first label is high. The classifier is not limited to the decision tree, and the classifier generation unit 53 may generate the classifier by an operation other than the above.

ステップＢ３の後、テストデータラベル判定部５４は、ステップＢ３で生成された分類器を用いて、テストデータのラベルを予測する（ステップＢ４）。すなわち、テストデータラベル判定部５４は、テストデータ内の各個別データ毎に、その個別データにおける顧客の属性を示す項目と、ステップＢ３で生成された分類器とを照合し、その個別データに対するラベルが正であるか負であるかを判定する。例えば、分類器が決定木である場合、テストデータラベル判定部５４は、そのルートのノードの項目に関してテストデータ内の個別データの項目値を参照し、その項目値に応じて子ノードを辿る。テストデータラベル判定部５４は、葉のノードまで辿ったならば、葉のノードでカウント数の多い方のラベルを判定結果とすればよい。 After step B3, the test data label determination unit 54 predicts the label of the test data using the classifier generated in step B3 (step B4). That is, for each individual data in the test data, the test data label determination unit 54 collates the item indicating the customer attribute in the individual data with the classifier generated in step B3, and labels the individual data. Determine whether is positive or negative. For example, when the classifier is a decision tree, the test data label determination unit 54 refers to the item value of the individual data in the test data for the item of the node of the route, and traces the child node according to the item value. If the test data label determination unit 54 traces the leaf node, the test data label determination unit 54 may use the label having the larger count number in the leaf node as the determination result.

テストデータラベル判定部５４は、テストデータ内の個別データ毎に判定した正または負のラベルを、例えば、ディスプレイ装置などに表示させてもよい。あるいは、印刷装置に印刷させてもよい。 The test data label determination unit 54 may display a positive or negative label determined for each individual data in the test data, for example, on a display device. Or you may make a printing apparatus print.

本実施形態によれば、顧客の属性を示す項目を含む個別データの集合に対してラベル付けを行い、さらに重み付けをして得た学習データから分類器を生成する。そして、テストデータラベル判定部５４がテストデータ内の各個別データ毎に、その項目と分類器とを照合して、正または負のいずれかを判定する。すなわち、顧客の属性を示す項目を利用して分類器を生成し、利用を開始していない顧客の項目とその分類器とから、その顧客が現時刻の次の時刻で利用を開始するか否かを予測している。従って、顧客の属性を利用して、顧客が現時刻の次の時刻で利用を開始するか否かを判定することができる。 According to the present embodiment, labeling is performed on a set of individual data including items indicating customer attributes, and a classifier is generated from learning data obtained by weighting. Then, the test data label determination unit 54 compares each item with the classifier for each individual data in the test data, and determines either positive or negative. That is, a classifier is generated using an item indicating the customer's attribute, and whether or not the customer starts using the item next to the current time from the item of the customer who has not started use and the classifier I'm predicting. Therefore, it is possible to determine whether or not the customer starts to use at the time after the current time by using the customer attribute.

また、個別データ毎にラベルの予測を行うので、個々の顧客毎に現時刻の次の時刻で商品またはサービスを利用開始するか否かを予測することができる。また、指定する現時刻を変更すれば、所望の時刻で各顧客毎に商品またはサービスを利用開始するか否かを予測することができる。すなわち、個別の顧客がある時刻で商品やサービスの利用を開始するか否かを予測できる。従って、その予測結果を、商品やサービスの販売促進に活用することができる。 Further, since the label is predicted for each individual data, it is possible to predict whether or not to start using the product or service at the time next to the current time for each individual customer. In addition, if the current time to be designated is changed, it is possible to predict whether or not to start using a product or service for each customer at a desired time. That is, it is possible to predict whether or not an individual customer will start using a product or service at a certain time. Therefore, the prediction result can be used for sales promotion of goods and services.

また、ロジスティック曲線等にデータを当てはめて普及曲線を予測する場合には、立ち上がりからのデータ（すなわち、商品やサービスの提供開始時からの提供数データ）が必要であるが、本発明では、顧客データが一部の期間におけるデータである場合にも、現時刻の次の時刻で各顧客毎に商品またはサービスを利用開始するか否かを予測することができる。 In addition, when predicting a diffusion curve by applying data to a logistic curve or the like, data from the rising edge (that is, data provided from the start of the provision of goods or services) is required. Even when the data is data in a partial period, it is possible to predict whether or not to start using the product or service for each customer at the time next to the current time.

次に、第２の実施形態の変形例を示す。以下に示す第２の実施形態の変形例では、テストデータに対する予測結果を顧客ＤＢ２が記憶する顧客データに反映させる。そして、現時刻を更新してテストデータに対する予測を繰り返して、商品またはサービスの普及の程度を予測する。 Next, a modification of the second embodiment is shown. In the following modification of the second embodiment, the prediction result for the test data is reflected in the customer data stored in the customer DB 2. Then, the current time is updated and the prediction for the test data is repeated to predict the degree of spread of the product or service.

図１４は、第２の実施形態の変形例を示すブロック図である。既に説明した構成要素と同様の構成要素は、図１２と同一の符号を付し、説明を省略する。また、図１２では、普及予測装置５に現時刻および影響度などを入力するための入力装置１と、予測結果を出力するための出力装置４とを備えている場合を例にして示している。入力装置１は、例えば、キーボードなどの入力装置であり、出力装置４は、例えば、ディスプレイ装置などの出力装置である。 FIG. 14 is a block diagram illustrating a modification of the second embodiment. Constituent elements similar to those already described are denoted by the same reference numerals as those in FIG. In addition, FIG. 12 shows an example in which the spread prediction device 5 includes the input device 1 for inputting the current time, the degree of influence, and the like, and the output device 4 for outputting the prediction result. . The input device 1 is an input device such as a keyboard, and the output device 4 is an output device such as a display device.

普及予測装置５は、テストデータ生成部５１と、学習データ生成部５２と、分類器生成部５３と、テストデータラベル判定部５４と、現時刻更新部５５とを備える。 The spread prediction device 5 includes a test data generation unit 51, a learning data generation unit 52, a classifier generation unit 53, a test data label determination unit 54, and a current time update unit 55.

本変形例では、テストデータラベル判定部５４は、ステップＢ４での判定結果が実際の利用開始情報と異なっている個別データの利用開始情報を上書きして更新する。判定結果が実際の利用開始情報と異なっている場合とは、個別データの利用開始情報が未利用を示していて、判定結果が正である場合である。この場合、テストデータラベル判定部５４は、顧客ＤＢ２に記憶されている顧客データ内におけるその個別データの利用開始情報を現時刻に一定の時間（Ｔとする。）分を加算した時刻に更新する。このように時刻を更新して、その個別データが、顧客がその時刻から商品またはサービスの利用を開始したことを表すようにする。 In this modification, the test data label determination unit 54 overwrites and updates the use start information of individual data whose determination result in step B4 is different from the actual use start information. The case where the determination result is different from the actual use start information is a case where the use start information of the individual data indicates not used and the determination result is positive. In this case, the test data label determination unit 54 updates the use start information of the individual data in the customer data stored in the customer DB 2 to a time obtained by adding a certain amount of time (T) to the current time. . In this way, the time is updated so that the individual data indicates that the customer has started using the product or service from that time.

現時刻更新部５５は、テストデータラベル判定部５４がテストデータに対する判定を行った後、現時刻を一定の時間（Ｔ）分増加した時刻を新たな現時刻とするように、現時刻を更新する。現時刻更新部５５も、例えば、普及予測プログラムに従って動作するＣＰＵによって実現される。 The current time updating unit 55 updates the current time so that the time obtained by increasing the current time by a certain time (T) is set as the new current time after the test data label determination unit 54 determines the test data. To do. The current time update unit 55 is also realized by a CPU that operates according to, for example, a spread prediction program.

テストデータ生成部５１、学習データ生成部５２、分類器生成部５３、およびテストデータラベル判定部５４は、更新後の現時刻が定められた期間内であることを条件に、更新後の顧客データおよび現時刻を用いて、第２の実施形態と同様の処理を繰り返す。 The test data generation unit 51, the learning data generation unit 52, the classifier generation unit 53, and the test data label determination unit 54 update the customer data on condition that the updated current time is within a predetermined period. Then, the same processing as in the second embodiment is repeated using the current time.

次に、本変形例の動作について説明する。図１５は、第２の実施形態の変形例における処理経過の例を示すフローチャートである。本変形例では、普及予測装置５に現時刻、影響度の他に、普及予測の対象とする期間（以下、予測期間と記す。）が入力される。予測期間として、例えば、普及予測の対象とする期間の最後の時刻が入力されてもよい。 Next, the operation of this modification will be described. FIG. 15 is a flowchart illustrating an example of processing progress in a modification of the second embodiment. In this modified example, in addition to the current time and the degree of influence, a period to be targeted for diffusion prediction (hereinafter referred to as a prediction period) is input to the diffusion prediction apparatus 5. As the prediction period, for example, the last time of the period that is the target of the spread prediction may be input.

その後、ステップＢ１〜Ｂ４の処理を行う。ステップＢ１〜Ｂ４の処理は、第２の実施形態で既に説明したステップＢ１〜Ｂ４（図１２参照）と同様の処理である。 Then, the process of step B1-B4 is performed. Steps B1 to B4 are the same as steps B1 to B4 (see FIG. 12) already described in the second embodiment.

ステップＢ４の後、テストデータラベル判定部５４は、個別データの利用開始情報が未利用を示していて、ステップＢ４での判定結果として正がラベル付けられるという結果が得られた個別データの利用開始情報を更新する（ステップＢ５）。テストデータラベル判定部５４は、顧客データ内のその個別データの利用開始情報を、現時刻に一定の時間Ｔを加算した時刻に更新する。 After step B4, the test data label determination unit 54 starts using the individual data in which the use start information of the individual data indicates that the individual data is not used, and a positive result is obtained as the determination result in step B4. Information is updated (step B5). The test data label determination unit 54 updates the use start information of the individual data in the customer data at a time obtained by adding a certain time T to the current time.

次に、現時刻更新部５５は、現時刻を更新する（ステップＢ６）。ステップＢ６において、現時刻更新部５５は、現時刻に一定の時間Ｔを加算した時刻を新たな現時刻とする。 Next, the current time update unit 55 updates the current time (step B6). In step B6, the current time update unit 55 sets a time obtained by adding a certain time T to the current time as a new current time.

そして、現時刻更新部５５は、更新後の現時刻が予測期間内であるか否かを判定する（ステップＢ７）。更新後の現時刻が予測期間内であると判定した場合（ステップＢ７におけるＹｅｓ）、普及予測装置５はステップＢ１以降の処理を繰り返す。 Then, the current time update unit 55 determines whether or not the updated current time is within the prediction period (step B7). If it is determined that the current time after the update is within the prediction period (Yes in Step B7), the spread prediction device 5 repeats the processes after Step B1.

また、更新後の現時刻が予測期間を越えると判定した場合（ステップＢ７におけるＮｏ）、現時刻更新部５５は、予測結果を出力装置４に出力させる（ステップＢ８）。例えば、各顧客毎の個別データにおける利用開示情報を、個々の顧客の予測される利用開始時期として出力装置４に出力させてもよい。また、商品またはサービスの普及曲線を出力装置４に出力させてもよい。普及予測システムが出力装置４としてディスプレイ装置などの表示装置を備えていて、現時刻更新部５５が出力装置４に予測結果を表示させてもよい。あるいは、普及予測システムが出力装置４として印刷装置を備えていて、現時刻更新部５５が出力装置４に予測結果を印刷させてもよい。 When it is determined that the updated current time exceeds the prediction period (No in step B7), the current time update unit 55 causes the output device 4 to output the prediction result (step B8). For example, the use disclosure information in the individual data for each customer may be output to the output device 4 as the predicted use start time of each customer. Further, the popularization curve of goods or services may be output to the output device 4. The spread prediction system may include a display device such as a display device as the output device 4, and the current time update unit 55 may display the prediction result on the output device 4. Alternatively, the spread prediction system may include a printing device as the output device 4, and the current time update unit 55 may cause the output device 4 to print the prediction result.

普及曲線を出力する場合、例えば、ステップＢ１に移行する毎に、現時刻更新部５５（他の構成要素でもよい）が、現時刻において商品またはサービスの利用を開始している顧客の個別データ数を顧客ＤＢの中からカウントし、そのカウント値と現時刻とを対応付けて記憶していけばよい。そして、ステップＢ８において、例えば、現時刻を横軸にとり、カウント値を縦軸にとって、現時刻の変化に伴うカウント値の推移を示すグラフを作成し、そのグラフを出力装置４に出力させてもよい。このカウント値は、利用を開始した顧客数に相当する。 When outputting a diffusion curve, for example, every time the process proceeds to step B1, the current time update unit 55 (which may be another component) counts the number of individual data of customers who have started using products or services at the current time May be counted from the customer DB, and the count value and the current time may be stored in association with each other. In step B8, for example, the current time is taken on the horizontal axis, the count value is taken on the vertical axis, a graph showing the transition of the count value accompanying the change in the current time is created, and the output device 4 outputs the graph. Good. This count value corresponds to the number of customers who have started use.

第２の実施形態の変形例では、顧客の属性を示す項目を含む個別データの集合に対してラベル付けを行い、さらに重み付けをして得た学習データから分類器を生成し、その分類器でテストデータに対する予測を行うことを、現時刻を更新しながら繰り返す。従って、顧客の属性（特徴）の違いを考慮して、商品やサービスの普及の程度を高い精度で予測することができる。 In a modification of the second embodiment, a classifier is generated from learning data obtained by labeling a set of individual data including items indicating customer attributes and further weighting, and the classifier The prediction for the test data is repeated while updating the current time. Therefore, it is possible to predict the degree of the spread of products and services with high accuracy in consideration of differences in customer attributes (features).

また、顧客のデータを増やすほど普及の予測精度を向上させることができる。 Moreover, the prediction accuracy of the spread can be improved as the customer data is increased.

また、学習データ生成時に用いる関数として、現時刻で商品またはサービスの利用を開始している顧客数と現時刻で未だその商品またはサービスの利用を開始していない顧客数との比率や、あるいは、現時刻で商品またはサービスの利用を開始している顧客数と全顧客数との比率を、影響度の係数とする関数を用いることにより、影響度だけでなく商品やサービスの普及率に応じても関数値を定め、重み付けを行うことができる。その結果、予測の精度をより高めることができる。 In addition, as a function used when generating learning data, the ratio between the number of customers who have started using the product or service at the current time and the number of customers who have not started using the product or service at the current time, or By using a function that takes the ratio of the number of customers who have started using products or services at the current time and the total number of customers as a coefficient of influence, depending on not only the influence but also the penetration rate of the product or service The function value can also be determined and weighted. As a result, prediction accuracy can be further increased.

また、学習データ生成時に用いる関数として、影響度の係数として、顧客の属性に応じた係数も含む関数を用いてもよい。例えば、学習データ生成時に、以下のような関数を用いて重み付けを行ってもよい。 Further, as a function used at the time of generating learning data, a function including a coefficient corresponding to a customer attribute may be used as an influence coefficient. For example, weighting may be performed using the following function when generating learning data.

影響度×顧客係数×（現時刻の未利用者数）／（現時刻の利用者数） Impact x Customer factor x (Number of unused users at current time) / (Number of users at current time)

学習データ生成部５２は、個別データにおける顧客の属性を示す項目の内容に応じて顧客係数を選択し、その顧客係数を用いて、例えば、上記のような関数により関数値を計算してもよい。このように、顧客毎に個別に重み付けを行ってもよい。 The learning data generation unit 52 may select a customer coefficient according to the content of the item indicating the customer attribute in the individual data, and use the customer coefficient to calculate a function value using, for example, the above function. . Thus, you may weight individually for every customer.

また、上記の変形例では、顧客ＤＢ２に記憶されている個別データの利用開始情報を上書きして更新する場合を例にして説明したが、個別データに当初から含まれている利用開始情報のコピーを利用開始時期予測情報としてコピーしておき、そのコピーを更新するようにしてもよい。この場合、実績と予測値とをそれぞれ保持しておくことができる。 Further, in the above-described modification example, the case of updating by overwriting the usage start information of the individual data stored in the customer DB 2 has been described as an example, but a copy of the usage start information included in the individual data from the beginning is described. May be copied as use start time prediction information, and the copy may be updated. In this case, the actual result and the predicted value can be held respectively.

実施形態３．
本実施形態の普及予測システムは、影響度を算出し、その影響度を用いて、予測対象の時刻において各顧客が商品またはサービスの利用を開始するか否かを判定する。図１６は、本実施形態の普及予測システムの例を示すブロック図である。本実施形態の普及予測システムは、顧客ＤＢ２と、影響度推定装置３と、普及予測装置５とを備える。また、入力装置１および出力装置４を備えていてもよい。影響度推定装置３は、第１の実施形態で説明した影響度推定装置３と同様に動作する。また、普及予測装置５は、第２の実施形態で説明した普及予測装置５と同様に動作する。顧客ＤＢ２、入力装置１、および出力装置４も、第１の実施形態および第２の実施形態で説明した顧客ＤＢ２、入力装置１、出力装置４と同様である。 Embodiment 3. FIG.
The spread prediction system according to the present embodiment calculates the degree of influence, and uses the degree of influence to determine whether or not each customer starts using the product or service at the prediction target time. FIG. 16 is a block diagram illustrating an example of a spread prediction system according to the present embodiment. The spread prediction system of this embodiment includes a customer DB 2, an influence degree estimation device 3, and a spread prediction device 5. Moreover, the input device 1 and the output device 4 may be provided. The influence degree estimation device 3 operates in the same manner as the influence degree estimation device 3 described in the first embodiment. Further, the spread prediction device 5 operates in the same manner as the spread prediction device 5 described in the second embodiment. The customer DB 2, the input device 1, and the output device 4 are the same as the customer DB 2, the input device 1, and the output device 4 described in the first embodiment and the second embodiment.

顧客ＤＢ２は、顧客データを記憶する記憶手段であり、顧客ＤＢ２が記憶する顧客データも、第１の実施形態および第２の実施形態における顧客データと同様である。すなわち、顧客データは、利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である。 The customer DB 2 is a storage means for storing customer data, and the customer data stored in the customer DB 2 is the same as the customer data in the first embodiment and the second embodiment. That is, customer data is a set of individual data for each customer including use start information and one or more items representing customer attributes other than use start information.

影響度推定装置３は、例えば入力装置１を介して、現時刻、時刻間隔、および複数の仮影響度を入力される。影響度推定装置３は、その現時刻、時刻間隔、および複数の仮影響度を用いて影響度を推定し、その影響度を普及予測装置５に渡す。 The influence degree estimation device 3 receives the current time, the time interval, and a plurality of temporary influence degrees, for example, via the input device 1. The influence degree estimation device 3 estimates the influence degree using the current time, the time interval, and a plurality of temporary influence degrees, and passes the influence degree to the diffusion prediction device 5.

普及予測装置５は、例えば入力装置１を介して、現時刻を入力される。普及予測装置５は、その現時刻と、影響度推定装置３が定めた影響度とを用いて、テストデータに対する予測を行う。 The spread prediction device 5 receives the current time, for example, via the input device 1. The spread prediction device 5 performs prediction on the test data using the current time and the influence degree determined by the influence degree estimation device 3.

本実施形態において、入力される現時刻は、影響度推定装置３において影響度の推定対象時刻としても用いられる。 In the present embodiment, the input current time is also used as an influence degree estimation target time in the influence degree estimation device 3.

図１７は、本実施形態の普及予測システムを詳細に示すブロック図である。ただし、入力装置１および出力装置４は省略している。 FIG. 17 is a block diagram showing in detail the spread prediction system of this embodiment. However, the input device 1 and the output device 4 are omitted.

影響度推定装置３は、現時刻データ生成部３１と、前時刻データ群生成部３２と、前時刻分類器群生成部３３と、誤差群算出部３４と、影響度算出部３５とを備える。普及予測装置５は、テストデータ生成部５１と、学習データ生成部５２と、分類器生成部５３と、テストデータラベル判定部５４とを備える。 The influence level estimation device 3 includes a current time data generation unit 31, a previous time data group generation unit 32, a previous time classifier group generation unit 33, an error group calculation unit 34, and an influence level calculation unit 35. The spread prediction device 5 includes a test data generation unit 51, a learning data generation unit 52, a classifier generation unit 53, and a test data label determination unit 54.

現時刻データ生成部３１、前時刻データ群生成部３２、前時刻分類器群生成部３３、誤差群算出部３４、影響度算出部３５は、第１の実施形態と同様に動作する。すなわち、ステップＡ１〜Ａ１０（図３参照。）を実行し、影響度を定める。影響度算出部３５は、その影響度を学習データ生成部５２に渡す。 The current time data generation unit 31, the previous time data group generation unit 32, the previous time classifier group generation unit 33, the error group calculation unit 34, and the influence calculation unit 35 operate in the same manner as in the first embodiment. That is, steps A1 to A10 (see FIG. 3) are executed to determine the degree of influence. The influence degree calculation unit 35 passes the influence degree to the learning data generation unit 52.

テストデータ生成部５１、学習データ生成部５２、分類器生成部５３、テストデータラベル判定部５４は、第２の実施形態と同様に動作する。すなわち、ステップＢ１〜Ｂ４（図１３参照）を実行し、個々の顧客毎に、商品またはサービスの利用が開始されるか否かを判定する。 The test data generation unit 51, the learning data generation unit 52, the classifier generation unit 53, and the test data label determination unit 54 operate in the same manner as in the second embodiment. That is, Steps B1 to B4 (see FIG. 13) are executed, and it is determined for each customer whether or not the use of goods or services is started.

なお、影響度算出部３５は、１つの影響度を定めて、普及予測装置５は、その影響度を用いて、ステップＢ１以降の処理を行えばよい。 In addition, the influence degree calculation part 35 determines one influence degree, and the spread prediction apparatus 5 should just perform the process after step B1 using the influence degree.

第１の実施形態で説明した影響度推定装置３に関する種々の変形が本実施形態の影響度推定装置３に適用されてもよい。また、第２の実施形態で説明した普及予測装置５に関する種々の変形が本実施形態の普及予測装置５に適用されていてもよい。第２の実施形態の変形例（図１４参照）と同様に、普及予測装置５が現時刻更新部５５を備え、ステップＢ１〜Ｂ８を実行する構成であってもよい。例えば、普及曲線を出力装置４に出力させてもよい。 Various modifications relating to the influence degree estimation device 3 described in the first embodiment may be applied to the influence degree estimation device 3 of the present embodiment. Moreover, the various deformation | transformation regarding the spread prediction apparatus 5 demonstrated in 2nd Embodiment may be applied to the spread prediction apparatus 5 of this embodiment. Similarly to the modification of the second embodiment (see FIG. 14), the spread prediction device 5 may include the current time update unit 55 and execute steps B1 to B8. For example, the popularization curve may be output to the output device 4.

現時刻データ生成部３１、前時刻データ群生成部３２、前時刻分類器群生成部３３、誤差群算出部３４、影響度算出部３５、テストデータ生成部５１、学習データ生成部５２、分類器生成部５３、およびテストデータラベル判定部５４は、例えば、普及予測プログラムに従って動作するＣＰＵによって実現される。普及予測システムが現時刻更新部５５を備える場合、現時刻更新部５５もそのＣＰＵによって実現されてよい。 Current time data generation unit 31, previous time data group generation unit 32, previous time classifier group generation unit 33, error group calculation unit 34, influence calculation unit 35, test data generation unit 51, learning data generation unit 52, classifier The generation unit 53 and the test data label determination unit 54 are realized by, for example, a CPU that operates according to a spread prediction program. When the spread prediction system includes the current time update unit 55, the current time update unit 55 may also be realized by the CPU.

本実施形態においても、第２の実施形態と同様の効果を得ることができる。特に、影響度推定装置３が影響度を用いて処理を行うので、予測精度をより高くすることができる。 Also in this embodiment, the same effect as that of the second embodiment can be obtained. In particular, since the influence estimation device 3 performs processing using the influence, the prediction accuracy can be further increased.

また、第２の実施形態および第３の実施形態では、普及予測システムが顧客ＤＢ２を備える場合を説明したが、顧客ＤＢ２が普及予測システムの外部に設けられ、普及予測システムがその顧客ＤＢ２を参照して処理を行う構成であってもよい。 In the second and third embodiments, the case where the spread prediction system includes the customer DB 2 has been described. However, the customer DB 2 is provided outside the spread prediction system, and the spread prediction system refers to the customer DB 2. Thus, a configuration may be used in which processing is performed.

実施形態４．
図１８は、本発明の第４の実施形態の普及予測システムの例を示すブロック図である。第４の実施形態の普及予測システムは、顧客ＤＢ２と、閾値関数推定装置６と、普及予測装置７とを備える。また、第１の実施形態から第３の実施形態で説明した入力装置１および出力装置４と同様の入力装置および出力装置を備えていてもよい。顧客ＤＢ２は、第１の実施形態から第３の実施形態で説明した顧客ＤＢ２と同様である。 Embodiment 4 FIG.
FIG. 18 is a block diagram illustrating an example of a spread prediction system according to the fourth embodiment of the present invention. The spread prediction system according to the fourth embodiment includes a customer DB 2, a threshold function estimation device 6, and a spread prediction device 7. Moreover, you may provide the input device and output device similar to the input device 1 and the output device 4 which were demonstrated in 1st Embodiment to 3rd Embodiment. The customer DB 2 is the same as the customer DB 2 described in the first to third embodiments.

顧客ＤＢ２は、顧客データを記憶する記憶手段であり、顧客ＤＢ２が記憶する顧客データも、第１の実施形態から第３の実施形態における顧客データと同様である。すなわち、顧客データは、利用開始情報と、利用開始情報以外の顧客の属性を表す一つ以上の項目とを含む顧客毎の個別データの集合である。 The customer DB 2 is a storage unit that stores customer data, and the customer data stored in the customer DB 2 is the same as the customer data in the first to third embodiments. That is, customer data is a set of individual data for each customer including use start information and one or more items representing customer attributes other than use start information.

本実施形態において、閾値関数推定装置６は、商品またはサービスの普及率と閾値との関係を示す閾値関数を推定する。閾値関数推定装置６は、複数の時刻において、普及率および閾値を求め、それぞれの時刻毎の普及率および閾値から閾値関数を推定する。この閾値とは、個別データに第１ラベルがラベル付けられるか否かを判定するための閾値である。普及予測装置７は、顧客ＤＢ２に記憶された顧客データからテストデータを生成し、テストデータ内の個別データに第１のラベルがラベル付けられる確からしさのスコアを計算する。また、普及予測装置７は、商品またはサービスの普及率も計算し、閾値関数推定装置６によって推定された閾値関数にその普及率を代入して閾値を算出する。普及予測装置７は、スコアがその閾値以上である個別データに対するラベルは第１ラベルであり、スコアがその閾値未満である個別データに対するラベルは第２ラベルであると判定する。以下、既に説明した実施形態と同様に、第１ラベルを正（または＋）と記し、第２ラベルを負（または−）と記す。 In the present embodiment, the threshold function estimation device 6 estimates a threshold function indicating the relationship between the penetration rate of goods or services and the threshold. The threshold function estimation device 6 obtains a penetration rate and a threshold at a plurality of times, and estimates a threshold function from the penetration rate and the threshold for each time. This threshold value is a threshold value for determining whether or not the first label is labeled on the individual data. The spread prediction device 7 generates test data from the customer data stored in the customer DB 2 and calculates a probability score that the first label is labeled on the individual data in the test data. The spread prediction device 7 also calculates the spread rate of the product or service, and calculates the threshold value by substituting the spread rate into the threshold function estimated by the threshold function estimation device 6. The spread prediction device 7 determines that the label for the individual data whose score is equal to or higher than the threshold is the first label, and the label for the individual data whose score is lower than the threshold is the second label. Hereinafter, the first label is described as positive (or +) and the second label is described as negative (or −), as in the embodiment described above.

閾値関数推定装置６は、基準時刻データ群生成部６１と、閾値推定前時刻データ群生成部６２と、閾値推定前時刻分類器群生成部６３と、誤差群算出部６４と、関数推定部６５とを備える。 The threshold function estimation device 6 includes a reference time data group generation unit 61, a pre-threshold time data group generation unit 62, a pre-threshold time classifier group generation unit 63, an error group calculation unit 64, and a function estimation unit 65. With.

基準時刻データ群生成部６１は、閾値の推定対象時刻を定めるための複数の基準時刻と、その各基準時刻の一定時間前を指定するための時刻間隔と、複数の仮閾値とが入力される。キーボード等の入力装置（図１８において図示せず。）を介して基準時刻データ群生成部６１に、複数の基準時刻、時刻間隔および複数の仮閾値が入力されてもよい。 The reference time data group generation unit 61 receives a plurality of reference times for determining threshold estimation target times, a time interval for designating a predetermined time before each reference time, and a plurality of temporary thresholds. . A plurality of reference times, time intervals, and a plurality of temporary thresholds may be input to the reference time data group generation unit 61 via an input device (not shown in FIG. 18) such as a keyboard.

本実施形態における基準時刻とは、閾値の推定対象時刻となる後述の閾値推定前時刻を定めるための基準時刻であり、後述の基準時刻データを用いて各閾値推定前時刻毎の閾値が決定される。なお、それぞれの閾値推定前時刻毎に普及率も計算され、それぞれの閾値推定前時刻における普及率と閾値との関係から閾値関数が推定される。 The reference time in the present embodiment is a reference time for determining a later-described threshold estimation time, which is a threshold estimation target time, and a threshold for each threshold estimation previous time is determined using later-described reference time data. The Note that a penetration rate is also calculated for each time before threshold estimation, and a threshold function is estimated from the relationship between the penetration rate and the threshold at each time before threshold estimation.

時刻間隔は、入力された基準時刻の一定時間前を指定するための時間である。入力される仮閾値は、閾値関数推定装置６で求めようとしている閾値の候補である。各仮閾値は、例えば、０以上１以下の数値である。 The time interval is a time for designating a certain time before the input reference time. The input temporary threshold is a threshold candidate to be obtained by the threshold function estimation device 6. Each temporary threshold is a numerical value of 0 or more and 1 or less, for example.

基準時刻データ群生成部６１に入力される基準時刻の数をＵ個とし、仮閾値の数をＶ個とする。 The number of reference times input to the reference time data group generation unit 61 is U, and the number of temporary thresholds is V.

基準時刻データ群生成部６１は、１つの基準時刻に対して、１つの基準時刻データを生成する。従って、Ｕ個の基準時刻データが生成される。基準時刻データは、基準時刻で商品またはサービスの利用を介している顧客の個別データに正（第１ラベル）をラベル付け、基準時刻で商品またはサービスの資料を開始していない顧客の個別データに負（第２ラベル）をラベル付けたデータである。基準時刻データ群生成部６１は、顧客ＤＢ２を用いて（すなわち、顧客ＤＢ２に記憶された顧客データに基づいて）、基準時刻データを生成する。個々の基準時刻データの生成処理は、第１の実施形態における現時刻データの生成処理と同様である。また、基準時刻データ群生成部６１は、それぞれの基準時刻から時刻間隔前の時刻（基準時刻から時刻間隔分遡った時刻）を求める。この時刻を、閾値推定前時刻と記す。基準時刻データ群生成部６１は、入力された各基準時刻毎に、基準時刻から時刻間隔を減算して閾値推定前時刻を計算する。従って、閾値推定前時刻もＵ個求められる。 The reference time data group generation unit 61 generates one reference time data for one reference time. Therefore, U reference time data are generated. The reference time data is obtained by labeling the individual data of the customer who uses the product or service at the reference time as positive (first label), and the individual data of the customer who has not started the product or service data at the reference time. Data labeled negative (second label). The reference time data group generation unit 61 generates reference time data using the customer DB 2 (that is, based on customer data stored in the customer DB 2). The process of generating individual reference time data is the same as the process of generating current time data in the first embodiment. In addition, the reference time data group generation unit 61 obtains a time before the time interval from each reference time (a time that goes back by the time interval from the reference time). This time is referred to as a threshold pre-estimation time. The reference time data group generation unit 61 calculates the time before threshold estimation by subtracting the time interval from the reference time for each input reference time. Therefore, U times before threshold estimation are also obtained.

閾値推定前時刻データ群生成部６２は、１つの閾値推定前時刻に対して、１つの閾値推定前時刻データを生成する。従って、Ｕ個の閾値推定前時刻データが生成される。閾値推定前時刻データ群生成部６２は、閾値推定前時刻毎に、閾値推定前時刻と顧客ＤＢ２に記憶されている顧客データとに基づいて、閾値推定前時刻データを生成する。閾値推定前時刻データは、閾値推定前時刻で商品またはサービスの利用を開始している顧客の個別データベースに正をラベル付け、閾値推定前時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付けたデータである。閾値推定前時刻データは、重み付けが行われていない点で、第１の実施形態で説明した前時刻データと異なる。 The pre-threshold time data group generation unit 62 generates one pre-threshold time data for one pre-threshold time. Accordingly, U threshold value pre-estimation time data are generated. The pre-threshold time data group generation unit 62 generates pre-threshold time data for each pre-threshold time based on the pre-threshold time and the customer data stored in the customer DB 2. For the time before threshold estimation data, the individual database of customers who started using products or services at the time before threshold estimation is labeled positive, and individual customers who have not started using products or services at the time before threshold estimation The data is negative labeled. The pre-threshold time data is different from the previous time data described in the first embodiment in that no weighting is performed.

また、閾値推定前時刻データ群生成部６２は、閾値推定前時刻データを用いて普及率を計算する。普及率は、顧客数に対する、閾値推定前時刻までに商品またはサービスの利用を開始した顧客数の割合である。閾値推定前時刻データ群生成部６２は、閾値推定前時刻データ毎に普及率を計算する。よって、普及率もＵ個計算される。閾値推定前時刻データ群生成部６２は、関数推定部６５で用いるために閾値推定前時刻データ毎のＵ個の普及率を記憶しておいてもよい。 Further, the threshold value pre-estimation time data group generation unit 62 calculates the penetration rate using the pre-threshold value time data. The penetration rate is the ratio of the number of customers who have started using goods or services by the time before threshold estimation with respect to the number of customers. The pre-threshold time data group generation unit 62 calculates the penetration rate for each pre-threshold time data. Therefore, U penetration rates are also calculated. The threshold value pre-estimation time data group generation unit 62 may store U penetration rates for each pre-threshold value time data for use by the function estimation unit 65.

閾値推定前時刻分類器群生成部６３は、１つの閾値推定前時刻データに基づいて、１つの分類器を生成する。従って、Ｕ個の分類器が生成される。本実施形態における分類器は、顧客の個別データに第１ラベルがラベル付けられる確からしさのスコアを定めるルールである。 The pre-threshold time classifier group generation unit 63 generates one classifier based on one pre-threshold time data. Therefore, U classifiers are generated. The classifier in the present embodiment is a rule that determines a probability score that a first label is labeled on individual customer data.

誤差群算出部６４は、閾値推定前時刻分類器群生成部６３で生成された分類器を用いて、基準時刻データ内の各個別データのスコアを計算し、スコアと仮閾値とを比較することで、各個別データのラベルを予測する。このスコアは、個別データに第１ラベルがラベル付けられる確からしさのスコアである。誤差群算出部６４は、このラベル予測を分類器毎に行う。また、一つの分類器でのラベル予測では、入力されたＶ個の仮閾値毎にラベル予測を行う。誤差群算出部６４は、ある分類器および仮閾値を用いてラベル予測を行う場合、その分類器で基準時刻データ内の各個別データのスコアを計算し、スコアがその仮閾値以上となっている個別データに対するラベルが正であり、スコアがその仮閾値未満となっている個別データに対するラベルが負であると予測する。また、その基準時刻データ内の各個別データには、既に実際にラベル付けが行われている。誤差群算出部６４は、分類器および仮閾値を用いて予測したラベルと、実際に基準時刻データでラベル付けられているラベルとの誤差を算出する。この予測したラベルと実際のラベルとの誤差の算出は、第１の実施形態における誤差群算出部３４が予測したラベルと実際のラベルとの誤差を算出する処理と同様である。Ｕ個の分類器が生成され、各分類器に関しそれぞれＶ個の仮閾値について誤差を算出するので、Ｕ×Ｖ個の誤差を算出することになる。 The error group calculation unit 64 calculates the score of each individual data in the reference time data using the classifier generated by the threshold value pre-estimation time classifier group generation unit 63, and compares the score with the temporary threshold value. Thus, the label of each individual data is predicted. This score is a probability score that the first label is labeled on the individual data. The error group calculation unit 64 performs this label prediction for each classifier. Further, in the label prediction with one classifier, label prediction is performed for each input V temporary thresholds. When performing the label prediction using a certain classifier and a temporary threshold, the error group calculation unit 64 calculates the score of each individual data in the reference time data with the classifier, and the score is equal to or higher than the temporary threshold. It is predicted that the label for the individual data is positive and the label for the individual data whose score is less than the provisional threshold is negative. Each individual data in the reference time data has already been actually labeled. The error group calculation unit 64 calculates an error between the label predicted using the classifier and the temporary threshold and the label actually labeled with the reference time data. The calculation of the error between the predicted label and the actual label is the same as the process of calculating the error between the predicted label and the actual label by the error group calculation unit 34 in the first embodiment. Since U classifiers are generated and errors are calculated for V temporary thresholds for each classifier, U × V errors are calculated.

関数推定部６５は、Ｕ個の分類器それぞれに対し、Ｖ個の仮閾値毎に算出された誤差のうち、最小の誤差を特定し、最小の誤差に対応する仮閾値を分類器に対応する閾値（すなわち、閾値推定前時刻に対応する閾値）として定める。具体的には、最小の誤差に対応する仮閾値の数が一つである場合、関数推定部６５は、その仮閾値を閾値として定める。また、最小の誤差に対応する仮閾値が複数個存在する場合には、その複数の仮閾値に基づいて、閾値推定前時刻に対応する閾値を定める。例えば、最小の誤差に対応する仮閾値が複数個存在する場合、その複数の仮閾値の平均値を計算し、その平均値を閾値推定前時刻に対応する閾値と定める。以下、最小の誤差に対応する仮閾値が複数個存在する場合、その仮閾値の平均値を閾値と定める場合を例にして説明する。 For each of the U classifiers, the function estimation unit 65 identifies the minimum error among the errors calculated for each of the V temporary thresholds, and corresponds the temporary threshold corresponding to the minimum error to the classifier. It is defined as a threshold value (that is, a threshold value corresponding to the time before threshold estimation). Specifically, when the number of provisional threshold values corresponding to the minimum error is one, the function estimation unit 65 determines the provisional threshold value as the threshold value. Further, when there are a plurality of temporary threshold values corresponding to the minimum error, the threshold value corresponding to the pre-threshold time is determined based on the plurality of temporary threshold values. For example, when there are a plurality of temporary threshold values corresponding to the minimum error, an average value of the plurality of temporary threshold values is calculated, and the average value is determined as a threshold value corresponding to the time before threshold estimation. Hereinafter, the case where there are a plurality of temporary threshold values corresponding to the smallest error and the average value of the temporary threshold values is set as the threshold value will be described as an example.

分類器毎に閾値を定めるので、各閾値推定前時刻に対応する閾値が定まる。また、閾値推定前時刻データ群生成部６２によって、閾値推定前時刻毎の普及率が計算されている。よって、閾値推定前時刻毎の閾値および普及率の組がＵ組得られる。関数推定部６５は、この閾値および普及率の組から閾値関数を推定する。 Since a threshold is determined for each classifier, a threshold corresponding to each time before threshold estimation is determined. Further, the prevalence estimation time data group generation unit 62 calculates the penetration rate for each preestimation time. Thus, U sets of thresholds and penetration rates for each time before threshold estimation are obtained. The function estimation unit 65 estimates a threshold function from the set of the threshold and the penetration rate.

閾値関数推定装置６が備える基準時刻データ群生成部６１、閾値推定前時刻データ群生成部６２、閾値推定前時刻分類器群生成部６３、誤差群算出部６４、および関数推定部６５は、例えばプログラムに従って動作するＣＰＵによって実現される。すなわち、閾値関数推定装置６に設けられた記憶装置からプログラムを読み込んだＣＰＵが、基準時刻データ群生成部６１、閾値推定前時刻データ群生成部６２、閾値推定前時刻分類器群生成部６３、誤差群算出部６４、および関数推定部６５として動作してもよい。 The reference time data group generation unit 61, the pre-threshold time data group generation unit 62, the pre-threshold time classifier group generation unit 63, the error group calculation unit 64, and the function estimation unit 65 included in the threshold function estimation device 6 are, for example, It is realized by a CPU that operates according to a program. That is, the CPU that has read the program from the storage device provided in the threshold function estimation device 6 includes a reference time data group generation unit 61, a threshold value pre-estimation time data group generation unit 62, a threshold value pre-estimation time classifier group generation unit 63, The error group calculation unit 64 and the function estimation unit 65 may be operated.

普及予測装置７は、テストデータ生成部７１と、ラベル付けデータ生成部７２と、分類器生成部７３と、テストデータラベル判定部７４とを備える。 The spread prediction device 7 includes a test data generation unit 71, a labeling data generation unit 72, a classifier generation unit 73, and a test data label determination unit 74.

普及予測装置７には現時刻が入力される。この現時刻は、第２の実施形態で説明した現時刻データと同様に、顧客が商品またはサービスを利用し始めたか否かの判定対象時刻を定める。入力装置（図１８において図示略）を介して普及予測装置７に現時刻が入力されてもよい。 The current time is input to the spread prediction device 7. This current time defines a determination target time as to whether or not the customer has started using the product or service, like the current time data described in the second embodiment. The current time may be input to the spread prediction device 7 via an input device (not shown in FIG. 18).

テストデータ生成部７１は、現時刻と顧客ＤＢ２とに基づいて、現時刻に利用を開始していない顧客の個別データを含むテストデータを生成する。以下、テストデータが、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみを含む場合を例にして説明する。 The test data generation unit 71 generates test data including individual data of customers who have not started use at the current time based on the current time and the customer DB 2. Hereinafter, a case where the test data includes only individual data of customers who have not started using the product or service at the current time will be described as an example.

ラベル付けデータ生成部７２は、現時刻および顧客ＤＢ２に記憶されている顧客データに基づいて、ラベル付けデータを生成する。ラベル付けデータは、現時刻で商品またはサービスの利用を開始している顧客の個別データに正（第１ラベル）をラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに負（第２ラベル）をラベル付けたデータである。ラベル付けデータは、重み付けが行われていない点で、第２の実施形態で説明した学習データと異なる。重み付けを行うか否かという点以外では、本実施形態におけるラベル付けデータ生成処理と、第２の実施形態における学習データ生成処理は同様である。 The labeling data generation unit 72 generates labeling data based on the current time and customer data stored in the customer DB 2. For the labeling data, the individual data of the customer who has started using the product or service at the current time is labeled positive (first label), and the individual data of the customer who has not started using the product or service at the current time Is negative (second label). The labeling data is different from the learning data described in the second embodiment in that no weighting is performed. Except for whether or not weighting is performed, the labeling data generation processing in the present embodiment and the learning data generation processing in the second embodiment are the same.

また、ラベル付けデータ生成部７２は、生成したラベル付けデータを用いて普及率を計算する。 Further, the labeling data generation unit 72 calculates a penetration rate using the generated labeling data.

分類器生成部７３は、ラベル付けデータ生成部７２が生成したラベル付けデータを用いて分類器を生成する。この分類器は、閾値推定前時刻分類器群生成部６３が生成する分類器と同じく、顧客の個別データに第１ラベルがラベル付けられる確からしさのスコアを定めるルールである。 The classifier generation unit 73 generates a classifier using the labeling data generated by the labeling data generation unit 72. This classifier is a rule for determining a probability score that the first label is labeled on the individual data of the customer, like the classifier generated by the pre-threshold time classifier group generation unit 63.

テストデータラベル判定部７４は、分類器生成部７３で生成された分類器を用いて、テストデータ内の各個別データのスコアを計算する。また、テストデータラベル判定部７４は、関数推定部６５で推定された閾値関数と、ラベル付けデータから計算された普及率とを用いて、閾値を計算する。すなわち、閾値関数に普及率を代入して閾値を計算する。そして、スコアとその閾値とを比較して、テストデータラベル内の各個別データのラベルを判定する。具体的には、テストデータラベル判定部７４は、テストデータラベル内の各個別データのうち、スコアが閾値以上となっている個別データに対するラベルが正であると判定し、スコアが閾値未満の個別データに対するラベルが第２ラベルであると判定する。正と判定された個別データの顧客は、現時刻の次の時刻で商品またはサービスの利用を開始すると予測されることになる。負と判定された個別データの顧客は、現時刻の次の時刻ではまだ商品またはサービスの利用を開始しないと予測されることになる。なお、現時刻の次の時刻とは、現時刻に、定められた一定時間を加算した時刻である。 The test data label determination unit 74 uses the classifier generated by the classifier generation unit 73 to calculate the score of each individual data in the test data. Further, the test data label determination unit 74 calculates a threshold value using the threshold function estimated by the function estimation unit 65 and the penetration rate calculated from the labeling data. That is, the threshold value is calculated by substituting the penetration rate into the threshold function. Then, the score is compared with the threshold value to determine the label of each individual data in the test data label. Specifically, the test data label determination unit 74 determines that the label for the individual data whose score is equal to or greater than the threshold among the individual data in the test data label is positive, and the individual whose score is less than the threshold It is determined that the label for the data is the second label. The customer of the individual data determined to be positive is predicted to start using the product or service at the time after the current time. The customer of the individual data determined to be negative is predicted not to start using the goods or services yet at the time after the current time. The time next to the current time is a time obtained by adding a predetermined fixed time to the current time.

テストデータラベル判定部７４は、テストデータ内の各個別データに対するラベルの予測結果（判定結果）を、出力装置（図１８において図示せず。）に出力させてもよい。 The test data label determination unit 74 may cause the output device (not shown in FIG. 18) to output a label prediction result (determination result) for each individual data in the test data.

普及予測装置７が備えるテストデータ生成部７１、ラベル付けデータ生成部７２、分類器生成部７３、およびテストデータラベル判定部７４は、例えば普及予測プログラムに従って動作するＣＰＵによって実現される。すなわち、普及予測装置７に設けられた記憶装置から普及予測プログラムを読み込んだＣＰＵがテストデータ生成部７１、ラベル付けデータ生成部７２、分類器生成部７３、およびテストデータラベル判定部７４として動作してもよい。 The test data generation unit 71, the labeling data generation unit 72, the classifier generation unit 73, and the test data label determination unit 74 included in the spread prediction device 7 are realized by a CPU that operates according to a spread prediction program, for example. That is, the CPU that has read the spread prediction program from the storage device provided in the spread prediction device 7 operates as the test data generation unit 71, the labeling data generation unit 72, the classifier generation unit 73, and the test data label determination unit 74. May be.

また、閾値関数推定装置６および普及予測装置７が同一の情報処理装置によって実現されてもよい。そして、基準時刻データ群生成部６１、閾値推定前時刻データ群生成部６２、閾値推定前時刻分類器群生成部６３、誤差群算出部６４、関数推定部６５、テストデータ生成部７１、ラベル付けデータ生成部７２、分類器生成部７３、およびテストデータラベル判定部７４が、普及予測プログラムに従って動作するＣＰＵによって実現されていてもよい。 Further, the threshold function estimation device 6 and the spread prediction device 7 may be realized by the same information processing device. Then, a reference time data group generation unit 61, a pre-threshold time data group generation unit 62, a pre-threshold time classifier group generation unit 63, an error group calculation unit 64, a function estimation unit 65, a test data generation unit 71, a labeling The data generation unit 72, the classifier generation unit 73, and the test data label determination unit 74 may be realized by a CPU that operates according to the popularization prediction program.

次に、動作について説明する。
図１９および図２０は、本実施形態における閾値関数推定装置６の処理経過の例を示すフローチャートである。例えば、入力装置（図１８において図示せず。）を介して基準時刻データ群生成部６１に複数の基準時刻、時刻間隔、および複数の仮閾値が入力されると、普及予測システムは以下のように動作する。なお、複数の基準時刻をＣＴｉｍｅ［ｔ］と記す。変数ｔは、入力された複数の基準時刻を順番に指定するための変数であり、ｔ＝１，・・・，Ｕである。例えば、ｔ＝１であれば、１番目の基準時刻を指定していることを意味し、ＣＴｉｍｅ［１］は１番目の基準時刻を表している。時刻間隔をΔｔと記す。また、複数の仮閾値をＦａｌｓｅＴｈｒ［ｉ］と記す。ここで、変数ｉは、入力された複数の仮閾値を順番に指定するための変数であり、ｉ＝１，・・・，Ｖである。例えば、ｉ＝１であれば、１番目の仮閾値を指定していることを意味し、ＦａｌｓｅＴｈｒ［１］は１番目の仮閾値を表している。入力される基準時刻の数をＵ個とし、仮閾値の数をＶ個としている。 Next, the operation will be described.
FIG. 19 and FIG. 20 are flowcharts showing an example of processing progress of the threshold function estimation device 6 in this embodiment. For example, when a plurality of reference times, time intervals, and a plurality of temporary thresholds are input to the reference time data group generation unit 61 via an input device (not shown in FIG. 18), the spread prediction system is as follows. To work. A plurality of reference times are denoted as CTime [t]. The variable t is a variable for sequentially specifying a plurality of input reference times, and t = 1,. For example, if t = 1, it means that the first reference time is specified, and CTime [1] represents the first reference time. The time interval is denoted as Δt. A plurality of temporary thresholds are denoted as FalseThr [i]. Here, the variable i is a variable for sequentially specifying a plurality of input temporary thresholds, and i = 1,. For example, if i = 1, it means that the first temporary threshold value is specified, and FalseThr [1] represents the first temporary threshold value. The number of input reference times is U, and the number of temporary thresholds is V.

基準時刻データ群生成部６１は、基準時刻を指定する変数ｔを初期化してｔ＝１とする。ｔ＝１であるので、１番目の基準時刻ＣＴｉｍｅ［１］を選択していることになる。基準時刻データ群生成部６１は、ＣＴｉｍｅ［１］から時刻間隔Δｔを減算して、閾値推定前時刻を求める。すなわち、閾値推定前時刻をＣＴｉｍｅ［１］−Δｔに設定する（以上、ステップＣ１）。次に、基準時刻データ群生成部６１は、ｔ≦Ｕであるか否かを判定し（ステップＣ２）。ｔ≦Ｕであれば（ステップＣ２におけるＹｅｓ）、ステップＣ３に移行し、ｔ＞Ｕであれば（ステップＣ２におけるＮｏ）、ステップＣ１８に移行する。 The reference time data group generation unit 61 initializes a variable t that specifies the reference time so that t = 1. Since t = 1, the first reference time CTime [1] is selected. The reference time data group generation unit 61 subtracts the time interval Δt from CTime [1] to obtain the time before threshold estimation. That is, the pre-threshold time is set to CTime [1] −Δt (step C1). Next, the reference time data group generation unit 61 determines whether or not t ≦ U (step C2). If t ≦ U (Yes in Step C2), the process proceeds to Step C3. If t> U (No in Step C2), the process proceeds to Step C18.

ステップＣ２でｔ≦Ｕと判定した場合、基準時刻データ群生成部６１は、ｔにより定まる基準時刻ＣＴｉｍｅ［ｔ］と、顧客ＤＢ２に記憶された顧客データとに基づいて、基準時刻データを生成する（ステップＣ３）。基準時刻データ群生成部６１は、顧客ＤＢ２に記憶された顧客データを読み込み、基準時刻ＣＴｉｍｅ［ｔ］で商品またはサービスの利用を開始している顧客の個別データ（利用開始情報がＣＴｉｍｅ［ｔ］以前の時刻を表している個別データ）に正をラベル付け、基準時刻ＣＴｉｍｅ［ｔ］で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付けることによって、基準時刻データを生成する。このとき、基準時刻データ群生成部６１は、例えば、顧客ＤＢ２に記憶されている顧客データに属する個別データのうち、ＣＴｉｍｅ［ｔ］に対応する閾値推定前時刻（ＣＴｉｍｅ［ｔ］−Δｔ）で商品またはサービスの利用を開始していた顧客の個別データを除外して基準時刻データを生成する。すなわち、利用開始情報が閾値推定前時刻以前の時刻となっている個別データを顧客データから除外し、残りの個別データに対して正または負のラベル付けを行って、基準時刻データを生成する。以下、閾値推定前時刻で利用開始していた顧客の個別データを除外して基準時刻データを生成する場合を例にして説明する。ステップＣ３の基準時刻データ生成処理は、第１の実施形態における現時刻データ生成処理と同様の処理である。 If it is determined in step C2 that t ≦ U, the reference time data group generation unit 61 generates reference time data based on the reference time CTime [t] determined by t and the customer data stored in the customer DB2. (Step C3). The reference time data group generation unit 61 reads customer data stored in the customer DB 2 and stores individual data of customers who have started using goods or services at the reference time CTime [t] (use start information is CTime [t] Generates reference time data by labeling positive data (individual data representing previous time) and negative data on individual data of customers who have not started using goods or services at the reference time CTime [t] To do. At this time, the reference time data group generation unit 61 uses, for example, the threshold pre-estimation time (CTime [t] −Δt) corresponding to CTime [t] among the individual data belonging to the customer data stored in the customer DB2. The reference time data is generated by excluding the individual data of the customer who has started using the product or service. That is, the individual data whose use start information is before the threshold estimation time is excluded from the customer data, and the remaining individual data is labeled positively or negatively to generate the reference time data. Hereinafter, a case will be described as an example in which the reference time data is generated by excluding the individual data of customers who have started to use at the time before threshold estimation. The reference time data generation process in step C3 is the same process as the current time data generation process in the first embodiment.

ステップＣ３の次に、閾値推定前時刻データ群６２は、変数ｔにより定まる閾値推定前時刻（ＣＴｉｍｅ［ｔ］−Δｔ）と、顧客ＤＢ２に記憶された顧客データとに基づいて、閾値推定前時刻データを生成する（ステップＣ４）。閾値推定前時刻データ群６２は、顧客ＤＢ２に記憶された顧客データを読み込み、閾値推定前時刻以前で商品またはサービスの利用を開始している顧客の個別データに正をラベル付け、閾値推定前時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付けることにより、閾値推定前時刻データを生成する。すなわち、使用開始情報が閾値推定前時刻以前の時刻となっている個別データに正をラベル付け、利用開始情報が前時刻以前の時刻でない個別データに負をラベル付けて、閾値推定前時刻データを生成する。 Next to step C3, the threshold pre-estimation time data group 62 is based on the pre-threshold pre-estimation time (CTime [t] −Δt) determined by the variable t and the customer data stored in the customer DB 2. Data is generated (step C4). The threshold value pre-estimation time data group 62 reads customer data stored in the customer DB 2, labels positive data on individual data of customers who have started using goods or services before the pre-threshold value estimation time, In this case, the threshold data before threshold estimation is generated by labeling individual data of customers who have not started using the product or service with negative. That is, individual data whose use start information is before the time before threshold estimation is labeled positive, individual data whose use start information is not before the previous time is labeled negative, and time data before threshold estimation is Generate.

次に、閾値推定前時刻データ群６２は、その閾値推定前時刻データを用いて普及率を計算する（ステップＣ５）。普及率は、顧客数に対する、閾値推定前時刻までに商品またはサービスの利用を開始した顧客数の割合であり、閾値推定前時刻データ中の各個別データは各顧客に対応している。従って、閾値推定前時刻データ中の全個別データ数に対する、閾値推定前時刻データ中の正がラベル付けられた個別データの数の割合が普及率となる。閾値推定前時刻データ群６２は、閾値推定前時刻データ中の正がラベル付けられた個別データの数を、閾値推定前時刻データ中の全個別データ数で除算して普及率を求めればよい。この普及率を変数ｔを用いてＤ［ｔ］と記す。閾値推定前時刻データ群６２は、計算した普及率Ｄ［ｔ］を記憶しておく。 Next, the threshold value pre-estimation time data group 62 calculates the penetration rate using the pre-threshold value time data (step C5). The penetration rate is the ratio of the number of customers who have started using goods or services by the time before threshold estimation with respect to the number of customers, and each individual data in the time data before threshold estimation corresponds to each customer. Accordingly, the ratio of the number of individual data labeled positive in the time data before threshold estimation to the total number of individual data in the time data before threshold estimation is the penetration rate. The threshold value pre-estimation time data group 62 may be obtained by dividing the number of individual data labeled positive in the pre-threshold time data by the total number of individual data in the pre-threshold time data. This penetration rate is denoted as D [t] using a variable t. The pre-threshold time data group 62 stores the calculated penetration rate D [t].

ステップＣ５の後、閾値推定前時刻分類器群生成部６３は、閾値推定前時刻データを用いて、分類器を生成する（ステップＣ６）。例えば、分類器生成部６３は、重回帰分析、回帰木、ニューラルネットワーク、サポートベクタマシン、ベイジアンネットワーク、決定木のアンサンブル学習などの出力が連続値であるデータマイニング手法のいずれかあるいはその組合せにより分類器を生成すればよい。 After Step C5, the threshold value pre-estimation time classifier group generation unit 63 generates a classifier using the pre-threshold time data (Step C6). For example, the classifier generation unit 63 performs classification according to any one or a combination of data mining techniques whose outputs are continuous values such as multiple regression analysis, regression tree, neural network, support vector machine, Bayesian network, ensemble learning of decision tree, and the like. It is sufficient to generate a container.

閾値推定前時刻分類器群生成部６３は、例えば、アンサンブル学習の一種であるバギングを決定木に適用して、複数の決定木を生成し、その複数の決定木の組を分類器としてもよい。この場合の分類器作成処理の例を説明する。閾値推定前時刻分類器群生成部６３は、閾値推定前時刻データから重複を許して一部の個別データを取り出して、その個別データの集合を用いて決定木を生成する。この処理を複数回繰り返すことにより、複数の決定木を生成し、その複数の決定木の組み合わせを一つの分類器とする。例えば、閾値推定前時刻データから取り出す個別データ数をＫ個とすると、閾値推定前時刻分類器群生成部６３は、閾値推定前時刻データから重複を許してＫ個の個別データを取り出し、Ｋ個の個別データから決定木を生成することを繰り返せばよい。例えば、この繰り返し処理を５００回行えば、５００個の決定木が作成される。 For example, the threshold pre-estimation time classifier group generation unit 63 may apply bagging, which is a type of ensemble learning, to a decision tree, generate a plurality of decision trees, and use the plurality of decision trees as a classifier. . An example of classifier creation processing in this case will be described. The pre-threshold estimation time classifier group generation unit 63 extracts some pieces of individual data from the pre-threshold estimation time data while allowing duplication, and generates a decision tree using the set of individual data. By repeating this process a plurality of times, a plurality of decision trees are generated, and the combination of the plurality of decision trees is set as one classifier. For example, assuming that the number of individual data extracted from the time data before threshold estimation is K, the time classifier group generation unit 63 before threshold estimation extracts K pieces of individual data by allowing duplication from the time data before threshold estimation. The decision tree may be generated repeatedly from the individual data. For example, if this iterative process is performed 500 times, 500 decision trees are created.

以下の説明では、上記のようにバギングにより複数の決定木を生成し、その複数の決定木を分類器とする場合を例にする。 In the following description, a case where a plurality of decision trees are generated by bagging as described above and the plurality of decision trees are used as a classifier is taken as an example.

取り出した個別データから決定木を生成する処理は、第１の実施形態で前時刻データ群生成部３２の分類器生成動作として例示した処理と同様である。すなわち、閾値推定前時刻分類器群生成部６３は、どの項目（顧客の属性を表す項目）で最初に分割するのかを決定する。このとき、項目１〜Ｎの各項目について、分割時の評価値を計算し、その評価値が最大の項目を、分割に最も適した項目として選択すればよい。評価値として、例えば、分割前のノードのエントロピーと、分割後のエントロピーの差を用いればよい。閾値推定前時刻分類器群生成部６３は、分割後の各ノードについても、上記と同様の処理を行い、次にどの項目で分割するのかを決定する処理を順次繰り返し、所定の条件が満たされたときには、ノードの分割を停止する。閾値推定前時刻分類器群生成部６３は、このようにして得た木構造の決定木に対して、枝刈りを行うことにより、決定木を生成する。上記の所定の条件や枝刈りの処理も、第１の実施形態で示した所定の条件や枝刈り処理と同様である。 The process of generating a decision tree from the extracted individual data is the same as the process exemplified as the classifier generation operation of the previous time data group generation unit 32 in the first embodiment. That is, the threshold value pre-estimation time classifier group generation unit 63 determines which item (item representing the customer's attribute) is divided first. At this time, an evaluation value at the time of division may be calculated for each of items 1 to N, and an item having the largest evaluation value may be selected as an item most suitable for division. As the evaluation value, for example, the difference between the entropy of the node before division and the entropy after division may be used. The pre-threshold time classifier group generation unit 63 performs the same process as described above for each node after the division, and sequentially repeats the process of determining which item to divide next, so that a predetermined condition is satisfied. If it happens, the node division is stopped. The pre-threshold time classifier group generation unit 63 generates a decision tree by pruning the tree-structured decision tree obtained in this way. The predetermined conditions and the pruning process are the same as the predetermined conditions and the pruning process described in the first embodiment.

また、閾値推定前時刻データから取り出すＫ個の個別データの組み合わせが異なることにより、生成される決定木も異なる。個別データに対して上記のように得られた複数の決定木を適用し、その個別データのスコア（ラベルが正となる確からしさ）を求める場合、個々の決定木毎に、個別データに対するラベルの判定を行い、正（利用する）と判定された回数を、判定回数で除算することによってスコアを求める。例えば、分類器として５００個の決定木を生成した場合、個別データに対して、それぞれの決定木を用いてラベル判定を行う。その結果、正と判定された回数をＸ回とすると、スコアとしてＸ／５００を求める。 In addition, the decision tree generated differs depending on the combination of K pieces of individual data extracted from the time data before threshold estimation. When applying a plurality of decision trees obtained as described above to individual data and obtaining the score (probability that the label will be positive) of the individual data, the label of the individual data is determined for each individual decision tree. A score is determined by dividing the number of times determined to be positive (used) by the number of times of determination. For example, when 500 decision trees are generated as a classifier, label determination is performed on individual data using each decision tree. As a result, if the number of times determined to be positive is X, X / 500 is obtained as a score.

ステップＣ６において分類器が生成されると、誤差群算出部６４は、誤差の最小値を示す変数ＥｒｒｏｒＭｉｎの初期値を設定し、変数ｉ，ｐに対して、ｉ＝１，ｐ＝０という初期値を設定する（ステップＣ７）。ＥｒｒｏｒＭｉｎの初期値は、閾値算出部６４が計算する誤差のとり得る値の最大値、あるいは、その誤差のとり得る値に比べて十分に大きな値であればよい。変数ｐは、最小の誤差に対応する仮閾値を指定するための変数である。例えば、最小語の誤差に対応する仮閾値がｐ個あったとすると、その最小誤差に対応する仮閾値の１番目からｐ番目までをそれぞれ、Ｔｈｒｅｓ［１］，・・・，Ｔｈｒｅｓ［ｐ］とする。 When the classifier is generated in step C6, the error group calculation unit 64 sets the initial value of the variable ErrorMin indicating the minimum value of the error, and the initial values i = 1 and p = 0 for the variables i and p. A value is set (step C7). The initial value of ErrorMin may be a maximum value that can be taken by the threshold calculation unit 64 or a value that is sufficiently larger than a value that can be taken by the error. The variable p is a variable for designating a temporary threshold corresponding to the minimum error. For example, if there are p provisional threshold values corresponding to the error of the minimum word, the first to pth provisional threshold values corresponding to the minimum error are respectively represented as Thres [1], ..., Thres [p]. To do.

ステップＣ７の次に、誤差群算出部６４は、仮閾値の番号ｉが仮閾値の数Ｖ以下であるか否かを判定する（ステップＣ８）。ｉ≦Ｖならば（ステップＣ８のＹｅｓ）、ステップＣ９に移行し、ｉ＞Ｖならば（ステップＣ８のＮｏ）、ステップＣ１６に移行する。 After step C7, the error group calculation unit 64 determines whether the temporary threshold number i is equal to or smaller than the temporary threshold number V (step C8). If i ≦ V (Yes in Step C8), the process proceeds to Step C9. If i> V (No in Step C8), the process proceeds to Step C16.

ステップＣ９では、誤差群算出部６４は、ステップＣ６で生成された閾値推定前時刻分類器と、仮閾値ＦａｌｓｅＴｈｒ［ｉ］とを用いて、ステップＣ３で生成された基準時刻データ内の各個別データのラベルを予測する（ステップＣ９）。本例では、誤差群算出部６４は、基準時刻データ内の各個別データ毎に、その個別データにおける顧客の属性を示す項目と、分類器として生成されたそれぞれの決定木とを照合し、その個別データに対するラベルを判定する。決定木は複数存在するので、誤差群算出部６４は、この判定を決定木毎に行い、その判定回数（決定木数）に対する正と判定された回数の割合を個別データのスコアとして求める。すなわち、誤差群算出部６４は、一つの個別データに関し、「正と判定された回数／判定回数」を計算し、その値をスコアとする。さらに、誤差群算出部６４は、変数ｉにより定まる仮閾値ＦａｌｓｅＴｈｒ［ｉ］とそのスコアとを比較し、スコアがＦａｌｓｅＴｈｒ［ｉ］以上であれば、個別データに対するラベルが正であると予測し、スコアがＦａｌｓｅＴｈｒ［ｉ］未満であれば、個別データに対するラベルが負であると予測する。 In step C9, the error group calculation unit 64 uses the pre-threshold time classifier generated in step C6 and the temporary threshold FalseThr [i], and each individual data in the reference time data generated in step C3. Is predicted (step C9). In this example, the error group calculation unit 64 compares, for each individual data in the reference time data, an item indicating the customer attribute in the individual data and each decision tree generated as a classifier, Determine labels for individual data. Since there are a plurality of decision trees, the error group calculation unit 64 performs this determination for each decision tree, and obtains the ratio of the number of times determined to be positive with respect to the number of determinations (number of determined trees) as the score of the individual data. That is, the error group calculation unit 64 calculates “the number of times determined to be positive / the number of determinations” for one piece of individual data, and uses the value as a score. Further, the error group calculating unit 64 compares the temporary threshold FalseThr [i] determined by the variable i with the score, and predicts that the label for the individual data is positive if the score is equal to or greater than FalseThr [i]. If the score is less than FalseThr [i], it is predicted that the label for the individual data is negative.

次に、誤差群算出部６４は、基準時刻データ内の各個別データにラベル付けられるラベルの予測結果と、基準時刻データとの誤差を算出する（ステップＣ１０）。すなわち、基準時刻データ内の各個別データには、基準時刻データ生成時に正または負のラベルがラベル付けられているので、ステップＣ９での予測結果と、実際の基準時刻データでラベル付けられているラベルとの誤差を予測する。この誤差をＥｒｒｏｒと記す。 Next, the error group calculation unit 64 calculates an error between the prediction result of the label labeled on each individual data in the reference time data and the reference time data (step C10). That is, since each individual data in the reference time data is labeled with a positive or negative label when the reference time data is generated, it is labeled with the prediction result in step C9 and the actual reference time data. Predict the error from the label. This error is referred to as Error.

誤差群算出部６４は、ステップＣ１０において、例えば、基準時刻データ内の個別データ毎に予測したラベルと、実際に基準時刻データの各個別データにラベル付けられているラベルとを比較し、両者が異なっている個別データ数をＥｒｒｏｒとしてもよい。 In step C10, the error group calculation unit 64 compares, for example, a label predicted for each individual data in the reference time data with a label actually labeled on each individual data in the reference time data. Different numbers of individual data may be set as Error.

あるいは、誤差群算出部６４は、基準時刻データ内の個別データのうちステップＣ９で正がラベル付けられると予測した個別データの数と、ステップＣ３で生成された基準時刻データにおいて実際に正がラベル付けられた個別データの数との差の絶対値をＥｒｒｏｒとしてもよい。 Alternatively, the error group calculation unit 64 actually labels positive in the number of individual data predicted to be labeled positive in step C9 among the individual data in the reference time data and the reference time data generated in step C3. The absolute value of the difference from the number of attached individual data may be Error.

ステップＣ１０の後、関数推定部６５は、ステップＣ１０で計算された誤差Ｅｒｒｏｒと最小誤差値ＥｒｒｏｒＭｉｎとを比較し、ＥｒｒｏｒがＥｒｒｏｒＭｉｎ未満であるか否かを判定する（ステップＣ１１）。ＥｒｒｏｒがＥｒｒｏｒＭｉｎ未満である場合（ステップＣ１１におけるＹｅｓ）、関数推定部６５は、ｐを０に初期化し、ＥｒｒｏｒＭｉｎにＥｒｒｏｒを代入する（ステップＣ１３）。また、ステップＣ１３において、関数推定部６５は、変数Ｓｕｍを０に初期化する。なお、変数Ｓｕｍは、誤差が最小となる仮閾値の和を格納するための変数であり、その仮閾値の平均値を計算するために用いられる。ＥｒｒｏｒがＥｒｒｏｒＭｉｎ未満ということは、これまで最小としていた誤差よりもさらに小さい最小値が見つかったことを意味する。この場合、ステップＣ１３において、その最小値でＥｒｒｏｒＭｉｎを更新し、また、その誤差に対応する仮閾値が複数ある場合に各仮閾値を個別に指定するための変数ｐを初期化している。 After step C10, the function estimation unit 65 compares the error Error calculated in step C10 with the minimum error value ErrorMin, and determines whether Error is less than ErrorMin (step C11). When Error is less than ErrorMin (Yes in Step C11), the function estimation unit 65 initializes p to 0 and substitutes Error for ErrorMin (Step C13). In step C13, the function estimation unit 65 initializes the variable Sum to 0. The variable Sum is a variable for storing the sum of provisional threshold values that minimize the error, and is used to calculate the average value of the provisional threshold values. If Error is less than ErrorMin, it means that a minimum value smaller than the error that has been minimized so far has been found. In this case, in Step C13, ErrorMin is updated with the minimum value, and when there are a plurality of temporary threshold values corresponding to the error, a variable p for individually specifying each temporary threshold value is initialized.

ステップＣ１３の後、関数推定部６５は、ｐの値を１インクリメントし、Ｓｕｍの値に仮閾値ＦａｌｓｅＴｈｒ［ｉ］を加算することによって、Ｓｕｍを更新する（ステップＣ１４）。すなわち、Ｓｕｍ＋ＦａｌｓｅＴｈｒ［ｉ］の値を新たなＳｕｍの値とするようにＳｕｍを更新する。また、ＥｒｒｏｒがＥｒｒｏｒＭｉｎ以上であると判定した場合（ステップＣ１１におけるＮｏ）、関数推定部６５は、ＥｒｒｏｒがＥｒｒｏｒＭｉｎと等しいか否かを判定する（ステップＣ１２）。ここで等しいと判定した場合（ステップＣ１２におけるＹｅｓ）にも、関数推定部６５は、ｐの値を１インクリメントし、Ｓｕｍの値にＦａｌｓｅＴｈｒ［ｉ］を加算してＳｕｍを更新する（ステップＣ１４）。 After step C13, the function estimation unit 65 updates Sum by incrementing the value of p by 1 and adding the provisional threshold FalseThr [i] to the value of Sum (step C14). That is, the Sum is updated so that the value of Sum + FalseThr [i] becomes the new Sum value. When it is determined that Error is equal to or greater than ErrorMin (No in step C11), the function estimation unit 65 determines whether Error is equal to ErrorMin (step C12). Even when it is determined that they are equal (Yes in Step C12), the function estimation unit 65 increments the value of p by 1, adds FalseThr [i] to the Sum value, and updates the Sum (Step C14). .

ステップＣ１３からステップＣ１４に移行した場合、ｐ＝１であり、現在着目している仮閾値ＦａｌｓｅＴｈｒ［ｉ］を、誤差が最小となる１番目の仮閾値とする。また、ステップＣ１２でＥｒｒｏｒ＝ＥｒｒｏｒＭｉｎとなり、ステップＣ１４に移行した場合、ｐは２以上の値となる。この場合、既に最小の誤差に対応する仮閾値は１つ以上見つかっていて、関数推定部６５は、現在着目している仮閾値ＦｌａｓｅＴｈｒ［ｉ］を、誤差が最小となるｐ番目の仮閾値として定めることになる。 When the process proceeds from step C13 to step C14, p = 1, and the temporary threshold value FalseThr [i] currently focused on is set as the first temporary threshold value that minimizes the error. In Step C12, Error = ErrorMin, and when the process proceeds to Step C14, p becomes a value of 2 or more. In this case, one or more provisional threshold values corresponding to the minimum error have already been found, and the function estimation unit 65 sets the currently-thought provisional threshold value FlameThr [i] as the p-th provisional threshold value that minimizes the error. It will be determined.

ステップＣ１４の後、あるいは、ステップＣ１２でＥｒｒｏｒがＥｒｒｏｒＭｉｎと等しくないと判定した場合、関数推定部６５は、仮閾値を示す変数ｉをインクリメントする（ステップＣ１５）。その後、ステップＣ１５でインクリメントされたｉを用いて、ステップＣ８移行の処理を繰り返す。 After step C14 or when it is determined in step C12 that Error is not equal to ErrorMin, the function estimation unit 65 increments a variable i indicating a temporary threshold (step C15). Thereafter, the process of step C8 is repeated using i incremented in step C15.

また、ステップＣ８でｉ＞Ｖと判定された場合、関数推定部６５は、ｔに対応する閾値（Ｔｈｒｅｓ［ｔ］と記す。）の値を、Ｓｕｍ／ｐとする（ステップＣ１６）。すなわち、最小誤差に対応する仮閾値の合計となるＳｕｍをその仮閾値の個数ｐで除算し、平均値を閾値とする。最小誤差に対応する仮閾値が１つしかなければ、Ｓｕｍはその仮閾値の値となっているので、その仮閾値をＴｈｒｅｓ［ｔ］とすることになる。 If it is determined in step C8 that i> V, the function estimation unit 65 sets the value of the threshold corresponding to t (denoted as Thres [t]) to Sum / p (step C16). That is, Sum, which is the sum of the temporary threshold values corresponding to the minimum error, is divided by the number p of the temporary threshold values, and the average value is set as the threshold value. If there is only one temporary threshold corresponding to the minimum error, Sum is the value of the temporary threshold, and the temporary threshold is set to Thres [t].

ステップＣ１６の後、関数推定部６５は、基準時刻を順番に指定するための変数ｔをインクリメントする（ステップＣ１７）。ステップＣ１７の後、インクリメントされたｔを用いて、ステップＣ２以降の処理を繰り返す。 After step C16, the function estimating unit 65 increments a variable t for sequentially specifying the reference time (step C17). After step C17, the process after step C2 is repeated using the incremented t.

ステップＣ２でｔ＞Ｕと判定された場合、関数推定部６５は、各閾値推定前時刻毎に求めた普及率Ｄ［ｔ］および閾値Ｔｈｒｅｓ［ｔ］から、普及率に対する閾値の関数である閾値関数を求める（ステップＣ１８）。ステップＣ１８に移行した時点で、各ｔにおける閾値推定間時刻が計算され、その閾値推定前時刻に対応する普及率Ｄ［ｔ］および閾値Ｔｈｒｅｓ［ｔ］が求められている。すなわち、普及率Ｄ［ｔ］および閾値Ｔｈｒｅｓ［ｔ］の組がＵ組求められている。関数推定部６５は、この普及率Ｄ［ｔ］および閾値Ｔｈｒｅｓ［ｔ］の関係を曲線近似して、閾値関数を求めればよい。例えば、Ｔｈｒｅｓ［ｔ］がＤ［ｔ］を用いて、以下の式（１）で表されるとみなし、式（１）のパラメタａ，ｂを推定すればよい。 When it is determined in step C2 that t> U, the function estimation unit 65 determines a threshold value that is a function of a threshold value for the penetration rate from the penetration rate D [t] and the threshold value Thres [t] obtained for each time before the threshold estimation. A function is obtained (step C18). At the time of shifting to Step C18, the time between threshold estimations at each t is calculated, and the penetration rate D [t] and the threshold Thres [t] corresponding to the time before the threshold estimation are obtained. That is, a set of the penetration rate D [t] and the threshold value Thres [t] is obtained. The function estimation unit 65 may obtain a threshold function by approximating the relationship between the penetration rate D [t] and the threshold value Thres [t] by a curve. For example, assuming that Thres [t] is expressed by the following equation (1) using D [t], the parameters a and b of the equation (1) may be estimated.

Ｔｈｒｅｓ［ｔ］＝ａ・Ｄ［ｔ］＋ｂ式（１） Thres [t] = a · D [t] + b Formula (1)

式（１）において、ａ，ｂは推定すべきパラメタである。普及率Ｄ［ｔ］および閾値Ｔｈｒｅｓ［ｔ］の組み合わせがＵ個求められており、関数推定部６５は、例えば、そのＵ個の組み合わせ（閾値および普及率の組み合わせ）を用いて、最小二乗法によってパラメタａ，ｂを計算してもよい。なお、パラメタａの値は、商品またはサービスの普及率が時間の経過と共に急激に上昇するなら低く、普及率の上昇が緩やかなら高くなる。ａは、例えば、０．５などの値となる。パラメタｂは、普及開始段階での普及率が急激に上昇するなら低く、普及率がほとんど上昇しないなら低くなる値であり、０以上１以下で、例えば０．０５等の小さな値となる。 In equation (1), a and b are parameters to be estimated. U combinations of the penetration rate D [t] and the threshold value Thres [t] are obtained, and the function estimating unit 65 uses, for example, the U combinations (a combination of the threshold value and the penetration rate) to calculate the least square method. The parameters a and b may be calculated by Note that the value of the parameter a is low if the penetration rate of goods or services increases rapidly with time, and increases if the increase in penetration rate is moderate. For example, a is a value such as 0.5. The parameter b is a value that is low if the diffusion rate at the diffusion start stage suddenly increases, and is low if the diffusion rate hardly increases, and is a value from 0 to 1 and a small value such as 0.05.

また、式（１）は、パラメタが定まっていない閾値関数の例であり、推定すべきパラメタを含む閾値関数は式（１）に限定されない。式（１）に例示する推定すべきパラメタを含む閾値関数は、普及予測システムのユーザに入力されてもよい。あるいは、予め定められていてもよい。 Moreover, Formula (1) is an example of a threshold function whose parameters are not determined, and the threshold function including the parameter to be estimated is not limited to Formula (1). The threshold function including the parameter to be estimated exemplified in Expression (1) may be input to the user of the spread prediction system. Alternatively, it may be determined in advance.

関数推定部６５は、例えば最小二乗法などによりパラメタａ，ｂを計算することによって、閾値関数を決定する。関数推定部６５は、閾値関数を普及予測装置７（テストデータラベル判定部７４）に入力する。 The function estimation unit 65 determines the threshold function by calculating the parameters a and b by, for example, the least square method. The function estimation unit 65 inputs the threshold function to the spread prediction device 7 (test data label determination unit 74).

図２１は、本実施形態における普及予測装置７の処理経過の例を示すフローチャートである。普及予測装置７は、例えば、関数推定部６５から閾値関数が入力され、また、キーボードなどの入力装置を介して現時刻が入力されると、以下のように動作する。 FIG. 21 is a flowchart illustrating an example of processing progress of the spread prediction device 7 in the present embodiment. For example, when the threshold value function is input from the function estimation unit 65 and the current time is input via an input device such as a keyboard, the spread prediction device 7 operates as follows.

テストデータ生成部７１は、テストデータを生成する（ステップＤ１）。この動作は、第２の実施形態におけるステップＢ１と同様である。ここでは、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみを含むテストデータを生成する場合を例にして説明する。テストデータ生成部７１は、顧客ＤＢ２に記憶されている顧客データの中から、現時刻で商品またはサービスの利用を開始していない顧客の個別データのみ（すなわち、利用開始情報が現時刻以前の時刻でない個別データのみ）を読み込み、その個別データの集合をテストデータとする。 The test data generation unit 71 generates test data (step D1). This operation is the same as step B1 in the second embodiment. Here, an example will be described in which test data including only individual data of customers who have not started using products or services at the current time is generated. The test data generation unit 71 selects only individual data of customers who have not started using the product or service at the current time from the customer data stored in the customer DB 2 (that is, the time when the use start information is before the current time). Non-individual data only) and the set of the individual data is used as test data.

次に、ラベル付けデータ生成部７２は、ラベル付けデータを生成する（ステップＤ２）。ラベル付けデータ生成部７２は、顧客ＤＢ２に記憶された顧客データを読み込み、現時刻以前で商品またはサービスの利用を開始している顧客の個別データに正をラベル付け、現時刻で商品またはサービスの利用を開始していない顧客の個別データに負をラベル付けることにより、ラベル付けデータを生成する。 Next, the labeling data generation unit 72 generates labeling data (step D2). The labeling data generation unit 72 reads the customer data stored in the customer DB 2, labels the individual data of the customer who has started using the product or service before the current time, and labels the product or service at the current time. Labeling data is generated by labeling individual customer data that has not been used negatively.

次に、ラベル付けデータ生成部７２は、そのラベル付けデータから普及率を計算する（ステップＤ３）。ラベル付けデータ中の各個別データは各顧客に対応しているので、ラベル付けデータ中の全個別データに対する、ラベル付けデータ中の正がラベル付けられた個別データの数の割合が普及率となる。ラベル付けデータ生成部７２は、ラベル付けデータ中の正がラベル付けられた個別データの数を、ラベル付けデータ中の全個別データ数で除算して普及率を求めればよい。 Next, the labeling data generation unit 72 calculates a penetration rate from the labeling data (step D3). Since each individual data in the labeled data corresponds to each customer, the penetration rate is the ratio of the number of individual data labeled positive in the labeled data to the total individual data in the labeled data . The labeling data generation unit 72 may obtain the penetration rate by dividing the number of individual data labeled positive in the labeling data by the total number of individual data in the labeling data.

次に、分類器生成部７３は、ラベル付けデータを用いて分類器を生成する（ステップＤ４）。この分類器は、閾値関数推定装置６の閾値推定前時刻分類器群生成部６３が生成する分類器と同様である。本例では、バギングにより複数の決定木を生成し、その複数の決定木を分類器とするものとする。すなわち、分類器生成部７３は、ラベル付けデータから重複を許して一部の個別データを取り出して、その個別データの集合を用いて決定木を生成する処理を繰り返して、複数の決定木を生成する。ラベル付けデータから取り出す個別データの数をＫ個とすると、分類器生成部７３は、ラベル付けデータから重複を許してＫ個の個別データを取り出し、Ｋ個の個別データから決定木を生成することを繰り返せばよい。 Next, the classifier generation unit 73 generates a classifier using the labeling data (step D4). This classifier is the same as the classifier generated by the threshold value pre-estimation time classifier group generation unit 63 of the threshold function estimation device 6. In this example, a plurality of decision trees are generated by bagging, and the plurality of decision trees are assumed to be classifiers. That is, the classifier generation unit 73 generates a plurality of decision trees by extracting a part of individual data while allowing duplication from the labeled data and repeating the process of generating a decision tree using the set of the individual data. To do. Assuming that the number of individual data extracted from the labeling data is K, the classifier generation unit 73 extracts K pieces of individual data allowing duplication from the labeling data, and generates a decision tree from the K pieces of individual data. Can be repeated.

取り出した個別データから決定木を生成する処理は、第１の実施形態で前時刻データ群生成部３２の分類器生成動作として例示した処理と同様である。分類器生成部７３は、どの項目（顧客の属性を表す項目）で最初に分割するのかを決定する。このとき、項目１〜Ｎの各項目について、分割時の評価値を計算し、その評価値が最大の項目を、分割に最も適した項目として選択すればよい。評価値として、例えば、分割前のノードのエントロピーと、分割後のエントロピーの差を用いればよい。分類器生成部７３は、分割後の各ノードについても、上記と同様の処理を行い、次にどの項目で分割するのかを決定する処理を順次繰り返し、所定の条件が満たされたときには、ノードの分割を停止する。また、分類器生成部７３は、このようにして得た木構造の決定木に対して、枝刈りを行うことにより、決定木を生成する。上記の所定の条件や枝刈りの処理も、第１の実施形態で示した所定の条件や枝刈り処理と同様である。 The process of generating a decision tree from the extracted individual data is the same as the process exemplified as the classifier generation operation of the previous time data group generation unit 32 in the first embodiment. The classifier generation unit 73 determines which item (item representing the customer's attribute) is to be divided first. At this time, an evaluation value at the time of division may be calculated for each of items 1 to N, and an item having the largest evaluation value may be selected as an item most suitable for division. As the evaluation value, for example, the difference between the entropy of the node before division and the entropy after division may be used. The classifier generation unit 73 performs the same process as described above for each node after the division, and sequentially repeats the process for determining which item is to be divided next, and when a predetermined condition is satisfied, Stop splitting. The classifier generation unit 73 generates a decision tree by pruning the tree-structured decision tree thus obtained. The predetermined conditions and the pruning process are the same as the predetermined conditions and the pruning process described in the first embodiment.

テストデータラベル判定部７４は、関数推定部６５が推定した閾値関数に、ステップＤ３で算出した普及率を代入して、閾値を計算する（ステップＤ５）。 The test data label determination unit 74 substitutes the penetration rate calculated in step D3 for the threshold function estimated by the function estimation unit 65 to calculate the threshold (step D5).

次に、テストデータラベル判定部７４は、ステップＤ４で生成された分類器およびステップＤ５で計算された閾値を用いてテストデータのラベルを予測する（ステップＤ６）。テストデータラベル判定部７４は、ステップＤ６において、テストデータ中の各個別データ毎に以下の処理を行う。テストデータラベル判定部７４は、個別データにおける顧客の属性を示す項目と、分類器として生成されたそれぞれの決定木とを照合し、その個別データに対するラベルを判定する。決定木は複数存在するので、テストデータラベル判定部７４は、この判定を決定木毎に行い、その判定回数（決定木数）に対する正と判定された回数の割合を個別データのスコアとして求める。すなわち、テストデータラベル判定部７４は、一つの個別データに関し、「正と判定された回数／判定回数」を計算し、その値をスコアとする。さらに、テストデータラベル判定部７４は、ステップＤ５で計算された閾値とそのスコアとを比較し、スコアが閾値以上であれば、その個別データに対するラベルが正であると判定し、スコアが閾値未満であれば、その個別データに対するラベルが負であると判定する。 Next, the test data label determination unit 74 predicts the label of the test data using the classifier generated in step D4 and the threshold value calculated in step D5 (step D6). In step D6, the test data label determination unit 74 performs the following processing for each individual data in the test data. The test data label determination unit 74 compares an item indicating the customer attribute in the individual data with each decision tree generated as a classifier, and determines a label for the individual data. Since there are a plurality of decision trees, the test data label determination unit 74 performs this determination for each decision tree, and obtains the ratio of the number of times determined to be positive with respect to the number of determinations (number of determined trees) as the score of the individual data. That is, the test data label determination unit 74 calculates “the number of times determined to be positive / the number of determinations” for one piece of individual data, and uses the value as a score. Furthermore, the test data label determination unit 74 compares the threshold calculated in step D5 with the score, and if the score is equal to or greater than the threshold, determines that the label for the individual data is positive, and the score is less than the threshold. If so, it is determined that the label for the individual data is negative.

テストデータラベル判定部７４は、テストデータ内の個別データ毎に判定した正または負のラベルを、ディスプレイ装置などに表示させたり、印刷装置に印刷させたりしてもよい。 The test data label determination unit 74 may display the positive or negative label determined for each individual data in the test data on a display device or the like, or may cause the printing device to print.

本実施形態によれば、第２の実施形態と同様に、顧客が商品やサービスを利用し始める時期を顧客毎に予測することができる。また、本実施形態では、閾値関数を求めるときには、入力された複数の基準時刻毎に、また、複数の仮閾値毎にそれぞれ予測誤差（Ｅｒｒｏｒ）を求め、誤差が最小となる閾値を求めて、閾値関数を推定するので、刻々と変動する受容性を閾値関数として推定することができる。また、重み付けを行う場合には、重み付けを行う毎に分類器を生成することになるが、本実施形態では、重み付けは行わないので、各仮閾値毎に誤差（Ｅｒｒｏｒ）を算出するときに、仮閾値毎に分類器を生成する必要はなく、一つの分類器を用いてそれぞれの仮閾値における誤差を求めることができる。よって、分類器生成のための計算時間を短縮することができる。 According to the present embodiment, similarly to the second embodiment, it is possible to predict for each customer when the customer starts using the product or service. In the present embodiment, when obtaining a threshold function, a prediction error (Error) is obtained for each of a plurality of input reference times and for each of a plurality of temporary thresholds, and a threshold value that minimizes the error is obtained. Since the threshold function is estimated, the acceptability that changes every moment can be estimated as the threshold function. In addition, when performing weighting, a classifier is generated every time weighting is performed. However, in this embodiment, since weighting is not performed, when calculating an error (Error) for each provisional threshold, There is no need to generate a classifier for each temporary threshold, and an error in each temporary threshold can be obtained using one classifier. Therefore, the calculation time for generating the classifier can be shortened.

次に、第４の実施形態の変形例を示す。以下に示す第４の実施形態の変形例では、第２の実施形態の変形例（図１４、図１５参照）と同様に、現時刻を更新してテストデータに対する予測を繰り返す。 Next, a modification of the fourth embodiment is shown. In the following modification of the fourth embodiment, the current time is updated and prediction for test data is repeated, as in the modification of the second embodiment (see FIGS. 14 and 15).

図２２は、第４の実施形態の変形例を示すブロック図である。既に説明した構成要素と同様の構成要素は、図１８と同一の符号を付し、詳細な説明を省略する。普及予測装置７は、テストデータ生成部７１と、ラベル付けデータ生成部７２と、分類器生成部７３と、テストデータラベル判定部７４と、現時刻更新部７５とを備える。現時刻更新部７５は、第２の実施形態の変形例で示した現時刻更新部５５と同様である。 FIG. 22 is a block diagram illustrating a modification of the fourth embodiment. The same components as those already described are denoted by the same reference numerals as those in FIG. 18, and detailed description thereof is omitted. The spread prediction device 7 includes a test data generation unit 71, a labeling data generation unit 72, a classifier generation unit 73, a test data label determination unit 74, and a current time update unit 75. The current time update unit 75 is the same as the current time update unit 55 shown in the modification of the second embodiment.

本変形例では、テストデータラベル判定部７４は、ステップＤ６での判定結果が実際の利用開始情報と異なっている個別データの利用開始情報を上書きして更新する。すなわち、個別データの利用開始情報が未利用を示していて、ステップＤ６で正と判定した場合、テストデータラベル判定部７４は、顧客ＤＢ２に記憶されている顧客データ内におけるその個別データの利用開始情報を現時刻に一定の時間（Ｔとする。）分を加算した時刻に更新する。 In this modification, the test data label determination unit 74 overwrites and updates the use start information of the individual data whose determination result in step D6 is different from the actual use start information. That is, if the use start information of the individual data indicates that it is not used and it is determined to be positive in Step D6, the test data label determination unit 74 starts using the individual data in the customer data stored in the customer DB2. Information is updated to a time obtained by adding a certain time (T) to the current time.

現時刻更新部７５は、テストデータラベル判定部７４がテストデータに対する判定を行った後、現時刻を一定の時間（Ｔ）分増加した時刻を新たな現時刻とするように、現時刻を更新する。現時刻更新部７５も、例えば、普及予測プログラムに従って動作するＣＰＵによって実現される。 The current time updating unit 75 updates the current time so that the time obtained by increasing the current time by a certain time (T) is set as the new current time after the test data label determination unit 74 determines the test data. To do. The current time update unit 75 is also realized by, for example, a CPU that operates according to a popularization prediction program.

テストデータ生成部７１、ラベル付けデータ生成部７２、分類器生成部７３、テストデータラベル判定部７４は、現時刻が更新されると、更新後の現時刻および更新後の顧客データを用いて、ステップＤ１以降の処理を繰り返す。 When the current time is updated, the test data generation unit 71, the labeling data generation unit 72, the classifier generation unit 73, and the test data label determination unit 74 use the updated current time and the updated customer data, The processing after step D1 is repeated.

なお、この繰り返し処理の終了条件は、例えば、第２の実施形態の変形例と同様に、更新後の現時刻が、定められた期間（予測期間）を越えることとすればよい。現時刻更新部７５は、更新後の現時刻が予測期間内であることを条件に現時刻の更新を行えばよい。予測期間は、例えば、普及予測システムのユーザによって予め入力されればよい。 Note that the condition for ending this iterative process may be, for example, that the updated current time exceeds a predetermined period (predicted period), as in the modification of the second embodiment. The current time update unit 75 may update the current time on condition that the updated current time is within the prediction period. The prediction period may be input in advance by the user of the popularization prediction system, for example.

現時刻更新部７５は、更新後の現時刻が予測期間を越えると判定した場合、予測結果を出力装置（図２２において図示せず。）に出力させてもよい。この出力態様は、図１４に例示する第２の実施形態の変形例と同様である。例えば、普及曲線を出力してもよい。この場合、ステップＤ１に移行する毎に、現時刻更新部７５（他の構成要素でもよい）が、現時刻において商品またはサービスの利用を開始している顧客の個別データ数を顧客ＤＢの中からカウントし、そのカウント値と現時刻とを対応付けて記憶していけばよい。そして、現時刻更新部７５は、現時刻を横軸にとり、カウント値を縦軸にとって、現時刻の変化に伴うカウント値の推移を示すグラフを作成して出力すればよい。 If the current time update unit 75 determines that the updated current time exceeds the prediction period, the current time update unit 75 may cause the output device (not shown in FIG. 22) to output the prediction result. This output mode is the same as the modification of the second embodiment illustrated in FIG. For example, a popularization curve may be output. In this case, every time the process proceeds to step D1, the current time update unit 75 (which may be another component) calculates the number of individual data of the customer who has started using the product or service at the current time from the customer DB. It is only necessary to count and store the count value and the current time in association with each other. Then, the current time updating unit 75 may create and output a graph showing the transition of the count value accompanying the change in the current time, with the current time on the horizontal axis and the count value on the vertical axis.

なお、本変形例において、顧客ＤＢ２に記憶されている個別データの利用開始情報を上書きするのではなく、個別データに当初から含まれている利用開始情報のコピーを利用開始時期予測情報としてコピーしておき、そのコピーを更新してもよい。 In this modification, instead of overwriting the use start information of the individual data stored in the customer DB 2, a copy of the use start information included in the individual data from the beginning is copied as use start time prediction information. The copy may be updated.

図２３は、第４の実施形態の他の変形例を示すブロック図である。図１８および図２２では、普及予測システムが閾値関数推定装置６を備える場合を説明したが、図２３に示すように、閾値関数推定装置６を備えていない構成であってもよい。図２３に示すテストデータ生成部７１、ラベル付けデータ生成部７２、分類器生成部７３、テストデータラベル判定部７４の動作は、既に説明した動作と同様である。閾値関数推定装置６を備えない構成では、普及予測システムの外部から閾値関数が、テストデータラベル判定部７４に入力されればよい。例えば、普及予測システムのユーザが、閾値関数を入力してもよい。本変形例においても、閾値関数推定装置６は現時刻更新部７５を備えてよい。 FIG. 23 is a block diagram illustrating another modification of the fourth embodiment. 18 and 22, the case where the spread prediction system includes the threshold function estimation device 6 has been described. However, as illustrated in FIG. 23, a configuration that does not include the threshold function estimation device 6 may be used. The operations of the test data generation unit 71, labeling data generation unit 72, classifier generation unit 73, and test data label determination unit 74 shown in FIG. 23 are the same as those already described. In a configuration that does not include the threshold function estimation device 6, a threshold function may be input to the test data label determination unit 74 from the outside of the spread prediction system. For example, a user of the spread prediction system may input a threshold function. Also in this modification, the threshold function estimation device 6 may include a current time update unit 75.

本実施例では、性別、年齢、既存の製品の購買履歴、イノベータ特性、接触メディアを、顧客の特徴を示す項目とし、それらの項目とともに利用開始時期を含む顧客データを顧客ＤＢ２に予め記憶させた。特定製品の発売時期から４年間の間にその製品の利用を開始した顧客の個別データに関しては、顧客がその製品の利用開始時期を利用開始情報として定めた。その４年間の間にまだ製品の利用を開始していない顧客の個別データに関しては、利用開始情報として未利用を意味する「？」を定めた。利用開始情報として購買時期（利用開始時期）が記述された顧客は顧客ＤＢ２に記憶された顧客数全体の８．４％であり、利用開始情報として未利用を意味する「？」が記述された顧客は顧客ＤＢ２に記憶された顧客数全体の９１．６％であった。また、各顧客毎に、顧客の特徴を示す上記の項目を定めた。項目の値が不明な場合には、その項目の値として、不明を意味する「？」を記述した。 In this embodiment, gender, age, purchase history of existing products, innovator characteristics, and contact media are items indicating customer characteristics, and customer data including use start time is stored in the customer DB 2 in advance together with these items. . With regard to individual data of customers who have started using the product during the four years from the release date of the specific product, the customer has set the use start time of the product as use start information. For individual data of customers who have not yet started using the product during the four years, “?” Meaning “not used” is set as the usage start information. The customers whose purchase time (use start time) is described as use start information is 8.4% of the total number of customers stored in the customer DB 2, and “?” Meaning “unused” is described as use start information. The customers were 91.6% of the total number of customers stored in the customer DB2. In addition, for each customer, the above items indicating customer characteristics were defined. When the value of an item is unknown, "?" Meaning unknown is described as the value of the item.

以上のような顧客データを顧客ＤＢ２に予め記憶させ、第１の実施形態の影響度推定システムに適用した。その結果、影響度推定システムが影響度を推定した結果、影響度は０．１９１であった。 The customer data as described above was previously stored in the customer DB 2 and applied to the influence degree estimation system of the first embodiment. As a result, as a result of the influence degree estimation system estimating the influence degree, the influence degree was 0.191.

また、この影響度０．１９１を、第２の実施形態の普及予測システム（図１４）に適用した。普及予測システムが出力した普及曲線の例を図２４に示す。発売開始後４年目の普及率は８．４％であり、このときの関数値は、０．１９１×（１００−８．４）／８．４≒２であり、４年目を現時刻とする予測では利用開始した顧客は未利用の顧客の２倍に重み付けた。以降の時刻を現時刻とするときにも影響度は０．１９１で一定としたが、関数値は普及率の増加に伴い減少し、重み付けも少なくした。 Moreover, this influence degree 0.191 was applied to the spread prediction system (FIG. 14) of 2nd Embodiment. An example of the diffusion curve output by the diffusion prediction system is shown in FIG. The penetration rate in the 4th year after the start of sales is 8.4%, and the function value at this time is 0.191 × (100−8.4) /8.4≈2, and the current time in the 4th year According to the prediction, the customers who have started to use are weighted twice as much as the unused customers. Even when the subsequent time is the current time, the influence level is constant at 0.191, but the function value decreases as the penetration rate increases and the weighting is also reduced.

また、普及予測システムによって得られた普及曲線を検証するため、製品発売後４年目から６年目までの２年間の普及実績も調査した。 In addition, in order to verify the penetration curve obtained by the penetration prediction system, we also investigated the penetration results for two years from the 4th year to the 6th year after product launch.

図２４に示す実線は、普及予測システムが出力した普及曲線である。また、＋でプロットした曲線は、製品発売後４年目までの普及の実績を示している。四角形でプロットした曲線は、製品発売後４年目から６年目までの２年間の普及の実績を示している。図２４に示すように、普及予測システムが出力した普及曲線（図２４に示す実線）は、普及の実績（図２４に示す四角形のプロット）に、極めて近かった。 The solid line shown in FIG. 24 is a diffusion curve output by the diffusion prediction system. In addition, the curve plotted with + indicates the results of popularization up to the fourth year after product release. The curve plotted with a rectangle shows the results of popularization for 2 years from the 4th year to the 6th year after product launch. As shown in FIG. 24, the diffusion curve output by the diffusion prediction system (solid line shown in FIG. 24) was very close to the actual achievement of diffusion (rectangular plot shown in FIG. 24).

本発明は、例えば、個別の顧客毎に商品やサービスの利用開始時期を予測する普及予測システムに好適に適用される。また、指定した時点での影響度を推定する影響度推定システムに好適に適用される。 The present invention is preferably applied to, for example, a spread prediction system that predicts the use start time of goods and services for each individual customer. Further, the present invention is suitably applied to an influence degree estimation system that estimates an influence degree at a designated time.

本発明の影響度推定システムの例を示すブロック図である。It is a block diagram which shows the example of the influence degree estimation system of this invention. 入力装置および出力装置を備えた影響度推定システムの例を示すブロック図である。It is a block diagram which shows the example of the influence estimation system provided with the input device and the output device. 本発明の影響度推定システムの処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress of the influence degree estimation system of this invention. 顧客データの例を示す説明図である。It is explanatory drawing which shows the example of customer data. 現時刻データの例を示す説明図である。It is explanatory drawing which shows the example of present time data. 前時刻データ生成過程において重み付けが行われる前のデータの例を示す説明図である。It is explanatory drawing which shows the example of the data before weighting is performed in the previous time data generation process. 前時刻データの例を示す説明図である。It is explanatory drawing which shows the example of previous time data. 既知の独立変数および従属変数の組み合わせの例を示す説明図である。It is explanatory drawing which shows the example of the combination of a known independent variable and a dependent variable. 分類器の例を示す説明図である。It is explanatory drawing which shows the example of a classifier. 分類器から予測される従属変数の例を示す説明図である。It is explanatory drawing which shows the example of the dependent variable estimated from a classifier. 現時刻毎に求めた各仮影響度における誤差を示す説明図である。It is explanatory drawing which shows the error in each temporary influence degree calculated | required for every present time. 本発明の普及予測システムの例を示すブロック図である。It is a block diagram which shows the example of the spread prediction system of this invention. 本発明の普及予測システムの処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress of the spread prediction system of this invention. 第２の実施形態の変形例を示すブロック図である。It is a block diagram which shows the modification of 2nd Embodiment. 第２の実施形態の変形例における処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress in the modification of 2nd Embodiment. 第３の実施形態の普及予測システムの例を示すブロック図である。It is a block diagram which shows the example of the spread prediction system of 3rd Embodiment. 第３の実施形態の普及予測システムを詳細に示すブロック図である。It is a block diagram which shows the penetration prediction system of 3rd Embodiment in detail. 第４の実施形態の普及予測システムの例を示すブロック図である。It is a block diagram which shows the example of the spread prediction system of 4th Embodiment. 第４の実施形態における閾値関数推定装置の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of a process progress of the threshold value function estimation apparatus in 4th Embodiment. 第４の実施形態における閾値関数推定装置の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of a process progress of the threshold value function estimation apparatus in 4th Embodiment. 第４の実施形態における普及予測装置の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of a process progress of the spread prediction apparatus in 4th Embodiment. 第４の実施形態の変形例を示すブロック図である。It is a block diagram which shows the modification of 4th Embodiment. 第４の実施形態の他の変形例を示すブロック図である。It is a block diagram which shows the other modification of 4th Embodiment. 普及予測システムが出力した普及曲線の例を示すグラフである。It is a graph which shows the example of the penetration curve which the penetration prediction system outputted.

Explanation of symbols

２顧客データベース
３１現時刻データ生成部
３２前時刻データ群生成部
３３前時刻分類器群生成部
３４誤差群算出部
３５影響度算出部
５１テストデータ生成部
５２学習データ生成部
５３分類器生成部
５４テストデータラベル判定部
５５現時刻更新部
６１基準時刻データ群生成部
６２閾値推定前時刻データ群生成部
６３閾値推定前時刻分類器群生成部
６４誤差群算出部
６５関数推定部
７１テストデータ生成部
７２ラベル付けデータ生成部
７３分類器生成部
７４テストデータラベル判定部 2 customer database 31 current time data generation unit 32 previous time data group generation unit 33 previous time classifier group generation unit 34 error group calculation unit 35 influence calculation unit 51 test data generation unit 52 learning data generation unit 53 classifier generation unit 54 Test data label determination unit 55 Current time update unit 61 Reference time data group generation unit 62 Time data group generation unit before threshold estimation 63 Time classifier group generation unit before threshold estimation 64 Error group calculation unit 65 Function estimation unit 71 Test data generation unit 72 Labeling Data Generation Unit 73 Classifier Generation Unit 74 Test Data Label Determination Unit

Claims

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start A customer database that stores customer data that is a collection of individual data for each customer including one or more items representing customer attributes other than information;
Using the customer database, a test data generating unit that generates test data that is data including individual data of customers who have not started using goods or services at the current time for determining the determination target time;
Label individual data of customers who have started using the goods or services at the current time, label second data to individual data of customers who have not started using the goods or services, This is data in which the number of individual data labeled with the first label is changed according to the degree of influence, which is the degree that the customer of the product or service urges others to use the product or service within a certain period. A learning data generation unit for generating learning data;
A classifier generator that generates a classifier, which is a rule for determining which of the customer's individual data is labeled with the first label or the second label from the item representing the customer's attribute, based on the learning data; ,
A spread prediction system comprising: a test data label determination unit that determines a label for each individual data in the test data from the classifier and each individual data item in the test data.

A current time update unit that sets a new current time as a time obtained by increasing the current time by a certain amount of time;
The test data label determination unit gives information indicating the use start time to the individual data determined to be labeled with the first label,
When the current time is updated, the test data generation unit generates test data at the current time after the update,
When the current time is updated, the learning data generation unit generates learning data at the current time after the update,
When the current time is updated and new learning data is generated, the classifier generator generates a new classifier based on the learning data,
When the current time is updated and new test data is generated, the test data label determination unit determines whether the individual data in the test data from the classifier and the individual data items in the test data. The spread prediction system according to claim 1, wherein a label is determined.

The test data generation unit generates test data from customer data stored in the customer database, excluding individual data of customers who have started using goods or services at the current time. The spread prediction system described.

The learning data generation unit is an individual data in which the first label is labeled as a function value multiple of a function having a ratio of the number of customers who have started using a product or service and the number of customers who have not started yet as a coefficient of influence. The spread prediction system according to any one of claims 1 to 3, wherein learning data is generated by increasing the number.

The learning data generation unit increases the number of individual data labeled with the first label to a function value multiple of the function having the ratio of the number of customers who have started using the product or service and the total number of customers as a coefficient of influence. The diffusion prediction system according to any one of claims 1 to 3, wherein learning data is generated.

The classifier generation unit labels the first label among the individual data labeled with the second label in the test data as the ratio of the individual data labeled with the first label in the learning data increases. 6. A classifier that generates a high frequency of determining that the first label is labeled with respect to individual data having an item value similar to the item value of the attached individual data is generated. The spread prediction system according to claim 1.

The diffusion prediction system according to any one of claims 1 to 6, wherein the learning data generation unit generates learning data using an influence degree input from outside the diffusion prediction system.

The current time that is used as the target time for determining the determination target time and the degree of influence estimation, the time interval for designating a predetermined time before the current time, and a plurality of temporary influence degrees that are candidates for the influence degree are input. Calculating a previous time that is a time before the time interval from the current time, and using a customer database, a first label is added to individual data of a customer who has started using the product or service at the current time A current time data generation unit for generating current time data by labeling and labeling a second label on individual data of a customer who has not started using the product or service;
For each temporary influence degree, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service A previous time data group generation unit that generates the previous time data, which is data obtained by labeling the second label and changing the number of individual data labeled with the first label according to the temporary influence degree;
Based on the previous time, a classifier that is a rule for determining whether to label individual customer data from the item representing the customer attribute for each individual previous time data. A previous time classifier group generation unit to generate;
For each classifier generated for each previous time data, predict the label that is labeled on the individual data in the current time data from the classifier and the individual data items in the current time data, and predict An error group calculation unit for calculating an error between the result and the current time data;
Among the errors calculated for each classifier, the smallest error is specified, and when there is one temporary influence corresponding to the smallest error, the temporary influence is determined as the influence, 7. The apparatus according to claim 1, further comprising an influence degree calculation unit that determines an influence degree based on the plurality of temporary influence degrees when there are a plurality of temporary influence degrees corresponding to the errors. The spread prediction system described in 1.

The current time data generation unit generates current time data from a set of individual data excluding the individual data of the customer who started using the product or service at the previous time among the customer data stored in the customer database. The spread prediction system according to claim 8.

The previous time data group generator is a function value multiple of the function that uses the ratio of the number of customers who have started using goods or services at the previous time and the number of customers who have not started at the previous time as a coefficient of temporary impact. The spread prediction system according to claim 8 or 9, wherein the previous time data is generated by increasing the number of individual data labeled with the first label.

The previous time data group generation unit labels the first label as a function value multiple of a function having the ratio of the number of customers who have started using the product or service at the previous time and the total number of customers as a coefficient of temporary impact. The spread prediction system according to claim 8 or 9, wherein the previous time data is generated by increasing the number of attached individual data.

For each classifier generated for each previous time data, the error group calculation unit labels the individual data in the current time data from the classifier and the individual data items in the current time data. 12. The label is predicted, and the number of individual data in which the predicted label and the label actually labeled with the current time data are different is calculated as an error. The spread prediction system described in 1.

For each classifier generated for each previous time data, the error group calculation unit labels the individual data in the current time data from the classifier and the individual data items in the current time data. 9. The label is predicted, and the absolute value of the difference between the number of individual data predicted that the first label is labeled and the number of individual data labeled with the first label in the current time data is calculated as an error. The spread prediction system according to any one of claims 11 to 11.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start A customer database that stores customer data that is a collection of individual data for each customer including one or more items representing customer attributes other than information;
Specify the current time used as an estimation target time of the degree of influence, which is the degree to which the customer of the product or service urges others to use the product or service within a certain period, and a certain time before the current time Time interval and a plurality of temporary influence levels that are candidates for influence level are input, a previous time that is a time before the time interval is calculated from the current time, and the current time is calculated using the customer database. Current data is obtained by labeling individual data of customers who have started using the product or service with a first label and labeling individual data of customers who have not started using the product or service with a second label. A current time data generation unit for generating
For each temporary influence degree, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service A previous time data group generation unit that generates the previous time data, which is data obtained by labeling the second label and changing the number of individual data labeled with the first label according to the temporary influence degree;
Based on the previous time, a classifier that is a rule for determining whether to label individual customer data from the item representing the customer attribute for each individual previous time data. A previous time classifier group generation unit to generate;
For each classifier generated for each previous time data, predict the label that is labeled on the individual data in the current time data from the classifier and the individual data items in the current time data, and predict An error group calculation unit for calculating an error between the result and the current time data;
Among the errors calculated for each classifier, the smallest error is specified, and when there is one temporary influence corresponding to the smallest error, the temporary influence is determined as the influence, An influence degree estimation system comprising: an influence degree calculation unit that determines an influence degree based on the plurality of provisional influence degrees when there are a plurality of provisional influence degrees corresponding to errors.

The current time data generation unit generates current time data from a set of individual data excluding the individual data of the customer who started using the product or service at the previous time among the customer data stored in the customer database. The influence estimation system according to claim 14.

The previous time data group generator is a function value multiple of the function that uses the ratio of the number of customers who have started using goods or services at the previous time and the number of customers who have not started at the previous time as a coefficient of temporary impact. The influence estimation system according to claim 14 or 15, wherein the previous time data is generated by increasing the number of individual data labeled with the first label.

The previous time data group generation unit labels the first label as a function value multiple of a function having the ratio of the number of customers who have started using the product or service at the previous time and the total number of customers as a coefficient of temporary impact. The influence estimation system according to claim 14 or 15, wherein the previous time data is generated by increasing the number of attached individual data.

For each classifier generated for each previous time data, the error group calculation unit labels the individual data in the current time data from the classifier and the individual data items in the current time data. 18. The label is predicted, and the number of individual data in which the predicted label and the label actually labeled with the current time data are different is calculated as an error. The impact estimation system described in.

For each classifier generated for each previous time data, the error group calculation unit labels the individual data in the current time data from the classifier and the individual data items in the current time data. The label is predicted, and the absolute value of the difference between the number of individual data predicted to be labeled with the first label and the number of individual data with the first label labeled with the current time data is calculated as an error. The influence degree estimation system according to claim 1.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start A customer database that stores customer data that is a collection of individual data for each customer including one or more items representing customer attributes other than information;
Using the customer database, a test data generating unit that generates test data that is data including individual data of customers who have not started using goods or services at the current time for determining the determination target time;
The first label is labeled on individual data of customers who have started using the product or service at the current time, and the second label is labeled on individual data of customers who have not started using the product or service. A labeling data generating unit that generates labeling data and calculates a ratio of the number of individual data labeled with the first label in the labeling data as a penetration rate of the product or service;
A classifier generating unit that generates a classifier that is a rule for determining a probability score that the first label is labeled on the individual data of the customer from the item representing the attribute of the customer based on the labeling data;
By substituting the penetration rate calculated by the labeling data generation unit into a threshold function that is a function of the threshold value of the score for the penetration rate, the threshold value is calculated, and from the individual data items in the classifier and the test data, A score of the probability that the first label is labeled to each individual data in the test data is determined, and it is determined that the label for the individual data in which the score is equal to or higher than the threshold among the individual data in the test data is the first label And a test data label determination unit that determines that the label for the individual data having a score less than the threshold is the second label.

A current time update unit that sets a new current time as a time obtained by increasing the current time by a certain amount of time;
The test data label determination unit gives information indicating the use start time to the individual data determined to be labeled with the first label,
When the current time is updated, the test data generation unit generates test data at the current time after the update,
When the current time is updated, the labeling data generation unit generates the labeling data at the current time after the update, calculates the diffusion rate in the labeling data,
When the current time is updated and new labeling data is generated, the classifier generator generates a new classifier based on the labeling data,
When the current time is updated and new test data is generated, the test data label determination unit converts each individual data in the test data from the classifier and each individual data item in the test data. A score of the probability that the first label is labeled is determined, a threshold value is newly calculated by substituting the penetration rate calculated by the labeling data generation unit into the threshold function, and the score among the individual data in the test data is The spread prediction system according to claim 20, wherein a label for individual data that is equal to or greater than a threshold is determined to be a first label, and a label for individual data having a score less than the threshold is determined to be a second label.

The test data generation unit generates test data excluding individual data of customers who have started using goods or services at the current time from customer data stored in a customer database. The spread prediction system described.

A plurality of reference times for determining a threshold estimation target time, a time interval for designating a predetermined time before each reference time, and a plurality of temporary thresholds that are threshold candidates are input, and for each reference time , Calculate the time before threshold estimation that is the time before the time interval from the reference time, and use the customer database to put the first label on the individual data of the customer who has started using the goods or services at the reference time A reference time data group generation unit for generating reference time data for labeling and labeling a second label on individual data of a customer who has not started using the product or service;
For each individual pre-threshold estimation time, a first label is labeled on the individual data of the customer who has started using the product or service at the pre-threshold estimation time. Generate time data before threshold estimation, which is data obtained by labeling individual data with a second label, and determine the ratio of the number of individual data labeled with the first label in the time data before threshold estimation. Time threshold value pre-estimation data group generation unit to calculate as a rate,
For each pre-threshold time estimation data, a classifier that is a rule for determining the probability score that the first label is labeled on the individual data of the customer from the item representing the customer attribute is used as the pre-threshold time data. A threshold value pre-estimation time classifier group generation unit to be generated based on:
For each classifier generated for each time data before threshold estimation, a first label is labeled on each individual data in the reference time data from the classifier and each individual data item in the reference time data. The probability score to be attached is determined, and for each temporary threshold, the label of the individual data whose score is equal to or higher than the temporary threshold is predicted to be the first label, and the label of the individual data whose score is less than the threshold is the second label An error group calculation unit that predicts that there is an error and calculates an error between the prediction result and the reference time data;
The minimum error among the errors calculated for each temporary threshold is specified for each individual classifier, and when there is one temporary threshold corresponding to the minimum error, the temporary threshold is associated with the classifier. When there are a plurality of temporary threshold values corresponding to the minimum error, the threshold value at the time before threshold estimation corresponding to the classifier is determined based on the plurality of temporary threshold values. The spread prediction system according to any one of claims 20 to 22, further comprising: a function estimation unit that estimates a threshold function from a threshold corresponding to a pre-estimation time and a spread rate.

The spread prediction system according to any one of claims 20 to 22, wherein the test data label determination unit calculates a threshold using a threshold function input from outside the spread prediction system.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start Use a customer database that stores customer data, which is a set of individual data for each customer, including one or more items representing customer attributes other than information, and use the product or service at the current time that determines the time to be judged Generate test data that includes individual data for customers who have not started,
Label individual data of customers who have started using the goods or services at the current time, label second data to individual data of customers who have not started using the goods or services, This is data in which the number of individual data labeled with the first label is changed according to the degree of influence, which is the degree that the customer of the product or service urges others to use the product or service within a certain period. Generate learning data,
Based on the learning data, a classifier that is a rule for determining which one of the first label and the second label to label the individual data of the customer from the item representing the attribute of the customer,
A spread prediction method, wherein a label for each individual data in the test data is determined from the classifier and each individual data item in the test data.

For each piece of data that is determined to be labeled with the first label, information indicating the use start time is given,
Update the current time increased by a certain amount of time to a new current time,
When the current time is updated, test data at the current time after the update is generated,
When the current time is updated, learning data at the updated current time is generated,
When the current time is updated and new learning data is generated, a new classifier is generated based on the learning data,
The label for each individual data in the test data is determined from the classifier and each individual data item in the test data when the current time is updated and new test data is generated. The spread forecast method described.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start Use a customer database that stores customer data, which is a set of individual data for each customer, including one or more items representing customer attributes other than information, and use the product or service at the current time that determines the time to be judged Generate test data that includes individual data for customers who have not started,
The first label is labeled on individual data of customers who have started using the product or service at the current time, and the second label is labeled on individual data of customers who have not started using the product or service. Generating labeling data, calculating a ratio of the number of individual data labeled with the first label in the labeling data as a penetration rate of the product or service,
Based on the labeling data, a classifier that is a rule for determining a probability score that the first label is labeled on the individual data of the customer from the item representing the customer attribute,
The threshold value is calculated by substituting the calculated penetration rate into a threshold function that is a function of the threshold value of the score with respect to the penetration rate, and each individual item in the test data is calculated from the classifier and each individual data item in the test data. A score of probability that the first label is labeled on the data is determined, and among the individual data in the test data, it is determined that the label for the individual data having a score equal to or higher than the threshold is the first label, and the score is the threshold It determines that the label with respect to less than individual data is a 2nd label, The penetration prediction method characterized by the above-mentioned.

For each piece of data that is determined to be labeled with the first label, information indicating the use start time is given,
Update the current time increased by a certain amount of time to a new current time,
When the current time is updated, test data at the current time after the update is generated,
When updating the current time, generate the labeling data at the current time after the update, calculate the penetration rate in the labeling data,
When new labeling data is generated by updating the current time, a new classifier is generated based on the labeling data,
When the current time is updated and new test data is generated, it is confirmed that the first label is labeled on each individual data in the test data from the classifier and each individual data item in the test data. A uniqueness score is determined, a new threshold value is calculated by substituting the calculated penetration rate, and a label for individual data having a score equal to or higher than the threshold value among the individual data in the test data is determined to be the first label. The spread prediction method according to claim 27, wherein a label for individual data having a score less than the threshold is determined to be a second label.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start A spread prediction program installed in a computer including a customer database that stores customer data that is a set of individual data for each customer including one or more items representing customer attributes other than information,
In the computer,
Using the customer database, test data generation processing for generating test data that is data including individual data of customers who have not started using goods or services at the current time for determining the determination target time;
Label individual data of customers who have started using the goods or services at the current time, label second data to individual data of customers who have not started using the goods or services, This is data in which the number of individual data labeled with the first label is changed according to the degree of influence, which is the degree that the customer of the product or service urges others to use the product or service within a certain period. Learning data generation process for generating learning data,
A classifier generation process for generating a classifier that is a rule for determining which of the first label and the second label to label the individual data of the customer from the item representing the attribute of the customer based on the learning data; and,
The spread prediction program for performing the test data label determination process which determines the label with respect to each individual data in the said test data from the said classifier and the item of each individual data in the said test data.

In the computer,
In the test data label determination process, information indicating the use start time is given to the individual data determined to be labeled with the first label,
Executing a current time update process in which the current time is increased by a certain amount of time and the new current time is set,
When the current time is updated, the test data generation process generates test data at the current time after the update,
When the current time is updated, the learning data generation process generates learning data at the updated current time,
When new learning data is generated by updating the current time, a new classifier is generated based on the learning data in the classifier generation process,
When new test data is generated by updating the current time, a label for each individual data in the test data from the classifier and each individual data item in the test data in the test data label determination process The spread prediction program according to claim 29.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start A spread prediction program installed in a computer including a customer database that stores customer data that is a set of individual data for each customer including one or more items representing customer attributes other than information,
In the computer,
Using the customer database, test data generation processing for generating test data that is data including individual data of customers who have not started using goods or services at the current time for determining the determination target time;
The first label is labeled on individual data of customers who have started using the product or service at the current time, and the second label is labeled on individual data of customers who have not started using the product or service. Labeling data generation processing for generating labeling data and calculating a ratio of the number of individual data labeled with the first label in the labeling data as a penetration rate of the product or service;
A classifier generating process for generating a classifier that is a rule for determining a probability score that the first label is labeled on the individual data of the customer from the item representing the attribute of the customer based on the labeling data;
By substituting the penetration rate calculated in the labeling data generation process into a threshold function that is a function of the threshold value of the score with respect to the penetration rate, a threshold value is calculated, and from the individual data items in the classifier and the test data, the A score of the probability that the first label is labeled to each individual data in the test data is determined, and it is determined that the label for the individual data in which the score is equal to or higher than the threshold among the individual data in the test data is the first label And a spread prediction program for executing a test data label determination process for determining that a label for individual data having a score less than the threshold is the second label.

In the computer,
In the test data label determination process, information indicating the use start time is given to the individual data determined to be labeled with the first label,
Executing a current time update process in which the current time is increased by a certain amount of time and the new current time is set,
When the current time is updated, the test data generation process generates test data at the current time after the update,
When the current time is updated, the labeling data generation process generates labeling data at the current time after the update, and calculates the diffusion rate in the labeling data,
When new labeling data is generated by updating the current time, a new classifier is generated based on the labeling data in the classifier generation process,
When the current time is updated and new test data is generated, the test data label determination process changes the classifier and each individual data item in the test data to each individual data in the test data. A score of probability that one label is labeled is determined, a threshold value is newly calculated by substituting the penetration rate calculated in the labeling data generation process into the threshold function, and the score among the individual data in the test data is The spread prediction program according to claim 31, wherein a label for individual data that is equal to or greater than a threshold value is determined to be a first label, and a label for individual data having a score less than the threshold value is determined to be a second label.

Specify the current time used as an estimation target time of the degree of influence, which is the degree to which the customer of the product or service urges others to use the product or service within a certain period, and a certain time before the current time And a plurality of temporary influence levels that are candidates for influence level are input, and a previous time that is a time before the time interval is calculated from the current time,
When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start Using a customer database that stores customer data that is a set of individual data for each customer including one or more items representing customer attributes other than information, and starting to use the product or service at the current time Generating current time data by labeling individual data of existing customers with a first label and labeling individual data of customers who have not started using the goods or services with a second label;
For each temporary influence degree, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service Labeling the second label, generating the previous time data which is data obtained by changing the number of individual data labeled the first label according to the temporary influence degree,
Based on the previous time, a classifier that is a rule for determining whether to label individual customer data from the item representing the customer attribute for each individual previous time data. Generate
For each classifier generated for each previous time data, predict the label that is labeled on the individual data in the current time data from the classifier and the individual data items in the current time data, and predict Calculate the error between the result and the current time data,
Among the errors calculated for each classifier, the smallest error is specified, and when there is one temporary influence corresponding to the smallest error, the temporary influence is determined as the influence, An impact estimation method, wherein when a plurality of temporary impacts corresponding to an error exist, the impact is determined based on the plurality of temporary impacts.

34. The current time data is generated from a set of individual data excluding individual data of customers who have started to use goods or services at the previous time among the customer data stored in the customer database. Degree estimation method.

When the customer starts using the product or service, the use start information indicates the use start time of the product or service, and when the product or service is not used, the use start information indicating that the product or service is not used, and the use start An impact estimation program installed in a computer including a customer database for storing customer data, which is a set of individual data for each customer including one or more items representing customer attributes other than information,
A current time used as an influence degree estimation target time, a time interval for designating a predetermined time before the current time, and a plurality of temporary influence degrees as influence degree candidates, are input from the current time to the time Calculating a previous time which is a time before the interval, and using the customer database, a first label is labeled on individual data of a customer who has started using the product or service at the current time, and the product or service Current time data generation processing for generating current time data in which the second label is labeled on individual data of customers who have not started using
For each temporary influence degree, the first label is labeled on the individual data of the customer who has started using the product or service at the previous time, and the individual data of the customer who has not started using the product or service A previous time data group generation process for generating the previous time data which is data obtained by labeling the second label and changing the number of individual data labeled with the first label according to the temporary influence degree;
Based on the previous time, a classifier that is a rule for determining whether to label individual customer data from the item representing the customer attribute for each individual previous time data. Generation process of the previous time classifier group to be generated,
For each classifier generated for each previous time data, predict the label that is labeled on the individual data in the current time data from the classifier and the individual data items in the current time data, and predict An error group calculation process for calculating an error between the result and the current time data; and
Among the errors calculated for each classifier, the smallest error is specified, and when there is one temporary influence corresponding to the smallest error, the temporary influence is determined as the influence, An influence degree estimation program for executing an influence degree calculation process for determining an influence degree based on the plurality of provisional influence degrees when there are a plurality of provisional influence degrees corresponding to errors.

In the computer,
In the current time data generation process, the current time data is generated from a set of individual data excluding the customer data stored in the customer database excluding the individual data of the customer who started using the product or service at the previous time. 36. The influence degree estimation program according to claim 35.