JP6567484B2

JP6567484B2 - Estimated model construction system, estimated model construction method and program

Info

Publication number: JP6567484B2
Application number: JP2016191041A
Authority: JP
Inventors: 信二垂水; 泰隆長谷川
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-09-29
Filing date: 2016-09-29
Publication date: 2019-08-28
Anticipated expiration: 2036-09-29
Also published as: JP2018055424A

Description

本発明は、推計モデル構築技術に関する。 The present invention relates to an estimation model construction technique.

自治体や健康保険組合などの保険者では、被保険者である住民や労働者の健康増進と医療費抑制の両立のため、効果的かつ効率的な医療施策の実施が必要とされる。その実現手段のひとつとして、過去に蓄積された電子情報の利活用が注目されている。例えば、住民基本情報や診療報酬明細などの蓄積情報を用いて、地域や事業所などの各集団が有する特性、例えば、年齢分布、保険利用状況、疾病罹患者数などを定量化することで、その特性に応じた適切な医療施策の選択および適用が可能となる。 Insurers such as local governments and health insurance unions need to implement effective and efficient medical measures in order to achieve both health promotion and health cost control for residents and workers who are insured. As one of the means for realizing this, the utilization of electronic information accumulated in the past is attracting attention. For example, using accumulated information such as basic resident information and medical fee details, quantifying the characteristics of each group such as regions and offices, for example, age distribution, insurance use status, number of sick people, etc. Appropriate medical measures according to the characteristics can be selected and applied.

さらに、現在の特性に加えて、データ分析により将来の状況を推定することで、より適切な医療施策の選択が可能になると期待される。例えば、特許文献１では、メッシュ人口および国際疾病分類別の患者発生率データに基づき、将来人口および年齢や傷病別の患者数を予測する、医療機関の患者シェア分析技術が提案されている。 Furthermore, in addition to the current characteristics, it is expected that more appropriate medical measures can be selected by estimating future situations through data analysis. For example, Patent Document 1 proposes a patient share analysis technique of a medical institution that predicts the future population and the number of patients by age and disease based on mesh population and patient incidence data by international disease classification.

特開２００６−１１９８６５号公報Japanese Patent Laid-Open No. 2006-111986

蓄積情報を活用して将来を推計する際に、推計対象として、固定された特定の集団のみならず、いくつか異なる粒度、例えば地域であれば、県、市、校区など、を自由に選択し、いずれも高精度に推計できれば、よりきめ細かな医療施策を実施できると期待できる。しかし保険者には、健康状態や通院履歴などが大きく異なる、様々な背景（属性や既往歴）を有する人が混在しているため、推計対象の集団サイズが小さくなると、精度の高い推計が困難だった。特許文献１は、固定された集団のみの推計であり、粒度（集団の属性など）を変化させることはできなかった。 When estimating the future using stored information, not only a fixed specific group but also several different granularities, such as prefectures, cities, school districts, etc., can be freely selected as the target of estimation. If both can be estimated with high accuracy, it can be expected that more detailed medical measures can be implemented. However, insurers are a mixture of people with various backgrounds (attributes and past histories) that vary greatly in health status, visit history, etc., so it is difficult to estimate with high accuracy if the target group size is small was. Patent Document 1 is an estimation of only a fixed group, and the granularity (group attribute, etc.) cannot be changed.

そこで本発明は、上記問題点に鑑みてなされたもので、背景の異なる人が混在した集団においても、高精度で将来の推計を行うことを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object thereof is to perform future estimation with high accuracy even in a group in which people with different backgrounds are mixed.

本発明は、プログラムを実行するプロセッサと、前記プログラムを格納するメモリとを有し、前記プログラムを実行することによって推計モデルを構築する推計モデル構築システムであって、個人毎の医療及び健康に関する蓄積情報と、前記蓄積情報に含まれるデータ項目と、当該データ項目に関連する他のデータ項目であってカテゴリ情報を含むデータ項目との対応関係を記憶する関連カテゴリ情報と、を格納する記憶部と、受け付けた情報または前記蓄積情報に基づいて前記推計モデルの推計対象である第１の目的変数と、前記推計モデルの入力情報である説明変数を決定し、前記関連カテゴリ情報の対応関係に基づいて、前記第１の目的変数に対応するカテゴリ変数を決定し、前記カテゴリ変数の値が所定の条件を満たす前記蓄積情報を分割し、前記分割した蓄積情報毎に対応する第１の推計モデルをそれぞれ構築する推計モデル構築部と、前記第１の目的変数と前記説明変数と前記カテゴリ変数を推計対象の情報とし、前記推計対象の情報に含まれる前記カテゴリ変数の値に応じて、適用する前記第１の推計モデルを選択する将来推計部と、を有する。 The present invention is an estimation model construction system that includes a processor that executes a program and a memory that stores the program, and constructs an estimation model by executing the program, and is an accumulation related to medical and health for each individual. A storage unit that stores information, a data item included in the accumulated information, and related category information that stores a correspondence relationship between a data item that is another data item related to the data item and includes category information; The first objective variable that is the estimation target of the estimation model and the explanatory variable that is the input information of the estimation model are determined based on the received information or the accumulated information, and based on the correspondence relationship of the related category information , Determining a categorical variable corresponding to the first objective variable, and the stored information in which a value of the categorical variable satisfies a predetermined condition An estimation model constructing unit that divides and constructs a first estimation model corresponding to each of the divided accumulated information, the first objective variable, the explanatory variable, and the categorical variable as information to be estimated, and the estimation A future estimation unit that selects the first estimation model to be applied according to the value of the categorical variable included in the target information.

本発明の代表的な実施の形態によれば、背景の異なる人が混在した集団においても、精度の高い将来の推計が可能な推計モデルを構築できる。 According to the representative embodiment of the present invention, it is possible to construct an estimation model capable of highly accurate future estimation even in a group in which people with different backgrounds are mixed.

前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 Problems, configurations, and effects other than those described above will become apparent from the description of the following embodiments.

本発明の実施例１を示し、推計モデル構築システムの構成を示すブロック図である。1 is a block diagram illustrating a configuration of an estimation model construction system according to a first embodiment of this invention. FIG. 本発明の実施例１を示し、整形データの一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of shaping data. 本発明の実施例１を示し、推計モデル構築処理の一例を示すフローチャートである。It is a flowchart which shows Example 1 of this invention and shows an example of an estimation model construction process. 本発明の実施例１を示し、関連カテゴリ読出処理の一例を示すフローチャートである。5 is a flowchart illustrating an example of related category reading processing according to the first embodiment of this invention. 本発明の実施例１を示し、関連カテゴリマトリクスの一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of a related category matrix. 本発明の実施例１を示し、関連カテゴリ間の関連性の一例を示すツリーである。It is a tree which shows Example 1 of this invention and shows an example of the relationship between related categories. 本発明の実施例１を示し、関連カテゴリ間の関連性の他の例を示すツリーである。It is a tree which shows Example 1 of this invention and shows the other example of the relationship between related categories. 本発明の実施例１を示し、推計モデル構築用整形データの一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of the shaping data for estimation model construction. 本発明の実施例１を示し、ラベル付けテーブルの一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of a labeling table. 本発明の実施例１を示し、ラベル付けテーブルの一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of a labeling table. 本発明の実施例１を示し、推計モデル構築用整形データの他の例を示す図である。It is a figure which shows Example 1 of this invention and shows the other example of the shaping data for estimation model construction. 本発明の実施例１を示し、ラベル付けテーブルの他の例を示す図である。It is a figure which shows Example 1 of this invention and shows the other example of a labeling table. 本発明の実施例１を示し、ラベル付けテーブルの他の例を示す図である。It is a figure which shows Example 1 of this invention and shows the other example of a labeling table. 本発明の実施例１を示し、ラベル付けテーブルの他の例を示す図である。It is a figure which shows Example 1 of this invention and shows the other example of a labeling table. 本発明の実施例１を示し、ラベル付けテーブルの他の例を示す図である。It is a figure which shows Example 1 of this invention and shows the other example of a labeling table. 本発明の実施例１を示し、推計処理の一例を示すフローチャートである。It is Example 1 of this invention, and is a flowchart which shows an example of an estimation process. 本発明の実施例１を示し、操作画面の一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of an operation screen. 本発明の実施例１を示し、推計結果表示画面の一例を示す図である。It is a figure which shows Example 1 of this invention and shows an example of an estimation result display screen. 本発明の実施例１を示し、推計情報集約処理の一例を示すフローチャートである。It is Example 1 of this invention, and is a flowchart which shows an example of an estimation information aggregation process.

以下、本発明の実施形態を添付図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings.

本実施例１では、予め収集した蓄積情報から推計モデルを構築する際に、推計対象である目的変数と関連の強いカテゴリ変数に関して、同一の値を有するレコード毎にデータを分割し、それぞれの値に対応した推計モデルを構築する推計モデル構築システムの例を説明する。 In the first embodiment, when the estimation model is constructed from the accumulated information collected in advance, regarding the categorical variable strongly related to the target variable to be estimated, the data is divided for each record having the same value, and each value is divided. An example of an estimation model construction system that constructs an estimation model corresponding to the above will be described.

以下、本実施例１の推計モデル構築システムの構成について説明する。図１は、本実施例１の推計モデル構築システムの構成を示すブロック図である。本実施例１の推計モデル構築システムは、データ分析部１０１およびデータベース１２０を有する。データ分析部１０１は、入力部１０２、出力部１０３、演算装置１０４、メモリ１０５および記憶媒体１０６を有する。 Hereinafter, the configuration of the estimation model construction system of the first embodiment will be described. FIG. 1 is a block diagram illustrating the configuration of the estimation model construction system according to the first embodiment. The estimation model construction system according to the first embodiment includes a data analysis unit 101 and a database 120. The data analysis unit 101 includes an input unit 102, an output unit 103, an arithmetic device 104, a memory 105, and a storage medium 106.

入力部１０２は、マウスや、キーボードなどのヒューマンインターフェースであり、データ分析部１０１への入力を受け付ける。出力部１０３は、推計モデル構築システムによる演算結果を出力するディスプレイやプリンタである。記憶媒体１０６は、データ分析部１０１によるデータ分析処理を実現する各種プログラム、及びデータ分析処理の実行結果等を格納する記憶装置であり、例えば、不揮発性記憶媒体（磁気ディスクドライブ、不揮発性メモリ等）である。 The input unit 102 is a human interface such as a mouse or a keyboard, and accepts input to the data analysis unit 101. The output unit 103 is a display or a printer that outputs a calculation result obtained by the estimation model construction system. The storage medium 106 is a storage device that stores various programs for realizing data analysis processing by the data analysis unit 101, execution results of the data analysis processing, and the like. For example, a non-volatile storage medium (magnetic disk drive, non-volatile memory, etc.) ).

メモリ１０５には、記憶媒体１０６に格納されているプログラムが展開される。演算装置１０４は、メモリ１０５にロードされたプログラムを実行する演算装置であり、例えば、ＣＰＵ、ＧＰＵなどである。以下に説明する処理及び演算は、演算装置１０４が実行する。 In the memory 105, a program stored in the storage medium 106 is expanded. The arithmetic device 104 is an arithmetic device that executes a program loaded in the memory 105, and is, for example, a CPU, a GPU, or the like. The processing device 104 executes the processing and calculation described below.

本実施例１の推計モデル構築システムは、一つの計算機上で、又は、論理的又は物理的に構成された複数の計算機上で構成される計算機システムであり、同一の計算機上で別個のスレッドで動作してもよく、複数の物理的計算機資源上に構築された仮想計算機上で動作してもよい。 The estimation model construction system of the first embodiment is a computer system that is configured on a single computer or on a plurality of computers that are logically or physically configured, and in a separate thread on the same computer. It may operate, and may operate on a virtual machine constructed on a plurality of physical computer resources.

演算装置１０４によって実行されるプログラムは、リムーバブルメディア（ＣＤ−ＲＯＭ、フラッシュメモリなど）又はネットワークを介して各サーバに提供され、非一時的記憶媒体である不揮発性記憶装置に格納される。このため、計算機システムは、リムーバブルメディアを読み込むインターフェースを含むとよい。 A program executed by the arithmetic device 104 is provided to each server via a removable medium (CD-ROM, flash memory, etc.) or a network, and is stored in a nonvolatile storage device that is a non-temporary storage medium. For this reason, the computer system may include an interface for reading removable media.

データ分析部１０１では、記憶媒体１０６に格納されたデータ整形部１０７と、推計モデル構築部１０８と、将来推計部１１１と、推計情報集約部１１２と、関連カテゴリ関連付け部１１３とを機能させる。 The data analysis unit 101 causes the data shaping unit 107 stored in the storage medium 106, the estimation model construction unit 108, the future estimation unit 111, the estimation information aggregation unit 112, and the related category association unit 113 to function.

演算装置１０４は、各機能部のプログラムに従って処理することによって、所定の機能を提供する機能部として稼働する。例えば、演算装置１０４は、データ整形プログラムに従って処理することでデータ整形部１０７として機能する。他のプログラムについても同様である。さらに、演算装置１０４は、各プログラムが実行する複数の処理のそれぞれの機能を提供する機能部としても稼働する。計算機及び計算機システムは、これらの機能部を含む装置及びシステムである。 The arithmetic unit 104 operates as a functional unit that provides a predetermined function by performing processing according to a program of each functional unit. For example, the arithmetic device 104 functions as the data shaping unit 107 by performing processing according to the data shaping program. The same applies to other programs. Furthermore, the arithmetic unit 104 also operates as a function unit that provides the functions of a plurality of processes executed by each program. A computer and a computer system are an apparatus and a system including these functional units.

以下、本実施例１で扱うデータの種類と構築する推計モデルについて説明する。本実施例１では、データとして、個人毎のレセプト情報および特定健診情報が時系列で含まれたデータを用いる例を説明する。 Hereinafter, the types of data handled in the first embodiment and the estimation model to be constructed will be described. In the first embodiment, an example will be described in which data including receipt information and specific medical examination information for each individual in time series is used as data.

構築する推計モデルの推計対象としては、将来の疾病に関する個人毎の医療機関の受診リスクと医療費の期待値、および、それらを統計的に集約した集団の医療機関受診人数と医療費、を推計する例を説明する。 The estimation target of the estimation model to be constructed is the estimation of the risk of medical institution visits and the expected value of medical expenses for each individual regarding future diseases, and the number of medical institution visits and the medical expenses of the group that statistically aggregates them. An example will be described.

まずデータ整形部１０７について説明する。データ整形部１０７は、データ記憶部１２１に記憶されたレセプト情報および特定健診情報の個人毎の診療報酬明細書等の個人毎の医療及び健康に関する情報を、特定の期間毎、かつ個人毎に集約した推計用の整形データ２０１を構築して、整形データ記憶部１２２に記憶する。なお、本実施例１では整形データ２０１を、個人毎の医療及び健康に関する蓄積情報として扱う。 First, the data shaping unit 107 will be described. The data shaping unit 107 stores information on medical care and health for each individual such as the receipt information stored in the data storage unit 121 and the medical remuneration statement for each individual of the specific medical examination information for each specific period and for each individual. The aggregated shaping data 201 for estimation is constructed and stored in the shaping data storage unit 122. In the first embodiment, the shaping data 201 is handled as accumulated information related to medical care and health for each individual.

なお、各記憶部１２１〜１２８に格納される情報はデータベース１２０に限定されるものではなく、データ分析部１０１からアクセス可能な記憶部や記憶装置に格納されても良い。 Information stored in each of the storage units 121 to 128 is not limited to the database 120, and may be stored in a storage unit or a storage device accessible from the data analysis unit 101.

図２はデータ整形部１０７で整形した推計用の整形データの一例を説明する図である。本実施例１では、例として、個人毎のレセプト情報等の医療に関する情報を年度毎に集約した例を説明する。 FIG. 2 is a diagram for explaining an example of the shaping data for estimation shaped by the data shaping unit 107. In the first embodiment, as an example, an example will be described in which medical information such as receipt information for each individual is aggregated every year.

２０１は整形データ全体を現す情報である。以下、整形データ２０１に含まれる情報を説明する。基本情報２１０は、個人毎の基本的なプロファイル情報を含む情報である。 Information 201 represents the entire shaped data. Hereinafter, information included in the shaping data 201 will be described. The basic information 210 is information including basic profile information for each individual.

ここでは例として、個人ＩＤ２１１と、性別２１２と、年齢２１３とを含む例を示している。個人ＩＤ２１１は、整形データ２０１に含まれる個人毎に一意に割り当てられた個人ＩＤである。性別２１２は、個人毎の性別を表す情報である。例えば、男性を０、女性を１、などのように、特定の変数に割り当てられたカテゴリ変数として表すことができる。年齢２１３は、個人毎の年齢を表す情報である。例えば、３５、２１、５０、のように、整数として表すことができる。 Here, as an example, an example including a personal ID 211, a gender 212, and an age 213 is shown. The personal ID 211 is a personal ID uniquely assigned to each individual included in the shaping data 201. The sex 212 is information representing the sex for each individual. For example, males can be represented as categorical variables assigned to specific variables, such as 0, females 1, and so on. Age 213 is information representing the age of each individual. For example, it can be expressed as an integer such as 35, 21, 50.

図中２２０は年度毎に集約された個人毎の情報を含む年度情報である。ここでは、２０１４年度の情報を集約した例を示している。特定健診情報２２１は、当該の年度に受けた特定検診に関する情報を含み、例えば、ＢＭＩ（ＢｏｄｙＭａｓｓＩｎｄｅｘ）や、血糖、血圧などの情報が含まれる。図中ではＢＭＩ２２３の情報を例として表示している。 In the figure, reference numeral 220 denotes year information including information for each individual aggregated for each year. Here, the example which gathered the information of 2014 is shown. The specific medical examination information 221 includes information related to the specific medical examination received in the current year, and includes information such as BMI (Body Mass Index), blood sugar, blood pressure, and the like. In the figure, information of BMI 223 is displayed as an example.

ＢＭＩ２２３は２０１４年度に計測されたＢＭＩの情報を含む。ここに含まれる情報は特定健診における問診の情報、例えば、運動習慣に関するアンケート情報や、飲酒、喫煙などに関するアンケート情報を含んでもよい。この場合、データとしては、例えば、一週間の飲酒の量ごとに割り当てられた記号や数値番号を含むカテゴリ変数としてデータが格納される。 The BMI 223 includes information on the BMI measured in 2014. The information included here may include information on an inquiry in a specific medical examination, for example, questionnaire information regarding exercise habits, questionnaire information regarding drinking, smoking, and the like. In this case, as the data, for example, the data is stored as a categorical variable including a symbol or a numerical number assigned for each amount of drinking per week.

レセプト情報２２２は、当該の年度に発生したレセプトに関する情報を含む。例えば、レセプトに含まれる傷病名に基づき定義された、傷病名毎のレセプトの有無、レセプトの発生数、関連する医療費、などの情報が含まれる。また傷病名を公知の集約情報に基づき疾病にグルーピングした、疾病毎の情報などが含まれる。図中では例として、糖尿病と、糖尿病医療費と、高血圧症、に関する情報が示されている。 The receipt information 222 includes information regarding a receipt generated in the current fiscal year. For example, information such as the presence / absence of a receipt for each wound name, the number of occurrences of the receipt, and related medical costs, which are defined based on the wound name included in the receipt, is included. In addition, information on each disease and the like in which the names of wounds are grouped into diseases based on known aggregated information are included. In the figure, information about diabetes, diabetes medical expenses, and hypertension is shown as an example.

糖尿病２２４は、糖尿病に関するレセプトの発生を示す情報で、図示の例では、当該の年度に糖尿病に関連するレセプトが少なくとも一つ以上発生した場合を１、発生しなかった場合を０としている。糖尿病医療費２２５は、糖尿病に関するレセプトに含まれる医療費を示す情報である。図示の例では、当該の年度に糖尿病に関連して発生した医療費に関する情報を含んでいる。 Diabetes mellitus 224 is information indicating the occurrence of a diabetes-related receptor. In the illustrated example, 1 indicates that at least one receptor related to diabetes has occurred in the current year, and 0 indicates that it has not occurred. The diabetes medical cost 225 is information indicating a medical cost included in a receipt regarding diabetes. In the example shown in the figure, information on medical expenses incurred in connection with diabetes in the current year is included.

高血圧症２２６は、高血圧症に関するレセプトの発生を示す情報で、図示の例では、当該の年度に高血圧症に関連するレセプトが少なくとも一つ以上発生した場合を１、発生しなかった場合を０としている。 The hypertension 226 is information indicating the occurrence of a receptor related to hypertension, and in the illustrated example, 1 indicates that at least one receptor related to hypertension has occurred in the corresponding year, and 0 indicates that it has not occurred. Yes.

図中２３０は年度毎に集約された個人毎の情報を含む年度情報である。図示の例では、２０１５年度の情報を集約した例を示しているが、集計方法は２０１４年度の年度情報２２０と同様である。ＢＭＩ２３１は、２０１５年度の特定健診で取得されたＢＭＩの情報を含む情報であり、２０１４年度の情報と同様の方法でテーブルに格納された情報である。上述の整形データ２０１を参照することで、個人毎の基本情報と年度毎の健康、医療に関する情報を一度に取得することが可能となる。 In the figure, reference numeral 230 denotes year information including information for each individual aggregated for each year. The example shown in the figure shows an example in which the information for 2015 is collected, but the counting method is the same as the year information 220 for 2014. The BMI 231 is information including information on the BMI acquired in the specific medical examination in 2015, and is information stored in the table in the same manner as the information in 2014. By referring to the shaping data 201 described above, it is possible to acquire basic information for each individual and information on health and medical care for each year at a time.

次に、推計モデル構築部１０８について説明する。推計モデル構築部１０８では、推計モデル構築処理を実行し、整形データ記憶部１２２に記憶された整形データ２０１に基づき、推計モデルを構築する。 Next, the estimation model construction unit 108 will be described. The estimation model construction unit 108 executes an estimation model construction process, and constructs an estimation model based on the shaped data 201 stored in the shaped data storage unit 122.

以下、推計モデル構築部１０８で処理する推計モデル構築処理の一例について説明をする。処理の詳細な説明にあたり、まず始めに、本処理で構築する推計モデルの概要について説明する。次に、推計モデルの構築時に、モデル構築に使用される情報の特徴に応じて、２つの異なる形態の推計モデルが構築されうることを説明する。最後に、図３に示される処理のフローチャートを用いて、２つの異なる形態の推計モデルのそれぞれが、どのように構築されるか、各ステップを追って、順に説明する。 Hereinafter, an example of the estimation model construction process processed by the estimation model construction unit 108 will be described. In the detailed description of the process, first, an outline of the estimation model constructed by this process will be described. Next, it will be described that two different forms of estimation models can be constructed according to the characteristics of information used for model construction when the estimation model is constructed. Finally, how each of the two different types of estimation models is constructed will be described step by step using the flowchart of the processing shown in FIG.

まず推計モデルの概要について説明する。本実施例１で構築される推計モデルは整形データ２０１から構築されるものであって、例えば、ひとつの目的変数、ひとつ以上の説明変数、説明変数から目的変数を推定するための変数間の関係性、の情報を含む推計モデルである。 First, the outline of the estimation model will be described. The estimation model constructed in the first embodiment is constructed from the shaped data 201. For example, one objective variable, one or more explanatory variables, and a relationship between variables for estimating the objective variable from the explanatory variables. This is an estimation model that includes information on sex.

推計モデルとしては、例えば、レセプト情報に基づく２０１５年度の糖尿病に関する医療機関の受診有無（０または１の値をとる）を目的変数とし、特定健診に基づく２０１４年度のＢＭＩ（連続値をとる）２２３と、空腹時血糖（連続値をとる）を説明変数とする推計モデルを構築することができる。 As an estimation model, for example, the presence or absence of a medical institution related to diabetes in 2015 based on receipt information (takes a value of 0 or 1) as an objective variable, and the BMI in 2014 based on a specific medical checkup (takes a continuous value) 223 and an estimation model having fasting blood glucose (taking a continuous value) as explanatory variables can be constructed.

この推計モデルは、予め学習した変数間の関連性を用いて、２０１４年度のＢＭＩ２２３と空腹時血糖の値に基づき、２０１５年度の糖尿病に関する医療機関の受診有無の確率を推計することができる。 This estimation model can estimate the probability of whether or not to see a medical institution related to diabetes in 2015 based on the BMI 223 and fasting blood glucose value in 2014 using the relationship between variables learned in advance.

このように、取得時間の異なる情報を説明変数と目的変数に設定した推計モデルを構築することで、将来の情報を推計することが可能となる。本推計モデルの場合、ある年度のＢＭＩと空腹時血糖の情報を入力することで、翌年度の糖尿病の医療機関の受診有無の確率を推計することが可能となる。 In this way, it is possible to estimate future information by constructing an estimation model in which information having different acquisition times is set as an explanatory variable and an objective variable. In the case of this estimation model, by inputting information on BMI and fasting blood glucose in a certain year, it is possible to estimate the probability of having a medical institution with diabetes in the next year.

例えば、この推計モデルにある個人の２０１５年度のＢＭＩの情報と空腹時血糖の情報を入力することで、当該個人の２０１６年度の医療機関の受診有無の値を推計することができる。例えば、受診有無の値を０と１とする場合、０以上かつ１以下の値として、受診有無の確率値を推計することができる。 For example, by inputting BMI information and fasting blood glucose information of an individual in this estimation model in 2015, the value of whether or not the individual has received a medical institution in 2016 can be estimated. For example, when the values of presence / absence of consultation are 0 and 1, the probability value of presence / absence of consultation can be estimated as a value of 0 or more and 1 or less.

次に、本実施例１において推計モデル構築部１０８が構築しうる異なる形態の推計モデルについて説明する。本実施例１において推計モデル構築部１０８は、推計モデル内の説明変数に含まれるカテゴリ変数の中で、任意のカテゴリ変数に関し、カテゴリ変数の値ごとに、データを分割し、各データからサブモデルを構築する。さらに、このカテゴリ変数の種類に基づき、２つの異なる形態の推計モデルを構築する。 Next, different types of estimation models that can be constructed by the estimation model construction unit 108 in the first embodiment will be described. In the first embodiment, the estimation model construction unit 108 divides data for each value of the categorical variable among the categorical variables included in the explanatory variables in the estimation model, and sub-models are obtained from each data. Build up. Furthermore, based on the type of this categorical variable, two different forms of estimation models are constructed.

具体的には、当該カテゴリ変数が、推計モデルを適用する際、つまり、推計したい対象年度の情報に対し、それを推計するために推計モデルが必要とする説明変数の情報を入力する際に、当該カテゴリ変数の値が、（１）確定済みの情報である、（２）未確定の情報である、という２つの条件に応じて、異なる２つの形態の推計モデルを構築しておくことができる。以下、推計モデルの２つの形態について説明する。 Specifically, when the categorical variable applies the estimation model, that is, when the information of the explanatory variable necessary for the estimation model to estimate the information for the target year to be estimated is input, Two different forms of estimation models can be constructed according to two conditions: (1) information that has been confirmed and (2) information that has not been confirmed. . Hereinafter, two forms of the estimation model will be described.

第１の形態の推計モデルは、説明変数中のカテゴリ変数Ｃに含まれる情報が、推計モデルの適用時に、確定済みの情報である場合の推計モデルである。例えば、推計モデルとして、２０１４年度のＢＭＩ２２３と、２０１４年度の空腹時血糖と、２０１４年度の糖尿病２２４の有無、の３つの情報を説明変数とし、２０１５年度の糖尿病の受診の有無、を目的変数とする推計モデルが一例である。 The estimation model of the first form is an estimation model in a case where information included in the categorical variable C in the explanatory variable is information that has been confirmed when the estimation model is applied. For example, as an estimation model, BMI 223 in 2014, fasting blood glucose in 2014, and the presence or absence of diabetes 224 in 2014 are explanatory variables, and the presence or absence of diabetes in 2015 is an objective variable. An estimation model is an example.

ここで、推計モデル構築部１０８は、説明変数に含まれる２０１４年度の糖尿病２２４の有無の値を、データを分割してサブモデルを構築するためのカテゴリ変数として選択するとする。この推計モデルは２０１４年度の情報から２０１５年度の情報を推計するモデルとして構築されているため、あるＮ年度の情報を入力とすると、Ｎ＋１の年度の情報を推計するモデルである。 Here, it is assumed that the estimation model construction unit 108 selects the value of the presence or absence of diabetes 224 in 2014 included in the explanatory variable as a categorical variable for dividing the data and constructing a submodel. Since this estimation model is constructed as a model for estimating the information for the year 2015 from the information for the year 2014, the information for the year N + 1 is estimated when the information for the year N is input.

このとき、カテゴリ変数に指定した２０１４年度の糖尿病有無の値は、他の説明変数である２０１４年度のＢＭＩと空腹時血糖の情報と同様に、２０１４年度の情報であるため、取得済の情報である。例えば、この推計モデルを用いて、２０１５年度の情報から２０１６年度の情報を推定する場合、カテゴリ変数である２０１４年度の糖尿病有無の値には、確定済みである２０１５年度の情報を入力して推計できる。このように、説明変数のカテゴリ変数Ｃに含まれる情報が、推計モデルの適用時において、確定済みの情報である場合、第１の形態の推計モデルとして、カテゴリ変数の値が同一（あるいは所定の条件を満たす）のデータ毎に整形データ２０１を分割し、それぞれの整形データ２０１からそれぞれのカテゴリ変数の値に対応した推計モデルを構築する。なお、カテゴリ変数の値は、必ずしも同一である必要は無く、整形データ２０１の種類によっては、近似値（差分が閾値未満）などの所定の条件を満たすデータ毎に整形データ２０１を分割してもよい。 At this time, the value of the presence / absence of diabetes in 2014 designated as the categorical variable is the information of 2014 as well as the information on the 2014 BMI and fasting blood glucose as other explanatory variables. is there. For example, when estimating the information of FY2016 from the information of FY2015 using this estimation model, it is estimated by inputting the information of FY2015 that has been confirmed to the value of the presence or absence of diabetes in FY2014 that is a categorical variable. it can. As described above, when the information included in the categorical variable C of the explanatory variable is information that has been confirmed when the estimation model is applied, the value of the categorical variable is the same (or a predetermined value) as the estimation model of the first form. The shaping data 201 is divided for each piece of data satisfying the conditions, and an estimation model corresponding to each categorical variable value is constructed from each shaping data 201. Note that the values of the categorical variables are not necessarily the same, and depending on the type of the shaped data 201, the shaped data 201 may be divided for each piece of data that satisfies a predetermined condition such as an approximate value (difference is less than a threshold). Good.

第２の形態の推計モデルは、説明変数中のカテゴリ変数Ｃに含まれる情報が、推計モデルの適用時に、未確定の情報である場合の推計モデルである。例えば、推計モデルとして、２０１４年度の疾病Ａの受診有無と、２０１５年度の疾病Ａの受診有無を説明変数とし、２０１５年度の疾病Ｂの受診の有無を目的変数とする推計モデルが一例である。 The estimation model of the second form is an estimation model in a case where information included in the categorical variable C in the explanatory variables is uncertain information when the estimation model is applied. For example, as an estimation model, an estimation model in which the presence or absence of a disease A visit in 2014 and the presence or absence of a disease A in 2015 is an explanatory variable and the presence or absence of a disease B visit in 2015 is an example variable.

ここで、推計モデル構築部１０８は、説明変数の中の２０１５年度の疾病Ａの受診有無を、データを分割してサブモデルを構築するためのカテゴリ変数として選択するとする。この推計モデルは２０１４年度の情報と２０１５年度の情報から、他の２０１５年度の情報を推計するモデルとして構築されているため、Ｎ年度の情報とＮ＋１年度の情報の入力に基づき、他のＮ＋１年度を推計するモデルである。 Here, it is assumed that the estimation model construction unit 108 selects whether or not to receive the disease A in 2015 among the explanatory variables as a categorical variable for dividing the data and constructing a submodel. Since this estimation model is built as a model that estimates the information of other 2015 from the information of 2014 and 2015, other N + 1 years based on the input of N year information and N + 1 year information Is a model for estimating

このとき、カテゴリ変数に指定した２０１５年度の疾病Ａの受診有無の情報は、目的変数である２０１５年度の疾病Ｂの受診の有無の情報と同じ年度に取得される情報である。そのため、この推計モデルを適用する場合には、このカテゴリ変数の情報は未確定であると考えられる。 At this time, the information on whether or not to receive the disease A in 2015 designated as the categorical variable is information acquired in the same year as the information on whether or not to receive the disease B in 2015 as the objective variable. Therefore, when this estimation model is applied, it is considered that the information of this categorical variable is unconfirmed.

例えば、この推計モデルを用いて、２０１５年度の情報から２０１６年度の情報を推計する場合に、カテゴリ変数である２０１６年度の疾病Ａの情報は、他の説明変数とは異なり、推計時には情報が未確定である。この場合には、まず、カテゴリ変数以外の説明変数の情報に基づき、カテゴリ変数の値を推計する第１の推計モデルを構築する。次に、カテゴリ変数の値が同一のデータ毎に整形データを分割し、それぞれのデータからカテゴリ変数の値に対応した第２の推計モデルを構築する。 For example, when using this estimation model to estimate information for FY2016 from information for FY2015, the information on disease A in FY2016, which is a categorical variable, is different from other explanatory variables, and information is not available at the time of estimation. It is definite. In this case, first, a first estimation model for estimating the value of the categorical variable is constructed based on the information of the explanatory variables other than the categorical variable. Next, the shaped data is divided for each data having the same value of the categorical variable, and a second estimation model corresponding to the value of the categorical variable is constructed from each data.

例えば前述の例では、第１の推計モデルとして、２０１４年度の疾病Ａの受診有無の情報を説明変数とし、２０１５年度の疾病Ａの受診有無の情報を目的変数とする第１の推計モデルを構築する。次に、第２の推計モデルとして、２０１４年度の疾病Ａの受診有無の情報を説明変数とし、２０１５年度の疾病Ｂの受診有無の情報を目的変数とする推計モデルを、２０１５年度の疾病Ａの受診有無の値が同一のデータセットごとにそれぞれ構築する。 For example, in the above-described example, as the first estimation model, the first estimation model is constructed in which the information on whether or not to receive the disease A in 2014 is an explanatory variable, and the information on whether or not the disease A is received in 2015 is the objective variable. To do. Next, as a second estimation model, an estimation model in which information on whether or not to see a disease A in 2014 is used as an explanatory variable, and information on whether or not a disease B is received in 2015 is an objective variable. Build for each data set with the same value of whether or not there is a medical examination.

この推計モデルを用いて、２０１５年度の情報から２０１６年度の情報を推計する場合は、まず、第１の推計モデルを用いて、２０１５年度の疾病Ａの受診有無の情報から、２０１６年度の疾病Ａの受診有無の確率分布を推計する。次に、推計した２０１６年度の疾病Ａの受診有無の確率分布の下で、第２の推計モデルをそれぞれ適用し、それら結果を統合して、最終的な推定結果とする。 When estimating the information of FY2016 from the information of 2015 using this estimation model, first, using the first estimation model, from the information on whether or not the disease A was received in 2015, the disease A of 2016 Estimate the probability distribution of the presence or absence of a visit. Next, the second estimation model is applied under the estimated probability distribution of the presence or absence of the disease A in 2016, and the results are integrated to obtain the final estimation result.

これらの２つの異なる形態の推計モデルは、整形データ２０１に含まれる情報およびデータベース１２０に記憶された情報に基づき、いずれか適切なものが選択されて構築される。 These two different types of estimation models are constructed by selecting any appropriate one based on the information included in the shaping data 201 and the information stored in the database 120.

以下、図３を用いて、推計モデル構築部１０８で行われる推計モデル構築処理のフローチャートを説明する。 Hereinafter, a flowchart of the estimation model construction process performed by the estimation model construction unit 108 will be described with reference to FIG.

図３は、推計モデル構築部１０８で実行される推計モデル構築処理の流れを説明するフローチャートである。以下、本フローチャートを用いて、各ステップの処理について説明する。 FIG. 3 is a flowchart for explaining the flow of the estimation model construction process executed by the estimation model construction unit 108. Hereinafter, the processing of each step will be described using this flowchart.

目的変数・説明変数入力のステップ３０１では、推計モデル構築部１０８は、入力部１０２から入力された情報に基づき、推計対象とする変数を目的変数に決定し、推計に用いる情報を説明変数に決定する。なお、説明変数と目的変数の対応は、事前に説明変数と目的変数の情報がペアとなった情報を推計モデル構築用変数群記憶部１２７に記憶しておき、予め記憶された情報を読み出すことで決定しても良い。 In the objective variable / explanatory variable input step 301, the estimation model construction unit 108 determines a variable to be estimated as an objective variable based on information input from the input unit 102, and determines information used for estimation as an explanatory variable. To do. The correspondence between the explanatory variable and the objective variable is determined by storing the information of the explanatory variable and the objective variable as a pair in advance in the estimation model construction variable group storage unit 127 and reading out the previously stored information. You may decide by.

整形データ読出処理のステップ３０２では、推計モデル構築部１０８は、予めデータ整形部１０７で処理されて、整形データ記憶部１２２に記憶された推計モデル構築用の整形データ２０１の情報を、整形データ記憶部１２２から読み出す。 In step 302 of the shaping data reading process, the estimation model construction unit 108 stores the information of the shaping data 201 for construction of the estimation model previously processed by the data shaping unit 107 and stored in the shaping data storage unit 122. Read from the unit 122.

関連カテゴリ変数読出処理のステップ３０３では、推計モデル構築部１０８は、目的変数の情報に基づき、説明変数に含まれる変数の中に、目的変数を推計するにあたり、カテゴリ変数として扱うべき変数が存在するか否かを判定する。この判定は例えば、推計モデル構築部１０８が、関連カテゴリ記憶部１２４に記憶された関連カテゴリマトリクス（関連カテゴリ情報）５０１の情報を用いることで実施される。 In step 303 of the related categorical variable reading process, the estimation model construction unit 108 includes a variable that should be treated as a categorical variable in estimating the objective variable among the variables included in the explanatory variable based on the objective variable information. It is determined whether or not. This determination is performed, for example, when the estimation model construction unit 108 uses information of the related category matrix (related category information) 501 stored in the related category storage unit 124.

図５は、関連カテゴリ記憶部１２４に記憶された関連カテゴリを選択するためのマトリクスの情報である。関連カテゴリマトリクス５０１は、推計対象である対象変数の情報を行方向に設定しに、関連する変数に関する情報を列方向に設定する行列であって、各行および列には整形データ２０１に含まれる項目の情報が含まれる。 FIG. 5 shows matrix information for selecting a related category stored in the related category storage unit 124. The related category matrix 501 is a matrix that sets information on target variables that are estimation targets in the row direction, and sets information on related variables in the column direction. Each row and column includes items included in the shaped data 201. Information is included.

行方向の対象変数情報５０２は、関連を探索する元となる対象変数に関する情報が含まれる。列方向の関連変数情報５０３は、関連を探索する先となる項目に関する情報が含まれる。 The target variable information 502 in the row direction includes information related to the target variable from which the association is searched. The column-related related variable information 503 includes information on items to be searched for relationships.

図示の例では、基本情報Ｂ、Ｎ年度の情報として、Ａ、ｙ１、ｙ２、Ｎ＋１年度の情報として、Ａ、ｙ１、ｙ２の計７つの情報が含まれる例を示す。行列は０、１、または空の情報を含む。空のセルは、当該セルが含まれる行の対象変数が、当該セルが含まれる列の関連先の変数と、関連を持っていないことを示す。 In the example shown in the figure, as the basic information B and N year information, A, y1, y2, and N + 1 year information including A, y1, and y2 are included. The matrix contains 0, 1, or empty information. An empty cell indicates that the target variable of the row including the cell does not have an association with the related variable of the column including the cell.

０が入っているセルは、当該セルが含まれる行の対象変数が、当該セルが含まれる列の関連先の変数と、関連を持っており、また、関連変数が、推定時には既に観測済みの情報、すなわち、推定時には情報が整形データに含まれることを示す。 For a cell containing 0, the target variable of the row containing the cell has an association with the related variable of the column containing the cell, and the related variable has already been observed at the time of estimation. Information, that is, information at the time of estimation indicates that the information is included in the shaped data.

１が入っているセルは、当該セルが含まれる行の対象変数が、当該セルが含まれる列の変数と関連を持っており、また、推定時には未観測の情報、すなわち、推定時には情報が整形データに含まれていないことを示す。例えば、関連変数が、対象変数よりも過去の年度の情報に関する変数であれば、其の変数は観測済みの情報である。一方、関連変数が、対象変数と同じ年度に取得された情報であれば当該変数は推計時には、未観測の情報である。 For cells that contain 1, the target variable in the row that contains the cell has an association with the variable in the column that contains the cell, and information that has not been observed at the time of estimation, that is, information that has been formatted at the time of estimation. Indicates that it is not included in the data. For example, if the related variable is a variable related to information in the past year than the target variable, the variable is information that has been observed. On the other hand, if the related variable is information acquired in the same year as the target variable, the variable is unobserved information at the time of estimation.

関連カテゴリマトリクス５０１を用いて関連カテゴリを読み出す処理を、図４のフローチャートを用いて説明する。図４は、推計モデル構築処理の関連カテゴリ変数読出処理のステップ３０３で実行される処理のフローチャートである。 A process of reading a related category using the related category matrix 501 will be described with reference to the flowchart of FIG. FIG. 4 is a flowchart of the process executed in step 303 of the related category variable reading process of the estimation model construction process.

対象変数決定のステップ４０１では、推計モデル構築部１０８が、関連を探索する変数を決定する。具体的には、推計モデル構築部１０８が、推計モデル構築処理の目的変数・説明変数入力のステップ３０１で決定した推計対象の目的変数を、探索する元の変数として決定する。 In step 401 for determining the target variable, the estimation model construction unit 108 determines a variable for searching for a relationship. Specifically, the estimation model construction unit 108 determines the objective variable to be estimated, which is determined in step 301 for inputting the objective variable and explanatory variable of the estimation model construction processing, as the original variable to be searched.

関連カテゴリ変数探索のステップ４０２では、推計モデル構築部１０８が、関連カテゴリマトリクス５０１を用いて、対象変数と関連するカテゴリ変数を探索する。具体的には、推計モデル構築部１０８が、関連カテゴリマトリクス５０１で対象変数とする変数が含まれる行を探索し、０または１が含まれるセルが存在するか否かを探索する。推計モデル構築部１０８は、０または１が含まれるセルが存在する場合には、該当する列の関連変数を、対象変数に対応した関連カテゴリ変数として、セル内の値と対応付けて、記憶する。 In step 402 of the related category variable search, the estimation model construction unit 108 searches for a category variable related to the target variable using the related category matrix 501. Specifically, the estimation model construction unit 108 searches the related category matrix 501 for a row that includes a variable that is a target variable, and searches for a cell that includes 0 or 1. When there is a cell including 0 or 1, the estimation model construction unit 108 stores the related variable of the corresponding column in association with the value in the cell as the related categorical variable corresponding to the target variable. .

図５で示す関連カテゴリマトリクスを例に説明する。例えば、対象変数が、Ｎ＋１年度のＡの情報である場合は、推計モデル構築部１０８が、０が存在するセルを含む列であるＮ年度のＡを関連変数として選択する。例えば、推計モデル構築部１０８は、対象変数が、Ｎ＋１年度のｙ１の情報である場合は、１が存在するセルを含む列であるＮ＋１年度のＡの情報を関連カテゴリ変数として選択する。 The related category matrix shown in FIG. 5 will be described as an example. For example, when the target variable is information on A in N + 1 year, the estimation model construction unit 108 selects A in N year, which is a column including a cell in which 0 exists, as a related variable. For example, when the target variable is information on y1 in N + 1 year, the estimation model construction unit 108 selects information on A in N + 1 year that is a column including a cell in which 1 exists as a related categorical variable.

ステップ４０３では、推計モデル構築部１０８が、変数探索のステップ４０２で記憶された関連カテゴリ変数について判定を実施する。推計モデル構築部１０８は、選択された関連カテゴリ変数に、推定時に未確定のカテゴリ変数が含まれるか否かを判定し、含まれる場合は、未確定カテゴリ項目読出のステップ４０４に進む。一方、推計モデル構築部１０８は、推定時に未確定のカテゴリ変数が、含まれない、または、関連カテゴリがひとつも記憶されていない場合は、関連変数情報集約処理のステップ４０５に進む。 In step 403, the estimation model construction unit 108 performs a determination on the related category variable stored in step 402 of variable search. The estimation model construction unit 108 determines whether or not the selected related category variable includes an undetermined categorical variable at the time of estimation. If included, the process proceeds to step 404 for reading the undetermined category item. On the other hand, the estimation model construction unit 108 proceeds to step 405 of the related variable information aggregation process when the categorical variable that is not determined at the time of estimation is not included or no related category is stored.

なお、推計モデル構築部１０８は、推定時に未確定のカテゴリ変数であるか否かの判定は、関連カテゴリマトリクスから選択した関連カテゴリ変数に対応するセルの値が０であるか１であるかに基づいて判定することができる。 The estimation model construction unit 108 determines whether or not the categorical variable is undetermined at the time of estimation based on whether the value of the cell corresponding to the related category variable selected from the related category matrix is 0 or 1. It can be determined based on.

未確定カテゴリ項目対象変数化のステップ４０４では、推計モデル構築部１０８が、ステップ４０２で記憶した関連変数の中で、推定時に未確定のカテゴリ変数を新たな対象変数として設定した後、関連カテゴリ変数探索のステップ４０２に戻る。以後、全ての未確定の関連カテゴリ変数の処理を完了するまで上記ステップ４０２、４０３、４０４を反復して繰り返す。 In step 404 of making the uncertain category item target variable, the estimation model construction unit 108 sets the uncertain category variable at the time of estimation among the related variables stored in step 402 as a new target variable, and then the related category variable. Return to step 402 of the search. Thereafter, the above-described steps 402, 403, and 404 are repeated repeatedly until the processing of all undetermined related category variables is completed.

ステップ４０３の判定と、ステップ４０２、４０３、４０４の繰り返し処理について、図５で示す関連カテゴリマトリクス５０１を例に説明する。 The determination in step 403 and the repetition processing in steps 402, 403, and 404 will be described using the related category matrix 501 shown in FIG. 5 as an example.

例えば、対象変数が、Ｎ＋１年度のＡである場合は、推計モデル構築部１０８は、０が存在するセルを含む列であるＮ年度のＡが関連カテゴリ変数として記憶する。関連カテゴリマトリクス５０１では、０は、対象変数との関連関係において、関連変数が確定済の変数であることを示す。そのため、推計モデル構築部１０８は、ステップ４０３の判定でＮｏと判定し、ステップ４０５に進む。 For example, when the target variable is A in N + 1 year, the estimation model construction unit 108 stores A in N year, which is a column including a cell in which 0 exists, as a related category variable. In the related category matrix 501, 0 indicates that the related variable is a confirmed variable in the related relationship with the target variable. Therefore, the estimation model construction unit 108 determines No in the determination in step 403 and proceeds to step 405.

一方、対象変数が、Ｎ＋１年度のｙ１である場合は、推計モデル構築部１０８は、１が存在するセルを含む列であるＮ＋１年度のＡを関連カテゴリ変数として選択する。関連カテゴリマトリクス５０１では、１は、対象変数との関連関係において、関連変数が未確定の変数であることを示す。そのため、推計モデル構築部１０８は、ステップ４０３の判定においてＹｅｓと判定し、ステップ４０４に進む。 On the other hand, when the target variable is y1 in the year N + 1, the estimation model construction unit 108 selects A in the year N + 1, which is a column including the cell in which 1 exists, as the related category variable. In the related category matrix 501, 1 indicates that the related variable is an undetermined variable in the related relationship with the target variable. Therefore, the estimation model construction unit 108 determines Yes in the determination in step 403, and proceeds to step 404.

ステップ４０４では、推計モデル構築部１０８がＮ＋１年度のＡを新たな対象変数に設定する。その後、再度ステップ４０２において、推計モデル構築部１０８は、Ｎ＋１年度のＡに関連するカテゴリ変数として、Ｎ年度のＡを記憶する。この変数は、Ｎ＋１年度のＡとの関連において、セルに０の値を含み、推定時に確定済の変数であるため、推計モデル構築部１０８は、２回目のステップ４０４においてＮｏと判定し、繰り返し処理を終了し、ステップ４０５に進む。 In step 404, the estimation model construction unit 108 sets A for N + 1 year as a new target variable. Thereafter, in step 402 again, the estimation model construction unit 108 stores A in N year as a categorical variable related to A in N + 1 year. Since this variable is a variable that includes a value of 0 in the cell and is fixed at the time of estimation in relation to A in the year N + 1, the estimation model construction unit 108 determines No in the second step 404 and repeats. The process ends, and the process proceeds to step 405.

関連変数情報集約のステップ４０５では、推計モデル構築部１０８がこれまでの処理で記憶された関連カテゴリ変数の情報を集約し、関連カテゴリ変数の関連性を示すツリー構造を構築する。 In step 405 of related variable information aggregation, the estimation model construction unit 108 aggregates information on the related categorical variables stored in the processes so far, and constructs a tree structure indicating the relevance of the related categorical variables.

以下、本処理で構築するツリー構造について図６を用いて説明する。図６Ａ、図６Ｂは、図５の関連カテゴリマトリクス５０１を用いて、関連カテゴリ変数の読出処理で読み出される変数の関連性の例を表すツリー構造である。 Hereinafter, the tree structure constructed in this process will be described with reference to FIG. 6A and 6B are tree structures representing examples of the relevance of variables read out in the related category variable reading process using the related category matrix 501 in FIG.

図６Ａは、対象変数をＮ＋１年度のＡに設定して、関連カテゴリ変数の読出処理を実施した場合のツリー構造である。この例では、対象変数に関連するカテゴリ変数として、Ｎ年度のＡがひとつ読み出されている。図６Ａは、関連性を表すツリー構造であって、図中の四角は変数、矢印は関連性を表す。 FIG. 6A shows a tree structure when the target variable is set to A in the year N + 1 and the related category variable reading process is performed. In this example, one A in N year is read as a categorical variable related to the target variable. FIG. 6A shows a tree structure representing a relationship, in which a square represents a variable and an arrow represents a relationship.

図６ＡのＡ＿Ｎ＋１は、Ｎ＋１年度のＡの情報を表す変数、Ａ＿ＮはＮ年度のＡの情報を表す変数を表す。矢印は根元にある変数が関連変数であり、矢印の先にある変数が、根元の関連変数から影響を受ける対象変数である。 In FIG. 6A, A_N + 1 represents a variable representing A information in year N + 1, and A_N represents a variable representing A information in year N. In the arrow, the variable at the root is the related variable, and the variable at the tip of the arrow is the target variable affected by the related variable at the root.

このように、対象変数と関連する関連カテゴリ変数が目的変数の推定時に確定済みの変数である場合、対象変数と関連カテゴリ変数のツリー構造上の距離は１となる。このツリー構造は、前記第１の形態の推計モデルに対応する。 In this way, when the related categorical variable related to the target variable is a variable that has been determined when the objective variable is estimated, the distance on the tree structure between the target variable and the related categorical variable is 1. This tree structure corresponds to the estimation model of the first form.

図６Ｂは、対象変数をＮ＋１年度のｙ１の情報に設定して、関連カテゴリ変数の読処理を実施した場合のツリー構造である。この例では、対象変数に関連するカテゴリ変数として、Ｎ＋１年度のＡがひとつ目の関連変数として読み出され、次に、Ｎ＋１年度のＡの情報が、推定時に未確定であることから、当該変数を新たな対象変数として読み出されたＮ年度のＡの情報がふたつ目の関連変数として読み出されている。このとき、Ｎ＋１年度のｙ１の情報が、Ｎ＋１年度のＡの情報と関連し、さらに、Ｎ＋１年度のＡの情報が、Ｎ年度のＡの情報と関連している関連性が存在する。 FIG. 6B shows a tree structure when the target variable is set to the information of y1 in the year N + 1 and the related category variable is read. In this example, as the categorical variable related to the target variable, A in N + 1 year is read as the first related variable, and then the information of A in N + 1 year is uncertain at the time of estimation. Is read as a second related variable. At this time, there is a relationship in which the information of y1 in the year N + 1 is related to the information of A in the year N + 1 and the information of A in the year N + 1 is related to the information of A in the year N.

図６Ｂは、上記関連性の情報を表すツリー構造であって、ｙ１＿Ｎ＋１は、Ｎ＋１年度のｙ１の情報を意味する変数、Ａ＿Ｎ＋１は、Ｎ＋１年度のＡの情報を意味する変数、Ａ＿ＮはＮ年度のＡの情報を意味する変数を表す。図６Ｂより、Ｎ＿ＮがＡ＿Ｎ＋１に影響を与え、さらにＡ＿Ｎ＋１は、ｙ１＿Ｎ＋１に影響を与えるという関係性が示されている。 FIG. 6B is a tree structure showing the above-mentioned relevance information, where y1_N + 1 is a variable meaning y1 information in N + 1 year, A_N + 1 is a variable meaning A information in N + 1 year, and A_N is in N year A variable representing the information of A is represented. FIG. 6B shows a relationship in which N_N affects A_N + 1, and A_N + 1 affects y1_N + 1.

このように、対象変数と関連する関連カテゴリ変数に、推定時に未確定の変数が含まれる場合は、対象変数と関連カテゴリ変数のツリー構造の深さが２以上となる。このツリー構造は、前記第２の形態の推計モデルに対応する。図６Ｂの例では、ｙ１＿Ｎ＋１が根ノードを示し、Ａ＿Ｎ＋１が関連カテゴリ変数の内部ノードを示す。 As described above, when the related categorical variable related to the target variable includes a variable that is not determined at the time of estimation, the depth of the tree structure of the target variable and the related categorical variable is 2 or more. This tree structure corresponds to the estimation model of the second form. In the example of FIG. 6B, y1_N + 1 indicates the root node, and A_N + 1 indicates the internal node of the related categorical variable.

推計モデル構築部１０８は、本処理で構築したツリー構造を、関連カテゴリ記憶部１２４に記憶する。 The estimation model construction unit 108 stores the tree structure constructed by this processing in the related category storage unit 124.

以上、推計モデル構築処理内の、関連カテゴリ変数読出処理のステップ３０３について説明を終了する。以下、図３を用いた推計モデル構築処理の説明に戻る。 This is the end of the description of step 303 of the related category variable reading process in the estimation model construction process. Hereinafter, the description returns to the estimation model construction process using FIG.

データ分割ラベル付け処理のステップ３０４では、推計モデル構築部１０８が、推計モデル構築時に実施するデータの分割を行うための、個々のデータの分類用のラベル付け処理を行う。 In step 304 of the data division labeling process, the estimation model construction unit 108 performs a labeling process for classifying individual data to perform data division performed at the time of construction of the estimation model.

具体的には、まず、推計モデル構築部１０８は、整形データ２０１に含まれる説明変数のうち、関連カテゴリ変数読出処理のステップ３０３で読み出された関連カテゴリ変数の情報を選択する。次に、推計モデル構築部１０８は、関連カテゴリ変数読出処理のステップ３０３で構築された関連カテゴリ変数間のツリー構造に基づき、対象の変数毎に、当該変数を推計する推計モデルを構築するためのデータ分割用のラベル付けを実行する。以下では、図６Ａと図６Ｂの２つのツリー構造を用いて、それぞれの例を説明する。 Specifically, first, the estimation model construction unit 108 selects related category variable information read in step 303 of the related category variable reading process from the explanatory variables included in the shaping data 201. Next, the estimation model construction unit 108 constructs an estimation model for estimating the relevant variable for each target variable based on the tree structure between the relevant category variables constructed in Step 303 of the relevant category variable reading process. Perform labeling for data partitioning. Below, each example is demonstrated using two tree structure of FIG. 6A and FIG. 6B.

まず、図６Ａで示すツリー構造を処理する例を説明する。 First, an example of processing the tree structure shown in FIG. 6A will be described.

図６Ａのツリー構造では、目的変数は、Ａ＿Ｎ＋１、カテゴリ変数はＡ＿Ｎである。図７は、推計モデル構築用整形データ２２００の一例を示す図である。推計モデル構築用整形データ２２００は、整形データ２０１から、関連カテゴリ変数の項目の情報を選択した結果を示す。推計モデル構築用整形データ２２００は、Ｎｏ２２０１と、確定カテゴリ２２０２と、目的変数２２０３と、説明変数２２０４〜２２０５とをひとつのエントリに含む。 In the tree structure of FIG. 6A, the objective variable is A_N + 1 and the categorical variable is A_N. FIG. 7 is a diagram illustrating an example of the shaping data 2200 for estimating model construction. Estimated model construction shaping data 2200 indicates the result of selecting information on the item of the related category variable from the shaping data 201. The estimated model construction shaping data 2200 includes No 2201, a confirmed category 2202, an objective variable 2203, and explanatory variables 2204 to 2205 in one entry.

本実施例１では、カテゴリ変数Ａ＿Ｎと目的変数Ａ＿Ｎ＋１は、いずれもｙとｎの２種類の値を取りうる変数であるとする。なお、ｙは例えば、カテゴリに該当する場合に設定され、ｎは例えば、カテゴリに含まれない場合に設定される。データ分割ラベル付け処理のステップ３０４では、推計モデル構築用整形データ２２００の情報を用いて、推計が必要な変数毎に、データ分割用のラベル付けを実施する。 In the first embodiment, it is assumed that the categorical variable A_N and the objective variable A_N + 1 are variables that can take two types of values, y and n. For example, y is set when corresponding to a category, and n is set when not included in a category, for example. In step 304 of the data division labeling process, labeling for data division is performed for each variable that needs to be estimated using the information of the shaping data 2200 for constructing the estimation model.

図６Ａのツリー構造において、推計が必要な変数は目的変数であるＡ＿Ｎ＋１のひとつだけであるので、推計モデル構築部１０８は、Ａ＿Ｎ＋１を推計するモデルを構築するためのデータ分割処理用のラベル付けを行う。このラベルは、ツリー構造の中で、推計対象である変数の先祖となる変数群の全ての状態の組合せが一意になるように設定する。 In the tree structure of FIG. 6A, the estimation model construction unit 108 labels the data for the data division processing for constructing the model for estimating A_N + 1 because only one of the objective variables A_N + 1 is required to be estimated. Do. This label is set so that all combinations of states of variable groups that are ancestors of variables to be estimated are unique in the tree structure.

例えば、先祖となる変数の直積値を取る変数でラベル付けすることで実施できる。図６Ａのツリーでは、Ａ＿Ｎ＋１の先祖ノードは、Ａ＿Ｎのひとつだけであるので、Ａ＿Ｎの値を直接用いることで、ラベル付けできる。 For example, it can be implemented by labeling with a variable that takes the direct product value of the ancestor variable. In the tree of FIG. 6A, since there is only one A_N ancestor node of A_N + 1, it can be labeled by directly using the value of A_N.

図８Ａは、図６Ａのツリーに関し、データ分割用のラベルを付与した結果を示すラベル付けテーブル２３００である。関連カテゴリ２３０１は、ラベル付けに用いた変数である。データ分割ラベル２３０２は、付与されたラベルである。図示の例では、Ａ＿Ｎの値をそのままラベルに用いた例を示している。 FIG. 8A is a labeling table 2300 showing the result of assigning labels for data division with respect to the tree of FIG. 6A. The related category 2301 is a variable used for labeling. The data division label 2302 is a given label. In the illustrated example, the value of A_N is used as it is for the label.

図８Ｂは、ラベル付けテーブル２３１０の一例を示す図である。ラベル付けテーブル２３１０は、Ｎｏ２３１１と、確定カテゴリ２３１２と、推計対象２３１３と、データ分割ラベル２３１４とをひとつのエントリに含む。 FIG. 8B shows an example of the labeling table 2310. The labeling table 2310 includes No 2311, a confirmed category 2312, an estimation target 2313, and a data division label 2314 in one entry.

図８Ｂは、図８Ａで示したラベルを用いて、図７に示した推計モデル構築用整形データ２２００にラベル付けした例である。データ分割ラベル２３１４は、カテゴリ変数の情報に基づき生成されたラベルの列である。 FIG. 8B is an example in which the estimation model construction shaping data 2200 shown in FIG. 7 is labeled using the label shown in FIG. 8A. The data division label 2314 is a column of labels generated based on categorical variable information.

次に、図６Ｂで示すツリーを処理する例を説明する。 Next, an example of processing the tree shown in FIG. 6B will be described.

図９は、推計モデル構築用整形データ２２１０の一例を示す図である。推計モデル構築用整形データ２２１０は、整形データ２０１から、図６Ｂの情報に基づき、関連カテゴリ変数項目の情報を選択した結果を示す。推計モデル構築用整形データ２２１０は、Ｎｏ２２１１と、確定カテゴリ２２１２と、未確定カテゴリ２２１３と、目的変数２２１４と、説明変数２２１５〜２２１６とをひとつのエントリに含む。 FIG. 9 is a diagram illustrating an example of the shaping data 2210 for estimating model construction. Estimated model construction shaping data 2210 indicates a result of selecting related category variable item information from shaping data 201 based on the information of FIG. 6B. The estimated model construction shaping data 2210 includes No 2211, a confirmed category 2212, an unconfirmed category 2213, an objective variable 2214, and explanatory variables 2215 to 2216 in one entry.

図６Ｂの例では、目的変数は、ｙ１＿Ｎ＋１であり、カテゴリ変数は、Ａ＿Ｎ＋１とＡ＿Ｎのふたつが存在する。このうち、Ａ＿Ｎは確定カテゴリであり、Ａ＿Ｎ＋１は未確定カテゴリである。 In the example of FIG. 6B, the objective variable is y1_N + 1, and there are two categorical variables, A_N + 1 and A_N. Of these, A_N is a confirmed category, and A_N + 1 is an unconfirmed category.

ここでは、カテゴリ変数Ａ＿ＮとＡ＿Ｎ＋１は、いずれもｙとｎの２種類の値を取りうる変数であるとする。データ分割ラベル付け処理のステップ３０４では、この情報に基づき、推計が必要な変数毎に、当該変数を推計する推計モデル構築時にデータを分割する際のラベル付けを実行する。 Here, it is assumed that the categorical variables A_N and A_N + 1 are variables that can take two types of values, y and n. In step 304 of the data division labeling process, based on this information, for each variable that requires estimation, labeling is performed when data is divided when an estimation model for estimating the variable is constructed.

図６Ｂのツリーにおいて推計が必要な変数は、目的変数であるｙ１＿Ｎ＋１と、カテゴリ変数Ａ＿Ｎ＋１のふたつである。そこで、ｙ１＿Ｎ＋１を推計する推計モデルを構築するためのデータ分割処理におけるラベル付けと、Ａ＿Ｎ＋１の推計を実行するモデルを構築するためのデータ分割処理におけるラベル付けの、ふたつのラベル付け処理を実行する。 The variables that need to be estimated in the tree of FIG. 6B are the target variable y1_N + 1 and the categorical variable A_N + 1. Therefore, two labeling processes are executed: labeling in the data division process for constructing the estimation model for estimating y1_N + 1 and labeling in the data division process for constructing the model for executing the estimation of A_N + 1.

これらのラベル付けも、図６Ａの例と同様に、推計対象の変数の先祖となる変数の値の組合せが一意となるようにラベル付けをすることで実施できる。 Similar to the example of FIG. 6A, these labeling can also be performed by labeling so that combinations of values of variables that are ancestors of variables to be estimated are unique.

図１０Ａは、ラベル付けテーブル２３２０の一例を示す図である。図１０Ａは、図６Ｂのツリーに関し、カテゴリ変数Ａ＿Ｎ＋１を推計するモデル構築のためのデータ分割ラベルを付与された結果を示すラベル付けテーブル２３２０である。 FIG. 10A is a diagram illustrating an example of the labeling table 2320. FIG. 10A is a labeling table 2320 showing a result of giving a data division label for model construction for estimating the categorical variable A_N + 1 with respect to the tree of FIG. 6B.

関連カテゴリ２３２１は、ラベル付けに用いた変数である。データ分割ラベル２３２２は、付与されたラベルである。この例では、推計対象のＡ＿Ｎ＋１の先祖となる変数は確定カテゴリのＡ＿Ｎのみであるため、当該値をそのままラベルに用いた例を示している。 The related category 2321 is a variable used for labeling. The data division label 2322 is a given label. In this example, since the variable serving as the ancestor of A_N + 1 to be estimated is only the confirmed category A_N, the value is used as it is for the label.

図１０Ｂは、ラベル付けテーブル２３３０の一例を示す図である。図１０Ｂは、図１０Ａで示したラベルを用いて、図７に示した推計モデル構築用整形データ２２００のデータにラベル付けした例である。データ分割ラベル２３３４が、カテゴリ変数の情報に基づき付与されたラベルの列を表す。 FIG. 10B is a diagram illustrating an example of the labeling table 2330. FIG. 10B is an example in which the data of the estimation model construction shaped data 2200 shown in FIG. 7 is labeled using the label shown in FIG. 10A. A data division label 2334 represents a column of labels assigned based on the information of the categorical variable.

図１０Ｃは、ラベル付けテーブル２３４０の一例を示す図である。図１０Ｃは、図６Ｂのツリーに関し、目的変数ｙ１＿Ｎ＋１を推計するモデル構築のためのデータ分割ラベルを付与した結果を示すラベル付けテーブル２３４０である。 FIG. 10C is a diagram illustrating an example of the labeling table 2340. FIG. 10C is a labeling table 2340 showing the result of assigning data division labels for model construction for estimating the objective variable y1_N + 1 with respect to the tree of FIG. 6B.

関連カテゴリ２３４１、２３４２は、当該ラベル付けに用いた変数である。この例では、推計対象のｙ１＿Ｎ＋１の先祖となる変数は、確定カテゴリのＡ＿Ｎと、未確定カテゴリのＡ＿Ｎ＋１のふたつが存在するため、それら２つの値の組合せを用いた値でラベル付けした例を示している。 Related categories 2341 and 2342 are variables used for labeling. In this example, there are two types of variables that are the ancestors of y1_N + 1 to be estimated, A_N of the definite category and A_N + 1 of the undetermined category, so an example of labeling with a value using a combination of these two values is shown. ing.

図１０Ｄは、ラベル付けテーブル２３５０の一例を示す図である。ラベル付けテーブル２３５０は、Ｎｏ２３５１と、確定カテゴリ２３５２と、未確定カテゴリ２３５３と、推計対象２３５４とデータ分割ラベル２３５５とを一つのエントリに含む。 FIG. 10D is a diagram illustrating an example of the labeling table 2350. The labeling table 2350 includes No 2351, a confirmed category 2352, an unconfirmed category 2353, an estimation target 2354, and a data division label 2355 in one entry.

図１０Ｄは、図１０Ｃで示したデータ分割ラベル２３４３を用いて、図９Ｂに示したデータにラベル付けした例である。データ分割ラベル２３５５が、カテゴリ変数の情報に基づき付与されたラベルの列を表す。これらのラベル付け処理は、ツリー構造の中で、根である目的変数との距離が遠いノードに対応する変数から順に、実行する。 FIG. 10D is an example in which the data shown in FIG. 9B is labeled using the data division label 2343 shown in FIG. 10C. A data division label 2355 represents a column of labels assigned based on the information of the categorical variable. These labeling processes are executed in order from the variable corresponding to the node that is far from the root target variable in the tree structure.

図６Ｂの例ではまず、推計モデル構築部１０８が、Ａ＿Ｎ＋１についてラベル付け処理を実行する。図６Ｂのツリーでは、Ａ＿Ｎ＋１の先祖ノードは、Ａ＿Ｎのひとつだけであるので、Ａ＿Ｎの値を用いて、ラベル付けする。 In the example of FIG. 6B, first, the estimation model construction unit 108 executes a labeling process for A_N + 1. In the tree of FIG. 6B, since there is only one A_N ancestor node of A_N + 1, labeling is performed using the value of A_N.

次に、推計モデル構築部１０８は、ｙ１＿Ｎ＋１についてラベル付け処理を実行する。図６Ｂのツリーでは、ｙ１＿Ｎ＋１の先祖ノードは、Ａ＿Ｎ＋１とＡ＿Ｎのふたつが存在するので、推計モデル構築部１０８は、Ａ＿Ｎ＋１の値と、Ａ＿Ｎの値の直積を取った値でラベル付けする。推計モデル構築部１０８は、ラベル付けした情報を、データ分割情報記憶部１２５に記憶する。 Next, the estimation model construction unit 108 executes a labeling process for y1_N + 1. In the tree of FIG. 6B, since there are two ancestor nodes of y1_N + 1, A_N + 1 and A_N, the estimation model construction unit 108 labels the value by taking the direct product of the value of A_N + 1 and the value of A_N. The estimation model construction unit 108 stores the labeled information in the data division information storage unit 125.

ステップ３０５では、推計モデル構築部１０８が、関連カテゴリ変数読出処理のステップ３０３で読出した未確定の関連カテゴリ変数に関して、推計モデルを未構築の変数が含まれるか否かを判定する。推計モデル構築部１０８は、未確定の関連カテゴリ変数が読み出されていない、もしくは、全ての未確定の関連カテゴリについて推計モデルを構築している場合には、カテゴリ別目的変数推計モデル構築処理のステップ３０７に進む。一方、推計モデル構築部１０８が、推計モデルを構築していない未確定の関連カテゴリが存在する場合には、未確定カテゴリ推計モデル構築処理のステップ３０６に進む。 In step 305, the estimation model construction unit 108 determines whether or not the estimation model includes an unconstructed variable with respect to the uncertain related category variable read in step 303 of the related category variable reading process. When the uncertain related category variable has not been read or when the estimation model has been constructed for all uncertain related categories, the estimation model construction unit 108 performs the category-specific objective variable estimation model construction processing. Proceed to step 307. On the other hand, if there is an undetermined related category for which the estimation model has not been constructed, the estimation model construction unit 108 proceeds to step 306 of the undetermined category estimation model construction process.

未確定カテゴリ推計モデル構築処理のステップ３０６は、推計モデル構築部１０８の未確定カテゴリ推計モデル構築部１１０で実施される処理である。未確定カテゴリ推計モデル構築部１１０は、関連カテゴリ変数読出処理のステップ３０３で読み出した関連カテゴリ変数及びツリーの情報と、データ分割ラベル付け処理のステップ３０４で処理したラベル情報を用いて、未確定カテゴリ変数を推計する推計モデルを構築する。 Step 306 of the uncertain category estimation model construction process is a process executed by the uncertain category estimation model construction unit 110 of the estimation model construction unit 108. The unconfirmed category estimation model construction unit 110 uses the related category variable and tree information read in step 303 of the related category variable reading process and the label information processed in step 304 of the data division labeling process, to determine the unconfirmed category. Build an estimation model for estimating variables.

このとき、未確定カテゴリ推計モデル構築部１１０は、推計モデルの構築対象の変数に関し、データ分割ラベル付け処理でラベル付けした情報に基づき、データを分割し、それぞれ推計モデルを構築する。未確定カテゴリ推計モデル構築部１１０は、例えば、図１０Ａに示すラベル付けテーブルを用いて、未確定カテゴリＡ＿Ｎ＋１に関する推計モデルを構築する場合、ラベルの値が同一であるデータに、整形データ２０１を分割し、それぞれのデータについて、目的変数と説明変数の関係性を学習する。 At this time, the unconfirmed category estimation model construction unit 110 divides data based on the information labeled in the data division labeling process with respect to the estimation target variable, and constructs an estimation model. For example, when the estimation model for the uncertain category A_N + 1 is constructed using the labeling table shown in FIG. 10A, the uncertain category estimation model construction unit 110 divides the shaped data 201 into data having the same label value. The relationship between the objective variable and the explanatory variable is learned for each data.

本実施例１では、上述のように、ラベルはｙとｎの二種類が存在するため、未確定カテゴリ推計モデル構築部１１０は、整形データ２０１を２つに分割し、それぞれのデータから推計モデルを構築する。推計モデルの構築は、例えば多項ロジスティック回帰や、ＢｏｏｓｔｉｎｇＴｒｅｅなどの公知または周知の手法を用いることができる。 In the first embodiment, as described above, since there are two types of labels, y and n, the uncertain category estimation model construction unit 110 divides the shaped data 201 into two, and estimates models from the respective data. Build up. The estimation model can be constructed using, for example, a known or well-known method such as multinomial logistic regression or BoostingTree.

目的変数推計モデル構築処理のステップ３０７では、推計モデル構築部１０８の目的変数推計モデル構築部１０９が、関連カテゴリ変数読出処理のステップ３０３で読み出した関連カテゴリ変数及びツリーの情報と、データ分割ラベル付け処理のステップ３０４で処理したラベル情報を用いて、目的変数を推計する推計モデルを関連するカテゴリ変数の値ごとに構築する。 In step 307 of the objective variable estimation model construction process, the objective variable estimation model construction section 109 of the estimation model construction section 108 and related category variable and tree information read in step 303 of the related category variable readout process, and data division labeling Using the label information processed in step 304 of the processing, an estimation model for estimating the objective variable is constructed for each related categorical variable value.

例えば、図８Ａに示すラベル付けテーブル２３００を用いて、目的変数Ａ＿Ｎ＋１に関する推計モデルを構築する場合、目的変数推計モデル構築部１０９は、対象とするラベルの値が同一であるデータに、整形データ２０１を分割し、それぞれのデータについて、目的変数と説明変数の関係性を学習する。 For example, when an estimation model related to the objective variable A_N + 1 is constructed using the labeling table 2300 shown in FIG. 8A, the objective variable estimation model construction unit 109 converts the formatted data 201 into data having the same target label value. And the relationship between the objective variable and the explanatory variable is learned for each data.

本実施例１では、ラベルはｙとｎの二種類が存在するため、推計モデル構築部１０８が整形データ２０１を２つに分割し、それぞれのデータから推計モデルを構築する。例えば、推計モデル構築部１０８は、図１０Ｄに示すラベル付けテーブル２３３０を用いて、目的変数ｙ１＿Ｎ＋１（２３５５）に関する推計モデルを構築する場合、（ｙ、ｙ）、（ｙ、ｎ）、（ｎ、ｙ）、（ｎ、ｎ）の４種類のラベルに基づき、整形データ２０１を４つに分割し、それぞれのデータから推計モデルを構築する。推計モデルの構築は、例えば重回帰や多項ロジスティック回帰やＢｏｏｓｔｉｎｇＴｒｅｅなどの広く公知の手法を用いることができる。 In the first embodiment, since there are two types of labels, y and n, the estimation model construction unit 108 divides the shaped data 201 into two, and constructs an estimation model from each data. For example, when the estimation model construction unit 108 constructs an estimation model related to the objective variable y1_N + 1 (2355) using the labeling table 2330 shown in FIG. 10D, (y, y), (y, n), (n, y) Based on the four types of labels (n, n), the shaped data 201 is divided into four, and an estimation model is constructed from each data. For the construction of the estimation model, widely known methods such as multiple regression, multinomial logistic regression, and BoostingTree can be used.

上記処理で構築した未確定カテゴリ推計モデルおよび目的変数推計モデルは、推計モデル記憶部１２６に記憶する。 The uncertain category estimation model and the objective variable estimation model constructed by the above processing are stored in the estimation model storage unit 126.

次に、図３のステップ３０８で行われる統合推計モデル構築処理について、以下に説明する。統合推計モデル構築処理では、まず、将来推計部１１１で処理を行った後に、推計情報集約部１１２で処理が行われる。 Next, the integrated estimation model construction process performed in step 308 of FIG. 3 will be described below. In the integrated estimation model construction process, first, the future estimation unit 111 performs the process, and then the estimation information aggregation unit 112 performs the process.

まず、将来推計部１１１の処理について説明する。将来推計部１１１では、推計モデル記憶部１２６に記憶された推計モデルに基づき、個人（整形データ２０１）毎の情報の将来推計を実施する。 First, the process of the future estimation unit 111 will be described. The future estimation unit 111 performs future estimation of information for each individual (shaped data 201) based on the estimation model stored in the estimation model storage unit 126.

図１１は、将来推計部１１１で実施する処理のフローチャートを示す図である。以下、各ステップについて説明する。 FIG. 11 is a diagram illustrating a flowchart of processing performed by the future estimation unit 111. Hereinafter, each step will be described.

推計対象情報入力処理のステップ１１０１では、将来推計部１１１が、推計対象となるデータの情報と、目的変数の情報と、説明変数の情報と、推計に用いるモデルの情報などの必要な情報を入力部１０２から受け付ける。なお、推計対象のデータとしては、入力部１０２から入力された情報以外にも、予め整形データ記憶部１２２に記憶されたデータを用いてもよい。 In step 1101 of the estimation target information input process, the future estimation unit 111 inputs necessary information such as information on data to be estimated, information on objective variables, information on explanatory variables, and information on models used for estimation. Accept from the unit 102. In addition to the information input from the input unit 102, data stored in advance in the shaped data storage unit 122 may be used as the estimation target data.

モデル読出処理のステップ１１０２では、将来推計部１１１が、対象の目的変数と使用する説明変数に対応した目的変数推計モデルおよび、未確定カテゴリ推計モデルを、推計モデル記憶部１２６から読み出す。 In step 1102 of the model reading process, the future estimation unit 111 reads from the estimation model storage unit 126 the target variable estimation model and the uncertain category estimation model corresponding to the target variable and the explanatory variable to be used.

データ分割処理のステップ１１０３では、データ分割情報記憶部１２５に記憶された、データ分割用のラベル情報を読み出す。 In step 1103 of the data division process, the label information for data division stored in the data division information storage unit 125 is read.

ステップ１１０４では、将来推計部１１１が、未推計の未確定カテゴリが存在するか否かを判定する。これは、将来推計部１１１が、モデル読出処理のステップ１１０１で読出した推計モデルの中に、未確定カテゴリ変数を推定するための推計モデル（以下、未確定カテゴリ変数推計モデル）が存在するか否かで判定できる。 In step 1104, the future estimation unit 111 determines whether there is an unestimated unconfirmed category. This is because whether or not there is an estimation model (hereinafter referred to as an undetermined categorical variable estimation model) for estimating an undetermined categorical variable in the estimation model read by the future estimation unit 111 in step 1101 of the model reading process. Can be determined.

将来推計部１１１は、未確定カテゴリ変数が存在し、当該変数の値が未確定である場合は、未確定カテゴリ推計処理のステップ１１０５に進む。一方、将来推計部１１１は、未推定の未確定カテゴリ変数が存在しない場合は、目的変数推計処理のステップ１１０６に進む。 If there is an undetermined categorical variable and the value of the variable is undetermined, the future estimating unit 111 proceeds to step 1105 of the undetermined category estimating process. On the other hand, if there is no unestimated uncertain categorical variable, the future estimation unit 111 proceeds to step 1106 of the objective variable estimation process.

未確定カテゴリ推計処理のステップ１１０５では、将来推計部１１１が、未確定カテゴリ変数推計モデルの中で、適用可能なものをひとつ選択し、適用する。適用は以下のふたつの場合分けで実施される。 In step 1105 of the uncertain category estimation process, the future estimation unit 111 selects and applies one of the uncertain category variable estimation models that can be applied. The application will be implemented in the following two cases.

ひとつ目の例を説明する。未確定カテゴリ推計モデルがひとつしか存在しない場合、将来推計部１１１は、当該未確定カテゴリ推計モデルを用いて、未確定カテゴリ変数を推計する。この場合、未確定カテゴリ変数は、他に関連カテゴリを持たない、あるいは、関連するカテゴリが全て確定済カテゴリである、のいずれかの条件を満たす。 The first example will be described. If there is only one uncertain category estimation model, the future estimation unit 111 estimates an uncertain category variable using the uncertain category estimation model. In this case, the unconfirmed category variable satisfies one of the conditions that there is no other related category or that all related categories are confirmed categories.

未確定カテゴリ変数が関連カテゴリを持たない場合は、将来推計部１１１が全ての整形データ２０１を用いて、説明変数から、未確定カテゴリの情報を推計する。未確定カテゴリ変数が関連するカテゴリをもつ場合は、将来推計部１１１は当該関連カテゴリの値ごとに整形データ２０１を分割し、分割した各データに、各カテゴリの値に対応した未確定カテゴリ推計モデルを適用して、未確定カテゴリ変数を推計する。 If the uncertain category variable does not have a related category, the future estimation unit 111 estimates information on the uncertain category from the explanatory variables using all the shaping data 201. When the uncertain category variable has a related category, the future estimation unit 111 divides the shaped data 201 for each value of the related category, and the uncertain category estimation model corresponding to the value of each category is divided into the divided data. To estimate uncertain categorical variables.

ふたつ目の例を説明する。未確定カテゴリ推計モデルがふたつ以上存在する場合、未確定カテゴリ推計モデルの中に、未確定カテゴリに関連するカテゴリが未確定のものが含まれる場合がある。 The second example will be described. When there are two or more uncertain category estimation models, the uncertain category estimation model may include an uncertain category related to the uncertain category.

例えば、未確定カテゴリ推計モデルＸと未確定カテゴリ推計モデルＹが存在し、推計モデルＸの推計対象が未確定カテゴリ変数Ｃであって、推計モデルＹの推計対象が未確定カテゴリ変数Ｄで、推計モデルＹに含まれる関連カテゴリがＣ、である場合を考える。 For example, there are an uncertain category estimation model X and an uncertain category estimation model Y, the estimation target of the estimation model X is an uncertain category variable C, and the estimation target of the estimation model Y is an uncertain category variable D. Consider a case where the related category included in model Y is C.

この場合、関連カテゴリＣは未確定であるため、このＣの値が確定しない限り、推計モデルＹを用いた推計を実施することができない。そこで、将来推計部１１１は、まず推計モデルＸを適用して、未確定カテゴリ変数Ｃに関する情報を推計し、次に推計した変数Ｃの情報に基づき、推計モデルＹを適用する。 In this case, since the related category C is undetermined, the estimation using the estimation model Y cannot be performed unless the value of C is fixed. Therefore, the future estimation unit 111 first applies the estimation model X to estimate information about the undefined category variable C, and then applies the estimation model Y based on the information of the estimated variable C.

このように、将来推計部１１１は、関連するカテゴリが未確定のものが含まれないものから順に推計することで、階層的に推計を実施する。具体的には、例えば、カテゴリ変数Ｃがｙとｎの２種類の値を取りうる情報である場合には、将来推計部１１１が、まず推計モデルＸを用いて、ｙの確率を０．６、ｎの確率を０．４、などと確率値を推計する。 As described above, the future estimation unit 111 performs the estimation hierarchically by estimating in order from the related category that does not include the uncertain category. Specifically, for example, when the categorical variable C is information that can take two types of values y and n, the future estimation unit 111 first uses the estimation model X to set the probability of y to 0.6. , And the probability value of n is estimated as 0.4.

次に、将来推計部１１１は、推計モデルＹを用いて、未確定カテゴリ変数Ｃがｙの場合の未確定カテゴリ変数Ｄの値Ｄｙと、未確定カテゴリ変数Ｃがｎの場合の未確定カテゴリ変数Ｄの値Ｄｎ、の２つの情報を推計する。 Next, the future estimation unit 111 uses the estimation model Y to determine the value Dy of the undefined category variable D when the undefined category variable C is y, and the undefined category variable when the undefined category variable C is n. Two pieces of information of the value Dn of D are estimated.

最後に将来推計部１１１は、これらの情報から期待値を計算し、０．６×Ｄｙ＋０．４×Ｄｎを未確定カテゴリ変数の推計値とする。なお、未確定カテゴリ変数の値の確率値を用いて推計する場合においても、他の確定カテゴリ変数に関しては、データ分割情報記憶部１２５に記憶されたラベル情報に基づき、将来推計部１１１は、データを分割して適用する。 Finally, the future estimation unit 111 calculates an expected value from these pieces of information, and sets 0.6 × Dy + 0.4 × Dn as an estimated value of an undetermined categorical variable. Even in the case of estimation using the probability value of the value of the undefined category variable, the future estimation unit 111 uses the data stored in the data division information storage unit 125 to calculate the future estimation unit 111 for data Apply by dividing.

次に、目的変数推計処理のステップ１１０６では、将来推計部１１１が目的変数推計モデルを適用する。目的変数推計モデルの適用は以下の三つの場合分けで実施される。 Next, in step 1106 of the objective variable estimation process, the future estimation unit 111 applies the objective variable estimation model. The objective variable estimation model is applied in the following three cases.

ひとつ目の例を説明する。目的変数推計モデルに関連カテゴリがひとつも存在しないとき、将来推計部１１１は全ての整形データ２０１を用いて、説明変数から目的値の情報を推計する。 The first example will be described. When there is no related category in the objective variable estimation model, the future estimation unit 111 estimates information on the objective value from the explanatory variables using all the shaping data 201.

ふたつ目の例を説明する。目的変数推計モデルに含まれる関連カテゴリが全て確定カテゴリ変数である場合、将来推計部１１１は各カテゴリの値ごとに整形データ２０１を分割し、分割した各データに、各カテゴリの値に対応した未確定カテゴリ推計モデルを適用して、未確定カテゴリ変数を推計する。 The second example will be described. When all the related categories included in the objective variable estimation model are definite categorical variables, the future estimation unit 111 divides the shaped data 201 for each category value, and sets the divided data to the uncorresponding value for each category. Estimate uncertain categorical variables by applying a definite category estimation model.

三つ目の例を説明する。目的変数推計モデルに含まれる関連カテゴリに未確定カテゴリ変数が含まれる場合、未確定カテゴリ変数推計モデルで推計した確率情報を用いて推計する。例えば、未確定カテゴリ推計モデルＹと目的変数推計モデルＺが存在し、未確定カテゴリ推計モデルＹの推計対象が未確定カテゴリ変数Ｄであって、目的変数推計モデルＺの推計対象が変数ｚ，目的変数推計モデルＺに含まれる関連カテゴリが未確定カテゴリ変数Ｄ、である場合を考える。 A third example will be described. When the uncertain categorical variable is included in the related category included in the objective variable estimation model, the estimation is performed using the probability information estimated by the uncertain category variable estimation model. For example, the uncertain category estimation model Y and the objective variable estimation model Z exist, the estimation target of the uncertain category estimation model Y is the uncertain category variable D, the estimation target of the objective variable estimation model Z is the variable z, the purpose Consider a case where the related category included in the variable estimation model Z is the uncertain categorical variable D.

この場合、未確定カテゴリ変数Ｄは未確定であるため、この値が確定しない限り、目的変数推計モデルＺを用いた推計を実施することができない。そこで将来推計部１１１は、未確定カテゴリ推計モデルＹを用いて推計した未確定カテゴリ変数Ｄに関する確率情報を用いて、目的変数を推計する。具体的には、例えば、未確定カテゴリ変数Ｄがｙとｎの２種類の値を取りうる情報であって、未確定カテゴリ推計モデルＹによって推計された未確定カテゴリ変数Ｄの確率がｙの確率を０．３、ｎの確率を０．７、などの確率値であるとする。 In this case, since the undetermined categorical variable D is undetermined, estimation using the objective variable estimation model Z cannot be performed unless this value is confirmed. Therefore, the future estimation unit 111 estimates the objective variable using the probability information related to the undefined category variable D estimated using the undefined category estimation model Y. Specifically, for example, the uncertain category variable D is information that can take two types of values y and n, and the probability of the uncertain category variable D estimated by the uncertain category estimation model Y is the probability of y Is a probability value such as 0.3, and the probability of n is 0.7.

このとき、まず将来推計部１１１は、目的変数推計モデルＺを用いて、未確定カテゴリ変数Ｄがｙの場合の目的変数ｚの値ｚｙと、未確定カテゴリ変数Ｄがｎの場合の目的変数ｚの値ｚｎの２つの情報を推計する。最後にこれらの情報から期待値を計算し、０．３×ｚｙ＋０．７×ｚｎを目的変数ｚの推計値とする。 At this time, first, the future estimation unit 111 uses the objective variable estimation model Z to determine the value zy of the objective variable z when the undetermined categorical variable D is y and the objective variable z when the undetermined categorical variable D is n. Two pieces of information of value zn are estimated. Finally, an expected value is calculated from these pieces of information, and 0.3 × zy + 0.7 × zn is set as an estimated value of the objective variable z.

本推計処理で推計した個人毎の推計情報は、整形データ２０１と合わせて、推計情報記憶部１２８に記憶される。 The estimation information for each individual estimated in this estimation process is stored in the estimation information storage unit 128 together with the shaping data 201.

次に、推計情報集約部１１２の処理について説明する。将来推計部１１１では、推計情報記憶部１２８に記憶された個人毎の推計結果を、ある特定のカテゴリ変数に基づいて集約し、集団毎の推計結果を生成する。 Next, processing of the estimation information aggregation unit 112 will be described. The future estimation unit 111 aggregates the individual estimation results stored in the estimation information storage unit 128 based on a specific categorical variable, and generates an estimation result for each group.

図１４は、推計情報集約部１１２で実施される処理の一例を示すフローチャートを示す図である。以下、各ステップについて説明する。 FIG. 14 is a flowchart illustrating an example of processing performed by the estimation information aggregation unit 112. Hereinafter, each step will be described.

推計情報読出のステップ１４０１では、推計情報集約部１１２が、推計情報記憶部１２８に記憶された推計結果の情報を読み出す。 In step 1401 for reading out estimated information, the estimated information aggregating unit 112 reads out information on the estimation result stored in the estimated information storage unit 128.

集約指標決定のステップ１４０２では、推計情報集約部１１２が、推計情報記憶部１２８から読出した推計結果の情報を集約する指標を決定する。推計情報集約部１１２は、指標として、推計結果の情報と合わせて記憶された個人毎の整形データ２０１に含まれる少なくともひとつ以上のカテゴリ変数を、集約する指標として決定する。指標としては、例えば、個人毎の基本情報に含まれる性別や年齢を１０歳ごとに階級化した情報や、介護保険の階級や、住所情報から取得した所在地に関する情報、などが挙げられる。 In step 1402 for determining the aggregated index, the estimated information aggregating unit 112 determines an index for aggregating the estimation result information read from the estimated information storage unit 128. The estimation information aggregating unit 112 determines, as an index, at least one or more categorical variables included in the shaping data 201 for each individual stored together with the estimation result information as an index to be aggregated. Examples of the index include information obtained by classifying the gender and age included in the basic information for each individual every 10 years, the class of care insurance, and information on the location acquired from the address information.

集約方法決定のステップ１４０３では、推計情報集約部１１２が、集約指標決定のステップ１４０２で決定したカテゴリ変数に関して、推計情報をどのように集計するかを決定する。例えば、カテゴリ変数毎に集約した情報の合計、平均、最大値、最小値、中央値、最頻値、標準偏差などの集計方法を決定する。 In step 1403 for determining the aggregation method, the estimation information aggregating unit 112 determines how to estimate the estimation information for the categorical variable determined in step 1402 for determining the aggregation index. For example, a totaling method such as the sum, average, maximum value, minimum value, median value, mode value, and standard deviation of information aggregated for each categorical variable is determined.

集約処理のステップ１４０４では、推計情報集約部１１２が、集約指標決定のステップ１４０２で決定した集約指標を、集約方法決定のステップ１４０３で決定した集計方法で、推計情報を集計する。集計した結果は、推計情報記憶部１２８に記憶する。 In step 1404 of the aggregation process, the estimation information aggregating unit 112 aggregates the estimated information by the aggregation method determined in step 1403 of the aggregation method determination using the aggregation index determined in step 1402 of the aggregation index determination. The totaled results are stored in the estimation information storage unit 128.

データ分析部１０１が出力部１０３に表示するグラフィカルユーザインターフェイスの一例を以下に示す。 An example of a graphical user interface displayed on the output unit 103 by the data analysis unit 101 is shown below.

図１２は、本実施例１を実現する操作画面の一例を示すグラフィカルユーザインターフェイスの図である。図示の例では、整形データ２０１を選択し、選択した整形データ２０１の推計モデルを構築し、構築した推計モデルのパラメータおよびアーキテクチャに関する情報を表示するための操作画面１２００の一例を示す。 FIG. 12 is a diagram of a graphical user interface showing an example of an operation screen for realizing the first embodiment. In the illustrated example, an example of an operation screen 1200 for selecting the shaping data 201, constructing an estimation model of the selected shaping data 201, and displaying information about parameters and architecture of the constructed estimation model 201 is shown.

図中１２０１は、整形データ２０１を選択するためのボタンである。１２０２は推計モデル構築処理を実行するためのボタンである。１２０３は、推計モデルのパラメータおよびアーキテクチャを画面上に表示するためのボタンである。 In the figure, reference numeral 1201 denotes a button for selecting the shaping data 201. Reference numeral 1202 denotes a button for executing the estimation model construction process. Reference numeral 1203 denotes a button for displaying the parameters and architecture of the estimation model on the screen.

図中１２０４は、推計モデルで用いられる目的変数と、説明変数と、カテゴリ変数の情報を表示する表である。 In the figure, reference numeral 1204 denotes a table that displays information on objective variables, explanatory variables, and categorical variables used in the estimation model.

図中１２０５は、推計モデルに含まれる目的変数推計モデルの中の目的変数推定用パラメータの情報をカテゴリ変数の値による分類ごとに表示する表である。１２０６は、推計モデルに含まれる未確定カテゴリ変数推計モデルの中の、未確定カテゴリ変数推定用パラメータの情報をカテゴリ変数の値による分類ごとに表示する表である。１２０７は、推計モデルのアーキテクチャとして、使用された説明変数と、目的変数と、カテゴリ変数との関係性をグラフにより可視化する表示領域である。 In the figure, reference numeral 1205 is a table that displays information on the target variable estimation parameter in the target variable estimation model included in the estimation model for each classification based on the value of the categorical variable. 1206 is a table that displays information on parameters for estimating the uncertain categorical variable in the uncertain categorical variable estimation model included in the estimation model for each classification based on the value of the categorical variable. Reference numeral 1207 denotes a display area for visualizing the relationship among used explanatory variables, objective variables, and categorical variables as a graph of the estimation model architecture.

表示領域１２０７は、カテゴリ変数を四角、説明変数を丸の図形で示し、説明変数と目的変数の関係性を矢印で表している例を示している。 A display area 1207 shows an example in which categorical variables are indicated by squares, explanatory variables are indicated by circles, and the relationship between the explanatory variables and the objective variables is indicated by arrows.

データ分析部１０１のユーザは本実施例１で示す操作画面１２００を用いることにより、推計モデルの構築に必要な整形データ２０１を選択できる。そして、データ分析部１０１は、選択した整形データに基づき、推計モデルの構築処理を実行できる。ユーザは、操作画面１２００を用いることで、構築した推計モデルの説明変数や、目的変数や、カテゴリ変数などの各種パラメータと、関係性であるモデルのアーキテクチャを把握できる。 The user of the data analysis unit 101 can select the shaping data 201 necessary for constructing the estimation model by using the operation screen 1200 shown in the first embodiment. Then, the data analysis unit 101 can execute an estimation model construction process based on the selected shaping data. By using the operation screen 1200, the user can grasp the model architecture that is related to various parameters such as explanatory variables of the constructed estimation model, objective variables, and categorical variables.

図１３は本実施例１を実現する推計結果表示画面１３００の一例を示すグラフィカルユーザインターフェイスの図である。推計結果表示画面１３００は、推計モデルを用いて推計結果を表示するための画面の例を示す。 FIG. 13 is a diagram of a graphical user interface showing an example of an estimation result display screen 1300 for realizing the first embodiment. The estimation result display screen 1300 shows an example of a screen for displaying an estimation result using an estimation model.

図中１３０１は、推計結果を選択するためのボタンである。１３０２は推定した結果を、年代（または年度）ごとに集約して表示するための表である。図示の例では、２０１５年度の情報に基づいて、２０１６年度と２０１７年度の２型糖尿病の医療費を推計した結果を表示する例を示している。 In the figure, reference numeral 1301 denotes a button for selecting an estimation result. Reference numeral 1302 denotes a table for collecting and displaying the estimated results for each age (or year). In the example of illustration, the example which displays the result of having estimated the medical expenses of type 2 diabetes of 2016 and 2017 based on the information of 2015 is shown.

図中１３０３は、集計対象を指定するためのプルダウンであり、プルダウン１３０３で指定された集計対象に応じて、推計情報集約処理が実行され、画面内の情報が更新される。１３０４は、読み出す推定結果の目的変数を指定するためのプルダウンである。プルダウン１３４指定された対象疾病の情報を画面内の情報として表示する。 In the figure, reference numeral 1303 denotes a pull-down menu for designating the aggregation target, and the estimation information aggregation processing is executed according to the aggregation target designated by the pull-down 1303, and the information in the screen is updated. Reference numeral 1304 denotes a pull-down menu for designating an objective variable of the estimation result to be read. Information on the target disease designated by the pull-down 134 is displayed as information on the screen.

図中１３０５は、プルダウン１３０３で指定された集計対象で、プルダウン１３０４で指定された疾病の推計結果を集計した結果を表示するためのグラフ表示領域である。１３０６は、１３０３で指定した集計対象で、１３０４で指定した疾病の集計結果を地図として表示するための地図表示領域である。 In the figure, reference numeral 1305 denotes a graph display area for displaying the result of totaling the estimation result of the disease specified by the pull-down 1304, which is the target of aggregation specified by the pull-down 1303. Reference numeral 1306 denotes a map display area for displaying a count result of diseases specified in 1304 as a map, which is specified in 1303.

以下、本実施例１に係る推計モデル構築システムの効果を説明する。 Hereinafter, effects of the estimation model construction system according to the first embodiment will be described.

本実施例１に示した推計モデル構築システムは、目的変数を推計するモデルを構築する際に、説明変数に含まれる変数の中で、任意のカテゴリ変数に関して、カテゴリ変数の値が同一のデータ毎にデータを分割し、それぞれのデータから推計モデルを構築することができる。これにより、属性や既往歴等の背景が異なる個体から構成された集団を推計する場合に、推計精度を向上することができる。 When the estimation model construction system shown in the first embodiment constructs a model for estimating an objective variable, among the variables included in the explanatory variables, for each categorical variable, the data for which the value of the categorical variable is the same. It is possible to divide the data into two and build an estimation model from each data. Thereby, when estimating the group comprised from the individual from which backgrounds, such as an attribute and past history, differ, estimation accuracy can be improved.

本実施例１に示した推計モデル構築システムは、カテゴリ変数の値に基づいてデータを分割してモデルを構築する際、推計時にこのカテゴリ変数の値が確定した値であるか、それとも未確定の値であるか否かを判定する。 In the estimation model construction system shown in the first embodiment, when a model is constructed by dividing data based on the value of a categorical variable, whether the value of the categorical variable is a fixed value at the time of estimation or is not yet determined. It is determined whether it is a value.

推計モデル構築システムは、推計時にカテゴリ変数の値が未確定の場合は、カテゴリ変数を推計するモデルと、目的変数を推計するモデルをふたつ構築する。これにより、推計時にカテゴリ変数の値が未確定な場合でも、高い推計精度で推計することができる。例えば、ある年度の、ある疾病に関する医療費を推計する場合に、ある年度の疾病の有無の情報をカテゴリ変数とし、この疾病の有無の情報を推計するモデルと、医療費を推計するモデルのふたつの推計モデルを構築し、それらを階層的に適用することで、医療費を高い精度で推計できる。また、医療費と疾病の有無の情報の２つを同時に推計できる。 The estimation model construction system constructs two models for estimating the categorical variable and two models for estimating the objective variable when the value of the categorical variable is uncertain at the time of estimation. Thereby, even when the value of the categorical variable is uncertain at the time of estimation, it is possible to estimate with high estimation accuracy. For example, when estimating medical costs related to a certain disease in a certain year, there are two models: a model that estimates information on the presence or absence of a disease using information on the presence or absence of a disease in a certain year, and a model that estimates medical costs. By constructing an estimation model and applying them hierarchically, medical costs can be estimated with high accuracy. Moreover, it is possible to estimate both medical expenses and information on the presence or absence of illness at the same time.

本実施例１に示した推計モデル構築システムは、カテゴリ変数の値に基づいて、それぞれモデルを構築するため、カテゴリ変数の値ごとに構築したモデルのパラメータを把握できる。これにより、ある特定の群に関し、興味のある目的変数に影響している変数をそれぞれのモデルから把握することができる。 Since the estimation model construction system shown in the first embodiment constructs a model based on the value of the categorical variable, it can grasp the parameters of the model constructed for each value of the categorical variable. Thereby, regarding a specific group, it is possible to grasp from the respective models variables affecting the objective variable of interest.

本実施例１に示した推計モデル構築システムは、個人毎に集計した情報を任意のカテゴリ変数で集約し、集団の推計を実施することができる。これにより、ユーザが所望する所定の粒度で、集団の将来の健康状態や医療状態を把握することが出来る。 The estimation model construction system shown in the first embodiment can aggregate the information aggregated for each individual with an arbitrary categorical variable, and can estimate a group. Thereby, the future health state and medical state of the group can be grasped with a predetermined granularity desired by the user.

以下、本実施例１に係る推計モデル構築システムの変形例を説明する。 Hereinafter, modifications of the estimation model construction system according to the first embodiment will be described.

本実施例１では、対象とするデータとして、個人毎のレセプト情報および特定健診情報が時系列で含まれたデータを用いた例を説明し、さらに、これらの情報を年度毎に集約し、年度単位に推計するモデルを構築する例を説明したが、本実施例１で扱うことの出来るデータはこれに限定されない。 In the first embodiment, as an example of the target data, an example using data including receipt information for each individual and specific medical checkup information in time series will be described. Although the example which builds the model estimated in a year unit was demonstrated, the data which can be handled by the present Example 1 are not limited to this.

例えば、健康情報に関するセンシング情報や、保険に関する情報など、広く健康や医療あるいは介護に関する情報を扱うことができる。また、情報を集約する時間間隔も、四半期ごと、月ごと、など、広く様々な時間間隔のモデルを構築することができる。また、本実施例１の推計モデル構築システムは、定期的な試験の結果などに基づいて教育分野に適用することができる。 For example, it is possible to widely handle information related to health, medical care or nursing care such as sensing information related to health information and information related to insurance. In addition, a wide variety of time interval models such as quarterly, monthly, etc. can be constructed as time intervals for collecting information. Further, the estimation model construction system according to the first embodiment can be applied to the education field based on the results of periodic tests.

本実施例１では、推計モデルとして、様々な推計モデルを扱うことができる。例えば、カテゴリ変数の情報を推計するモデルとしては、ロジスティック回帰やニューラルネットワークなどの、説明変数と目的変数の関係性を式で記述したモデルや、サポートベクトルマシンやＫＮＮ（k-nearest neighbor algorithm）のように、モデル構築データの部分集合をパラメータ化して保持するモデルや、ベイジアンネットワークのように、説明変数と目的変数の確率的な依存関係を保持するモデルや、ランダムフォレストのように、いくつかの分類ルールを集約したモデル、など、様々なモデルを推計モデルとして活用することができる。 In the first embodiment, various estimation models can be handled as the estimation model. For example, as models for estimating categorical variable information, models that describe the relationship between explanatory variables and objective variables such as logistic regression and neural networks, support vector machines, and KNN (k-nearest neighbor algorithm) A model that maintains a subset of model building data as a parameter, a model that maintains a stochastic dependency between explanatory variables and objective variables, such as a Bayesian network, and a random forest Various models such as a model in which classification rules are aggregated can be used as an estimation model.

本実施例１では、目的変数に関するカテゴリ変数を指定する方法として、関連カテゴリマトリクス５０１を用いる方法を説明したが、関連カテゴリマトリクス５０１以外の方法を用いて、目的変数に関連するカテゴリ変数を読み出しても良い。例えば、関連カテゴリ変数読出処理で構築する目的変数と関連カテゴリ変数群の関係性を示したツリー構造を、事前に構築し、記憶しておき、当該情報を直接用いて、関連カテゴリ変数を読み出しても良い。 In the first embodiment, the method of using the related category matrix 501 has been described as the method for specifying the categorical variable related to the objective variable. However, the categorical variable related to the objective variable is read using a method other than the related category matrix 501. Also good. For example, a tree structure showing the relationship between the target variable and related category variable group constructed in the related category variable reading process is built and stored in advance, and the relevant category variable is read out directly using the information. Also good.

本実施例１では、個人毎に推計する推計モデルを構築し、推計した結果を、所定のカテゴリに基づき集団として推計する例を示したが、推計は個人よりも大きな粒度で実施しても良い。例えば、市単位で集約した情報に関して推計する推計モデルを構築しても良い。 In the first embodiment, an estimation model for estimating for each individual is constructed, and the estimation result is estimated as a group based on a predetermined category. However, the estimation may be performed with a granularity larger than that of the individual. . For example, an estimation model for estimating information aggregated in units of cities may be constructed.

以上のように、本実施例１に係る推計モデル構築システムは、カテゴリ変数毎の推計モデル構築処理、および、推定時に未確定のカテゴリ変数を推計する推計モデル構築処理を構築し、背景の異なる人の混在した集団においても、精度の高い将来推計が可能な推計モデルを構築することができる。 As described above, the estimation model construction system according to the first embodiment constructs an estimation model construction process for each categorical variable and an estimation model construction process for estimating an undetermined categorical variable at the time of estimation. It is possible to construct an estimation model that enables future estimation with high accuracy even in a group with a mixture of.

第１の実施例では、推計対象の目的変数と関連するカテゴリ変数が、事前に関連カテゴリ記憶部１２４に記憶された推計モデル構築システムの例を説明した。本実施例２では、整形データ２０１に基づき、目的変数と関連するカテゴリ変数を決定するための処理を実行する推計モデル構築システムの例を説明する。構成や処理などは、関連カテゴリ関連付け部１１３を除き、前記実施例１と同様であるため、重複した説明を省略する。 In the first embodiment, the example of the estimation model construction system in which the categorical variable related to the target variable to be estimated is stored in the related category storage unit 124 in advance has been described. In the second embodiment, an example of an estimation model construction system that executes processing for determining a categorical variable associated with an objective variable based on the shaping data 201 will be described. Since the configuration, processing, and the like are the same as those in the first embodiment except for the related category associating unit 113, a duplicate description is omitted.

関連カテゴリ関連付け部１１３では、整形データ記憶部１２２に記憶された整形データ２０１と、目的変数の情報を用いて、目的変数に関するカテゴリ変数の項目を決定する。 The related category associating unit 113 uses the shaping data 201 stored in the shaping data storage unit 122 and the information on the objective variable to determine the category variable item related to the objective variable.

以下、カテゴリ変数の項目の決定方法を説明する。 Hereinafter, a method for determining the item of the categorical variable will be described.

一つ目の例を説明する。目的変数と同一の項目であって、過去の時系列で取得された情報を、目的変数と関連するカテゴリ変数とする。例えば、目的変数がある年度の疾病の受診の有無である場合に、過去の年度の同じ疾病の受診の有無を、目的変数と関連するカテゴリ変数として指定する。 The first example will be described. Information that is the same item as the objective variable and acquired in the past time series is set as a categorical variable related to the objective variable. For example, in the case where the objective variable is whether or not there is a illness visit in a certain year, the presence or absence of the same illness in the past year is designated as a categorical variable related to the objective variable.

例えば、将来のある疾病の受診の有無は、現在の疾病の受診状況に大きく依存するという関係性を有する。そこで、現在の疾病の受診状況に応じて、推計モデルを分離することで、推計精度を向上する。 For example, there is a relationship that whether or not there is a future medical checkup depends greatly on the current medical checkup status. Therefore, the estimation accuracy is improved by separating the estimation model in accordance with the current medical condition of the disease.

二つ目の例を説明する。目的変数と同じ時期の情報であって、目的変数との類似度の高い変数をカテゴリ変数として選択する。例えば、相関や相互情報量が一定（閾値）以上の項目を、カテゴリ変数として選択する。例えば、同じ年度のある疾病の医療費に関連するカテゴリ変数として、同じ疾病の受診の有無の情報をカテゴリ変数として指定する。これにより、他の変数に依存した変数の推計精度を向上することができる。 A second example will be described. A variable having the same timing as the objective variable and having a high similarity to the objective variable is selected as a categorical variable. For example, an item whose correlation or mutual information amount is constant (threshold) or more is selected as a categorical variable. For example, as a categorical variable related to medical expenses for a certain disease in the same year, information on the presence or absence of the same disease is designated as a categorical variable. Thereby, the estimation accuracy of the variable depending on other variables can be improved.

三つ目の例を説明する。ある変数を仮の関連カテゴリ変数として設定し、当該変数に関して、関連カテゴリ変数の値が同一のデータ毎に分割した整形データ２０１からそれぞれ目的変数を推計する推計モデルを構築し、それらの推計モデルを用いて目的変数を推計した結果と、整形データ２０１を分割せずに単一の推計モデルで推計した結果、または異なるカテゴリ変数で整形データ２０１を分割し、構築した推計モデルで推計した結果、のいずれかを比較し、推計の比較結果（推計精度）に基づいて、関連カテゴリ変数を決定する。推計精度の比較は、例えば、過去の整形データ２０１を用いた交差検証により実施することができる。 A third example will be described. A certain variable is set as a temporary related categorical variable, and with respect to the variable, an estimation model for estimating an objective variable is constructed from the shaped data 201 divided for each piece of data having the same value of the related categorical variable. A result of estimating the target variable using the result, a result of estimating with the single estimation model without dividing the shaping data 201, or a result of estimating with the estimation model constructed by dividing the shaping data 201 with different categorical variables, One of them is compared, and the related categorical variable is determined based on the comparison result (estimation accuracy) of the estimation. The comparison of the estimation accuracy can be performed by, for example, cross validation using the past shaping data 201.

以下、本本実施例２に係る推計モデル構築システムの効果を説明する。 Hereinafter, effects of the estimation model construction system according to the second embodiment will be described.

本本実施例２に示した推計モデル構築システムは、整形データ２０１に基づき、推計モデル構築の対象とする整形データ２０１の分割方法を決定することができる。これにより、例えば整形データ２０１に含まれる情報の質が変化した場合であっても、当該整形データ２０１に適切な推計モデルを自動的に構築することが可能となる。 The estimation model construction system shown in the second embodiment can determine a division method of the shaping data 201 that is a target of estimation model construction based on the shaping data 201. Thereby, for example, even when the quality of information included in the shaped data 201 is changed, it is possible to automatically construct an appropriate estimation model for the shaped data 201.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に記載したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加、削除、又は置換のいずれもが、単独で、又は組み合わせても適用可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments are described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. In addition, any of the additions, deletions, or substitutions of other configurations can be applied to a part of the configuration of each embodiment, either alone or in combination.

また、上記の各構成、機能、処理部、及び処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、及び機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. In addition, each of the above-described configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files for realizing each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

１０１データ分析部
１０２入力部
１０３出力部
１０４演算装置
１０５メモリ
１０６記憶媒体
１０７データ整形部
１０８推計モデル構築部
１０９目的変数推計モデル構築部
１１０未確定カテゴリ推計モデル構築部
１１１将来推計部
１１２推計情報集約部
１１３関連カテゴリ関連付け部
１２０データベース
１２１データ記憶部
１２２整形データ記憶部
１２４関連カテゴリ記憶部
１２５データ分割情報記憶部
１２６推計モデル記憶部
１２７推計モデル構築用変数群記憶部
１２８推計情報記憶部 DESCRIPTION OF SYMBOLS 101 Data analysis part 102 Input part 103 Output part 104 Arithmetic apparatus 105 Memory 106 Storage medium 107 Data shaping part 108 Estimate model construction part 109 Objective variable estimation model construction part 110 Uncertain category estimation model construction part 111 Future estimation part 112 Estimation information aggregation Unit 113 related category association unit 120 database 121 data storage unit 122 shaped data storage unit 124 related category storage unit 125 data division information storage unit 126 estimation model storage unit 127 estimation model construction variable group storage unit 128 estimation information storage unit

Claims

An estimation model construction system comprising a processor for executing a program and a memory for storing the program, and constructing an estimation model by executing the program,
Stored information on medical and health for each individual, related category information for storing correspondence between data items included in the stored information, and other data items related to the data item and data items including category information And a storage unit for storing
Based on the received information or the accumulated information, the first objective variable that is the estimation target of the estimation model and the explanatory variable that is the input information of the estimation model are determined, and based on the correspondence relationship of the related category information, A categorical variable corresponding to the first objective variable is determined, the accumulated information whose value satisfies the predetermined condition is divided, and a first estimation model corresponding to each divided accumulated information is constructed. An estimation model construction unit to
The future in which the first objective variable, the explanatory variable, and the categorical variable are information to be estimated, and the first estimation model to be applied is selected according to the value of the categorical variable included in the information to be estimated An estimation section;
The estimation model construction system characterized by having.

An estimation model construction system according to claim 1,
The estimation model construction unit further includes:
It is determined whether or not the information on the categorical variable is included in the information to be estimated. If the information is not included in the information on the estimation target, the categorical variable is set as a second objective variable, and the explanatory variable A second estimation model for estimating the probability distribution of the value of the categorical variable based on
The future estimation part is
Information on the probability distribution of the value of the categorical variable, which is the second objective variable, is estimated based on the explanatory variable using the second estimation model, and further, the probability distribution of the value of the categorical variable thus estimated is estimated. An estimation model construction system for estimating the first objective variable by applying the first estimation model constructed for each stored information in which the value of the categorical variable satisfies a predetermined condition according to information .

An estimation model construction system according to claim 1,
The estimation model construction unit
By repeating the process of extracting a categorical variable related to the first objective variable that is the estimation target of the estimation model and the process of using the extracted categorical variable as a new first objective variable, the first Extract a tree structure having one objective variable as a root node and an internal node of a categorical variable related to the first objective variable;
Based on the tree structure, build an estimation model with the root node or internal node as a first objective variable,
Further, when building an estimation model with each node as a first objective variable, the accumulated information is divided for each piece of information in which the value of a categorical variable that is a child node of the internal node satisfies a predetermined condition, Build an estimation model with each node as the first objective variable for each divided stored information,
The future estimation part is
Estimating an estimation model of a categorical variable corresponding to the internal node using an estimation model having the internal node as a first objective variable, and further, if the internal node has child nodes, By applying the categorical variable estimation model corresponding to the internal node according to the information on the probability distribution of the value of the categorical variable corresponding to the child node estimated from the categorical variable estimation model as the first objective variable, the internal variable An estimation model construction system characterized by estimating information on probability distribution of nodes.

An estimation model construction system according to claim 1,
An estimation model construction system, further comprising a related category associating unit that determines an item of a categorical variable related to the first objective variable using the accumulated information and information of the first objective variable.

An estimation model construction system according to claim 2,
The estimation model has a parameter for estimating the first objective variable,
The estimation model construction system further includes at least one of the parameter of the estimation model, the relationship between the first objective variable, the explanatory variable, and the categorical variable related to the first objective variable included in the estimation model. An estimation model construction system comprising an output unit for presentation.

An estimation model construction method for constructing an estimation model with a computer having a processor and a memory,
The computer stores correspondence information between stored information on medical care and health for each individual, data items included in the stored information, and other data items related to the data item and data items including category information A first step of storing the related category information in the storage unit of the computer,
A second step in which the computer determines a first objective variable that is an estimation target of the estimation model and an explanatory variable that is input information of the estimation model based on the received information or the accumulated information;
A third step in which the computer determines a categorical variable corresponding to the first objective variable based on a correspondence relationship of the related category information;
A fourth step in which the computer divides the stored information in which the value of the categorical variable satisfies a predetermined condition, and constructs a first estimation model corresponding to each of the divided stored information;
The computer uses the first objective variable, the explanatory variable, and the categorical variable as information to be estimated, and applies the first estimation model according to the value of the categorical variable included in the information to be estimated. A fifth step of selecting
The estimation model construction method characterized by including.

An estimation model construction method according to claim 6,
The fourth step further includes:
It is determined whether or not the information on the categorical variable is included in the information to be estimated. If the information is not included in the information on the estimation target, the categorical variable is set as a second objective variable, and the explanatory variable A second estimation model for estimating the probability distribution of the value of the categorical variable based on
The fifth step includes
Information on the probability distribution of the value of the categorical variable, which is the second objective variable, is estimated based on the explanatory variable using the second estimation model, and further, the probability distribution of the value of the categorical variable thus estimated is estimated. An estimation model construction method for estimating the first objective variable by applying the first estimation model constructed for each accumulated information in which the value of the categorical variable satisfies a predetermined condition according to information .

An estimation model construction method according to claim 6,
The fourth step includes
By repeating the process of extracting a categorical variable related to the first objective variable that is the estimation target of the estimation model and the process of using the extracted categorical variable as a new first objective variable, the first A tree structure having one objective variable as a root node and an internal node of a categorical variable related to the first objective variable is extracted, and based on the tree structure, the root node or the internal node is defined as a first objective variable. When each estimation model is constructed, and further, when the estimation model having each node as the first objective variable is constructed, the value of the categorical variable that is a child node of the internal node satisfies each predetermined condition. Dividing the accumulated information, constructing an estimation model with each node as a first objective variable for each of the divided accumulated information,
The fifth step includes
Estimating an estimation model of a categorical variable corresponding to the internal node using an estimation model having the internal node as a first objective variable, and further, if the internal node has child nodes, By applying the categorical variable estimation model corresponding to the internal node according to the information on the probability distribution of the value of the categorical variable corresponding to the child node estimated from the categorical variable estimation model as the first objective variable, the internal variable An estimation model construction method characterized by estimating node probability distribution information.

An estimation model construction method according to claim 6,
A method for constructing an estimation model, further comprising a sixth step of determining an item of a categorical variable related to the first objective variable using the accumulated information and information of the first objective variable.

An estimation model construction method according to claim 7,
The estimation model has a parameter for estimating the first objective variable,
A seventh step of presenting at least one of the parameters of the estimation model, the relationship between the first objective variable, the explanatory variable, and the categorical variable related to the first objective variable included in the estimation model; The estimation model construction method characterized by including further.

A program for controlling a computer having a processor and a memory,
Stored information on medical and health for each individual, related category information for storing correspondence between data items included in the stored information, and other data items related to the data item and data items including category information And a first step of storing in the storage unit of the computer,
A second step of determining a first objective variable that is an estimation target of the estimation model based on the received information or the accumulated information, and an explanatory variable that is input information of the estimation model;
A third step of determining a categorical variable corresponding to the first objective variable based on a correspondence relationship of the related category information;
A fourth step of dividing the accumulated information in which the value of the categorical variable satisfies a predetermined condition, and constructing a first estimation model corresponding to each of the divided accumulated information;
The first objective variable, the explanatory variable, and the categorical variable are used as estimation target information, and the first estimation model to be applied is selected in accordance with the value of the categorical variable included in the estimation target information. 5 steps,
Is executed by the computer.

The program according to claim 11,
The fourth step further includes:
It is determined whether or not the information on the categorical variable is included in the information to be estimated. If the information is not included in the information on the estimation target, the categorical variable is set as a second objective variable, and the explanatory variable A second estimation model for estimating the probability distribution of the value of the categorical variable based on
The fifth step includes
Information on the probability distribution of the value of the categorical variable, which is the second objective variable, is estimated based on the explanatory variable using the second estimation model, and further, the probability distribution of the value of the categorical variable thus estimated is estimated. A program for estimating the first objective variable by applying the first estimation model constructed for each stored information in which the value of the categorical variable satisfies a predetermined condition according to information.

The program according to claim 11,
The fourth step includes
By repeating the process of extracting a categorical variable related to the first objective variable that is the estimation target of the estimation model and the process of using the extracted categorical variable as a new first objective variable, the first A tree structure having one objective variable as a root node and an internal node of a categorical variable related to the first objective variable is extracted, and based on the tree structure, the root node or the internal node is defined as a first objective variable. When each estimation model is constructed, and further, when the estimation model having each node as the first objective variable is constructed, the value of the categorical variable that is a child node of the internal node satisfies each predetermined condition. Dividing the accumulated information, constructing an estimation model with each node as a first objective variable for each of the divided accumulated information,
The fifth step includes
Estimating an estimation model of a categorical variable corresponding to the internal node using an estimation model having the internal node as a first objective variable, and further, if the internal node has child nodes, By applying the categorical variable estimation model corresponding to the internal node according to the information on the probability distribution of the value of the categorical variable corresponding to the child node estimated from the categorical variable estimation model as the first objective variable, the internal variable A program characterized by estimating the probability distribution information of a node.

The program according to claim 11,
A program, further comprising a sixth step of determining an item of a categorical variable related to the first objective variable using the accumulated information and information of the first objective variable.

A program according to claim 12,
The estimation model has a parameter for estimating the first objective variable,
A seventh step of presenting at least one of the parameters of the estimation model, the relationship between the first objective variable, the explanatory variable, and the categorical variable related to the first objective variable included in the estimation model; Furthermore, the program characterized by including.