JP2021005191A

JP2021005191A - Generation device, generation method, and generation program

Info

Publication number: JP2021005191A
Application number: JP2019118136A
Authority: JP
Inventors: 薫樹小林; Nobuki Kobayashi; 洋史近藤; Yoji Kondo; 泰隆長谷川; Yasutaka Hasegawa; 裕司鎌田; Yuji Kamata; 俊太郎由井; Shuntaro Yui; 伴　秀行; Hideyuki Ban; 伴　　秀行; 隆秀新家; Takahide Araya
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-06-26
Filing date: 2019-06-26
Publication date: 2021-01-14
Anticipated expiration: 2039-06-26
Also published as: JP7245125B2; WO2020261869A1

Abstract

To efficiently achieve highly accurate data analysis.SOLUTION: A generation device including a processor for executing a program and a storage device for storing the program can access time-lapse feature information showing a time-lapse feature acquired from time-lapse data, and grouping information in which a plurality of groups with the time-lapse data having to be belonged thereto is regulated. The processor executes: generation processing for generating a plurality of pieces of time-lapse feature data showing the time-lapse features from time-lapse data of an analysis object on the basis of the time-lapse feature information; division processing for dividing the plurality of pieces of time-lapse feature data generated by the generation processing into the plurality of groups based on the grouping information; and dimensional compression processing for performing dimensional compression of each of the plurality of groups divided by the division processing.SELECTED DRAWING: Figure 2

Description

本発明は、データを生成する生成装置、生成方法、および生成プログラムに関する。 The present invention relates to a generator, a generation method, and a generation program for generating data.

生命保険の加入審査では、加入希望者の健康状態に基づいて将来の発症や入院リスクが予測される。健康状態は健康診断結果や告知情報など多変量データで表現される。さらに、健康状態の変化を考慮する場合、複数年分の健康状態を考慮してリスク予測を行うため、多変量データの次元数はさらに膨大になる。 In the life insurance enrollment examination, future onset and hospitalization risk are predicted based on the health condition of the applicant. The health condition is expressed by multivariate data such as health diagnosis results and notification information. Furthermore, when considering changes in health status, the number of dimensions of multivariate data becomes even larger because risk prediction is performed in consideration of health status for multiple years.

時系列データ分析のための手法として、ベクトル自己回帰モデルやＬＳＴＭ（Ｌｏｎｇｓｈｏｒｔ‐ｔｅｒｍＭｅｍｏｒｙ）などがあり、また、各時点で独立した変数として回帰モデルやニューラルネットワークを用いて分析する方法がある。 As a method for time-series data analysis, there are a vector self-regression model and an LSTM (Long short-term memory), and there is a method of analyzing using a regression model or a neural network as an independent variable at each time point.

また、特許文献１は、複数年の健康状態を分析するデータ分析装置を開示する。このデータ分析装置は、ＩＤ及び時間情報をそれぞれ有する定量データ及び定性データを記憶し、前記定量データから時系列定量イベントデータを生成し、前記定性データから時系列定性イベントデータを生成し、前記時系列定量及び定性イベントデータの一方から変化がある特徴部分を抽出し、前記特徴部分に対応するイベントデータの集合から時系列イベントパターンを生成し、前記時系列イベントパターンに含まれるＩＤと、前記時系列定量及び定性イベントデータの他方に含まれるＩＤとを対応付け、前記対応づけられた時系列イベントパターンと、前記対応付けられた時系列定量及び定性イベントデータの他方と、を表示する。 Further, Patent Document 1 discloses a data analyzer that analyzes a health condition for a plurality of years. This data analyzer stores quantitative data and qualitative data having ID and time information, respectively, generates time-series quantitative event data from the quantitative data, and generates time-series qualitative event data from the qualitative data. A characteristic part with a change is extracted from one of the series quantitative and qualitative event data, a time series event pattern is generated from the set of event data corresponding to the feature part, and the ID included in the time series event pattern and the time mentioned above. The ID included in the other of the series quantitative and qualitative event data is associated with the ID, and the associated time-series event pattern and the other of the associated time-series quantitative and qualitative event data are displayed.

特開２００６‐２８５６７２号公報Japanese Unexamined Patent Publication No. 2006-285672

しかしながら、分析に必要な特徴量を全探索的に抽出する場合、分析に不要な特徴量も生成され、計算コストが増加する。また、分析に不要な特徴量が、目的とする分析に悪影響を及ぼす可能性も生じる。 However, when the features required for the analysis are extracted in a full search, the features unnecessary for the analysis are also generated, and the calculation cost increases. In addition, features that are not necessary for analysis may adversely affect the target analysis.

本発明は、効率的かつ高精度なデータ分析を実現することを目的とする。 An object of the present invention is to realize efficient and highly accurate data analysis.

本願において開示される発明の一側面となる生成装置は、プログラムを実行するプロセッサと、前記プログラムを記憶する記憶デバイスと、を有する生成装置であって、経時データから得られる経時的な特徴を示す経時特徴情報と、前記経時データが所属すべき複数のグループが規定されたグループ分け情報と、にアクセス可能であり、前記プロセッサは、前記経時特徴情報に基づいて、分析対象の経時データから、前記経時的な特徴を示す複数の経時特徴データを生成する生成処理と、前記グループ分け情報に基づいて、前記生成処理によって生成された複数の経時特徴データを前記複数のグループに分割する分割処理と、前記分割処理によって分割された複数のグループの各々を次元圧縮する次元圧縮処理と、を実行することを特徴とする。 The generator, which is one aspect of the invention disclosed in the present application, is a generator having a processor for executing a program and a storage device for storing the program, and exhibits temporal characteristics obtained from temporal data. It is possible to access the time-lapse feature information and the grouping information in which a plurality of groups to which the time-lapse data belongs are defined, and the processor uses the time-time data to be analyzed based on the time-time feature information. A generation process for generating a plurality of time-dependent feature data showing characteristics over time, and a division process for dividing the plurality of time-related feature data generated by the generation process into the plurality of groups based on the grouping information. It is characterized by executing a dimensional compression process of dimensionally compressing each of a plurality of groups divided by the division process.

本発明の代表的な実施の形態によれば、効率的かつ高精度なデータ分析を実現することができる。前述した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to a typical embodiment of the present invention, efficient and highly accurate data analysis can be realized. Issues, configurations and effects other than those described above will be clarified by the description of the following examples.

図１は、生成装置のハードウェア構成例を示すブロック図である。FIG. 1 is a block diagram showing a hardware configuration example of the generator. 図２は、実施例１にかかる生成装置の機能的構成例を示すブロック図である。FIG. 2 is a block diagram showing a functional configuration example of the generator according to the first embodiment. 図３は、告知情報の記憶内容例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of stored contents of notification information. 図４は、ドメイン知識の記憶内容例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of stored contents of domain knowledge. 図５は、経時特徴の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the characteristics over time. 図６は、分割経時特徴の一例を示す説明図である。FIG. 6 is an explanatory diagram showing an example of the characteristics over time of division. 図７は、高次特徴データの一例を示す説明図である。FIG. 7 is an explanatory diagram showing an example of higher-order feature data. 図８は、入出力画面例１を示す説明図である。FIG. 8 is an explanatory diagram showing an input / output screen example 1. 図９は、告知情報入力画面例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of a notification information input screen. 図１０は、入出力画面例２を示す説明図である。FIG. 10 is an explanatory diagram showing an input / output screen example 2. 図１１は、生成装置による生成処理手順例を示すフローチャートである。FIG. 11 is a flowchart showing an example of a generation processing procedure by the generation device. 図１２は、実施例２にかかる生成装置の機能的構成例を示すブロック図である。FIG. 12 is a block diagram showing a functional configuration example of the generator according to the second embodiment. 図１３は、マルチモーダルニューラルネットワークの一例を示す説明図である。FIG. 13 is an explanatory diagram showing an example of a multimodal neural network. 図１４は、マルチモーダルニューラルネットワークによる分析結果を示す入出力画面例を示す説明図である。FIG. 14 is an explanatory diagram showing an example of an input / output screen showing an analysis result by a multimodal neural network.

以下、添付図面を用いて本発明にかかる生成装置について説明する。本明細書では、生命保険の引受査定における保険金支払リスク予測の例を示す。引受査定では、契約希望者が告知した情報（以下、告知情報）に基づき、将来の保険金支払リスクが査定され、保険加入の承認または謝絶が決定される。告知情報は、健康診断の検査結果、問診、既往歴等を含む。 Hereinafter, the generator according to the present invention will be described with reference to the accompanying drawings. This specification shows an example of insurance payment risk prediction in the underwriting assessment of life insurance. In the underwriting assessment, future insurance payment risk is assessed based on the information notified by the contract applicant (hereinafter referred to as notification information), and approval or abandonment of insurance coverage is decided. The notification information includes the test result of the medical examination, the interview, the medical history, and the like.

＜生成装置のハードウェア構成例＞
図１は、生成装置のハードウェア構成例を示すブロック図である。生成装置１００は、プロセッサ１０１と、記憶デバイス１０２と、入力デバイス１０３と、出力デバイス１０４と、通信インターフェース（通信ＩＦ）１０５と、を有する。プロセッサ１０１、記憶デバイス１０２、入力デバイス１０３、出力デバイス１０４、および通信ＩＦ１０５は、バス１０６により接続される。プロセッサ１０１は、生成装置１００を制御する。記憶デバイス１０２は、プロセッサ１０１の作業エリアとなる。また、記憶デバイス１０２は、各種プログラムやデータを記憶する非一時的なまたは一時的な記録媒体である。記憶デバイス１０２としては、たとえば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリがある。入力デバイス１０３は、データを入力する。入力デバイス１０３としては、たとえば、キーボード、マウス、タッチパネル、テンキー、スキャナがある。出力デバイス１０４は、データを出力する。出力デバイス１０４としては、たとえば、ディスプレイ、プリンタがある。通信ＩＦ１０５は、ネットワークと接続し、データを送受信する。 <Hardware configuration example of generator>
FIG. 1 is a block diagram showing a hardware configuration example of the generator. The generation device 100 includes a processor 101, a storage device 102, an input device 103, an output device 104, and a communication interface (communication IF) 105. The processor 101, the storage device 102, the input device 103, the output device 104, and the communication IF 105 are connected by the bus 106. The processor 101 controls the generator 100. The storage device 102 serves as a work area for the processor 101. Further, the storage device 102 is a non-temporary or temporary recording medium for storing various programs and data. Examples of the storage device 102 include a ROM (Read Only Memory), a RAM (Random Access Memory), an HDD (Hard Disk Drive), and a flash memory. The input device 103 inputs data. The input device 103 includes, for example, a keyboard, a mouse, a touch panel, a numeric keypad, and a scanner. The output device 104 outputs data. The output device 104 includes, for example, a display and a printer. The communication IF 105 connects to the network and transmits / receives data.

＜生成装置１００の機能的構成例＞
図２は、実施例１にかかる生成装置１００の機能的構成例を示すブロック図である。生成装置１００は、判定部２０１と、生成部２０２と、分割部２０３と、次元圧縮部２０４と、結合部２０５と、分析部２０６と、を有する。判定部２０１、生成部２０２、分割部２０３、次元圧縮部２０４、結合部２０５、および分析部２０６は、具体的には、たとえば、図１に示した記憶デバイス１０２に記憶されたプログラムをプロセッサ１０１に実行させることにより実現される。 <Example of functional configuration of the generator 100>
FIG. 2 is a block diagram showing a functional configuration example of the generator 100 according to the first embodiment. The generation device 100 includes a determination unit 201, a generation unit 202, a division unit 203, a dimension compression unit 204, a coupling unit 205, and an analysis unit 206. Specifically, the determination unit 201, the generation unit 202, the division unit 203, the dimension compression unit 204, the coupling unit 205, and the analysis unit 206 specifically, for example, process the program stored in the storage device 102 shown in FIG. 1 in the processor 101. It is realized by letting it execute.

また、生成装置１００は、少なくとも生成部２０２と分割部２０３と次元圧縮部２０４とを有していればよく、判定部２０１、結合部２０５、および分析部２０６は、生成装置１００と通信可能な他のコンピュータで実現されてもよい。 Further, the generation device 100 only needs to have at least a generation unit 202, a division unit 203, and a dimensional compression unit 204, and the determination unit 201, the coupling unit 205, and the analysis unit 206 can communicate with the generation device 100. It may be realized by another computer.

また、生成装置１００は、告知情報３００と、ドメイン知識４００と、を記憶デバイス１０２に記憶する。告知情報３００およびドメイン知識４００は、あらかじめ生成装置１００に記憶されていてもよく、生成装置１００と通信可能な他のコンピュータから取得してもよい。まず、告知情報３００について詳細に説明する。 Further, the generation device 100 stores the notification information 300 and the domain knowledge 400 in the storage device 102. The notification information 300 and the domain knowledge 400 may be stored in the generator 100 in advance, or may be acquired from another computer capable of communicating with the generator 100. First, the notification information 300 will be described in detail.

図３は、告知情報３００の記憶内容例を示す説明図である。告知情報３００は、契約希望者が告知した保険契約に必要な情報であり、分析対象データとなる。告知情報３００は、告知基本情報３１０と、健診結果３２０と、問診結果３３０と、既往歴３４０と、を有する。告知基本情報３１０は、契約希望者の告知に関する基本情報である。告知基本情報３１０は、氏名ＩＤ３１１と、生年月日３１２と、年齢３１３と、を含む。 FIG. 3 is an explanatory diagram showing an example of stored contents of the notification information 300. The notification information 300 is information necessary for the insurance contract notified by the contract applicant, and is the data to be analyzed. The notification information 300 includes basic notification information 310, a medical examination result 320, an interview result 330, and a medical history 340. The notification basic information 310 is basic information regarding notification of a contract applicant. The notification basic information 310 includes a name ID 311 and a date of birth 312 and an age 313.

氏名ＩＤ３１１は、契約希望者を一意に特定する識別情報である。生年月日３１２は、契約希望者が生まれた年月日である。図３の氏名ＩＤ３１１が「０００１」の契約希望者の３つのエントリは、当該契約希望者の過去３年分の分析対象データを示す。年齢３１３は、契約希望者の生年月日３１２から起算した年単位の経過年数である。後述する例において、氏名ＩＤ３１１が「０００１」の契約希望者の３つのエントリについて、年齢３１３が「４７」を時系列の１年目、「４８」を時系列の２年目、「４９」を時系列の３年目とする。 The name ID 311 is identification information that uniquely identifies the contract applicant. Date of birth 312 is the date of birth of the contract applicant. The three entries of the contract applicant whose name ID 311 is "0001" in FIG. 3 indicate the data to be analyzed for the past three years of the contract applicant. Age 313 is the number of years elapsed from the date of birth 312 of the contract applicant. In the example described later, for the three entries of the contract applicant whose name ID 311 is "0001", the age 313 is "47" in the first year of the time series, "48" in the second year of the time series, and "49". This is the third year of the time series.

健診結果３２０は、契約希望者が受けた健康診断の結果である。健診結果３２０は、体重３２１と、ＢＭＩ（ＢｏｄｙＭａｓｓＩｎｄｅｘ）３２２と、収縮期血圧３２３と、拡張期血圧３２４と、空腹時血糖３２５と、を含む。体重３２１は、契約希望者の体の重さである。ＢＭＩ３２２は、人間の肥満度を表す体格指数であり、体重／（身長^２）で算出される。ＢＭＩ３２２は、その値が小さくなるほど痩せており、大きくなるほど太っていることを示す。 The medical examination result 320 is the result of the medical examination received by the contract applicant. The medical examination result 320 includes a body weight 321, a BMI (Body Mass Index) 322, a systolic blood pressure 323, a diastolic blood pressure 324, and a fasting blood glucose 325. The weight 321 is the body weight of the contract applicant. BMI322 is a physique index representing the degree of obesity in humans, and is calculated by body weight / (height ² ). BMI322 indicates that the smaller the value, the thinner the body, and the larger the value, the thicker the body.

収縮期血圧３２３は、心臓から大動脈へ血液を送り出す状態において、心臓の収縮で押し出された血液によって大動脈の血管壁にかかる圧力である。拡張期血圧３２４は、心臓へ血液が戻る状態において、心臓の拡張で大動脈から心臓に血液が流入し大動脈の血液量が減少することで低下した大動脈の血管壁にかかる圧力である。空腹時血糖３２５は、空腹の状態で測定された血糖値である。 The systolic blood pressure 323 is the pressure exerted on the blood vessel wall of the aorta by the blood pushed out by the contraction of the heart in the state of pumping blood from the heart to the aorta. Diastolic blood pressure 324 is the pressure exerted on the aortic vessel wall, which is reduced by the influx of blood from the aorta into the heart and the decrease in blood volume in the aorta when the blood returns to the heart. Fasting blood glucose 325 is a blood glucose level measured in a fasting state.

問診結果３３０は、契約希望者が受けた問診の結果である。問診結果３３０は、喫煙習慣３３１と、飲酒習慣３３２と、運動習慣３３３と、を含む。喫煙習慣３３１は、契約希望者の喫煙の有無や頻度、喫煙量である。飲酒習慣３３２は、契約希望者の飲酒の有無や頻度、飲酒量である。運動習慣３３３は、契約希望者の運動の有無や頻度、運動量である。 The interview result 330 is the result of the interview received by the contract applicant. The interview result 330 includes smoking habit 331, drinking habit 332, and exercise habit 333. The smoking habit 331 is the presence / absence, frequency, and amount of smoking of the contract applicant. The drunken habit 332 is the presence / absence, frequency, and amount of drunkenness of the contract applicant. The exercise habit 333 is the presence / absence, frequency, and amount of exercise of the contract applicant.

既往歴３４０は、契約希望者が既に受信または入院した履歴である。既往歴３４０は、高血圧症受診歴３４１と、高血圧症入院歴３４２と、糖尿病受診歴３４３と、を含む。高血圧症受診歴３４１は、契約希望者が高血圧症に関して受診した履歴である。高血圧症入院歴３４２は、契約希望者が高血圧症に関して入院した履歴である。糖尿病受診歴３４３は、契約希望者が糖尿病に関して受診した履歴である。 The medical history 340 is a history that the contract applicant has already received or been hospitalized. The medical history 340 includes a history of hypertension consultation 341, a history of hypertension hospitalization 342, and a history of diabetes consultation 343. The hypertension consultation history 341 is a history of a contract applicant having a medical examination regarding hypertension. The history of hospitalization for hypertension 342 is a history of hospitalization for hypertension by a contract applicant. Diabetes consultation history 343 is a history of a contract applicant having a consultation regarding diabetes.

図４は、ドメイン知識４００の記憶内容例を示す説明図である。ドメイン知識４００は、告知情報３００に含まれる健診結果３２０、問診結果３３０、既往歴３４０などの各種情報に対する定性的な情報であり、医学的知見に相当する。具体的には、たとえば、ドメイン知識４００は、経時データ判定知識４１０と、経時特徴知識４２０と、経時特徴分割知識４３０と、を含む。 FIG. 4 is an explanatory diagram showing an example of stored contents of the domain knowledge 400. The domain knowledge 400 is qualitative information for various information such as a medical examination result 320, an interview result 330, and a medical history 340 included in the notification information 300, and corresponds to medical knowledge. Specifically, for example, the domain knowledge 400 includes time-time data determination knowledge 410, time-time feature knowledge 420, and time-time feature division knowledge 430.

経時データ判定知識４１０は、分析対象データである契約希望者の告知情報３００が非経時データ２３１および経時データ２３２のいずれに該当するかを判定するための判定情報である。具体的には、たとえば、経時データ判定知識４１０は、非経時データ２３１および経時データ２３２に該当する告知情報３００の項目を規定する。 The time-lapse data determination knowledge 410 is determination information for determining whether the notification information 300 of the contract applicant, which is the analysis target data, corresponds to the non-time-lapse data 231 or the time-lapse data 232. Specifically, for example, the time-dependent data determination knowledge 410 defines the items of the notification information 300 corresponding to the non-time-lapse data 231 and the time-lapse data 232.

非経時データ項目４１１は、非経時データ２３１に該当する項目を含む。非経時データ２３１とは、データの時系列な変化がない、または当該変化はあっても意味のないデータである。非経時データ２３１には、たとえば、喫煙習慣３３１、飲酒習慣３３２、各種受診歴、各種入院歴が該当する。経時データ項目４１２は、経時データ２３２に該当する項目を含む。経時データ２３２とは、データの時系列な変化に意味がある時系列データである。経時データ２３２には、たとえば、体重３２１、ＢＭＩ３２２、収縮期血圧３２３、拡張期血圧３２４、空腹時血糖３２５などが該当する。 The non-time data item 411 includes an item corresponding to the non-time data 231. The non-periodic data 231 is data in which there is no time-series change in the data, or even if there is such a change, it is meaningless. The non-time-lapse data 231 corresponds to, for example, smoking habits 331, drinking habits 332, various consultation histories, and various hospitalization histories. The time-lapse data item 412 includes an item corresponding to the time-lapse data 232. The time-series data 232 is time-series data in which the time-series changes in the data are meaningful. The time-dependent data 232 includes, for example, body weight 321 and BMI 322, systolic blood pressure 323, diastolic blood pressure 324, and fasting blood glucose 325.

経時特徴知識４２０は、経時データ２３２から得られる経時的な特徴（経時特徴）を示す経時特徴情報である。たとえば、経時特徴知識４２０は、基本統計量項目４２１と、変化量項目４２２と、変化割合項目４２３と、…を規定する。基本統計量項目４２１は、経時データ２３２から得られる基本統計量に該当する項目を含む。基本統計量は、たとえば、経時データ２３２の最大値、最小値、平均値である。 The temporal feature knowledge 420 is temporal feature information indicating the temporal characteristics (temporal features) obtained from the temporal data 232. For example, the temporal feature knowledge 420 defines the basic statistic item 421, the change amount item 422, the change rate item 423, and so on. The basic statistic item 421 includes an item corresponding to the basic statistic obtained from the time-dependent data 232. The basic statistic is, for example, the maximum value, the minimum value, and the average value of the time-dependent data 232.

変化量項目４２２は、経時データ２３２のうち連続する２つの値の変化を示す変化量に該当する項目である。たとえば、経時データ２３２が１〜３年分の年ごとの体重３２１である場合、変化量（１、２年目）は、１年目の体重３２１と２年目の体重３２１との差であり、変化量（２、３年目）は、２年目の体重３２１と３年目の体重３２１との差である。 The change amount item 422 is an item corresponding to the change amount indicating the change of two consecutive values in the time-dependent data 232. For example, when the time-dependent data 232 is the annual weight 321 for 1 to 3 years, the amount of change (1st and 2nd years) is the difference between the 1st year weight 321 and the 2nd year weight 321. The amount of change (2nd and 3rd years) is the difference between the body weight 321 in the 2nd year and the body weight 321 in the 3rd year.

変化割合項目４２３は、経時データ２３２のうち連続する２つの値の変化の割合を示す値に該当する項目である。たとえば、経時データ２３２が１〜３年分の年ごとの体重３２１である場合、変化割合（１、２年目）は、１年目の体重３２１と２年目の体重３２１との差を１年目の体重３２１で割った値であり、変化量（２、３年目）は、２年目の体重３２１と３年目の体重３２１との差を２年目の体重３２１で割った値である。 The change rate item 423 is an item corresponding to a value indicating the rate of change of two consecutive values in the time-dependent data 232. For example, when the time-dependent data 232 is the annual body weight 321 for 1 to 3 years, the change rate (1st and 2nd years) is the difference between the 1st year body weight 321 and the 2nd year body weight 321. It is the value divided by the weight 321 of the year, and the amount of change (2nd and 3rd years) is the value obtained by dividing the difference between the weight 321 of the 2nd year and the weight 321 of the 3rd year by the weight 321 of the 2nd year. Is.

経時特徴分割知識４３０は、経時データ２３２が所属すべきグループが規定されたグループ分け情報である。経時特徴分割知識４３０は、具体的には、たとえば、統計学的知見あるいは医学的知見に基づいて規定される。具体的には、たとえば、経時特徴分割知識は、体型基本情報項目４３１と、血圧系検査値項目４３２と、血糖系検査値項目４３３と、肝機能系検査値項目４３４と、…を、それぞれグループとして含む。 The temporal feature division knowledge 430 is grouping information in which the group to which the temporal data 232 should belong is defined. The temporal feature division knowledge 430 is specifically defined based on, for example, statistical knowledge or medical knowledge. Specifically, for example, the time-dependent characteristic division knowledge groups the body shape basic information item 431, the blood pressure system test value item 432, the blood glucose system test value item 433, the liver function system test value item 434, and so on, respectively. Include as.

体型基本情報項目４３１は、体型基本情報に該当する項目を含む。体型基本情報とは、契約希望者の体型に関する基本情報である。体型基本情報項目４３１は、たとえば、年齢３１３、体重３２１、ＢＭＩ３２２を項目として含む。 The body shape basic information item 431 includes an item corresponding to the body shape basic information. The basic body shape information is basic information about the body shape of the contract applicant. The body shape basic information item 431 includes, for example, age 313, body weight 321 and BMI 322 as items.

血圧系検査値項目４３２は、血圧系検査値に該当する項目を含む。血圧系検査値は、契約希望者の血圧に関する検査値である。血圧系検査値項目４３２は、たとえば、収縮期血圧３２３や拡張期血圧３２４を項目として含む。 The blood pressure system test value item 432 includes an item corresponding to the blood pressure system test value. The blood pressure system test value is a test value related to the blood pressure of the contract applicant. The blood pressure system test value item 432 includes, for example, systolic blood pressure 323 and diastolic blood pressure 324 as items.

血糖系検査値項目４３３は、血糖系検査値に該当する項目を含む。血糖系検査値は、契約希望者の血糖に関する検査値である。血糖系検査値項目４３３は、たとえば、空腹時血糖３２５やＨｂＡ１ｃを項目として含む。 The blood glucose test value item 433 includes an item corresponding to the blood glucose test value. The blood glucose system test value is a test value related to blood glucose of a contract applicant. The blood glucose system test value item 433 includes, for example, fasting blood glucose 325 and HbA1c as items.

肝機能系検査値項目４３４は、肝機能系検査値に該当する項目を含む。肝機能系検査値は、契約希望者の肝機能に関する検査値である。肝機能系検査値項目４３４は、たとえば、ＧＯＴ（グルタミン酸オキサロ酢酸トランスアミナーゼ）、ＧＰＴ（グルタミン酸ピルビン酸トランスアミナーゼ）、γ−ＧＴＰ（γグルタミルトランスペプチダーゼ）を項目として含む。 The liver function test value item 434 includes an item corresponding to the liver function test value. Liver function test values are test values related to liver function of contract applicants. Liver function test value item 434 includes, for example, GOT (glutamate oxaloacetate transaminase), GPT (glutamate pyruvate transaminase), and γ-GTP (γ-glutamyl transpeptidase).

図２に戻り、判定部２０１は、経時データ判定知識４１０に基づいて、告知情報３００が経時データ２３２および非経時データ２３１のいずれに該当するかを判定する。図３の告知情報３００内の氏名ＩＤ３１１が「０００１」のエントリを例に挙げる。年齢３１３、体重３２１、ＢＭＩ３２２、収縮期血圧３２３、拡張期血圧３２４および空腹時血糖３２５は、経時データ項目４１２に含まれる。したがって、判定部２０１は、年齢３１３の「４７」，「４８」，「４９」、体重３２１の「８３．４」，「８６．６」，「９２．０」、ＢＭＩ３２２の「２２．８」，「２４．３」，「２６．０」、収縮期血圧３２３の「１２４．９」，「１２８．５」，「１３３．８」、拡張期血圧３２４の「８０．７」，「８６．１」，「９０．０」、空腹時血糖３２５の「１０４．５」，「１０７．２」，「１１０．０」が、氏名ＩＤ３１１が「０００１」である契約希望者の経時データ２３２である、と判定する。判定部２０１は、判定された経時データ２３２を判定結果として出力する。 Returning to FIG. 2, the determination unit 201 determines whether the notification information 300 corresponds to the temporal data 232 or the non-chronological data 231 based on the temporal data determination knowledge 410. An entry in which the name ID 311 in the notification information 300 of FIG. 3 is "0001" is taken as an example. Age 313, body weight 321 and BMI 322, systolic blood pressure 323, diastolic blood pressure 324 and fasting blood glucose 325 are included in the temporal data item 412. Therefore, the determination unit 201 has age 313 "47", "48", "49", weight 321 "83.4", "86.6", "92.0", and BMI 322 "22.8". , "24.3", "26.0", systolic blood pressure 323 "124.9", "128.5", "133.8", diastolic blood pressure 324 "80.7", "86. "1", "90.0", "104.5", "107.2", "110.0" of fasting blood pressure 325 are time-lapse data 232 of the contract applicant whose name ID 311 is "0001". , Is determined. The determination unit 201 outputs the determined time-dependent data 232 as a determination result.

また、喫煙習慣３３１、飲酒習慣３３２、運動習慣３３３、高血圧症受診歴３４１、高血圧症入院歴３４２および糖尿病受診歴３４３は、非経時データ項目４１１に含まれる。したがって、判定部２０１は、喫煙習慣３３１の「なし」，「なし」，「なし」、飲酒習慣３３２の「週１」，「週１」，「週２」、運動習慣３３３の「週１」，「週１」，「なし」、高血圧症受診歴３４１の「なし」，「なし」，「なし」、高血圧症入院歴３４２の「なし」，「なし」，「なし」、糖尿病受診歴３４３の「なし」，「なし」，「なし」が、氏名ＩＤ３１１が「０００１」である契約希望者の非経時データ２３１である、と判定する。判定部２０１は、判定された非経時データ２３１を判定結果として出力する。 In addition, smoking habits 331, drinking habits 332, exercise habits 333, hypertension consultation history 341, hypertension hospitalization history 342 and diabetes consultation history 343 are included in the non-time-lapse data item 411. Therefore, the determination unit 201 has "none", "none", "none" of smoking habit 331, "week 1", "week 1", "week 2" of drinking habit 332, and "week 1" of exercise habit 333. , "Week 1", "None", Hypertension consultation history 341 "None", "None", "None", Hypertension hospitalization history 342 "None", "None", "None", Diabetes consultation history 343 "None", "None", and "None" are determined to be non-time data 231 of the contract applicant whose name ID 311 is "0001". The determination unit 201 outputs the determined non-time-lapse data 231 as a determination result.

生成部２０２は、経時特徴知識４２０に基づいて、経時データ２３２から、経時的な特徴を示す経時特徴データを生成する。生成部２０２は、経時データ２３２について、基本統計量項目に含まれる最大値５１１、最小値５１２、平均値、…を算出する。生成部２０２は、経時データ２３２について、変化量項目に含まれる変化量（１，２年目）５２１、変化量（２，３年目）５２２…を算出する。生成部２０２は、経時データ２３２について、変化割合項目に含まれる変化割合（１，２年目）５３１、変化割合（２，３年目）５３２、…を算出する。 The generation unit 202 generates the temporal feature data showing the temporal characteristics from the temporal data 232 based on the temporal characteristic knowledge 420. The generation unit 202 calculates the maximum value 511, the minimum value 512, the average value, ... Included in the basic statistic items for the time-dependent data 232. The generation unit 202 calculates the amount of change (1st and 2nd year) 521, the amount of change (2nd and 3rd year) 522, and the like included in the change amount item for the time-dependent data 232. The generation unit 202 calculates the change rate (1st and 2nd year) 531 and the change rate (2nd and 3rd year) 532, etc. included in the change rate item for the time-dependent data 232.

図３の告知情報３００内の氏名ＩＤ３１１が「０００１」のエントリを例に挙げる。たとえば、経時データ２３２に含まれる体重３２１の「８３．４」，「８６．６」，「９２．０」の場合、最大値５１１は、３年目の体重３２１である「９２．０」、最小値５１２は、１年目の体重３２１である「８３．４」、平均値は、「８３．４」，「８６．６」，「９２．０」を平均化した「８７．３」である。 An entry in which the name ID 311 in the notification information 300 of FIG. 3 is "0001" is taken as an example. For example, in the case of "83.4", "86.6", and "92.0" of the body weight 321 included in the time data 232, the maximum value 511 is "92.0", which is the body weight 321 of the third year. The minimum value 512 is "83.4", which is the weight of 321 in the first year, and the average value is "87.3", which is the average of "83.4", "86.6", and "92.0". is there.

また、変化量（１，２年目）５２１は、「８６．６」から「８３．４」を引いた「３．２」、変化量（２，３年目）５２２は、「９２．０」から「８６．６」を引いた「５．４」である。また、変化割合（１，２年目）５３１は、変化量（１，２年目）５２１である「３．２」を１年目の体重３２１の「８３．４」で割った「０．０４」、変化割合（２，３年目）５３２は、変化量（２，３年目）５２２である「５．４」を２年目の体重３２１の「８６．６」で割った「０．０６」である。体重３２１以外のＢＭＩ３２２、収縮期血圧３２３、拡張期血圧３２４および空腹時血糖３２５などについても同様である。生成部２０２は、生成結果を経時特徴データ５００として出力する。 The amount of change (1st and 2nd year) 521 is "3.2", which is obtained by subtracting "83.4" from "86.6", and the amount of change (2nd and 3rd year) 522 is "92.0". It is "5.4" which is obtained by subtracting "86.6" from "". The rate of change (1st and 2nd year) 531 is calculated by dividing "3.2", which is the amount of change (1st and 2nd year) 521, by "83.4" of the weight 321 in the first year. "04", the rate of change (2nd and 3rd years) 532 is "0" obtained by dividing "5.4", which is the amount of change (2nd and 3rd years) 522, by "86.6" of the weight 321 in the second year. It is ".06". The same applies to BMI 322 other than body weight 321, systolic blood pressure 323, diastolic blood pressure 324, fasting blood glucose 325 and the like. The generation unit 202 outputs the generation result as the time-dependent feature data 500.

図５は、経時特徴データ５００の一例を示す説明図である。経時特徴データ５００は、基本統計量５１０と、変化量５２０と、変化割合５３０と、を含む。基本統計量５１０は、基本統計量項目４２１に従って算出された最大値５１１、最小値５１２、平均値（不図示）、…を含む。変化量５２０は、変化量項目４２２に従って算出された変化量（１，２年目）５２１、変化量（２，３年目）５２２、…を含む。変化割合５３０は、変化量項目４２２に従って算出された変化割合（１，２年目）５３１、変化割合（２，３年目）５３２、…を含む。 FIG. 5 is an explanatory diagram showing an example of time-dependent feature data 500. The temporal feature data 500 includes a basic statistic 510, a change amount 520, and a change rate 530. The basic statistic 510 includes a maximum value 511, a minimum value 512, an average value (not shown), ... Calculated according to the basic statistic item 421. The change amount 520 includes the change amount (1st and 2nd year) 521, the change amount (2nd and 3rd year) 522, ... Calculated according to the change amount item 422. The change rate 530 includes the change rate (1st and 2nd year) 531 calculated according to the change amount item 422, the change rate (2nd and 3rd year) 532, and so on.

基本統計量５１０、変化量５２０および変化割合５３０はそれぞれ、経時データ２３２の種類ごとに、生成部２０２によって生成される。図５に示した経時特徴内のエントリ５０１−１、５０１−２、…、５０２−１、５０２−２、…、５０１−３、５０３−２、…、５０４−１、５０４−２、…は、ある一人の契約希望者（たとえば、氏名ＩＤ３１１が「０００１」の契約希望者）に関する経時特徴データである。生成部２０２は、契約希望者ごとにエントリ５０１−１、５０１−２、…、５０２−１、５０２−２、…、５０１−３、５０３−２、…、５０４−１、５０４−２、…を生成する。 The basic statistic 510, the change amount 520, and the change rate 530 are generated by the generation unit 202 for each type of time-dependent data 232, respectively. The entries 501-1, 501-2, ..., 502-1, 502-2, ..., 501-3, 503-2, ..., 504-1, 504-2, ... In the temporal features shown in FIG. , It is time-dependent characteristic data about a certain contract applicant (for example, a contract applicant whose name ID 311 is "0001"). The generation unit 202 has entries 501-1, 501-2, ..., 502-1, 502-2, ..., 501-3, 503-2, ..., 504-1, 504-2, ..., For each contract applicant. To generate.

エントリ５０１は、体型基本情報項目４３１に従った経時特徴データである。具体的には、たとえば、エントリ５０１−１は、体重３２１についての基本統計量、変化量および変化割合を示す経時特徴データである。エントリ５０１−２は、ＢＭＩ３２２についての基本統計量、変化量および変化割合を示す経時特徴データである。 Entry 501 is time-dependent feature data according to body shape basic information item 431. Specifically, for example, entry 501-1 is temporal feature data showing basic statistics, amount of change and rate of change for body weight 321. Entry 501-2 is temporal feature data showing basic statistics, amount of change and rate of change for BMI322.

エントリ５０２は、血圧系検査値項目４３２に従った経時特徴データである。具体的には、たとえば、エントリ５０２−１は、収縮期血圧３２３についての基本統計量、変化量および変化割合を示す経時特徴データである。エントリ５０２−２は、拡張期血圧３２４についての基本統計量、変化量および変化割合を示す経時特徴データである。 Entry 502 is temporal feature data according to blood pressure system test value item 432. Specifically, for example, entry 502-1 is temporal feature data showing basic statistics, amount of change and rate of change for systolic blood pressure 323. Entry 502-2 is temporal feature data showing basic statistics, amount of change and rate of change for diastolic blood pressure 324.

エントリ５０３は、血糖系検査値項目４３３に従った経時特徴データである。具体的には、たとえば、エントリ５０３−１は、空腹時血糖３２５についての基本統計量、変化量および変化割合を示す経時特徴データである。エントリ５０３−２は、ＨｂＡ１ｃについての基本統計量、変化量および変化割合を示す経時特徴データである。 Entry 503 is temporal feature data according to blood glucose test value item 433. Specifically, for example, entry 503-1 is temporal feature data showing basic statistics, changes, and rates of change for fasting blood glucose 325. Entry 503-2 is temporal feature data showing the basic statistics, amount of change and rate of change for HbA1c.

エントリ５０４は、肝機能系検査値項目４３４に従った経時特徴データである。具体的には、たとえば、エントリ５０４−１は、ＧＯＴについての基本統計量、変化量および変化割合を示す経時特徴データである。エントリ５０４−２は、ＧＰＴについての基本統計量、変化量および変化割合を示す経時特徴データである。 Entry 504 is temporal feature data according to liver function test value item 434. Specifically, for example, entry 504-1 is temporal feature data showing basic statistics, amount of change, and rate of change for GOT. Entry 504-2 is temporal feature data showing the basic statistics, amount of change and rate of change for GPT.

図２に戻り、分割部２０３は、経時特徴分割知識４３０に基づいて、生成部２０２によって生成された複数の経時特徴データを複数のグループに分割し、分割経時特徴データ６００を出力する。複数の経時特徴データとは、たとえば、図５に示した各エントリ５０１−１、５０１−２、…、５０２−１、５０２−２、…、５０１−３、５０３−２、…、５０４−１、５０４−２、…を構成する値の各々である。複数のグループは、分割経時特徴データ６００−１，６００−２、…、６００−ｎ（ｎは１以上の整数）である。分割経時特徴データ６００−１，６００−２、…、６００−ｎの各々は、図４に示した経時特徴分割知識４３０の体型基本情報項目４３１、血圧系検査値項目４３２、血糖系検査値項目４３３、肝機能系検査値項目４３４、…として規定される。 Returning to FIG. 2, the division unit 203 divides the plurality of time-dependent feature data generated by the generation unit 202 into a plurality of groups based on the time-dependent feature division knowledge 430, and outputs the divided time-dependent feature data 600. The plurality of temporal feature data are, for example, the respective entries 501-1, 501-2, ..., 502-1, 502-2, ..., 501-3, 503-2, ..., 504-1 shown in FIG. , 504-2, ..., Each of the values constituting. The plurality of groups are divided time-dependent feature data 600-1, 600-2, ..., 600-n (n is an integer of 1 or more). Each of the divided temporal feature data 600-1, 600-2, ..., 600-n is the body shape basic information item 431, the blood pressure system test value item 432, and the blood glucose system test value item of the temporal feature division knowledge 430 shown in FIG. It is defined as 433, liver function test value item 434, ...

経時特徴分割知識４３０は、統計学的知見あるいは医学的知見に基づいて規定されている。（例１）ＢＭＩ３２２は身長と体重３２１から算出される指数である。保険契約希望者の年代であれば身長の変化は大きくない。このため、ＢＭＩ３２２の変化は体重３２１の変化と非常に強い相関がある。告知情報３００の中で相関の強い項目が複数存在する場合、それらは冗長な情報となり、データ分析において非効率の原因となる。 The temporal feature division knowledge 430 is defined based on statistical or medical findings. (Example 1) BMI 322 is an index calculated from height and weight 321. If it is the age of the insurance contract applicant, the change in height is not large. Therefore, the change in BMI 322 has a very strong correlation with the change in body weight 321. If there are a plurality of highly correlated items in the notification information 300, they become redundant information and cause inefficiency in data analysis.

これに対し、経時特徴分割知識４３０を適用することにより、冗長さを含む複数の項目（年齢３１３、体重３２１、ＢＭＩ３２２、…）の経時特徴データが、体型基本情報項目４３１という分割経時特徴データ６００−１のグループ（以下、体型グループ６００−１）にまとめられる。これにより、生成装置１００は、当該複数の項目の値を用いて生成された経時特徴データを、体型グループとしてまとめて次元圧縮して高次元の特徴（以下、高次特徴）を抽出することが可能となり、データ分析の効率化を図ることができる。 On the other hand, by applying the temporal feature division knowledge 430, the temporal feature data of a plurality of items (age 313, weight 321, BMI 322, ...) Including redundancy can be divided into the temporal feature data 600 of the body shape basic information item 431. It is grouped into a group of -1 (hereinafter referred to as body type group 600-1). As a result, the generation device 100 can extract the high-dimensional features (hereinafter referred to as high-order features) by collectively compressing the time-dependent feature data generated using the values of the plurality of items as a body shape group. This makes it possible to improve the efficiency of data analysis.

（例２）収縮期血圧３２３と拡張期血圧３２４は、どちらも加齢とともに緩やかに悪化することが知られている。また、生活習慣の改善、悪化などで血圧が下降、上昇する場合も、収縮期血圧３２３と拡張期血圧３２４はバランスを保ったまま下降、上昇する。しかし、そのバランスに変化が生じたとき、動脈硬化などの高血圧疾患の予兆であると言われている。 (Example 2) It is known that both systolic blood pressure 323 and diastolic blood pressure 324 gradually deteriorate with aging. Further, when the blood pressure decreases or increases due to improvement or deterioration of lifestyle, the systolic blood pressure 323 and the diastolic blood pressure 324 decrease and increase while maintaining a balance. However, when the balance changes, it is said to be a sign of hypertensive diseases such as arteriosclerosis.

そこで、経時特徴分割知識４３０を適用することにより、収縮期血圧３２３と拡張期血圧３２４の経時特徴データが、血圧系検査値項目４３２という分割経時特徴データ６００−２のグループ（以下、血圧系グループ６００−２）にまとめられる。これにより、生成装置１００は、収縮期血圧３２３の値と拡張期血圧３２４の値とを用いて生成された経時特徴データを、血圧系グループとしてまとめて次元圧縮して高次特徴を抽出することで、データ分析に必要な複合的な特徴量を得ることができる。 Therefore, by applying the temporal feature division knowledge 430, the temporal characteristic data of the systolic blood pressure 323 and the diastolic blood pressure 324 are the group of the divided temporal characteristic data 600-2 called the blood pressure system test value item 432 (hereinafter, the blood pressure system group). It is summarized in 600-2). As a result, the generator 100 collects the temporal feature data generated using the value of systolic blood pressure 323 and the value of diastolic blood pressure 324 as a blood pressure system group and dimensionally compresses them to extract higher-order features. Therefore, the complex features required for data analysis can be obtained.

同様の理由で、空腹時血糖３２５とＨｂＡ１ｃの経時特徴データが、血糖系検査値項目４３３という分割経時特徴データ６００−３のグループ（以下、血糖系グループ６００−３）にまとめられ、ＧＯＴ，ＧＰＴ，γ−ＧＴＰの経時特徴データが、肝機能系検査値項目４３４という分割経時特徴データ６００−４のグループ（以下、肝機能系グループ６００−４）にまとめられる。 For the same reason, the temporal characteristic data of fasting blood glucose 325 and HbA1c are summarized in the group of divided temporal characteristic data 600-3 (hereinafter referred to as blood glucose group 600-3) called blood glucose test value item 433, and GOT, GPT. , Γ-GTP temporal feature data is summarized in a group of divided temporal feature data 600-4 (hereinafter, liver function system group 600-4) called liver function test value item 434.

図６は、分割経時特徴データ６００の一例を示す説明図である。分割経時特徴は、体型グループ６００−１と、血圧系グループ６００−２と、血糖系グループ６００−３と、肝機能系グループ６００−４と、を含む。体型グループ６００−１は、体型基本情報項目４３１に該当する経時特徴データを含むデータ集合である。血圧系グループ６００−２は、血圧系検査値項目４３２に該当する経時特徴データを含むデータ集合である。血糖系グループ６００−３は、血糖系検査値項目４３３に該当する経時特徴データを含むデータ集合である。肝機能系グループ６００−４は、肝機能系検査値項目４３４に該当する経時特徴データを含むデータ集合である。 FIG. 6 is an explanatory diagram showing an example of the divided time-dependent feature data 600. The divisional time-dependent features include body type group 600-1, blood pressure group 600-2, blood glucose group 600-3, and liver function group 600-4. The body shape group 600-1 is a data set including time-dependent feature data corresponding to the body shape basic information item 431. The blood pressure system group 600-2 is a data set including time-dependent characteristic data corresponding to the blood pressure system test value item 432. The blood glucose system group 600-3 is a data set including time-dependent characteristic data corresponding to the blood glucose system test value item 433. The liver function system group 600-4 is a data set including time-dependent feature data corresponding to the liver function system test value item 434.

体型グループ６００−１は、体重３２１の経時特徴データ６０１−１と、ＢＭＩ３２２の経時特徴データ６０１−２と、を含む。体重３２１の経時特徴データ６０１−１のデータ列をベクトルＵｂ１−１とする。ＢＭＩ３２２の経時特徴データ６０１−２のデータ列をベクトルＵｂ１−２とする。 The body type group 600-1 includes time-related characteristic data 601-1 of body weight 321 and time-related characteristic data 601-2 of BMI 322. Let the data string of the temporal feature data 601-1 of the body weight 321 be the vector Ub1-1. Let the data sequence of the time-dependent feature data 601-2 of BMI322 be the vector Ub1-2.

血圧系グループ６００−２は、収縮期血圧３２３の経時特徴データ６０２−１と、拡張期血圧３２４の経時特徴データ６０２−２と、を含む。収縮期血圧３２３の経時特徴データ６０２−１のデータ列をベクトルＵｂ２−１とする。拡張期血圧３２４の経時特徴データ６０２−２のデータ列をベクトルＵｂ２−２とする。 Blood pressure group 600-2 includes temporal characteristic data 602-1 for systolic blood pressure 323 and temporal characteristic data 602-2 for diastolic blood pressure 324. The data sequence of the temporal characteristic data 602-1 of the systolic blood pressure 323 is defined as the vector Ub2-1. Let the data string of the time-dependent characteristic data 602-2 of the diastolic blood pressure 324 be the vector Ub2-2.

血糖系グループ６００−３は、空腹時血糖３２５の経時特徴データ６０３−１と、ＨｂＡ１ｃの経時特徴データ６０２−２と、を含む。空腹時血糖３２５の経時特徴データ６０３−１のデータ列をベクトルＵｂ３−１とする。ＨｂＡ１ｃの経時特徴データ６０２−２のデータ列をベクトルＵｂ３−２とする。 The blood glucose system group 600-3 includes time-related characteristic data 603-1 of fasting blood glucose 325 and time-related characteristic data 602-2 of HbA1c. Let the data sequence of the time-dependent characteristic data 603-1 of the fasting blood glucose 325 be the vector Ub3-1. Let the data string of the time-dependent feature data 602-2 of HbA1c be the vector Ub3-2.

肝機能系グループ６００−４は、ＧＯＴの経時特徴データ６０４−１と、ＧＰＴの経時特徴データ６０４−２と、を含む。ＧＯＴの経時特徴データ６０４−１のデータ列をベクトルＵｂ４−１とする。ＧＰＴの経時特徴データ６０４−２のデータ列をベクトルＵｂ４−２とする。 Liver function group 600-4 includes GOT temporal feature data 604-1 and GPT temporal characteristic data 604-2. Let the data string of the time-dependent feature data 604-1 of GOT be the vector Ub4-1. Let the data string of the time-dependent feature data 604-2 of GPT be the vector Ub4-2.

図２に戻り、次元圧縮部２０４は、入力されてくるデータを次元圧縮して高次特徴データ７００を生成する。具体的には、たとえば、次元圧縮部２０４は、分割部２０３によって分割された複数のグループの各々を次元圧縮する。具体的には、たとえば、次元圧縮部２０４は、分割経時特徴データ６００−１〜６００−ｎの各々について次元圧縮をおこない、経時データ２３２に関する高次特徴データ７０２−１〜７０２−ｎを生成する。 Returning to FIG. 2, the dimensional compression unit 204 dimensionally compresses the input data to generate higher-order feature data 700. Specifically, for example, the dimensional compression unit 204 dimensionally compresses each of the plurality of groups divided by the division unit 203. Specifically, for example, the dimensional compression unit 204 performs dimensional compression on each of the divided chronological feature data 600-1 to 600-n to generate higher-order feature data 702 to 702-n related to the chronological data 232. ..

また、次元圧縮部２０４は、非経時データ２３１について次元圧縮をおこない、非経時データ２３１に関する高次特徴データ７０１を生成する。データの次元圧縮による高次特徴データの抽出は、Ｐｒｉｎｃｉｐａｌｃｏｍｐｏｎｅｎｔｓａｎａｌｙｓｉｓ（ＰＣＡ）や、Ｓｔａｃｋｅｄａｕｔｏｅｎｃｏｄｅｒなど公知の次元圧縮方法により実現される。また、次元圧縮部２０４は、ニューラルネットワークなどを用いて、一度データの次元数を拡張することで高次特徴データを生成し、その後次元圧縮をしてもよい。 Further, the dimensional compression unit 204 performs dimensional compression on the non-time-lapse data 231 and generates higher-order feature data 701 regarding the non-time-lapse data 231. Extraction of higher-order feature data by dimensional compression of data is realized by a known dimensional compression method such as Principal components analysis (PCA) or Stacked autoencoder. Further, the dimension compression unit 204 may generate high-order feature data by expanding the number of dimensions of the data once by using a neural network or the like, and then perform dimension compression.

図７は、高次特徴データの一例を示す説明図である。高次特徴データは、非経時データ２３１に関する高次特徴データ７０１と、経時データ２３２に関する体型系高次特徴データ７０２−１と、経時データ２３２に関する血圧系高次特徴データ７０２−２と、経時データ２３２に関する血糖系高次特徴データ７０２−３と、経時データ２３２に関する肝機能系高次特徴データ７０２−４と、を含む。 FIG. 7 is an explanatory diagram showing an example of higher-order feature data. The higher-order feature data includes higher-order feature data 701 related to non-time-lapse data 231, body-type higher-order feature data 702-1 related to time-lapse data 232, blood pressure-based higher-order feature data 702-2 related to time-lapse data 232, and time-lapse data. Includes glycemic system higher-order feature data 702-3 for 232 and hepatic function system higher-order feature data 702-4 for time-dependent data 232.

非経時データ２３１に関する高次特徴データ７０１、経時データ２３２に関する体型系高次特徴データ７０２−１、経時データ２３２に関する血圧系高次特徴データ７０２−２、経時データ２３２に関する血糖系高次特徴データ７０２−３、および経時データ２３２に関する肝機能系高次特徴データ７０２−４はそれぞれ、次元圧縮により得られた特徴量１、特徴量２、…を含む。 Higher-order feature data 701 for non-time data 231, body-type higher-order feature data 702-1 for time-lapse data 232, blood-blood-type higher-order feature data 702-2 for time-lapse data 232, and blood-type higher-order feature data 702 for time-lapse data 232 -3, and hepatic functional system higher-order feature data 702-4 relating to time-dependent data 232, respectively, include feature amount 1, feature amount 2, ... Obtained by dimensional compression.

非経時データ２３１に関する高次特徴データ７０１の特徴量１の列における値の集合がベクトルＶａ１であり、特徴量２の列における値の集合がベクトルＶａ２である。経時データ２３２に関する体型系高次特徴データ７０２−１の特徴量１の列における値の集合がベクトルＶｂａ１−１であり、特徴量２の列における値の集合がベクトルＶｂ１−２である。 The set of values in the column of the feature amount 1 of the higher-order feature data 701 regarding the non-periodic data 231 is the vector Va1, and the set of the values in the column of the feature amount 2 is the vector Va2. The set of values in the column of feature amount 1 of the body type system higher-order feature data 702-1 regarding the time-dependent data 232 is the vector Vba1-1, and the set of values in the column of feature amount 2 is the vector Vb1-2.

経時データ２３２に関する血圧系高次特徴データ７０２−２の特徴量１の列における値の集合がベクトルＶｂ２−１であり、特徴量２の列における値の集合がベクトルＶｂ２−２である。経時データ２３２に関する血糖系高次特徴データ７０２−３の特徴量１の列における値の集合がベクトルＶｂ３−１であり、特徴量２の列における値の集合がベクトルＶｂ３−２である。経時データ２３２に関する肝機能系高次特徴データ７０２−４の特徴量１の列における値の集合がベクトルＶｂ４−１であり、特徴量２の列における値の集合がベクトルＶｂ４−２である。 The set of values in the column of feature amount 1 of the blood pressure system higher-order feature data 702-2 with respect to the time-dependent data 232 is the vector Vb2-1, and the set of values in the column of feature amount 2 is the vector Vb2-2. The set of values in the column of feature amount 1 of the blood glucose system higher-order feature data 702-3 with respect to the time-dependent data 232 is the vector Vb3-1, and the set of values in the column of feature amount 2 is the vector Vb3-2. The set of values in the column of feature amount 1 of the liver function system higher-order feature data 702-4 with respect to the time-dependent data 232 is the vector Vb4-1, and the set of values in the column of feature amount 2 is the vector Vb4-2.

図２に戻り、結合部２０５は、次元圧縮部２０４による次元圧縮後の複数のグループを結合し、結合高次特徴データ７１０を生成する。次元圧縮後の複数のグループとは、分割経時特徴データ６００−１〜６００−ｎが次元圧縮された場合、経時データ２３２に関する高次特徴データ７０２−１〜７０２−ｎである。非経時データ２３１および分割経時特徴データ６００−１〜６００−ｎが次元圧縮された場合、非経時データ２３１に関する高次特徴データ７０１および経時データ２３２に関する高次特徴データ７０２−１〜７０２−ｎである。 Returning to FIG. 2, the coupling unit 205 combines a plurality of groups after the dimension compression by the dimension compression unit 204 to generate the combined higher-order feature data 710. The plurality of groups after the dimensional compression are higher-order feature data 702 to 702-n regarding the chronological data 232 when the divided chronological feature data 600-1 to 600-n are dimensionally compressed. When the non-time-lapse data 231 and the divided time-lapse feature data 600-1 to 600-n are dimensionally compressed, the higher-order feature data 701 for the non-time-lapse data 231 and the higher-order feature data 702 to 702-n for the time-lapse data 232 is there.

図７を用いて結合部２０５による結合例を示す。結合部２０５は、Ｖａ１，Ｖａ２，…，Ｖｂ１−１，Ｖｂ１−２，…，Ｖｂ３−１，Ｖｂ３−２，…，Ｖｂ４−１，Ｖｂ４−２，…を結合して、高次特徴ベクトルＶａｌｌを結合高次特徴データ７１０として生成する。高次特徴ベクトルＶａｌｌの次元数は、Ｖａ１，ｖＡ２，…，Ｖｂ１−１，Ｖｂ１−２，…，Ｖｂ３−１，Ｖｂ３−２，…，Ｖｂ４−１，Ｖｂ４−２，…の各々の要素の総和である。 FIG. 7 shows an example of coupling by the coupling portion 205. The coupling portion 205 connects Va1, Va2, ..., Vb1-1, Vb1-2, ..., Vb3-1, Vb3-2, ..., Vb4-1, Vb4-2, ..., And the higher-order feature vector Val. Is generated as the combined high-order feature data 710. The number of dimensions of the higher-order feature vector Val is the number of dimensions of each element of Va1, vA2, ..., Vb1-1, Vb1-2, ..., Vb3-1, Vb3, 2, ..., Vb4-1, Vb4-2, ... It is the sum.

図２に戻り、分析部２０６は、結合部２０５による結合結果（結合高次特徴データ７１０）を説明変数とし、対応する目的変数を出力する。たとえば、分析部２０６は、保険金支払リスク分析を行い、具体的には、死亡、入院、手術、通院などの将来の発生確率を目的変数として出力する。具体的には、たとえば、分析部２０６は、重回帰分析なのどの統計的手法や、ニューラルネットワークなどの機械学習手法など、公知の技術を用いてデータ分析を実行する。具体的には、たとえば、分析部２０６は、既存の告知情報３００から得られる高次特徴ベクトルＶａｌｌとその分析結果との組み合わせを訓練データとして学習モデルを生成し、新規の告知情報３００から得られる高次特徴ベクトルＶａｌｌを学習モデルに入力することで、新規の告知情報３００に対応する新規の分析結果を得る。 Returning to FIG. 2, the analysis unit 206 uses the combination result (combination higher-order feature data 710) of the connection unit 205 as an explanatory variable and outputs the corresponding objective variable. For example, the analysis unit 206 performs insurance payment risk analysis, and specifically outputs future occurrence probabilities such as death, hospitalization, surgery, and outpatient treatment as objective variables. Specifically, for example, the analysis unit 206 executes data analysis using a known technique such as a statistical method such as multiple regression analysis or a machine learning method such as a neural network. Specifically, for example, the analysis unit 206 generates a learning model using the combination of the higher-order feature vector Val obtained from the existing notification information 300 and the analysis result as training data, and obtains it from the new notification information 300. By inputting the higher-order feature vector Val into the training model, a new analysis result corresponding to the new notification information 300 is obtained.

＜画面例＞
図８は、入出力画面例１を示す説明図である。入出力画面８００は、生成装置１００の出力デバイス１０４の一例であるディスプレイまたは生成装置１００と通信可能な他のコンピュータのディスプレイに表示される。 <Screen example>
FIG. 8 is an explanatory diagram showing an input / output screen example 1. The input / output screen 800 is displayed on a display that is an example of the output device 104 of the generation device 100 or a display of another computer that can communicate with the generation device 100.

入出力画面８００は、告知情報読込みボタン８０１と、ドメイン知識読込みボタン８０２と、特徴抽出手法選択プルダウン８０３と、分析手法選択プルダウン８０４と、分析実行ボタン８０５と、実行結果表示領域８０６と、を含む。告知情報読込みボタン８０１は、入力デバイス１０３で押下されるボタンである。告知情報読込みボタン８０１が押下されると、記憶デバイス１０２に記憶された契約希望者の告知情報３００が読み込まれる。 The input / output screen 800 includes a notification information read button 801, a domain knowledge read button 802, a feature extraction method selection pull-down 803, an analysis method selection pull-down 804, an analysis execution button 805, and an execution result display area 806. .. The notification information reading button 801 is a button pressed by the input device 103. When the notification information reading button 801 is pressed, the notification information 300 of the contract applicant stored in the storage device 102 is read.

告知情報３００は、告知情報読込みボタン８０１を押下する方法以外に、告知情報入力ボタン８０７の押下により告知情報入力画面を表示し、入力デバイス１０３によって入力することもできる。図９は、告知情報入力画面例を示す説明図である。告知情報入力画面９００は、健診結果入力領域９０１と、問診結果入力領域９０２と、を含む。健診結果入力領域９０１では、入力デバイス１０３により、収縮期血圧３２３、拡張期血圧３２４、空腹時血糖３２５などの値が設定可能である。問診結果入力領域９０２では、入力デバイス１０３により、喫煙習慣３３１、飲酒習慣３３２、運動習慣３３３などの有無が設定可能である。 In addition to the method of pressing the notification information reading button 801, the notification information 300 can display the notification information input screen by pressing the notification information input button 807 and can be input by the input device 103. FIG. 9 is an explanatory diagram showing an example of a notification information input screen. The notification information input screen 900 includes a medical examination result input area 901 and a medical inquiry result input area 902. In the medical examination result input area 901, values such as systolic blood pressure 323, diastolic blood pressure 324, and fasting blood glucose 325 can be set by the input device 103. In the interview result input area 902, the presence / absence of smoking habits 331, drinking habits 332, exercise habits 333, etc. can be set by the input device 103.

図８に戻り、ドメイン知識読込みボタン８０２は、入力デバイス１０３で押下されるボタンである。ドメイン知識読込みボタン８０２が押下されると、記憶デバイス１０２に記憶されたドメイン知識４００が読み込まれる。あるいは、ドメイン知識入力ボタン８０８の押下により、ドメイン知識を入力するための設定画面（不図示）を表示することができる。設定画面では、入力デバイス１０３により、経時データ判定知識４１０、経時特徴知識４２０、および経時特徴分割知識４３０の内容について追加、変更、削除が可能となる。 Returning to FIG. 8, the domain knowledge reading button 802 is a button pressed by the input device 103. When the domain knowledge read button 802 is pressed, the domain knowledge 400 stored in the storage device 102 is read. Alternatively, by pressing the domain knowledge input button 808, a setting screen (not shown) for inputting domain knowledge can be displayed. On the setting screen, the input device 103 enables addition, change, and deletion of the contents of the time-lapse data determination knowledge 410, the time-lapse feature knowledge 420, and the time-lapse feature division knowledge 430.

特徴抽出手法選択プルダウン８０３は、入力デバイス１０３で複数の特徴抽出手法をプルダウン表示させ、いずれか１つを選択させるボタンである。複数の特徴抽出手法には、たとえば、上述したＰＣＡやＳｔａｃｋｅｄａｕｔｏｅｎｃｏｄｅｒなど公知の次元圧縮方法が含まれる。たとえば、ＰＣＡが選択されると、生成装置１００は、ＰＣＡで次元圧縮をおこなうことになる。 The feature extraction method selection pull-down 803 is a button for displaying a plurality of feature extraction methods in a pull-down display on the input device 103 and selecting one of them. The plurality of feature extraction methods include known dimensional compression methods such as the above-mentioned PCA and Stacked autoencoder. For example, when PCA is selected, the generator 100 will perform dimensional compression on the PCA.

分析手法選択プルダウン８０４は、入力デバイス１０３で複数の分析手法をプルダウン表示させ、いずれか１つを選択させるボタンである。複数の分析手法には、たとえば、上述した重回帰分析なのどの統計的手法やニューラルネットワークなどの機械学習手法など公知の手法が含まれる。たとえば、重回帰分析が選択されると、生成装置１００は、重回帰分析でデータ分析をおこなうことになる。 The analysis method selection pull-down 804 is a button for displaying a plurality of analysis methods in a pull-down display on the input device 103 and selecting one of them. The plurality of analysis methods include known methods such as the above-mentioned multiple regression analysis and other statistical methods and machine learning methods such as neural networks. For example, if multiple regression analysis is selected, the generator 100 will perform data analysis in multiple regression analysis.

分析実行ボタン８０５は、入力デバイス１０３で押下されるボタンである。分析実行ボタンが押下されると、生成装置１００は、告知情報３００およびドメイン知識４００を記憶デバイス１０２からロードし、特徴抽出手法選択プルダウン８０３で選択された手法により次元圧縮をおこない、分析手法選択プルダウン８０４で選択された手法により、データ分析を実行する。実行結果表示領域８０６は、分析実行ボタン８０５が押下されたことにより実行されたデータ分析の実行結果が表示される領域である。 The analysis execution button 805 is a button pressed by the input device 103. When the analysis execution button is pressed, the generation device 100 loads the notification information 300 and the domain knowledge 400 from the storage device 102, performs dimension compression by the method selected by the feature extraction method selection pull-down 803, and performs the analysis method selection pull-down. Data analysis is performed by the method selected in 804. The execution result display area 806 is an area in which the execution result of the data analysis executed by pressing the analysis execution button 805 is displayed.

図１０は、入出力画面例２を示す説明図である。図１０では、実行結果表示領域８０６に氏名ＩＤ３１１ごとの分析結果１０００が表示される。 FIG. 10 is an explanatory diagram showing an input / output screen example 2. In FIG. 10, the analysis result 1000 for each name ID 311 is displayed in the execution result display area 806.

＜生成処理手順例＞
図１１は、生成装置１００による生成処理手順例を示すフローチャートである。生成装置１００は、分析実行ボタン８０５の押下により、告知情報３００およびドメイン知識４００を読み込み（ステップＳ１１０１）、告知情報３００内における契約希望者の分析対象データの各々について、経時データ２３２であるか非経時データ２３１であるかを判定部２０１により判定する（ステップＳ１１０２）。経時データ２３２であると判定されたデータについて（ステップＳ１１０２：Ｙｅｓ）、生成装置１００は、生成部２０２により経時特徴データ５００を生成し（ステップＳ１１０３）、分割部２０３により複数の経時特徴データを複数のグループに分割する（ステップＳ１１０４）。 <Example of generation processing procedure>
FIG. 11 is a flowchart showing an example of a generation processing procedure by the generation device 100. By pressing the analysis execution button 805, the generation device 100 reads the notification information 300 and the domain knowledge 400 (step S1101), and for each of the analysis target data of the contract applicant in the notification information 300, it is time-lapse data 232 or not. The determination unit 201 determines whether the data is the time-lapse data 231 (step S1102). Regarding the data determined to be the time-lapse data 232 (step S1102: Yes), the generation device 100 generates the time-lapse feature data 500 by the generation unit 202 (step S1103), and the division unit 203 generates a plurality of time-lapse feature data. (Step S1104).

非経時データ２３１であると判定されたデータについて（ステップＳ１１０２：Ｎｏ）、生成装置１００は、次元圧縮部２０４により非経時データ２３１に関する高次特徴データ７０１を抽出する（ステップＳ１１０５）。同様に、生成装置１００は、次元圧縮部２０４により経時データ２３２に関する高次特徴データ７０２−１〜７０２−ｎを抽出する（ステップＳ１１０６−１〜Ｓ１１０６−ｎ）。 Regarding the data determined to be the non-time-lapse data 231 (step S1102: No), the generation device 100 extracts the higher-order feature data 701 related to the non-time-lapse data 231 by the dimension compression unit 204 (step S1105). Similarly, the generation device 100 extracts the higher-order feature data 702 to 702-n regarding the time-dependent data 232 by the dimension compression unit 204 (steps S1106-1 to S1106-n).

そして、生成装置１００は、結合部２０５により高次特徴データ７０１、７０２−１〜７０２−ｎを結合して、結合高次特徴データ７１０を生成し（ステップＳ１１０７）、分析部２０６によりデータ分析を実行して（ステップＳ１１０８）、分析結果１０００を入出力画面８００の実行結果表示領域８０６に表示する。これにより、一連の生成処理が終了する。 Then, the generation device 100 combines the higher-order feature data 701 and 702 to 702-n by the coupling unit 205 to generate the combined higher-order feature data 710 (step S1107), and the analysis unit 206 performs data analysis. It is executed (step S1108), and the analysis result 1000 is displayed in the execution result display area 806 of the input / output screen 800. As a result, a series of generation processes is completed.

このように、実施例１によれば、経時特徴データ５００が経時特徴分割知識４３０に従ってグループ分けされるため、分析に不要な特徴量の生成を抑制することができる。これにより、高品質な説明変数を生成することができ、データ分析の高精度化を図ることができる。また、分岐に不要な特徴量の生成を抑制することにより、計算コストが低減され、データ生成およびデータ分析における計算効率の向上を図ることができる。 As described above, according to the first embodiment, since the temporal feature data 500 is grouped according to the temporal feature division knowledge 430, it is possible to suppress the generation of the feature amount unnecessary for the analysis. As a result, high-quality explanatory variables can be generated, and the accuracy of data analysis can be improved. In addition, by suppressing the generation of features unnecessary for branching, the calculation cost can be reduced and the calculation efficiency in data generation and data analysis can be improved.

つぎに、実施例２について説明する。実施例２では、実施例１との相違点を中心に説明するため、実施例１と同一構成には同一符号を付し、その説明を省略する。 Next, the second embodiment will be described. In the second embodiment, since the differences from the first embodiment will be mainly described, the same components as those in the first embodiment are designated by the same reference numerals, and the description thereof will be omitted.

図１２は、実施例２にかかる生成装置１００の機能的構成例を示すブロック図である。実施例１にかかる生成装置１００は、次元圧縮部２０４による次元圧縮処理、結合部２０５による結合処理、および分析部２０６による分析処理をそれぞれ独立した処理として実行したが、実施例２では、次元圧縮部２０４、結合部２０５および分析部２０６に替えて、マルチモーダルニューラルネットワーク１２００を適用することで、次元圧縮部２０４による次元圧縮処理、結合部２０５による結合処理、および分析部２０６による分析処理を連続的に実行する。 FIG. 12 is a block diagram showing a functional configuration example of the generator 100 according to the second embodiment. The generator 100 according to the first embodiment executed the dimensional compression process by the dimensional compression unit 204, the coupling process by the coupling unit 205, and the analysis process by the analysis unit 206 as independent processes, but in the second embodiment, the dimensional compression process was performed. By applying the multimodal neural network 1200 instead of the unit 204, the coupling unit 205, and the analysis unit 206, the dimensional compression processing by the dimensional compression unit 204, the coupling processing by the coupling unit 205, and the analysis processing by the analysis unit 206 are continuously performed. Execute.

図１３は、マルチモーダルニューラルネットワーク１２００の一例を示す説明図である。図１４は、マルチモーダルニューラルネットワーク１２００による分析結果１４００を示す入出力画面例を示す説明図である。 FIG. 13 is an explanatory diagram showing an example of the multimodal neural network 1200. FIG. 14 is an explanatory diagram showing an example of an input / output screen showing an analysis result 1400 by the multimodal neural network 1200.

マルチモーダルニューラルネットワーク１２００は、まず、複数のグループに分類された入力ベクトルに対し、各グループで分岐したニューラルネットワークｆ１，ｆ２，…，ｆｎで特徴抽出を行う。つぎに、マルチモーダルニューラルネットワーク１２００は、ニューラルネットワークｆ１，ｆ２，…，ｆｎの出力ベクトルを結合し、全結合ネットワークｇによって特徴抽出および分析を行い、出力層ｈにて分析結果を出力する。ニューラルネットワークｆ１，ｆ２，…，ｆｎが次元圧縮処理に対応し、全結合ネットワークｇおよび出力層ｈが結合処理および分析処理に対応する。 First, the multimodal neural network 1200 performs feature extraction on the input vectors classified into a plurality of groups by the neural networks f1, f2, ..., Fn branched in each group. Next, the multimodal neural network 1200 combines the output vectors of the neural networks f1, f2, ..., Fn, performs feature extraction and analysis by the fully connected network g, and outputs the analysis result on the output layer h. The neural networks f1, f2, ..., Fn correspond to the dimensional compression processing, and the fully coupled network g and the output layer h correspond to the coupling processing and the analysis processing.

マルチモーダルニューラルネットワーク１２００の学習は、ニューラルネットワークｆ１，ｆ２，…，ｆｎに入力される入力ベクトルと、出力値である分析結果の教師あり学習である。マルチモーダルニューラルネットワーク１２００は、グループごとの特徴抽出、全グループを結合した高次特徴抽出、分析の３つの処理を同時に学習することができる。したがって、ネットワークの構造やパラメータの設計次第で高精度な分析が可能である。マルチモーダルニューラルネットワーク１２００の動作は、たとえば、図１３に示した式（１）により表現される。 The learning of the multimodal neural network 1200 is supervised learning of the input vectors input to the neural networks f1, f2, ..., Fn and the analysis results which are the output values. The multimodal neural network 1200 can simultaneously learn three processes of feature extraction for each group, higher-order feature extraction by combining all groups, and analysis. Therefore, highly accurate analysis is possible depending on the network structure and parameter design. The operation of the multimodal neural network 1200 is expressed by, for example, the equation (1) shown in FIG.

ｈ＝ｇ（ｆ（Ｕａ），…，ｆ（Ｕｂ１−１），ｆ（Ｕｂ１−２），…，ｆ（Ｕｂ３−１），ｆ（Ｕｂ３−２），…，ｆ（Ｕｂ４−１），ｆ（Ｕｂ４−２），…）・・・（１） h = g (f (Ua), ..., f (Ub1-1), f (Ub1-2), ..., f (Ub3-1), f (Ub3-2), ..., f (Ub4-1), f (Ub4-2), ...) ... (1)

なお、式（１）において、ベクトルＵａは、非経時データ２３１のベクトル表現である。ベクトルＵａを関数ｆに与えることで、非経時データ２３１に関する高次特徴データ７０１のベクトルＶａ１、Ｖａ２、…が生成される。また、ベクトルＵｂ１−１、ベクトルＵｂ１−２、…を関数ｆに与えることで、経時データ２３２に関する高次特徴データ７０２−１のベクトルＶｂ１−１、Ｖｂａ１−２、…が生成される。 In the equation (1), the vector Ua is a vector representation of the non-periodic data 231. By giving the vector Ua to the function f, the vectors Va1, Va2, ... Of the higher-order feature data 701 related to the non-periodic data 231 are generated. Further, by giving the vectors Ub1-1, vectors Ub1-2, ... To the function f, the vectors Vb1-1, Vba1-2, ... Of the higher-order feature data 702-1 regarding the time-dependent data 232 are generated.

また、ベクトルＵｂ２−１、ベクトルＵｂ２−２、…を関数ｆに与えることで、経時データ２３２に関する高次特徴データ７０２−２のベクトルＶｂ２−１、Ｖｂａ２−２、…が生成される。また、ベクトルＵｂ３−１、ベクトルＵｂ３−２、…を関数ｆに与えることで、経時データ２３２に関する高次特徴データ７０２−３のベクトルＶｂ３−１、Ｖｂａ３−２、…が生成される。また、ベクトルＵｂ４−１、ベクトルＵｂ４−２、…を関数ｆに与えることで、経時データ２３２に関する高次特徴データ７０２−４のベクトルＶｂ４−１、Ｖｂａ４−２、…が生成される。 Further, by giving the vectors Ub2-1, the vectors Ub2-2, ... To the function f, the vectors Vb2-1, Vba2-2, ... Of the higher-order feature data 702-2 relating to the time-dependent data 232 are generated. Further, by giving the vectors Ub3-1, the vectors Ub3-2, ... To the function f, the vectors Vb3-1, Vba3-2, ... Of the higher-order feature data 702-3 regarding the time-dependent data 232 are generated. Further, by giving the vectors Ub4-1, the vectors Ub4-2, ... To the function f, the vectors Vb4-1, Vba4-2, ... Of the higher-order feature data 702-4 relating to the time-dependent data 232 are generated.

このように、実施例２によれば、次元圧縮部２０４による次元圧縮処理、結合部２０５による結合処理、および分析部２０６による分析処理が連続的に実行されるため、生成処理および分析処理の高速化および高精度化を図ることができる。 As described above, according to the second embodiment, the dimensional compression process by the dimensional compression unit 204, the combination process by the coupling unit 205, and the analysis process by the analysis unit 206 are continuously executed, so that the generation process and the analysis process are high-speed. It is possible to improve the accuracy and accuracy.

また、上述した実施例１および実施例２では、生命保険の引受査定における保険金支払リスク予測を例にあげて説明したが、企業の財務分析にも適用可能である。この場合、告知情報３００に替えて有価証券報告書に記載されたデータまたは当該データから算出される指標データとする。また、経時特徴分割知識４３０には、たとえば、上記データを、収益性、安全性、活動性、生産性および成長性の５つの観点でグループ分けした情報となる。 Further, in the above-mentioned Examples 1 and 2, the insurance claim payment risk prediction in the underwriting assessment of life insurance has been described as an example, but it can also be applied to the financial analysis of a company. In this case, instead of the notification information 300, the data described in the securities report or the index data calculated from the data is used. Further, the temporal feature division knowledge 430 is, for example, information obtained by grouping the above data from the five viewpoints of profitability, safety, activity, productivity and growth potential.

収益性とは、企業がどれだけ利益を上げられているかを示す項目であり、売上高総利益率，売上高営業利益率，総資本経常利益率（ＲＯＡ），自己資本当期利益率（ＲＯＥ）を含む。安全性とは、銀行からの借入に対する返済能力といった企業の支払い能力を示す項目であり、流動比率，当座比率，営業キャッシュフロー，投資キャッシュフローなどを含む。活動性とは、資本を効率的に使い、多くの売り上げをあげているかを示す項目であり、総資本回転率，固定資産回転率，棚卸資産回転率などを含む。 Profitability is an item that indicates how much a company is making a profit. Total profit margin on sales, operating profit margin on sales, ordinary profit margin on total capital (ROA), return on equity (ROE) including. Safety is an item that indicates a company's ability to pay, such as the ability to repay loans from banks, and includes the current ratio, quick ratio, operating cash flow, investment cash flow, and so on. Activity is an item that indicates whether capital is used efficiently and a large amount of sales are being made, and includes total capital turnover, fixed asset turnover, inventory turnover, and the like.

生産性とは、企業が従業員や設備などを効率よく活用しているかどうかを示す項目であり、売上高付加価値率，労働分配率，労働生産性などを含む。成長性とは、企業の今後の成長可能性を示す項目であり、売上高伸び率，経常利益伸び率，当期純利益伸び率などを含む。このように、実施例１および実施例２にかかる生成装置１００は、各種データ分析に適用可能である。 Productivity is an item that indicates whether or not a company is efficiently utilizing employees and equipment, and includes sales value-added rate, labor share, labor productivity, and so on. Growth potential is an item that indicates the future growth potential of a company, and includes sales growth rate, ordinary profit growth rate, net income growth rate, and so on. As described above, the generator 100 according to the first and second embodiments can be applied to various data analyzes.

なお、本発明は前述した実施例に限定されるものではなく、添付した特許請求の範囲の趣旨内における様々な変形例及び同等の構成が含まれる。例えば、前述した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに本発明は限定されない。また、ある実施例の構成の一部を他の実施例の構成に置き換えてもよい。また、ある実施例の構成に他の実施例の構成を加えてもよい。また、各実施例の構成の一部について、他の構成の追加、削除、または置換をしてもよい。 It should be noted that the present invention is not limited to the above-described examples, but includes various modifications and equivalent configurations within the scope of the attached claims. For example, the above-described examples have been described in detail in order to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to those having all the described configurations. In addition, a part of the configuration of one embodiment may be replaced with the configuration of another embodiment. In addition, the configuration of another embodiment may be added to the configuration of one embodiment. In addition, other configurations may be added, deleted, or replaced with respect to a part of the configurations of each embodiment.

また、前述した各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等により、ハードウェアで実現してもよく、プロセッサ１０１がそれぞれの機能を実現するプログラムを解釈し実行することにより、ソフトウェアで実現してもよい。 Further, each of the above-described configurations, functions, processing units, processing means, etc. may be realized by hardware by designing a part or all of them by, for example, an integrated circuit, and the processor 101 performs each function. It may be realized by software by interpreting and executing the program to be realized.

各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置、又は、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード、ＳＤカード、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）の記録媒体に格納することができる。 Information such as programs, tables, and files that realize each function is recorded in a storage device such as a memory, a hard disk, an SSD (Solid State Drive), or an IC (Integrated Circuit) card, an SD card, or a DVD (Digital Versaille Disc). It can be stored in a medium.

また、制御線や情報線は説明上必要と考えられるものを示しており、実装上必要な全ての制御線や情報線を示しているとは限らない。実際には、ほとんど全ての構成が相互に接続されていると考えてよい。 In addition, the control lines and information lines indicate those that are considered necessary for explanation, and do not necessarily indicate all the control lines and information lines necessary for implementation. In practice, it can be considered that almost all configurations are interconnected.

１００生成装置
１０１プロセッサ
１０２記憶デバイス
２０１判定部
２０２生成部
２０３分割部
２０４次元圧縮部
２０５結合部
２０６分析部
３００告知情報
４００ドメイン知識
５００経時特徴データ
６００分割経時特徴データ
７００高次特徴データ
７１０結合高次特徴データ
１２００マルチモーダルニューラルネットワーク 100 Generator 101 Processor 102 Storage device 201 Judgment unit 202 Generation unit 203 Divided unit 204 Dimensional compression unit 205 Combined unit 206 Analytical unit 300 Notification information 400 Domain knowledge 500 Time-lapse feature data 600 Next feature data 1200 multimodal neural network

Claims

A generator having a processor that executes a program and a storage device that stores the program.
It is possible to access the temporal feature information showing the temporal characteristics obtained from the temporal data and the grouping information in which a plurality of groups to which the temporal data should belong are defined.
The processor
A generation process for generating a plurality of temporal feature data showing the temporal features from the temporal data to be analyzed based on the temporal feature information.
A division process for dividing a plurality of time-dependent feature data generated by the generation process into the plurality of groups based on the grouping information,
A dimensional compression process that dimensionally compresses each of the plurality of groups divided by the division process,
A generator characterized by performing.

The generator according to claim 1.
In the dimensional compression process, the processor is a generation device that dimensionally compresses non-time-lapse data to be analyzed.

The generator according to claim 2.
It is possible to access the determination information for determining whether the data corresponds to the time-lapse data or the non-time-lapse data.
The processor
Based on the determination information, a determination process for determining whether the analysis target data corresponds to the time-lapse data or the non-time-lapse data is executed.
In the generation process, the processor uses the analysis target data determined to be the temporal data by the determination process as the temporal data of the analysis target, and based on the temporal feature information, from the temporal data of the analysis target, Generate the plurality of temporal feature data,
In the dimensional compression process, the processor dimensionally compresses the analysis target data determined to be the non-time data by the determination process as the analysis target non-time data. A generator characterized by.

The generator according to claim 1.
The processor
A generator characterized by executing a joining process for joining a plurality of groups after the dimension compression by the dimension compression process.

The generator according to claim 3.
A generator characterized by executing a combining process of combining a non-time-lapse data to be analyzed after dimension compression by the dimension compression process and a plurality of groups after dimension compression.

The generator according to claim 4.
The processor
A generator characterized in that an analysis process for outputting a corresponding objective variable is executed using the combination result of the combination process as an explanatory variable.

The generator according to claim 5.
The processor
A generator characterized in that an analysis process for outputting a corresponding objective variable is executed using the combination result of the combination process as an explanatory variable.

The generator according to claim 7.
A generator characterized in that the dimension compression process, the combination process, and the analysis process are executed by a multimodal neural network.

A generation method executed by a generator having a processor that executes a program and a storage device that stores the program.
The generator can access time-dependent feature information indicating the time-dependent characteristics of the time-lapse data and grouping information in which a plurality of groups to which the time-lapse data should belong are defined.
The processor
A generation process for generating a plurality of temporal feature data showing the temporal features from the temporal data to be analyzed based on the temporal feature information.
A division process for dividing a plurality of time-dependent feature data generated by the generation process into the plurality of groups based on the grouping information,
A dimensional compression process that dimensionally compresses each of the plurality of groups divided by the division process,
A generation method characterized by executing.

A generator that lets a processor perform data generation
The processor can access the temporal feature information indicating the temporal characteristics of the temporal data and the grouping information in which a plurality of groups to which the temporal data should belong are defined.
To the processor
A generation process for generating a plurality of temporal feature data showing the temporal features from the temporal data to be analyzed based on the temporal feature information.
A division process for dividing a plurality of time-dependent feature data generated by the generation process into the plurality of groups based on the grouping information,
A dimensional compression process that dimensionally compresses each of the plurality of groups divided by the division process,
A generation program characterized by executing.