JP7481909B2

JP7481909B2 - Feature generation method and feature generation device

Info

Publication number: JP7481909B2
Application number: JP2020095384A
Authority: JP
Inventors: 常之今木; 大輔田代
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2024-05-13
Anticipated expiration: 2040-06-01
Also published as: JP2021189833A

Description

本発明は、時系列データから生成した特徴量で機械学習を行う特徴量生成装置及び特徴量生成方法に関する。 The present invention relates to a feature generation device and a feature generation method that perform machine learning using features generated from time-series data.

時系列データから機械学習のモデルを生成する技術としては、例えば、特許文献１が知られている。特許文献１には、製造装置の時系列データ（例えば、センサ値やイベントログ）からの故障予測を目的としたＭＩＬ（ＭｕｌｔｉｐｌｅＩｎｓｔａｎｃｅＬｅａｒｎｉｎｇ）に関して、ｎｅｇａｔｉｖｅｂａｇのサブセットを複数抽出し、該サブセット毎にｐｏｓｉｔｉｖｅｂａｇと併せて学習させた分類器を生成し、該分類器群の(各特徴量に掛かる)重みの平均が大きい特徴量を優先して選択し、該特徴量を入力として故障予測モデルを学習させる技術が開示されている。 For example, Patent Document 1 is known as a technology for generating a machine learning model from time-series data. Patent Document 1 discloses a technology for MIL (Multiple Instance Learning) for the purpose of failure prediction from time-series data (e.g., sensor values and event logs) of a manufacturing device, in which multiple subsets of negative bags are extracted, a classifier is generated that is trained together with a positive bag for each subset, and a feature value having a large average weight (applied to each feature value) of the classifier group is preferentially selected, and a failure prediction model is trained using the feature value as an input.

また、特許文献２には、製薬における副作用の検出を目的として、患者毎の医療事象の履歴を対象に、投薬から特定期間内に発生した疾患の組み合わせや、他の医療事象（例えば、入院や医療費）の時系列パターン及び既知の投薬と副作用の組み合わせ（ｐｏｓｉｔｉｖｅ／ｎｅｇａｔｉｖｅ）を学習して、ある医療事象の履歴が副作用発生のケースであるか否かをスコアリングする技術が開示されている。 Patent Document 2 also discloses a technology for detecting side effects in pharmaceuticals, which targets the medical event history of each patient, learns combinations of diseases that occur within a specific period from medication, time series patterns of other medical events (e.g., hospitalization and medical expenses), and combinations of known medications and side effects (positive/negative), and scores whether or not a certain medical event history is a case of side effect occurrence.

また、特許文献３には、訓練データのラベル付与に関して、主要な特徴量（数個）から始めて、ラベリングに役立つ追加の特徴量を、熟練者に提示して選択してもらうことを何度か繰り返すことで、徐々に特徴量を増やしてラベルの再現率を高め、適切な個数の特徴量で正例の全てに該ラベルを付与する技術が開示されている。 Patent Literature 3 also discloses a technology for labeling training data, which starts with a few main features, and then presents additional features useful for labeling to an expert, who is asked to select them, in a process that is repeated several times, gradually increasing the number of features and improving the recall of the labels, and assigning labels to all positive examples with an appropriate number of features.

米国特許出願公開第２０１５／０２２７８３８号明細書US Patent Application Publication No. 2015/0227838 米国特許出願公開第２０１７／００８３６７０号明細書US Patent Application Publication No. 2017/0083670 国際公開第２０１９／０４５７５９号International Publication No. 2019/045759

上記従来例では、人手によらず学習させる特徴量を絞り込むことは考慮されていない。このため、特徴量の積などによって説明変数を合成するような場合では、説明変数の組み合わせ数が膨大になる場合がある、という問題があった。 The above conventional example does not take into consideration the narrowing down of features to be learned without manual intervention. This causes a problem in that when explanatory variables are synthesized by multiplying features, the number of combinations of explanatory variables can become enormous.

また、時系列データから、目的事象の発生（正例）を予測する機械学習のモデルを生成する際には、機械学習の入力データとなる特徴量を正例と負例から生成する。正例の時系列データは、分析対象期間を目的事象の発生した日（又は日時）を基準日とし、基準日から所定期間を分析対象期間とする。 When generating a machine learning model that predicts the occurrence of a target event (positive example) from time series data, features that serve as input data for machine learning are generated from positive and negative examples. For time series data of positive examples, the analysis period is set to the date (or date and time) when the target event occurred as the base date, and a specified period from the base date is set as the analysis period.

一方、負例の時系列データは、分析対象期間は正例と同一ではあるが、目的事象が発生していないため、前記従来例ではどのように負例の基準日を決定するかについては考慮されていない、という問題があった。 On the other hand, the time series data for negative cases has the same analysis period as the positive cases, but since the target event does not occur, the above-mentioned conventional example has a problem in that it does not take into consideration how to determine the reference date for negative cases.

そこで本発明は、上記問題点に鑑みてなされたもので、時系列データから目的事象の発生を予測する機械学習モデルを生成する際に、機械学習の入力データ数が膨大になるのを抑制し、負例の時系列データの基準日を決定することを目的とする。 The present invention has been made in consideration of the above problems, and aims to prevent the amount of input data for machine learning from becoming too large when generating a machine learning model that predicts the occurrence of a target event from time series data, and to determine the reference date for negative example time series data.

本発明は、プロセッサとメモリを有する計算機が、時系列データを受け付けて目的事象の発生を予測する機械学習部への入力データとなる特徴量を生成する特徴量生成方法であって、前記計算機が、値とタイムスタンプを含む複数の時系列データを受け付ける時系列データ入力ステップと、前記計算機が、前記目的事象が発生したタイムスタンプを含む目的事象発生データを受け付ける目的事象発生データ入力ステップと、前記計算機が、前記時系列データの特徴量を算出する内容を定義した特徴量算出定義を受け付ける特徴量算出定義入力ステップと、前記計算機が、目的事象発生データを参照して前記時系列データを、正例時系列データと負例時系列データに分割する分割ステップと、前記計算機が、前記正例時系列データにおける基準日である、正例基準日を決定する正例基準日決定ステップと、前記計算機が、前記正例時系列データと前記正例基準日の組み合わせから、前記特徴量算出定義に基づいて正例特徴量を算出する正例特徴量算出ステップと、前記計算機が、前記正例基準日、前記正例特徴量、及び前記負例時系列データを入力として、負例基準日を決定する負例基準日決定ステップと、前記計算機が、前記負例時系列データと前記負例基準日の組み合わせから、前記特徴量算出定義に基づいて負例特徴量を算出する負例特徴量算出ステップと、を含む。 The present invention is a feature generation method in which a computer having a processor and a memory receives time series data and generates features to be input data to a machine learning unit that predicts the occurrence of a target event, the method including a time series data input step in which the computer receives a plurality of time series data including values and timestamps, a target event occurrence data input step in which the computer receives target event occurrence data including a timestamp at which the target event occurred, a feature calculation definition input step in which the computer receives a feature calculation definition that defines the content for calculating the feature of the time series data, and a feature calculation step in which the computer converts the time series data into a positive example time series data by referring to the target event occurrence data. a positive example reference date determination step in which the computer determines a positive example reference date, which is a reference date in the positive example time series data; a positive example feature calculation step in which the computer calculates positive example features from a combination of the positive example time series data and the positive example reference date based on the feature calculation definition; a negative example reference date determination step in which the computer determines a negative example reference date using the positive example reference date, the positive example features, and the negative example time series data as inputs; and a negative example feature calculation step in which the computer calculates negative example features from a combination of the negative example time series data and the negative example reference date based on the feature calculation definition.

したがって、本発明は、特徴量の重要度が高い方から特徴量重要度の累積値を算出し、該累積値に対する閾値に基づいて、重要度の低い特徴量から徐々に排除することで、重要な特徴量を絞り込むことで、特徴量（説明変数）の組み合わせが膨大になるのを抑制することが可能となる。また、負例の時系列データから、正例の特徴量に近いことを指標として負例の基準日を決定することが可能となる。 Therefore, in the present invention, by calculating the cumulative value of feature importance starting from the most important feature, and gradually eliminating features with less importance based on a threshold value for the cumulative value, it is possible to narrow down the important features and prevent the number of combinations of features (explanatory variables) from becoming too large. In addition, it is possible to determine the reference date for negative examples from the time series data of negative examples, using the proximity to the features of positive examples as an indicator.

本明細書において開示される主題の、少なくとも一つの実施の詳細は、添付されている図面と以下の記述の中で述べられる。開示される主題のその他の特徴、態様、効果は、以下の開示、図面、請求項により明らかにされる。 Details of at least one implementation of the subject matter disclosed herein are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the disclosed subject matter will become apparent from the following disclosure, drawings, and claims.

本発明の実施例１を示し、経時データ分析装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a configuration of a time-course data analysis device according to a first embodiment of the present invention. 本発明の実施例１を示し、経時データ分析装置で行われる処理の一例を示す図である。FIG. 2 illustrates an example of processing performed by the time-course data analysis device according to the first embodiment of the present invention. 本発明の実施例１を示し、特徴量重要度累積値グラフの一例である。1 illustrates an example of a feature amount importance cumulative value graph according to the first embodiment of the present invention. 本発明の実施例１を示し、特徴選択部で行われる処理の一例を示すフローチャートである。11 is a flowchart illustrating an example of processing performed by the feature selection unit according to the first embodiment of the present invention. 本発明の実施例１を示し、経時特徴量生成部で行われる基準日のスライディング処理の一例を示す図である。FIG. 11 illustrates the first embodiment of the present invention and is a diagram illustrating an example of a sliding process of a reference date that is performed by the temporal feature generating unit. 本発明の実施例１を示し、負例基準日決定部で行われる負例基準日の決定処理の一例を示す図である。FIG. 11 illustrates an example of a process for determining a negative example base date that is performed by the negative example base date determination unit according to the first embodiment of the present invention. 本発明の実施例１を示し、負例基準日決定部の構成の一例を示す図である。FIG. 2 illustrates an example of a configuration of a negative example base date determination unit according to the first embodiment of the present invention. 本発明の実施例１を示し、負例基準日決定部で行われる処理の一例を示すフローチャートである。11 is a flowchart illustrating an example of a process performed by the negative example base date determination unit according to the first embodiment of the present invention. 本発明の実施例２を示し、経時データ分析装置で行われる処理の一例を示す図である。FIG. 11 illustrates an example of a process performed by the time-course data analysis device according to the second embodiment of the present invention. 本発明の実施例２を示し、経時データ分析装置で行われる重要度のフィードバック処理の一例を示す図である。FIG. 11 illustrates an example of a feedback process of importance performed by the longitudinal data analysis device according to the second embodiment of the present invention. 本発明の実施例２を示し、経時特徴量生成部と特徴選択部で行われる処理の一例を示すフローチャートである。11 is a flowchart illustrating an example of a process performed by a time-varying feature quantity generating unit and a feature selecting unit according to a second embodiment of the present invention. 本発明の実施例３を示し、負例基準日決定部で行われる処理の一例を示す図である。FIG. 13 illustrates an example of a process performed by the negative example base date determination unit according to the third embodiment of the present invention. 本発明の実施例４を示し、時系列データと予兆期間の関係を示すグラフである。13 is a graph illustrating a relationship between time-series data and a warning period according to a fourth embodiment of the present invention. 本発明の実施例４を示し、負例基準日決定部で行われる処理の一例を示す図である。FIG. 13 illustrates an example of a process performed by the negative example base date determining unit according to the fourth embodiment of the present invention. 本発明の実施例４を示し、負例基準日決定部で行われる処理の変形例を示すフローチャートである。13 is a flowchart illustrating a modified example of the process performed by the negative example base date determination unit according to the fourth embodiment of the present invention.

以下、本発明の実施形態を添付図面に基づいて説明する。 The following describes an embodiment of the present invention with reference to the attached drawings.

図１は、本発明の実施例１を示し、経時データ分析装置１の構成の一例を示すブロック図である。 Figure 1 shows a first embodiment of the present invention, and is a block diagram showing an example of the configuration of a time-course data analysis device 1.

経時データ分析装置１は、プロセッサ２と、メモリ３と、ストレージ装置４と、入力装置５と、出力装置６と、通信装置７を含む計算機である。 The time-course data analysis device 1 is a computer including a processor 2, a memory 3, a storage device 4, an input device 5, an output device 6, and a communication device 7.

本実施例の経時データ分析装置１は、学習用の時系列データ１０２として例えば、金融機関の口座残高を使用し、目的事象として債務不履行（貸し倒れ）の発生を用い、口座残高の時系列の推移から債務不履行の発生を予測する機械学習のモデルを生成する例を示す。 The longitudinal data analysis device 1 of this embodiment uses, for example, the account balance of a financial institution as the time series data 102 for learning, and the occurrence of a default (bad debt) as the target event, and shows an example of generating a machine learning model that predicts the occurrence of a default from the time series trend of the account balance.

なお、時系列データ１０２は、口座残高に限定されるものではなく、また、目的事象は債務不履行の発生に限定されるものではなく、例えば、物理量の時系列から故障発生などの目的事象の発生を予測するようにしてもよい。 The time series data 102 is not limited to the account balance, and the target event is not limited to the occurrence of a default on a debt. For example, the occurrence of a target event such as a failure may be predicted from a time series of a physical quantity.

本実施例の時系列データ１０２は、口座の識別子毎に、値（残高）とタイムスタンプ（日付）と予め設定された識別子を一つのレコードに含む。 In this embodiment, the time series data 102 includes a value (balance), a timestamp (date), and a pre-set identifier in one record for each account identifier.

経時データ分析装置１のメモリ３には、予め収集された時系列データ１０２から特徴量を算出する経時特徴量生成部１１０と、特徴量の絞り込みを行う特徴選択部１５０と、機械学習を実施して予測モデルを生成する機械学習部１６０を格納する。 The memory 3 of the longitudinal data analysis device 1 stores a longitudinal feature generation unit 110 that calculates features from pre-collected time series data 102, a feature selection unit 150 that narrows down the features, and a machine learning unit 160 that performs machine learning to generate a predictive model.

経時特徴量生成部１１０と、特徴選択部１５０と、機械学習部１６０の各機能部はプログラムとしてメモリ３にロードされる。 The functional units of the temporal feature generation unit 110, the feature selection unit 150, and the machine learning unit 160 are loaded into memory 3 as programs.

プロセッサ２は、各機能部のプログラムに従って処理を実行することによって、所定の機能を提供する機能部として稼働する。例えば、プロセッサ２は、特徴選択プログラムに従って処理を実行することで特徴選択部１５０として機能する。他のプログラムについても同様である。さらに、プロセッサ２は、各プログラムが実行する複数の処理のそれぞれの機能を提供する機能部としても稼働する。計算機及び計算機システムは、これらの機能部を含む装置及びシステムである。 The processor 2 operates as a functional unit that provides a specific function by executing processing according to the program of each functional unit. For example, the processor 2 functions as a feature selection unit 150 by executing processing according to a feature selection program. The same applies to other programs. Furthermore, the processor 2 also operates as a functional unit that provides each function of the multiple processes executed by each program. Computers and computer systems are devices and systems that include these functional units.

経時特徴量生成部１１０は、時系列データ分割部１１１と、正例基準日決定部１１４と、負例基準日決定部１１９と、特徴量算出部１１６を含む。時系列データ分割部１１１は、時系列データ１０２を目的事象（債務不履行）が発生した正例の時系列データと、目的事象が発生しない負例の時系列データに分割する。 The temporal feature generation unit 110 includes a time series data division unit 111, a positive example reference date determination unit 114, a negative example reference date determination unit 119, and a feature calculation unit 116. The time series data division unit 111 divides the time series data 102 into time series data of positive examples in which a target event (default) has occurred, and time series data of negative examples in which a target event has not occurred.

正例基準日決定部１１４は、目的事象発生時刻データ１０１を参照して目的事象が発生した日時を正例基準日として決定する。負例基準日決定部１１９は、後述するように、負例の時系列データから算出した特徴量に基づいて負例の時系列データの基準日を決定する。特徴量算出部１１６は、後述するように、正例の時系列データと、負例の時系列データからそれぞれ特徴量を算出する。 The positive example reference date determination unit 114 determines the date and time when the target event occurred as the positive example reference date by referring to the target event occurrence time data 101. The negative example reference date determination unit 119 determines the reference date for the negative example time series data based on the feature amount calculated from the negative example time series data, as described below. The feature amount calculation unit 116 calculates the feature amount from each of the positive example time series data and the negative example time series data, as described below.

特徴選択部１５０は、特徴量重要度算出部１５１と、特徴量累積閾値判定部１５３を含む。特徴量重要度算出部１５１は、後述するように、特徴量の値の増減が、機械学習部１６０で生成されるモデルの予測値にどの程度の影響を与えるかを示す指標を重要度として算出する。特徴量重要度算出部１５１は、例えば、ＬｉｇｈｔＧＢＭとＳＨＡＰ（ＳＨａｐｌｅｙＡｄｄｉｔｉｖｅｅｘＰｌａｎａｔｉｏｎｓ）を組み合わせた構成を採用することができる。 The feature selection unit 150 includes a feature importance calculation unit 151 and a feature accumulation threshold determination unit 153. As described below, the feature importance calculation unit 151 calculates an index indicating the degree to which an increase or decrease in the value of a feature affects the predicted value of the model generated by the machine learning unit 160 as importance. The feature importance calculation unit 151 can employ, for example, a configuration that combines LightGBM and SHAP (Shapley Additive exPlanations).

特徴量重要度算出部１５１では、ＬｉｇｈｔＧＢＭで生成した予測モデルが目的事象の有無（１、０）を推測し、ＳＨＡＰは各特徴量が予測結果に対してどの程度影響を与えたのかを重要度として算出する。 In the feature importance calculation unit 151, the prediction model generated by LightGBM predicts the presence or absence (1, 0) of the target event, and SHAP calculates the importance of the extent to which each feature influenced the prediction result.

特徴量累積閾値判定部１５３は、特徴量重要度算出部１５１が算出した重要度の累積値と所定の閾値Ｔｈ１を用いて学習対象から排除する特徴量を決定して、特徴量の積から目的変数を合成する際の目的変数の組み合わせの数を絞り込む。 The feature accumulation threshold determination unit 153 determines the features to be excluded from the learning target using the cumulative importance value calculated by the feature importance calculation unit 151 and a predetermined threshold Th1, and narrows down the number of combinations of objective variables when synthesizing objective variables from the product of the features.

機械学習部１６０は、特徴選択部１５０で絞り込まれた正例と負例の特徴量を入力データとして機械学習を実施し、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成する。 The machine learning unit 160 performs machine learning using the features of the positive and negative examples narrowed down by the feature selection unit 150 as input data, and generates a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

ストレージ装置４は、目的事象発生時刻データ１０１と、時系列データ１０２と、正例基準日１１５と、負例基準日１２０と、正例特徴量１１８と、負例特徴量１２１と、特徴量算出定義１１７と、第１特徴量リスト１２２と、特徴量重要度１５２と、第２特徴量リスト１５４を格納する。 The storage device 4 stores target event occurrence time data 101, time series data 102, positive example reference date 115, negative example reference date 120, positive example feature amount 118, negative example feature amount 121, feature amount calculation definition 117, first feature amount list 122, feature amount importance 152, and second feature amount list 154.

目的事象発生時刻データ１０１は、時系列データ１０２のうち予め設定された目的事象が発生した日付と、口座の識別子と、口座の残高を図示しない一つのレコードに含む。時系列データ１０２は、口座の識別子毎にと、日付と、残高及び識別子などを図示しない一つのレコードに含む。 The target event occurrence time data 101 includes the date when a preset target event occurred in the time series data 102, the account identifier, and the account balance in one record (not shown). The time series data 102 includes the date, balance, identifier, etc. for each account identifier in one record (not shown).

正例基準日１１５には、正例基準日決定部１１４が出力した口座の識別子と、正例の時系列データの基準日が一つのレコードに格納される。負例基準日１２０には、負例基準日決定部１１９が出力した口座の識別子と、負例の時系列データの基準日が一つのレコードに格納される。なお、各基準日の決定については後述する。 In the positive example reference date 115, the account identifier output by the positive example reference date determination unit 114 and the reference date of the positive example time series data are stored in one record. In the negative example reference date 120, the account identifier output by the negative example reference date determination unit 119 and the reference date of the negative example time series data are stored in one record. The determination of each reference date will be described later.

正例特徴量１１８には、特徴量算出部１１６が算出した正例時系列データ１１２の特徴量と、時系列データ１０２の識別子が一つのレコードに格納される。なお、正例特徴量１１８は、特徴量算出定義１１７で指定された時系列データ１０２の特徴量で構成される。 In the positive example feature 118, the feature of the positive example time series data 112 calculated by the feature calculation unit 116 and the identifier of the time series data 102 are stored in one record. The positive example feature 118 is composed of the feature of the time series data 102 specified in the feature calculation definition 117.

負例特徴量１２１には、特徴量算出部１１６が算出した負例時系列データ１１３の特徴量と、時系列データ１０２の識別子が一つのレコードに格納される。なお、負例特徴量１２１は、特徴量算出定義１１７で指定された時系列データ１０２の特徴量で構成される。 In the negative example feature 121, the feature of the negative example time series data 113 calculated by the feature calculation unit 116 and the identifier of the time series data 102 are stored in one record. Note that the negative example feature 121 is composed of the feature of the time series data 102 specified in the feature calculation definition 117.

特徴量算出定義１１７には、特徴量算出方法ユーザ設定１０３で指定された時系列データ１０２の処理対象期間や、特徴量の種類や算出方法が格納される。本実施例では、時系列データ１０２の特徴量として、例えば、平均や、最大や、最小や、分散や、標準偏差や、最大－最小や、変動係数等の統計量を用いるものとする。 The feature calculation definition 117 stores the processing period of the time series data 102, the type of feature, and the calculation method specified in the feature calculation method user setting 103. In this embodiment, for example, statistics such as the average, maximum, minimum, variance, standard deviation, maximum-minimum, and coefficient of variation are used as the feature of the time series data 102.

第１特徴量リスト１２２には、特徴量算出方法ユーザ設定１０３で指定された学習対象の時系列データ１０２の一覧が含まれる。第１特徴量リスト１２２は、例えば、時系列データ１０２の識別子とタイムスタンプを含む。第１特徴量リスト１２２は、これに限定されるものではなく、算出された正例特徴量１１８及び負例特徴量１２１と、特徴量算出定義１１７で指定された時系列データ１０２の対応関係が識別可能なデータであればよい。 The first feature list 122 includes a list of the time series data 102 to be learned, which is specified in the feature calculation method user setting 103. The first feature list 122 includes, for example, an identifier and a timestamp of the time series data 102. The first feature list 122 is not limited to this, and may be any data that allows the correspondence between the calculated positive example features 118 and negative example features 121 and the time series data 102 specified in the feature calculation definition 117 to be identified.

また、本実施例の第１特徴量リスト１２２は、特徴選択部１５０で絞り込まれる前の正例特徴量１１８と、負例特徴量１２１の一覧が格納される。 In addition, the first feature list 122 in this embodiment stores a list of positive example features 118 and negative example features 121 before being narrowed down by the feature selection unit 150.

特徴量重要度１５２には、特徴量の識別子と、特徴量重要度算出部１５１が算出した特徴量の重要度が一つのレコードに格納される。特徴量の識別子は、正例特徴量１１８と負例特徴量１２１の特徴量の識別子に対応する値が格納される。 In the feature importance 152, the feature identifier and the feature importance calculated by the feature importance calculation unit 151 are stored in one record. The feature identifier stores values corresponding to the feature identifiers of the positive example feature 118 and the negative example feature 121.

第２特徴量リスト１５４には、特徴選択部１５０で絞り込まれた特徴量の一覧が生成される。機械学習部１６０には、第２特徴量リスト１５４に記載されている正例特徴量１１８と負例特徴量１２１が入力される。 In the second feature list 154, a list of features narrowed down by the feature selection unit 150 is generated. The positive example features 118 and negative example features 121 listed in the second feature list 154 are input to the machine learning unit 160.

機械学習部１６０は、例えば、ＡＴ／ＰＲＣ（ＡＩＴｅｃｈｎｏｌｏｇｙ／ＰｒｅｄｉｃｔｉｏｎｏｆＲａｒｅＣａｓｅ）や周知又は公知の機械学習器を採用することができる。 The machine learning unit 160 can employ, for example, AT/PRC (AI Technology/Prediction of Rare Case) or a known or publicly known machine learning device.

入力装置５は、例えば、キーボードやマウスあるいはタッチパネルで構成される。出力装置６は、ディスプレイで構成される。通信装置７は、図示しないネットワークに接続されて、情報の授受を行う。 The input device 5 is, for example, a keyboard, a mouse, or a touch panel. The output device 6 is a display. The communication device 7 is connected to a network (not shown) and transmits and receives information.

図２は、経時データ分析装置１で行われる処理の一例を示す図である。経時データ分析装置１は、入力装置５や通信装置７を介して、特徴量算出方法ユーザ設定１０３を受け付ける。 Figure 2 is a diagram showing an example of processing performed by the longitudinal data analysis device 1. The longitudinal data analysis device 1 accepts feature calculation method user settings 103 via the input device 5 and the communication device 7.

特徴量算出方法ユーザ設定１０３は、例えば、特徴量として使用する統計量の種類や、機械学習の対象とする時系列データ１０２の指定や、目的事象発生時刻データ１０１の指定等を含む。時系列データ１０２の指定は、学習対象の期間（以下、学習対象期間とする）や、口座の属性（業種等）や口座の識別子を含むことができる。 The feature calculation method user settings 103 include, for example, the type of statistics to be used as features, the designation of the time series data 102 to be the subject of machine learning, and the designation of the target event occurrence time data 101. The designation of the time series data 102 can include the period to be learned (hereinafter referred to as the learning period), account attributes (industry type, etc.), and account identifiers.

経時特徴量生成部１１０では、まず、時系列データ分割部１１１が、目的事象発生時刻データ１０１を読み込んで、時系列データ１０２を目的事象が発生した正例時系列データ１１２と、目的事象が発生していない負例時系列データ１１３に分割する。 In the temporal feature generation unit 110, first, the time series data division unit 111 reads the target event occurrence time data 101 and divides the time series data 102 into positive example time series data 112 where the target event has occurred and negative example time series data 113 where the target event has not occurred.

次に、経時特徴量生成部１１０の正例基準日決定部１１４は、目的事象発生時刻データ１０１の口座の識別子と、目的事象の発生時刻（又は日付あるいはタイムスタンプ）を取得して、目的事象の発生日付を正例基準日１１５として出力する。 Next, the positive example reference date determination unit 114 of the temporal feature generation unit 110 acquires the account identifier of the target event occurrence time data 101 and the occurrence time (or date or timestamp) of the target event, and outputs the occurrence date of the target event as the positive example reference date 115.

経時特徴量生成部１１０の負例基準日決定部１１９は、正例基準日１１５と正例特徴量１１８と負例時系列データ１１３と特徴量算出方法ユーザ設定１０３の学習対象期間を取得して、負例基準日１２０を後述するように決定する。 The negative example reference date determination unit 119 of the temporal feature generation unit 110 acquires the positive example reference date 115, the positive example features 118, the negative example time series data 113, and the learning period of the feature calculation method user settings 103, and determines the negative example reference date 120 as described below.

本実施例では、負例基準日１２０を決定する手法として、負例基準日決定部１１９が、負例基準日１２０を正例基準日１１５から所定の単位でスライドさせて学習対象期間の負例時系列データ１１３を抽出し、学習対象期間内の特徴量を特徴量算出部１１６に負例の基準日別特徴量として算出させる。 In this embodiment, the method for determining the negative example reference date 120 is as follows: the negative example reference date determination unit 119 slides the negative example reference date 120 by a predetermined unit from the positive example reference date 115 to extract the negative example time series data 113 for the learning period, and causes the feature calculation unit 116 to calculate the feature amounts within the learning period as the negative example feature amounts for each reference date.

負例基準日決定部１１９は、負例基準日の候補を日単位や週単位あるいは月単位でスライドさせながら予め設定された複数の統計期間毎に特徴量を特徴量算出部１１６に算出させ、基準日をスライドさせた負例基準日の候補毎にクラスタリングを行って負例の基準日別特徴量を算出する。 The negative example base date determination unit 119 causes the feature calculation unit 116 to calculate feature amounts for multiple pre-set statistical periods while sliding the candidates for the negative example base date by day, week, or month, and performs clustering for each candidate for the negative example base date with the base date slid to calculate the feature amounts for each negative example base date.

また、負例基準日決定部１１９は、正例特徴量１１８についても、１以上の正例基準日１１５について前記負例の基準日候補と同様の複数の統計期間で正例時系列データ１１２の特徴量を算出し、正例基準日１１５毎に特徴量のクラスタリングを行って正例の基準日別特徴量を算出する。 The negative example reference date determination unit 119 also calculates the features of the positive example time series data 112 for one or more positive example reference dates 115 over multiple statistical periods similar to the negative example reference date candidates, and calculates the positive example feature values for each reference date by clustering the features for each positive example reference date 115.

そして、負例基準日決定部１１９は、所定の特徴量空間上に、負例の基準日別特徴量と正例の基準日別特徴量を配置して、正例基準日１１５の特徴量に最も近い負例の基準日別特徴量に対応する基準日候補を負例基準日１２０として決定する。 Then, the negative example reference date determination unit 119 arranges the negative example reference date feature quantities and the positive example reference date feature quantities in a specified feature space, and determines the reference date candidate corresponding to the negative example reference date feature quantities that are closest to the positive example reference date 115 feature quantities as the negative example reference date 120.

なお、負例基準日決定部１１９は、各基準日別特徴量を複数の次元で生成して、正例と負例の基準日別特徴量の距離は、例えば、幾何学的距離（例えば、ユークリッド距離など）を用いることができる。また、負例基準日決定部１１９の処理の詳細については後述する。 The negative example reference date determination unit 119 generates each reference date feature in multiple dimensions, and the distance between the reference date feature of the positive example and the negative example can be, for example, a geometric distance (e.g., Euclidean distance, etc.). The processing of the negative example reference date determination unit 119 will be described in detail later.

次に、特徴量算出部１１６は、正例基準日１１５と正例時系列データ１１２を受け付けて、特徴量算出定義１１７に従って正例特徴量１１８を算出し、負例基準日決定部１１９と特徴選択部１５０及び機械学習部１６０へ出力する。 Next, the feature calculation unit 116 accepts the positive example reference date 115 and the positive example time series data 112, calculates the positive example features 118 according to the feature calculation definition 117, and outputs them to the negative example reference date determination unit 119, the feature selection unit 150, and the machine learning unit 160.

また、特徴量算出部１１６は、負例基準日１２０と負例時系列データ１１３を受け付けて、特徴量算出定義１１７に従って負例特徴量１２１を算出し、特徴選択部１５０及び機械学習部１６０へ出力する。 The feature calculation unit 116 also accepts the negative example reference date 120 and the negative example time series data 113, calculates the negative example features 121 according to the feature calculation definition 117, and outputs them to the feature selection unit 150 and the machine learning unit 160.

また、経時特徴量生成部１１０は、正例特徴量１１８と負例特徴量１２１に含まれる正例時系列データ１１２と負例時系列データ１１３のリストを生成して第１特徴量リスト１２２として出力する。 In addition, the temporal feature generation unit 110 generates a list of the positive example time series data 112 and the negative example time series data 113 contained in the positive example features 118 and the negative example features 121, and outputs the list as a first feature list 122.

次に、特徴選択部１５０では、特徴量重要度算出部１５１が、正例特徴量１１８と、負例特徴量１２１と、第１特徴量リスト１２２を受け付けて、各特徴量毎に特徴量重要度１５２を算出する。本実施例では、ＬｉｇｈｔＧＢＭで生成した予測モデルの予測結果に与えた影響が大きい特徴量には、値の大きな重要度が付与される。 Next, in the feature selection unit 150, the feature importance calculation unit 151 receives the positive example features 118, the negative example features 121, and the first feature list 122, and calculates the feature importance 152 for each feature. In this embodiment, a large importance value is assigned to a feature that has a large influence on the prediction result of the prediction model generated by LightGBM.

次に、特徴選択部１５０では、特徴量累積閾値判定部１５３が、特徴量重要度１５２の値が大きい順に第１特徴量リスト１２２をソートする。そして、特徴量累積閾値判定部１５３は、特徴量重要度１５２の値が最大の特徴量重要度から値の累積を行って、累積値が所定の閾値Ｔｈ１に達するまでの特徴量（正例特徴量１１８及び負例特徴量１２１）を学習対象の特徴量として第２特徴量リスト１５４に格納する。また、特徴量累積閾値判定部１５３は、その他の特徴量（累積されていない正例特徴量１１８及び負例特徴量１２１）を第１特徴量リスト１２２から削除する。 Next, in the feature selection unit 150, the feature accumulation threshold determination unit 153 sorts the first feature list 122 in descending order of the feature importance 152 value. Then, the feature accumulation threshold determination unit 153 accumulates values starting from the feature importance with the largest feature importance 152 value, and stores the features (positive example features 118 and negative example features 121) until the accumulated value reaches a predetermined threshold Th1 in the second feature list 154 as features to be learned. In addition, the feature accumulation threshold determination unit 153 deletes other features (non-accumulated positive example features 118 and negative example features 121) from the first feature list 122.

これにより、特徴量累積閾値判定部１５３は、経時特徴量生成部１１０で算出された正例特徴量１１８と負例特徴量１２１のうち、特徴量重要度の大きい特徴量に絞り込みを行って、学習対象の特徴量の数を削減する。なお、閾値Ｔｈ１は、特徴量重要度１５２の累積値の比率や、特徴量の数の比率など予め設定された値を用いることができる。 As a result, the feature accumulation threshold determination unit 153 narrows down the positive example features 118 and negative example features 121 calculated by the temporal feature generation unit 110 to those with high feature importance, thereby reducing the number of features to be learned. Note that the threshold Th1 can be a preset value such as the ratio of the accumulated value of the feature importance 152 or the ratio of the number of features.

また、特徴量累積閾値判定部１５３は生成した第２特徴量リスト１５４を特徴量重要度算出部１５１へ入力して、特徴量重要度１５２を再度生成し、特徴量累積閾値判定部１５３でさらに特徴量の絞り込みを行うループ処理を行う。 The feature amount cumulative threshold determination unit 153 also inputs the generated second feature amount list 154 to the feature amount importance calculation unit 151 to generate the feature amount importance 152 again, and the feature amount cumulative threshold determination unit 153 performs a loop process to further narrow down the features.

このような、特徴量重要度算出部１５１から第２特徴量リスト１５４を生成するまでのループ処理は、特徴量累積閾値判定部１５３が、特徴量重要度１５２の累積値が閾値Ｔｈ１に達した時点で、第１特徴量リスト１２２に残り（未処理）のデータが存在する場合には、残り（未処理）のデータを削除してから再度特徴量重要度１５２の算出を行って、累積値が閾値Ｔｈ１に達した時点で第１特徴量リスト１２２に残りのデータが無くなるまで絞り込みのループを行うことができる。 In this loop process from the feature importance calculation unit 151 to generating the second feature list 154, if there is remaining (unprocessed) data in the first feature list 122 when the feature accumulation threshold determination unit 153 detects that the accumulated value of the feature importance 152 has reached the threshold Th1, the remaining (unprocessed) data is deleted and the feature importance 152 is calculated again, and the narrowing down loop can be performed until there is no remaining data in the first feature list 122 when the accumulated value reaches the threshold Th1.

あるいは、特徴量累積閾値判定部１５３が、第２特徴量リスト１５４の特徴量の数が、所定の閾値Ｔｈ２となるまで繰り返すことができる。所定の閾値Ｔｈ２は、例えば、正例特徴量１１８と負例特徴量１２１の特徴量の数の総和に対する比率（例えば、６０％以下）など、予め設定された値であればよい。 Alternatively, the feature accumulation threshold determination unit 153 can repeat the process until the number of features in the second feature list 154 reaches a predetermined threshold Th2. The predetermined threshold Th2 may be a preset value, such as a ratio (e.g., 60% or less) to the sum of the number of features of the positive example features 118 and the negative example features 121.

以上のように、本実施例の経時データ分析装置１は、正例特徴量１１８の基準日別特徴量に最も近い負例の基準日別特徴量となる基準日を負例基準日１２０として決定するとことで、目的事象が発生していない負例時系列データ１１３の基準日を的確に設定することが可能となる。 As described above, the longitudinal data analysis device 1 of this embodiment determines the reference date that is the negative example reference date feature closest to the positive example feature 118, as the negative example reference date 120, making it possible to accurately set the reference date for the negative example time series data 113 in which no target event has occurred.

換言すれば、経時データ分析装置１は、説明変数の組み合わせが類似する正例の特徴量と負例の特徴量を機械学習部１６０で比較させることで、有意な特徴量で学習を実施することができる。 In other words, the longitudinal data analysis device 1 can perform learning using significant features by having the machine learning unit 160 compare features of positive examples and features of negative examples that have similar combinations of explanatory variables.

そして、経時データ分析装置１は、正例特徴量１１８と負例特徴量１２１の特徴量重要度を算出して、特徴量重要度が最大の値から所定の閾値Ｔｈ１までの特徴量を学習対象とし、その他の特徴量を削除することで機械学習部１６０へ入力する特徴量の数を低減し、かつ、有意な特徴量を機械学習部１６０へ与えることが可能となる。 The longitudinal data analysis device 1 then calculates the feature importance of the positive example features 118 and the negative example features 121, and learns features whose feature importance is from the maximum value to a predetermined threshold value Th1. By deleting other features, it becomes possible to reduce the number of features to be input to the machine learning unit 160 and provide significant features to the machine learning unit 160.

図３は、特徴量重要度累積値グラフ３０１の一例である。特徴量累積閾値判定部１５３は、特徴量重要度の大きい順に第１特徴量リスト１２２をソートして、特徴量重要度の累積値を特徴量重要度累積値として算出し、累積した特徴量の数を特徴数として算出する。 Figure 3 is an example of a feature importance cumulative value graph 301. The feature importance cumulative threshold determination unit 153 sorts the first feature list 122 in descending order of feature importance, calculates the cumulative value of feature importance as the feature importance cumulative value, and calculates the number of accumulated features as the number of features.

図３の特徴量重要度累積値グラフ３０１は、縦軸を特徴量重要度累積値とし、横軸を特徴数とした例を示し、閾値Ｔｈ１は、特徴量重要度累積値の比率（例えば、９０％）とした例を示す。図示の例では、閾値Ｔｈ１を超えた重要度に対応する特徴量が削除され、閾値Ｔｈ１以下の重要度に対応する特徴量が第２特徴量リスト１５４へ格納される。なお、閾値Ｔｈ１は、特徴量重要度累積値に限定されるものではなく、特徴数に対する比率としてもよい。 The feature importance cumulative value graph 301 in Figure 3 shows an example in which the vertical axis represents the feature importance cumulative value and the horizontal axis represents the number of features, and the threshold Th1 is a ratio of the feature importance cumulative value (e.g., 90%). In the example shown, features corresponding to importance exceeding the threshold Th1 are deleted, and features corresponding to importance equal to or less than the threshold Th1 are stored in the second feature list 154. Note that the threshold Th1 is not limited to the feature importance cumulative value, and may be a ratio to the number of features.

図４は、特徴選択部１５０で行われる処理の一例を示すフローチャートである。この処理は、経時特徴量生成部１１０から正例特徴量１１８と負例特徴量１２１及び第１特徴量リスト１２２が出力されてから開始される（４０１）。 Figure 4 is a flowchart showing an example of the processing performed by the feature selection unit 150. This processing starts after the positive example features 118, the negative example features 121, and the first feature list 122 are output from the temporal feature generation unit 110 (401).

まず、特徴量重要度算出部１５１は、経時特徴量生成部１１０から正例特徴量１１８と負例特徴量１２１及び第１特徴量リスト１２２を取得する（４０２）。特徴量重要度算出部１５１は、第１特徴量リスト１２２に記載されている正例特徴量１１８及び負例特徴量１２１の重要度を算出する（４０３）。 First, the feature importance calculation unit 151 acquires the positive example features 118, the negative example features 121, and the first feature list 122 from the temporal feature generation unit 110 (402). The feature importance calculation unit 151 calculates the importance of the positive example features 118 and the negative example features 121 listed in the first feature list 122 (403).

特徴量重要度算出部１５１は、上述したようにＬｉｇｈｔＧＢＭでとＳＨＡＰを組み合わせて、ＬｉｇｈｔＧＢＭで生成した予測モデルに第１特徴量リスト１２２の特徴量を与えて目的事象の有無を予測し、ＳＨＡＰは各特徴量が予測結果に対してどの程度影響を与えたかを重要度として算出する。そして、特徴量重要度算出部１５１は、算出された重要度と特徴量の識別子を特徴量重要度１５２へ格納する。 As described above, the feature importance calculation unit 151 combines LightGBM and SHAP to predict the presence or absence of a target event by applying the features in the first feature list 122 to the prediction model generated by LightGBM, and calculates the importance of the extent to which each feature affected the prediction result. The feature importance calculation unit 151 then stores the calculated importance and the feature identifier in feature importance 152.

次に、特徴量累積閾値判定部１５３の処理に遷移する（４０４）。特徴量累積閾値判定部１５３は、特徴量重要度１５２と第１特徴量リスト１２２を取得して、特徴量重要度１５２の値の降順で第１特徴量リスト１２２をソートする（４０５）。 Next, the process proceeds to the feature amount accumulation threshold determination unit 153 (404). The feature amount accumulation threshold determination unit 153 acquires the feature amount importance 152 and the first feature amount list 122, and sorts the first feature amount list 122 in descending order of the feature amount importance 152 value (405).

次に、特徴量累積閾値判定部１５３は、ステップ４０６～４０９で、第１特徴量リスト１２２の先頭から順に特徴量重要度１５２の値を累積して、累積値が所定の閾値Ｔｈ１に達するまでループ処理を実行する。 Next, in steps 406 to 409, the feature accumulation threshold determination unit 153 accumulates the values of the feature importance 152 in order from the top of the first feature list 122, and executes a loop process until the accumulated value reaches a predetermined threshold Th1.

特徴量累積閾値判定部１５３は、重要度の大きい順にソート済みの第１特徴量リスト１２２の先頭から特徴量重要度１５２の重要度を取得して、順次累積する（４０７）。 The feature accumulation threshold determination unit 153 obtains the importance of the feature importance 152 from the top of the first feature list 122, which has been sorted in descending order of importance, and accumulates them sequentially (407).

特徴量累積閾値判定部１５３は、累積値が所定の閾値Ｔｈ１に達したか否かを判定して（４０８）、閾値Ｔｈ１に達していればループ処理を終了してステップ４１０へ進み、達していなければステップ４０９に進んでループ処理を繰り返す。 The feature accumulation threshold determination unit 153 determines whether the accumulated value has reached a predetermined threshold Th1 (408), and if it has reached the threshold Th1, ends the loop processing and proceeds to step 410, and if it has not reached the threshold Th1, proceeds to step 409 and repeats the loop processing.

次に、特徴量累積閾値判定部１５３は、第１特徴量リスト１２２の特徴量の数に残り（未処理）があるか否かを判定し（４１０）、残りがある場合にはステップ４１１へ進み、残りがない場合にはステップ４１２へ進む。なお、第１特徴量リスト１２２の特徴量の数の残りは、図３に示した削除する特徴量を示し、特徴量重要度累積値が閾値Ｔｈ１を超える部分に相当する。 Next, the feature accumulation threshold determination unit 153 determines whether there are any remaining (unprocessed) features in the first feature list 122 (410), and if there are, proceeds to step 411, and if there are no remaining features, proceeds to step 412. Note that the remaining number of features in the first feature list 122 indicates the features to be deleted shown in FIG. 3, and corresponds to the portion where the feature importance accumulation value exceeds the threshold Th1.

ステップ４１１では、特徴量累積閾値判定部１５３が、第１特徴量リスト１２２の閾値Ｔｈ１を超える部分の特徴量を削除して、第１特徴量リスト１２２を更新する。そして、特徴量累積閾値判定部１５３は、ステップ４０３に戻って上記処理を繰り返す。 In step 411, the feature amount accumulation threshold determination unit 153 deletes the feature amounts that exceed the threshold Th1 in the first feature amount list 122, and updates the first feature amount list 122. Then, the feature amount accumulation threshold determination unit 153 returns to step 403 and repeats the above process.

一方、ステップ４１２では、特徴量重要度累積値が閾値Ｔｈ１以下となって第１特徴量リスト１２２の特徴量の数が削減されたので、第１特徴量リスト１２２の内容（特徴量の識別子）を第２特徴量リスト１５４として出力する。 On the other hand, in step 412, since the cumulative feature importance value is equal to or less than the threshold value Th1 and the number of features in the first feature list 122 is reduced, the contents of the first feature list 122 (feature identifiers) are output as the second feature list 154.

上記処理によって、特徴選択部１５０は、特徴量重要度累積値が閾値Ｔｈ１を超える部分の特徴量が削減され、かつ、重要度の大きい特徴量で構成された第２特徴量リスト１５４を生成して、機械学習部１６０へ入力することが可能となる。 By the above process, the feature selection unit 150 can reduce the features whose cumulative feature importance value exceeds the threshold value Th1, generate a second feature list 154 composed of features with high importance, and input it to the machine learning unit 160.

次に、図５Ａ、図５Ｂを用いて、経時特徴量生成部１１０の負例基準日決定部１１９の処理について説明する。 Next, the processing of the negative example reference date determination unit 119 of the temporal feature generation unit 110 will be described with reference to Figures 5A and 5B.

図５Ａは、経時特徴量生成部１１０の負例基準日決定部１１９で行われる基準日のスライディング処理の一例を示す図である。図５Ｂは、負例基準日決定部１１９で行われる負例基準日の決定処理の一例を示す図である。 Figure 5A is a diagram showing an example of the sliding process of the reference date performed by the negative example reference date determination unit 119 of the temporal feature generation unit 110. Figure 5B is a diagram showing an example of the process of determining the negative example reference date performed by the negative example reference date determination unit 119.

図５Ａは、負例時系列データ１１３として、観測値（例えば、残高）と時間（又は日付）の関係を示す。 Figure 5A shows the relationship between observed values (e.g., balances) and time (or dates) as negative example time series data 113.

負例基準日決定部１１９は、予め設定された日付（例えば、正例基準日１１５）を最初の基準日１として設定して、基準日１から過去１ヶ月、３ヶ月、６ヶ月、１２ヶ月などの予め設定された複数の統計期間を設定する。なお、基準日１を決定する条件は、複数の正例基準日１１５からユーザが特徴量算出方法ユーザ設定１０３で指定してもよいし、その他の条件を用いてもよい。 The negative example reference date determination unit 119 sets a preset date (e.g., positive example reference date 115) as the first reference date 1, and sets multiple preset statistical periods such as the past one month, three months, six months, and twelve months from reference date 1. The conditions for determining reference date 1 may be specified by the user in the feature calculation method user settings 103 from multiple positive example reference dates 115, or other conditions may be used.

そして、負例基準日決定部１１９は、予め設定されたスライド幅（所定の日付間隔）を基準日１に加えた（又は減算した）日付を基準日２として設定し、１ヶ月～１２ヶ月などの予め設定された複数の統計期間を設定する。 Then, the negative example base date determination unit 119 sets the date obtained by adding (or subtracting) a preset sliding width (predetermined date interval) to (or from) base date 1 as base date 2, and sets multiple preset statistical periods, such as 1 month to 12 months.

同様に、負例基準日決定部１１９は、所定のスライド幅でずらした基準日３～基準日Ｎを設定し、上記と同様に複数の統計期間を設定する。負例基準日決定部１１９は、負例時系列データ１１３の全期間を上記統計期間で網羅するように基準日１から基準日Ｎを設定する。 Similarly, the negative example reference date determination unit 119 sets reference dates 3 to N shifted by a predetermined sliding width, and sets multiple statistical periods in the same manner as above. The negative example reference date determination unit 119 sets reference dates 1 to N so that the above statistical periods cover the entire period of the negative example time series data 113.

図示の例では、過去の基準日１から現在へ向けて基準日１～基準日Ｎをずらした例を示したが、これに限定されるものではなく、逆方向であってもよい。また、負例基準日決定部１１９は、複数の統計期間と基準日で一つの負例時系列データの全期間をカバーするように、基準日１～基準日Ｎと複数の統計期間を設定する
次に、負例基準日決定部１１９は、基準日１～基準日Ｎの各統計期間で負例時系列データ１１３の特徴量を特徴量算出部１１６に算出させて、各基準日毎に複数の統計期間の特徴量をクラスタリングして負例の基準日別統計量を算出して基準日１～基準日Ｎに対応付ける。 In the illustrated example, the reference dates 1 to N are shifted from the past reference date 1 to the present, but the present invention is not limited to this and may be shifted in the opposite direction. The negative example reference date determination unit 119 sets the reference dates 1 to N and multiple statistical periods so that the multiple statistical periods and reference dates cover the entire period of one negative example time series data. Next, the negative example reference date determination unit 119 causes the feature calculation unit 116 to calculate the feature amount of the negative example time series data 113 for each statistical period of the reference dates 1 to N, and clusters the feature amounts for the multiple statistical periods for each reference date to calculate the reference date-specific statistics of negative examples and associate them with the reference dates 1 to N.

また、負例基準日決定部１１９は、正例基準日１１５のそれぞれについて予め設定された複数の統計期間を設定して、正例時系列データ１１２の特徴量を特徴量算出部１１６に算出させ、各正例基準日１１５毎に各統計期間の特徴量を集計した正例の基準日別特徴量を算出させる。 The negative example reference date determination unit 119 also sets multiple predefined statistical periods for each positive example reference date 115, causes the feature calculation unit 116 to calculate the feature amounts of the positive example time series data 112, and calculates the feature amounts by reference date of the positive examples by aggregating the feature amounts of each statistical period for each positive example reference date 115.

負例基準日決定部１１９は、負例の基準日別特徴量と、正例の基準日別特徴量を図５Ｂに示す特徴空間６０２に配置して、負例の基準日別特徴量と正例の基準日別特徴量（図中基準日又は正例）の幾何学的距離を算出する。なお、図示の例では、特徴量Ａと特徴量Ｂの２次元空間を示すが、特徴量の次元数に応じた特徴空間を設定すればよい。 The negative example reference date determination unit 119 places the reference date feature amounts of the negative examples and the reference date feature amounts of the positive examples in the feature space 602 shown in FIG. 5B, and calculates the geometric distance between the reference date feature amounts of the negative examples and the reference date feature amounts of the positive examples (reference date or positive example in the figure). Note that in the illustrated example, a two-dimensional space of feature amounts A and B is shown, but a feature space may be set according to the number of dimensions of the feature amounts.

そして、負例基準日決定部１１９は、負例の基準日１～基準日Ｎに対応する基準日別特徴量のうち、正例の基準日別特徴量に最も距離が近い負例の基準日別特徴量を選択し、当該負例の基準日別特徴量に対応する基準日を負例基準日１２０として決定する。 Then, the negative example reference date determination unit 119 selects the reference date feature of the negative example that is closest to the reference date feature of the positive example from among the reference date feature of the negative example corresponding to reference date 1 to reference date N, and determines the reference date corresponding to the reference date feature of the negative example as the negative example reference date 120.

図示の特徴空間６０２では、正例２（正例の基準日２）に対応する正例の基準日別特徴量と、基準日５に対応する負例の基準日別特徴量の幾何学的距離が最も近いため、基準日５が負例基準日１２０として決定される例を示す。 In the illustrated feature space 602, the geometric distance between the reference date feature of the positive example corresponding to positive example 2 (reference date 2 of the positive example) and the reference date feature of the negative example corresponding to reference date 5 is the shortest, so an example is shown in which reference date 5 is determined as the negative example reference date 120.

図６は、負例基準日決定部１１９の構成の一例を示す図である。負例基準日決定部１１９は、負例の基準日候補として基準日１から基準日Ｎまでの複数の基準日を生成する基準日スライド部８０２と、基準日毎の特徴量から基準日別特徴量８０４を算出して、正例の基準日別特徴量に最も近い負例の基準日別特徴量８０４の基準日を負例基準日１２０として決定する特徴量空間最短距離探索部８１０と、を含む。 Figure 6 is a diagram showing an example of the configuration of the negative example reference date determination unit 119. The negative example reference date determination unit 119 includes a reference date sliding unit 802 that generates multiple reference dates from reference date 1 to reference date N as reference date candidates for negative examples, and a feature space shortest distance search unit 810 that calculates reference date feature values 804 from the feature values for each reference date, and determines the reference date of the negative example reference date feature values 804 that is closest to the positive example reference date feature values as the negative example reference date 120.

負例及び正例の基準日の統計期間は、例えば、上述の１ヶ月、３ヶ月、６ヶ月、１２ヶ月など所定の複数の統計期間とする。 The statistical period for the reference date for negative and positive examples is a number of predetermined statistical periods, such as the above-mentioned 1 month, 3 months, 6 months, and 12 months.

負例基準日決定部１１９は、負例時系列データ１１３から特徴量算出方法ユーザ設定１０３で指定された負例時系列データ１１３から一つの負例時系列データ８０１を取得して、上述した所定の条件から基準日１を決定して、基準日スライド部８０２へ基準日１を入力する。 The negative example reference date determination unit 119 acquires one piece of negative example time series data 801 from the negative example time series data 113 specified in the feature calculation method user setting 103, determines reference date 1 from the above-mentioned specified conditions, and inputs reference date 1 to the reference date sliding unit 802.

基準日スライド部８０２は、予め設定されたスライド幅で所定数の基準日２～基準日Ｎを生成する。負例基準日決定部１１９は、生成された基準日１～基準日Ｎについて、それぞれ予め設定された複数の統計期間を設定し、負例の基準日毎に各統計期間の負例時系列データ８０１を特徴量算出部１１６へ入力して負例の特徴量を算出させる。 The reference date sliding unit 802 generates a predetermined number of reference dates 2 to N with a preset sliding width. The negative example reference date determination unit 119 sets multiple preset statistical periods for each of the generated reference dates 1 to N, and inputs the negative example time series data 801 for each statistical period for each negative example reference date to the feature calculation unit 116 to calculate the negative example features.

負例基準日決定部１１９は、特徴量算出部１１６が算出した基準日１～基準日Ｎの負例の特徴量を基準日別特徴量８０４として受け付けて、特徴量空間最短距離探索部８１０へ入力する。 The negative example reference date determination unit 119 accepts the negative example features for reference dates 1 to N calculated by the feature calculation unit 116 as reference date-specific features 804, and inputs them to the feature space shortest distance search unit 810.

特徴量空間最短距離探索部８１０は、正例特徴量１１８と目的事象発生時刻データ１０１を入力として負例の基準日別特徴量８０４と同様に、複数の統計期間で正例の基準日別特徴量を特徴量算出部１１６に算出させる。特徴量空間最短距離探索部８１０は、負例の基準日別特徴量８０４と、上記算出した正例の基準日別特徴量それぞれ特徴空間６０２（図５Ｂ参照）に配置し、各基準日別特徴量間の幾何学的距離を算出する。 The feature space shortest distance search unit 810 receives the positive example features 118 and the target event occurrence time data 101 as input, and causes the feature calculation unit 116 to calculate the reference date features of the positive examples for multiple statistical periods, similar to the reference date features 804 of the negative examples. The feature space shortest distance search unit 810 places the reference date features 804 of the negative examples and the calculated reference date features of the positive examples in the feature space 602 (see FIG. 5B), and calculates the geometric distance between each of the reference date features.

そして、特徴量空間最短距離探索部８１０は、負例の基準日１～基準日Ｎ（８０５）に対応する基準日別特徴量のうち、正例の基準日別特徴量に最も距離が近い負例の基準日別特徴量（負例特徴量８０６）を選択し、当該基準日別特徴量８０４に対応する基準日を負例基準日１２０として決定する。 Then, the feature space shortest distance search unit 810 selects the negative example reference date feature (negative example feature 806) that is closest to the positive example reference date feature from among the negative example reference date features corresponding to reference date 1 to reference date N (805), and determines the reference date corresponding to the reference date feature 804 as the negative example reference date 120.

また、特徴量空間最短距離探索部８１０は、処理対象の負例時系列データ８０１のそれぞれについて、負例基準日１２０と負例特徴量１２１を出力することができる。 The feature space shortest distance search unit 810 can also output the negative example reference date 120 and the negative example feature 121 for each piece of negative example time series data 801 to be processed.

図７は、負例基準日決定部１１９で行われる処理の一例を示すフローチャートである。この処理は、負例基準日決定部１１９が、負例時系列データ１１３と、正例基準日１１５及び正例特徴量１１８を受け付けてから開始される。 Figure 7 is a flowchart showing an example of processing performed by the negative example base date determination unit 119. This processing is started after the negative example base date determination unit 119 receives the negative example time series data 113, the positive example base date 115, and the positive example features 118.

負例基準日決定部１１９は、負例時系列データ１１３の中から一つを選択して負例時系列データ８０１とし、正例基準日１１５を最初の基準日１として決定する（９０１）。そして、負例基準日決定部１１９は、ステップ９０２～９０５のループで、所定のスライド幅ずつ基準日をずらして負例の特徴量を特徴量算出部１１６に算出させる。 The negative example reference date determination unit 119 selects one of the negative example time series data 113 to set it as the negative example time series data 801, and determines the positive example reference date 115 as the initial reference date 1 (901). Then, in a loop of steps 902 to 905, the negative example reference date determination unit 119 shifts the reference date by a predetermined sliding width and causes the feature calculation unit 116 to calculate the feature of the negative example.

負例基準日決定部１１９は、ステップ９０３で、現在の基準日Ｎと、予め設定された複数の統計期間と、負例時系列データ８０１を特徴量算出部１１６へ入力して、負例特徴量を算出させる。 In step 903, the negative example reference date determination unit 119 inputs the current reference date N, multiple pre-set statistical periods, and the negative example time series data 801 to the feature calculation unit 116 to calculate the negative example features.

負例基準日決定部１１９は、ステップ９０４で、複数の統計期間毎の負例特徴量を特徴量算出部１１６から取得して、所定の統計処理（例えば、平均）を行って、基準日別特徴量８０４として記憶する。 In step 904, the negative example reference date determination unit 119 acquires the negative example features for each of the multiple statistical periods from the feature calculation unit 116, performs a predetermined statistical process (e.g., averaging), and stores the results as the reference date feature 804.

次に、負例基準日決定部１１９は、ステップ９０２へ戻って（９０５）、基準日Ｎをスライド幅だけずらして負例時系列データ８０１の終端まで上記処理を繰り返し、基準日１～基準日Ｎの基準日別特徴量８０４をそれぞれ算出する。 Next, the negative example reference date determination unit 119 returns to step 902 (905), shifts the reference date N by the sliding width, and repeats the above process until the end of the negative example time series data 801, thereby calculating the reference date-specific feature quantities 804 for each of reference dates 1 to N.

負例時系列データ８０１の終端に達すると、負例基準日決定部１１９はステップ９０２～９０５のループを終了してステップ９０６に進む。 When the end of the negative example time series data 801 is reached, the negative example reference date determination unit 119 ends the loop of steps 902 to 905 and proceeds to step 906.

ステップ９０６では、負例基準日決定部１１９の特徴量空間最短距離探索部８１０が、上述したように、正例基準日１１５から所定の複数の統計期間で正例時系列データ１１２の特徴量を特徴量算出部１１６に算出させ、各正例基準日１１５毎に統計期間の特徴量をクラスタリングして正例の基準日別特徴量とする。 In step 906, the feature space shortest distance search unit 810 of the negative example reference date determination unit 119 causes the feature calculation unit 116 to calculate the feature of the positive example time series data 112 for a predetermined number of statistical periods from the positive example reference date 115, as described above, and clusters the feature of the statistical period for each positive example reference date 115 to obtain the feature for each positive example reference date.

そして、特徴量空間最短距離探索部８１０、負例の基準日別特徴量８０４と正例の基準日別特徴量を特徴量空間に配置して、各基準日別特徴量間の幾何学的距離を算出する。そして、特徴量空間最短距離探索部８１０は、正例基準日１１５の基準日別特徴量との距離が最も小さい負例の基準日別特徴量に対応する基準日を負例基準日１２０として決定して出力する（９０７）。 Then, the feature space shortest distance search unit 810 places the negative example reference date feature 804 and the positive example reference date feature in the feature space, and calculates the geometric distance between each reference date feature. The feature space shortest distance search unit 810 then determines and outputs the reference date corresponding to the negative example reference date feature that is the smallest distance from the reference date feature of the positive example reference date 115 as the negative example reference date 120 (907).

上記処理によって、経時特徴量生成部１１０は、負例時系列データ１１３から基準日Ｎをずらして複数の負例の基準日別特徴量を算出し、正例時系列データ１１２の基準日別特徴量に幾何学的距離が近いことを指標として特徴量を算出する起点となる負例基準日１２０を決定する。 By the above process, the temporal feature generation unit 110 calculates the feature amounts for each reference date of multiple negative examples by shifting the reference date N from the negative example time series data 113, and determines the negative example reference date 120 that is the starting point for calculating the feature amounts, using the close geometric distance to the reference date feature amounts of the positive example time series data 112 as an indicator.

これにより、経時データ分析装置１は、目的事象が発生していない負例時系列データ１１３において、説明変数の組み合わせが類似する正例時系列データ１１２と負例時系列データ１１３を機械学習部１６０で学習させることで、高精度なリスク推定モデルを提供することが可能となる。 As a result, the longitudinal data analysis device 1 is able to provide a highly accurate risk estimation model by having the machine learning unit 160 learn the positive example time series data 112 and the negative example time series data 113, which have similar combinations of explanatory variables, in the negative example time series data 113 in which the target event does not occur.

以上のように、実施例１の経時データ分析装置１は、負例時系列データ１１３から正例特徴量１１８の基準日別特徴量に近いことを指標として負例基準日決定部１１９を決定し、特徴量の重要度が高い方から累積値を算出して重要度の低い特徴量から徐々に排除する処理を繰り返すことで、重要な特徴量を選別して、機械学習部１６０の学習データを生成する。 As described above, the longitudinal data analysis device 1 of Example 1 determines the negative example reference date determination unit 119 based on the closeness of the reference date features of the positive example features 118 from the negative example time series data 113 as an indicator, and selects important features by repeating the process of calculating cumulative values starting from the most important features and gradually eliminating features with less importance, thereby generating learning data for the machine learning unit 160.

これにより、機械学習部１６０に学習させる特徴量の数を低減しながらも重要度の高い特徴量（第２特徴量リスト１５４）と、正例特徴量１１８の正例基準日別特徴量に近い指標を有する負例基準日１２０によって、計算負荷を抑制しながら精度の高い機械学習モデルを生成させることができる。本実施例の経時データ分析装置１では、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 This makes it possible to reduce the number of features to be learned by the machine learning unit 160, while generating a highly accurate machine learning model with reduced computational load by using highly important features (second feature list 154) and negative example reference dates 120 that have indices close to the positive example reference date features of the positive example features 118. The longitudinal data analysis device 1 of this embodiment makes it possible to generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

図８は、本発明の実施例２を示し、経時データ分析装置１で行われる処理の一例を示す図である。前記実施例１では特徴選択部１５０の内部で重要度を利用する例を示したが、実施例２では、特徴選択部１５０が算出した特徴量の重要度を経時特徴量生成部１１０へフィードバックさせて、経時特徴量生成部１１０が特徴量の重要度に基づいて特徴量算出定義１１７の更新を通知する例を示す。 Figure 8 shows Example 2 of the present invention, and is a diagram showing an example of processing performed by the longitudinal data analysis device 1. In Example 1, an example was shown in which importance was used inside the feature selection unit 150, but in Example 2, an example is shown in which the importance of the feature calculated by the feature selection unit 150 is fed back to the longitudinal feature generation unit 110, and the longitudinal feature generation unit 110 notifies an update of the feature calculation definition 117 based on the importance of the feature.

実施例２の経時データ分析装置１は、前記実施例１の構成に対して、経時特徴量生成部１１０に特徴量算出定義更新部２０１を加え、特徴選択部１５０に最小特徴数判定部２０２と前回出力特徴量リスト２０３を加えて、特徴量重要度算出部１５１が算出した特徴量重要度１５２を経時特徴量生成部１１０の特徴量算出定義更新部２０１へフィードバックするもので、その他の構成は前記実施例１と同様である。 The longitudinal data analysis device 1 of Example 2 adds a feature calculation definition update unit 201 to the longitudinal feature generation unit 110, adds a minimum feature number determination unit 202 and a previously output feature list 203 to the feature selection unit 150, and feeds back the feature importance 152 calculated by the feature importance calculation unit 151 to the feature calculation definition update unit 201 of the longitudinal feature generation unit 110 to the configuration of Example 1. The other configurations are the same as those of Example 1.

特徴量算出定義更新部２０１は、各統計期間の重要度の大きさに偏りがある場合、予め設定された統計期間の変更を通知する。例えば、統計期間が１ヶ月と３ヶ月の重要度が、６ヶ月や１２ヶ月の重要度よりも相対的に大きい場合には、新たに「２ヶ月」と「４ヶ月」を統計期間に追加するように通知する。 If there is a bias in the importance of each statistical period, the feature calculation definition update unit 201 notifies a change to the previously set statistical period. For example, if the importance of statistical periods of 1 month and 3 months is relatively greater than the importance of statistical periods of 6 months and 12 months, a notification is issued to add new statistical periods of "2 months" and "4 months."

換言すれば、特徴量算出定義更新部２０１は、統計期間の数や間隔を変更することで、より大きな重要度を検出することを可能にする。なお、特徴量算出定義更新部２０１は、各統計期間の重要度を出力装置６に表示して、統計期間の数や間隔の変更を促す通知を出力してもよいし、あるいは、重要度の偏りを検出した場合に、経時データ分析装置１の利用者に統計期間の見直しを通知してもよい。 In other words, the feature calculation definition update unit 201 makes it possible to detect greater importance by changing the number of statistical periods and the interval. The feature calculation definition update unit 201 may display the importance of each statistical period on the output device 6 and output a notification encouraging the user to change the number of statistical periods and the interval, or may notify the user of the longitudinal data analysis device 1 to review the statistical periods if a bias in importance is detected.

あるいは、特徴量算出定義更新部２０１が、複数の統計期間で重要度の偏りを検出すると、自動的に統計期間を変更するように特徴量算出定義１１７を更新してもよい。 Alternatively, when the feature calculation definition update unit 201 detects a bias in importance across multiple statistical periods, it may update the feature calculation definition 117 to automatically change the statistical period.

特徴選択部１５０の最小特徴数判定部２０２は、特徴量累積閾値判定部１５３から第２特徴量リスト１５４が出力されると、前回出力特徴量リスト２０３に格納された前回の第２特徴量リスト１５４の特徴量の数（レコード数）と今回の第２特徴量リスト１５４の特徴量の数（レコード数）を比較する。 When the second feature list 154 is output from the feature accumulation threshold determination unit 153, the minimum feature number determination unit 202 of the feature selection unit 150 compares the number of features (number of records) in the previous second feature list 154 stored in the previously output feature list 203 with the number of features (number of records) in the current second feature list 154.

今回の第２特徴量リスト１５４の特徴量の数の方が小さい場合には、最小特徴数判定部２０２は、まだ、特徴量の数を低減する余地があると判定して、特徴量算出定義更新部２０１に特徴量算出定義１１７を更新して、新たな特徴量を算出させるよう指令する。また、最小特徴数判定部２０２は、最新の第２特徴量リスト１５４を前回出力特徴量リスト２０３へ格納しておく。 If the number of features in the current second feature list 154 is smaller, the minimum feature number determination unit 202 determines that there is still room to reduce the number of features, and instructs the feature calculation definition update unit 201 to update the feature calculation definition 117 and calculate new features. The minimum feature number determination unit 202 also stores the latest second feature list 154 in the previous output feature list 203.

図９は、経時データ分析装置１で行われる重要度のフィードバック処理の一例を示す図である。 Figure 9 shows an example of the importance feedback process performed by the longitudinal data analysis device 1.

図示の例では、経時特徴量生成部１１０の特徴量算出定義１１７に、所定の統計期間として１ヶ月、３ヶ月、６ヶ月、１２ヶ月の４つの期間が予め設定されている。また、時系列データ１０２の特徴量を算出する条件として、統計量として平均値が設定されている例を示す。 In the illustrated example, four periods of 1 month, 3 months, 6 months, and 12 months are preset as predetermined statistical periods in the feature calculation definition 117 of the temporal feature generation unit 110. In addition, an example is shown in which the average value is set as a statistical quantity as a condition for calculating the feature quantities of the time-series data 102.

特徴量算出部１１６は、特徴量算出定義１１７の統計期間に従って、正例時系列データ１１２と負例時系列データ１１３を受け付けて、正例特徴量１１８と負例特徴量１２１を特徴量算出部１１６に算出させ、第１特徴量リスト１２２と統計期間１１７１を特徴選択部１５０へ出力する。 The feature calculation unit 116 accepts the positive example time series data 112 and the negative example time series data 113 according to the statistical period of the feature calculation definition 117, causes the feature calculation unit 116 to calculate the positive example features 118 and the negative example features 121, and outputs the first feature list 122 and the statistical period 1171 to the feature selection unit 150.

特徴選択部１５０の特徴量重要度算出部１５１は、正例特徴量１１８と負例特徴量１２１と第１特徴量リスト１２２及び統計期間１１７１を受け付けると、各特徴量の重要度を算出して特徴量重要度１５２として出力する。 When the feature importance calculation unit 151 of the feature selection unit 150 receives the positive example features 118, the negative example features 121, the first feature list 122, and the statistical period 1171, it calculates the importance of each feature and outputs it as feature importance 152.

特徴量重要度１５２は、一つの基準日について複数の統計期間の重要度が格納されている。図示の例では、１ヶ月平均の重要度が０．４、３ヶ月平均の重要度が０．５で、６ヶ月平均及び１２ヶ月平均の重要度が０．１となっている。 Feature importance 152 stores the importance of multiple statistical periods for one reference date. In the example shown, the importance of the one-month average is 0.4, the importance of the three-month average is 0.5, and the importance of the six-month average and the twelve-month average is 0.1.

特徴量重要度算出部１５１から特徴量重要度１５２のフィードバックを受け付けた特徴量算出定義更新部２０１は、１ヶ月平均と３ヶ月平均の重要度が高くなっていることを検出する。 The feature calculation definition update unit 201, which receives feedback on the feature importance 152 from the feature importance calculation unit 151, detects that the importance of the one-month average and three-month average has increased.

特徴量算出定義更新部２０１は、重要度の値が高くなっている統計期間の近傍を細分化し、１ヶ月平均と３ヶ月平均の間の２ヶ月平均と、３ヶ月平均の１ヶ月後の４ヶ月平均を特徴量算出定義１１７に追加して更新する。 The feature calculation definition update unit 201 subdivides the vicinity of the statistical period with a high importance value, and updates the feature calculation definition 117 by adding the two-month average between the one-month average and the three-month average, and the four-month average one month after the three-month average.

特徴量算出部１１６は、更新された特徴量算出定義１１７に基づいて再度正例特徴量１１８と負例特徴量１２１及び第１特徴量リスト１２２を算出して、特徴選択部１５０に出力する。 The feature calculation unit 116 recalculates the positive example features 118, the negative example features 121, and the first feature list 122 based on the updated feature calculation definition 117, and outputs them to the feature selection unit 150.

図１０は、経時特徴量生成部１１０と特徴選択部１５０で行われる処理の一例を示すフローチャートである。この処理は、経時特徴量生成部１１０が正例時系列データ１１２と負例時系列データ１１３と目的事象発生時刻データ１０１と特徴量算出定義１１７を受け付けて開始される（５０１）。 Figure 10 is a flowchart showing an example of processing performed by the temporal feature generation unit 110 and the feature selection unit 150. This processing begins when the temporal feature generation unit 110 receives the positive example time series data 112, the negative example time series data 113, the target event occurrence time data 101, and the feature calculation definition 117 (501).

経時特徴量生成部１１０は、入力された正例時系列データ１１２と負例時系列データ１１３と目的事象発生時刻データ１０１及び特徴量算出定義１１７から、正例特徴量１１８と負例特徴量１２１及び第１特徴量リスト１２２を生成する（５０２）。 The temporal feature generation unit 110 generates positive example features 118, negative example features 121, and a first feature list 122 from the input positive example time series data 112, negative example time series data 113, target event occurrence time data 101, and feature calculation definition 117 (502).

特徴選択部１５０の特徴量重要度算出部１５１は、正例特徴量１１８と負例特徴量１２１から各特徴量の重要度を算出して特徴量重要度１５２として出力する。次に、特徴量累積閾値判定部１５３は、第１特徴量リスト１２２を重要度の値の大きい順にソートして、重要度が上述の閾値Ｔｈ１に達するまでの特徴量を選択して、第２特徴量リスト１５４を生成して出力する（５０３）。 The feature importance calculation unit 151 of the feature selection unit 150 calculates the importance of each feature from the positive example features 118 and the negative example features 121, and outputs it as feature importance 152. Next, the feature accumulation threshold determination unit 153 sorts the first feature list 122 in descending order of importance value, selects features whose importance reaches the above-mentioned threshold Th1, and generates and outputs the second feature list 154 (503).

最小特徴数判定部２０２は、前回出力特徴量リスト２０３に格納された前回の第２特徴量リスト１５４の特徴量の数が、新たな第２特徴量リスト１５４の特徴量の数よりも大きいか否かを判定する（５０４）。 The minimum feature number determination unit 202 determines whether the number of features of the previous second feature list 154 stored in the previous output feature list 203 is greater than the number of features of the new second feature list 154 (504).

最小特徴数判定部２０２は、前回出力特徴量リスト２０３の特徴量の数の方が大きい場合には、まだ、特徴量の数を低減する余地があると判定してステップ５０５に進み、そうでない場合には、ステップ５０６へ進む。 If the number of features in the previously output feature list 203 is greater, the minimum feature number determination unit 202 determines that there is still room to reduce the number of features and proceeds to step 505; otherwise, it proceeds to step 506.

ステップ５０５では、特徴量算出定義更新部２０１が、特徴量重要度１５２に基づいて、上述したように特徴量算出定義１１７を更新し、ステップ５０２へ戻って新たな特徴量を算出し、上記処理を繰り返す。 In step 505, the feature calculation definition update unit 201 updates the feature calculation definition 117 as described above based on the feature importance 152, returns to step 502, calculates new features, and repeats the above process.

一方、ステップ５０６では、最小特徴数判定部２０２が、前回出力特徴量リスト２０３の第２特徴量リスト１５４を結果として出力し、処理を終了する。 On the other hand, in step 506, the minimum feature number determination unit 202 outputs the second feature list 154 of the previously output feature list 203 as a result, and ends the process.

以上のように、実施例２の経時データ分析装置１では、特徴量重要度算出部１５１で算出した重要度を、経時特徴量生成部１１０の特徴量算出定義更新部２０１へフィードバックすることで、新たな特徴量を算出するために特徴量算出定義１１７の更新を示唆することが可能となる。 As described above, in the longitudinal data analysis device 1 of Example 2, the importance calculated by the feature importance calculation unit 151 is fed back to the feature calculation definition update unit 201 of the longitudinal feature generation unit 110, making it possible to suggest updating the feature calculation definition 117 in order to calculate a new feature.

なお、上記では、特徴量算出定義更新部２０１が統計期間を変更する例を示したが、これに限定されるものではなく、統計量の算出方法を変更してもよい。 In the above, an example has been shown in which the feature calculation definition update unit 201 changes the statistical period, but this is not limited to this, and the method of calculating the statistics may also be changed.

図１１は、本発明の実施例３を示し、負例基準日決定部１１９で行われる処理の一例を示す図である。実施例３では、負例基準日決定部１１９が、目的事象（正例基準日１１５）の発生頻度に応じて選択確率を算出し、選択確率に基づいて負例基準日１２０を決定する例を示す。 Fig. 11 illustrates a third embodiment of the present invention, and shows an example of processing performed by the negative example base date determination unit 119. In the third embodiment, the negative example base date determination unit 119 calculates a selection probability according to the occurrence frequency of the target event (positive example base date 115), and determines the negative example base date 120 based on the selection probability.

本実施例では、負例基準日決定部１１９が、負例基準日選択部１１０２と正例基準日頻度分布１１０３を含む例を示す。 In this embodiment, an example is shown in which the negative example base date determination unit 119 includes a negative example base date selection unit 1102 and a positive example base date frequency distribution 1103.

負例基準日決定部１１９は、正例基準日決定部１１４が出力した正例基準日１１５を受け付けて頻度分布を算出し、正例基準日頻度分布１１０３を算出する。負例基準日選択部１１０２は、負例時系列データ１１３から選択した一つの負例時系列データ１１０１を入力として、正例基準日１１５の発生頻度（正例基準日頻度分布１１０３）に応じて選択確率を算出し、選択確率に基づいて負例時系列データ１１０１の負例基準日１１０４を決定する。なお、選択確率は、２項分布やポアソン分布などの周知の手法で近似してもよい。 The negative example base date determination unit 119 receives the positive example base date 115 output by the positive example base date determination unit 114, calculates a frequency distribution, and calculates a positive example base date frequency distribution 1103. The negative example base date selection unit 1102 receives as input one piece of negative example time series data 1101 selected from the negative example time series data 113, calculates a selection probability according to the occurrence frequency of the positive example base date 115 (positive example base date frequency distribution 1103), and determines a negative example base date 1104 for the negative example time series data 1101 based on the selection probability. The selection probability may be approximated by a well-known method such as a binomial distribution or a Poisson distribution.

負例基準日決定部１１９は、決定された負例基準日１１０４をストレージ装置４の負例基準日１２０に書き込む。負例基準日決定部１１９は、処理対象の負例時系列データ１１３についてそれぞれ負例基準日１１０４を算出して負例基準日１２０に格納する。 The negative example base date determination unit 119 writes the determined negative example base date 1104 to the negative example base date 120 of the storage device 4. The negative example base date determination unit 119 calculates the negative example base date 1104 for each of the negative example time series data 113 to be processed and stores it in the negative example base date 120.

上記処理によって、経時データ分析装置１は、正例基準日１１５の発生頻度と同一の確率分布で、負例時系列データ１１３の負例基準日１２０を決定することが可能となり、機械学習部１６０では、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 By the above processing, the longitudinal data analysis device 1 is able to determine the negative example reference date 120 of the negative example time series data 113 with the same probability distribution as the occurrence frequency of the positive example reference date 115, and the machine learning unit 160 is able to generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

図１２～図１４は、本発明の実施例４を示す。本実施例では、目的事象の発生に関連した事象（以下、重要事象）から負例基準日１２０を決定する例を示す。本実施例の負例基準日決定部１１９は、重要事象の発生日から目的事象が発生した正例基準日１１５までの期間を予兆期間とし、正例時系列データ１１２のそれぞれについて予兆期間を算出し、算出された予兆期間の頻度分布などの統計処理の結果に基づいて負例基準日１２０を決定する。 Figures 12 to 14 show a fourth embodiment of the present invention. In this embodiment, an example is shown in which a negative example reference date 120 is determined from an event related to the occurrence of a target event (hereinafter, an important event). The negative example reference date determination unit 119 of this embodiment determines the period from the occurrence date of the important event to the positive example reference date 115 on which the target event occurred as a predictive period, calculates the predictive period for each piece of positive example time series data 112, and determines the negative example reference date 120 based on the results of statistical processing such as the frequency distribution of the calculated predictive periods.

本実施例の重要事象としては、目的事象がデフォルトの場合、例えば、ローン契約実行日や、高額の借入実行日、当座貸越が所定の金額を超過した日など、予め設定された事象である。 In this embodiment, the important events are events that are preset when the target event is a default, such as the execution date of a loan contract, the execution date of a large amount of borrowing, or the date when the overdraft exceeds a specified amount.

経時データ分析装置１は、これらの重要事象が発生した日から正例基準日１１５までの期間を予兆期間として算出し、複数の正例時系列データ１１２のそれぞれについて予兆期間を算出し、これらの正例の予兆期間の頻度分布を算出する。そして、経時データ分析装置１は、正例の予兆期間の頻度分布に基づいて、負例時系列データ１１３の負例基準日１２０と負例の予兆期間を算出する例を示す。 The longitudinal data analysis device 1 calculates the period from the date on which these important events occurred to the positive example reference date 115 as the predictive period, calculates the predictive period for each of the multiple positive example time series data 112, and calculates the frequency distribution of the predictive periods of these positive examples. The longitudinal data analysis device 1 then calculates the negative example reference date 120 and the predictive periods of the negative examples of the negative example time series data 113 based on the frequency distribution of the predictive periods of the positive examples.

図１２は、正例時系列データ１１２から算出した特徴量（図中重要特徴量）と予兆期間の関係を示すグラフである。正例の時系列データ７０１は、正例時系列データ１１２から選択したデータの特徴量と時間の関係を示すグラフである。特徴量としては、例えば、借入残高や当座貸越残高の統計量（例えば、平均や、最大、最小、分散、標準偏差、最大－最小、変動係数等）を用いる例を示す。 Figure 12 is a graph showing the relationship between the feature (important feature in the figure) calculated from the positive example time series data 112 and the warning period. Positive example time series data 701 is a graph showing the relationship between the feature and time of data selected from the positive example time series data 112. As the feature, for example, statistics of the borrowing balance and overdraft balance (e.g., average, maximum, minimum, variance, standard deviation, maximum-minimum, coefficient of variation, etc.) are used.

図示の例では、上述のように目的事象が発生した日を正例基準日１１５とし、目的事象に関連する重要事象が発生した日を重要事象発生日として、正例基準日１１５から重要事象発生日までの期間を予兆期間とする。さらに、本実施例では、正例基準日１１５から過去の所定期間を統計期間として設定する。 In the illustrated example, the date on which the target event occurred as described above is the positive case reference date 115, the date on which an important event related to the target event occurred is the important event occurrence date, and the period from the positive case reference date 115 to the important event occurrence date is the predictive period. Furthermore, in this embodiment, a specified period in the past from the positive case reference date 115 is set as the statistical period.

図示の例では、特徴量が閾値Ｔｈ３を超えた日を、重要事象発生日とする例を示すが、上述のように、ローンの実行日や借入日など日付や時刻が明確な重要事象の発生データが存在する場合には、当該重要事象の発生データを重要事象発生日としてもよい。閾値Ｔｈ３は、例えば、特徴量の最大値の９０％など、予め設定した値や比率を用いることができる。 In the illustrated example, the day on which the feature value exceeds the threshold value Th3 is set as the important event occurrence date, but as described above, if there is important event occurrence data with a clear date and time, such as the execution date of a loan or the borrowing date, the occurrence data of the important event may be set as the important event occurrence date. The threshold value Th3 may be a preset value or ratio, such as 90% of the maximum feature value.

図１３は、負例基準日決定部１１９で行われる処理の一例を示す図である。実施例４の負例基準日決定部１１９は、重要特徴量閾値超過探索部１００２と、予兆期間決定部１００４と、加算部１００７を含む。実施例４の経時データ分析装置１のその他の構成は、前記実施例１又は前記実施例２と同様である。 FIG. 13 is a diagram showing an example of processing performed by the negative example reference date determination unit 119. The negative example reference date determination unit 119 of Example 4 includes an important feature threshold excess search unit 1002, a predictive period determination unit 1004, and an addition unit 1007. The other configurations of the longitudinal data analysis device 1 of Example 4 are the same as those of Example 1 or Example 2.

重要特徴量閾値超過探索部１００２は、負例時系列データ１１３から受け付けたデータを負例時系列データ１００１として、特徴量算出定義１１７で設定された重要事象を判定する項目を参照して、当該項目について負例の特徴量を特徴量算出部１１６に算出させる。 The important feature threshold excess search unit 1002 regards the data received from the negative example time series data 113 as negative example time series data 1001, and refers to the items for determining important events set in the feature calculation definition 117, and causes the feature calculation unit 116 to calculate the negative example feature for the item.

重要特徴量閾値超過探索部１００２は、特徴量算出部１１６が算出した負例の特徴量を負例時系列データ１００１の時系列の過去から現在へ向けて所定の閾値Ｔｈ４と比較し、負例の特徴量が当該閾値Ｔｈ４を初めて超えた日を重要事象発生日１００３として出力する。 The important feature threshold excess search unit 1002 compares the negative example feature calculated by the feature calculation unit 116 with a predetermined threshold Th4 from the past to the present of the time series of the negative example time series data 1001, and outputs the day on which the negative example feature first exceeds the threshold Th4 as the important event occurrence date 1003.

予兆期間決定部１００４は、正例特徴量１１８と正例基準日１１５を入力して、予め設定された閾値Ｔｈ３と比較を行って、正例の重要事象発生日を抽出し、正例基準日１１５と重要事象発生日の期間を予兆期間として算出する。 The warning period determination unit 1004 inputs the positive case feature amount 118 and the positive case reference date 115, compares them with a preset threshold value Th3, extracts the date on which the important event occurred for the positive case, and calculates the period between the positive case reference date 115 and the date on which the important event occurred as the warning period.

そして、予兆期間決定部１００４は、複数の正例特徴量１１８のそれぞれについて予兆期間を算出し、さらに予兆期間の頻度分布を算出して、正例予兆期間頻度分布１００５として保持する。 Then, the predictive period determination unit 1004 calculates the predictive period for each of the multiple positive example features 118, and further calculates the frequency distribution of the predictive periods and stores it as a positive example predictive period frequency distribution 1005.

そして、予兆期間決定部１００４は、正例予兆期間頻度分布１００５の正例の予兆期間の頻度分布に合うように、予兆期間１００６を確率的に決定し、加算部１００７へ出力する。 Then, the predictive period determination unit 1004 probabilistically determines the predictive period 1006 so as to match the frequency distribution of the predictive periods of positive cases in the positive case predictive period frequency distribution 1005, and outputs it to the addition unit 1007.

加算部１００７は、負例の重要事象発生日１００３に予兆期間決定部１００４からの予兆期間１００６を加算して負例基準日１００８を生成する。加算部１００７は、入力された負例時系列データ１００１のそれぞれについて負例基準日１００８を算出して負例基準日１２０に格納する。 The addition unit 1007 adds the warning period 1006 from the warning period determination unit 1004 to the negative example important event occurrence date 1003 to generate the negative example reference date 1008. The addition unit 1007 calculates the negative example reference date 1008 for each of the input negative example time series data 1001 and stores it in the negative example reference date 120.

なお、予兆期間決定部１００４は、正例において重要事象の発生日から目的事象の発生日までの期間の逆数を新たな目的関数としてもよい。 The warning period determination unit 1004 may set the inverse of the period from the occurrence date of the important event to the occurrence date of the target event in the positive case as the new objective function.

図１４は、負例基準日決定部１１９で行われる処理の変形例を示すフローチャートである。図示の例では、経時特徴量生成部１１０が、前記実施例２の図８で示したように、特徴選択部１５０から特徴量重要度１５２のフィードバックを受け付ける場合を示すが、これに限定されるものではない。 Figure 14 is a flowchart showing a modified example of the process performed by the negative example reference date determination unit 119. In the illustrated example, the temporal feature generation unit 110 receives feedback of feature importance 152 from the feature selection unit 150 as shown in Figure 8 of the second embodiment, but is not limited to this.

負例基準日決定部１１９は、指定された正例時系列データ１１２の正例特徴量１１８を受け付けて処理を開始する（Ｓ１３０１）。特徴選択部１５０からフィードバックされた特徴量重要度１５２のうち、所定の閾値Ｔｈ５を超える特徴量重要度１５２があるか否かを判定する（Ｓ１３０２）。特徴量重要度１５２のうち所定の閾値Ｔｈ５を超える特徴量重要度１５２が存在する場合にはステップＳ１３０３へ進み、そうでない場合には処理を終了する。 The negative example reference date determination unit 119 starts processing by accepting the positive example features 118 of the specified positive example time series data 112 (S1301). It is determined whether or not any of the feature importances 152 fed back from the feature selection unit 150 exceeds a predetermined threshold Th5 (S1302). If any of the feature importances 152 exceeds the predetermined threshold Th5, the process proceeds to step S1303; otherwise, the process ends.

ステップＳ１３０３では、負例基準日決定部１１９は受け付けた特徴量について、図１２で示したように、現在の重要特徴量について閾値Ｔｈ３を決定する。閾値Ｔｈ３は、図１２で示したように、重要特徴量の最大値に対する所定の比率で設定することができる。 In step S1303, the negative example reference date determination unit 119 determines a threshold value Th3 for the current important feature amount for the received feature amount, as shown in FIG. 12. The threshold value Th3 can be set at a predetermined ratio to the maximum value of the important feature amount, as shown in FIG. 12.

ステップＳ１３０４～Ｓ１３０７では、負例基準日決定部１１９が受け付けた正例時系列データ１１２のそれぞれについて正例特徴量１１８を重要特徴量として扱って処理を繰り返す。 In steps S1304 to S1307, the negative example reference date determination unit 119 repeats the process for each piece of positive example time series data 112 received, treating the positive example features 118 as important features.

ステップＳ１３０５で、負例基準日決定部１１９が正例特徴量１１８から閾値Ｔｈ３を超える日が存在する場合には、重要事象発生日として取得する。負例基準日決定部１１９は、重要事象発生日を取得した場合には正例基準日１１５を取得して、正例基準日１１５から重要事象発生日までの期間を予兆期間として算出する（１３０６）。 In step S1305, if the negative example reference date determination unit 119 determines from the positive example feature amount 118 that there is a day that exceeds the threshold value Th3, it acquires this as the important event occurrence date. When the negative example reference date determination unit 119 acquires the important event occurrence date, it acquires the positive example reference date 115 and calculates the period from the positive example reference date 115 to the important event occurrence date as the warning period (1306).

負例基準日決定部１１９は、受け付けた正例時系列データ１１２の全てについてステップＳ１３０４～Ｓ１３０７の処理が完了すると、ステップＳ１３０８で、正例の予兆期間の頻度分布を算出し、正例予兆期間頻度分布１００５を生成する。 When the negative example reference date determination unit 119 has completed the processing of steps S1304 to S1307 for all of the received positive example time series data 112, in step S1308, it calculates the frequency distribution of the predictive periods of positive examples and generates the positive example predictive period frequency distribution 1005.

ステップＳ１３０９～Ｓ１３１３では、負例基準日決定部１１９が受け付けた負例時系列データ１１３のそれぞれについて処理を繰り返す。ステップＳ１３１０では、負例基準日決定部１１９が負例特徴量１２１を一つ選択して、重要特徴量（借入残高や当座貸越残高の統計量）が所定の閾値Ｔｈ３を超えた日を重要事象発生日として取得する。 In steps S1309 to S1313, the negative example reference date determination unit 119 repeats the process for each piece of negative example time series data 113 received. In step S1310, the negative example reference date determination unit 119 selects one negative example feature 121 and obtains the day on which the important feature (statistics of the loan balance and overdraft balance) exceeds a predetermined threshold value Th3 as the important event occurrence date.

ステップＳ１３１１では、負例基準日決定部１１９が正例予兆期間頻度分布１００５を参照して、正例の予兆期間の頻度分布に合うように負例での予兆期間を決定する。即ち、予兆期間を確率変数として、前記頻度分布を確率分布と見做し、該分布に従って個々の負例の予兆期間を確率的に選択する。ステップＳ１３１２では、負例基準日決定部１１９が重要事象発生日に負例の予兆期間を加算して負例基準日１２０を算出する。 In step S1311, the negative example reference date determination unit 119 refers to the positive example predictive period frequency distribution 1005 and determines the predictive period of the negative example so as to match the frequency distribution of the predictive period of the positive example. That is, the predictive period is treated as a random variable, the frequency distribution is regarded as a probability distribution, and the predictive period of each negative example is probabilistically selected according to the distribution. In step S1312, the negative example reference date determination unit 119 adds the predictive period of the negative example to the important event occurrence date to calculate the negative example reference date 120.

負例基準日決定部１１９は、受け付けた負例時系列データ１１３の全てについてステップＳ１３０９～Ｓ１３１３の繰り返し処理を実行する。 The negative example reference date determination unit 119 repeats steps S1309 to S1313 for all of the received negative example time series data 113.

上記処理によって、経時データ分析装置１は、正例の特徴量に近いことを指標として、負例時系列データ１１３の負例基準日１２０を決定することが可能となり、機械学習部１６０では、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 By the above process, the longitudinal data analysis device 1 can determine the negative example reference date 120 of the negative example time series data 113 using the proximity to the features of the positive examples as an indicator, and the machine learning unit 160 can generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

＜結び＞
以上のように、上記各実施例は、以下のような構成とすることができる。 <Conclusion>
As described above, each of the above embodiments can be configured as follows.

（１）プロセッサ２とメモリ３を有する計算機（経時データ分析装置１）が、時系列データ（１０２）を受け付けて目的事象の発生を予測する機械学習部（１６０）への入力データとなる特徴量を生成する特徴量生成方法であって、前記計算機が、値とタイムスタンプを含む複数の時系列データ（１０２）を受け付ける時系列データ（１０２）入力ステップと、前記計算機が、前記目的事象が発生したタイムスタンプを含む目的事象発生データ（目的事象発生時刻データ１０１）を受け付ける目的事象発生データ入力ステップと、前記計算機が、前記時系列データ（１０２）の特徴量を算出する内容を定義した特徴量算出定義（１１７）を受け付ける特徴量算出定義入力ステップと、前記計算機が、目的事象発生データ（１０１）を参照して前記時系列データ（１０２）を、正例時系列データ（１１２）と負例時系列データ（１１３）に分割する分割ステップと、前記計算機が、前記正例時系列データ（１１２）における基準日である、正例基準日（１１５）を決定する正例基準日決定ステップと、前記計算機が、前記正例時系列データ（１１２）と前記正例基準日（１１５）の組み合わせから、前記特徴量算出定義（１１７）に基づいて正例特徴量（１１８）を算出する正例特徴量算出ステップと、前記計算機が、前記正例基準日（１１５）、前記正例特徴量（１１８）及び前記負例時系列データ（１１３）を入力として、負例基準日（１２０）を決定する負例基準日決定ステップと、前記計算機が、前記負例時系列データ（１１３）と前記負例基準日（１２０）の組み合わせから、前記特徴量算出定義（１１７）に基づいて負例特徴量（１２１）を算出する負例特徴量算出ステップと、を含むことを特徴とする特徴量生成方法。 (1) A feature generation method in which a computer (time-course data analysis device 1) having a processor 2 and a memory 3 receives time series data (102) and generates features to be input data to a machine learning unit (160) that predicts the occurrence of a target event, the method including: a time series data (102) input step in which the computer receives a plurality of time series data (102) including values and timestamps; a target event occurrence data input step in which the computer receives target event occurrence data (target event occurrence time data 101) including a timestamp at which the target event occurred; a feature calculation definition input step in which the computer receives a feature calculation definition (117) that defines the content for calculating the feature of the time series data (102); and a feature calculation definition input step in which the computer refers to the target event occurrence data (101) and divides the time series data (102) into positive example time series data (112) and negative example time series data (113). a positive example reference date determination step in which the computer determines a positive example reference date (115) that is a reference date in the positive example time series data (112); a positive example feature calculation step in which the computer calculates a positive example feature (118) from a combination of the positive example time series data (112) and the positive example reference date (115) based on the feature calculation definition (117); a negative example reference date determination step in which the computer determines a negative example reference date (120) using the positive example reference date (115), the positive example feature (118), and the negative example time series data (113) as input; and a negative example feature calculation step in which the computer calculates a negative example feature (121) from a combination of the negative example time series data (113) and the negative example reference date (120) based on the feature calculation definition (117).

上記構成により、経時データ分析装置１は、負例時系列データ１１３から正例特徴量１１８の基準日別特徴量に近いことを指標として負例基準日決定部１１９を決定することで、目的事象が発生していない負例時系列データ１１３における基準日を決定することが可能となる。これにより、経時データ分析装置１は、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 With the above configuration, the longitudinal data analysis device 1 can determine the negative example reference date determination unit 119 using the closeness of the reference date feature of the positive example feature 118 from the negative example time series data 113 as an index, thereby making it possible to determine a reference date in the negative example time series data 113 on which no target event has occurred. This enables the longitudinal data analysis device 1 to generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

（２）上記（１）に記載の特徴量生成方法であって、前記計算機が、前記正例特徴量（１１８）と負例特徴量（１２１）のリストを第１特徴量リスト（１２２）として生成し、前記正例特徴量（１１８）及び前記負例特徴量（１２１）と第１特徴量リスト（１２２）を出力する経時特徴量生成ステップと、前記計算機が、前記第１特徴量リスト（１２２）に記載されている正例特徴量（１１８）と負例特徴量（１２１）の特徴量重要度（１５２）を算出する特徴量重要度算出ステップと、前記計算機が、前記特徴量重要度（１５２）の値が大きい順に累積を行って、累積値が所定の閾値Ｔｈ１に達するまでの正例特徴量（１１８）及び負例特徴量（１２１）を学習対象の特徴量として第２特徴量リスト（１５４）に格納する特徴量累積閾値判定ステップと、をさらに含むことを特徴とする特徴量生成方法。 (2) The feature generation method according to (1) above, further comprising: a time-course feature generation step in which the computer generates a list of the positive example features (118) and the negative example features (121) as a first feature list (122) and outputs the positive example features (118), the negative example features (121), and the first feature list (122); a feature importance calculation step in which the computer calculates feature importance (152) of the positive example features (118) and the negative example features (121) listed in the first feature list (122); and a feature accumulation threshold determination step in which the computer accumulates the feature importance (152) in descending order, and stores the positive example features (118) and the negative example features (121) until the accumulated value reaches a predetermined threshold Th1 in a second feature list (154) as features to be learned.

上記構成により、特徴量重要度の値が高い方から累積値を算出して重要度の低い特徴量から徐々に排除する処理を繰り返すことで、重要な特徴量を選別することで、機械学習部１６０に学習させる特徴量の数を低減しながらも重要度の高い特徴量（第２特徴量リスト１５４）で学習を実施することで、高精度な予測するモデルを生成することが可能となる。 With the above configuration, by repeatedly calculating the cumulative value starting from the feature importance value with the highest and gradually eliminating features with lower importance, it is possible to select important features, thereby reducing the number of features to be trained by the machine learning unit 160 while still performing training using features with higher importance (second feature list 154), thereby making it possible to generate a model that can make highly accurate predictions.

（３）上記（１）に記載の特徴量生成方法であって、前記特徴量累積閾値判定ステップは、前記累積値が所定の閾値Ｔｈ１に達した時点で、第１特徴量リスト（１２２）に未処理のデータが存在する場合には、未処理のデータを削除してから再度前記特徴量重要度算出ステップで特徴量重要度（１５２）の算出を行って、前記特徴量重要度（１５２）の累積値が閾値Ｔｈ１に達した時点で第１特徴量リスト（１２２）に未処理のデータが無くなるまで前記特徴量重要度算出ステップと、前記特徴量累積閾値判定部ステップによる絞り込みを繰り返すことを特徴とする特徴量生成方法。 (3) The feature generation method described in (1) above, wherein the feature accumulation threshold determination step is characterized in that, if there is unprocessed data in the first feature list (122) when the accumulated value reaches a predetermined threshold Th1, the unprocessed data is deleted and the feature importance (152) is calculated again in the feature importance calculation step, and the feature importance calculation step and the narrowing down by the feature accumulation threshold determination unit step are repeated until there is no unprocessed data in the first feature list (122) when the accumulated value of the feature importance (152) reaches the threshold Th1.

上記構成により、経時データ分析装置１は、重要な特徴量を選別することで、機械学習部１６０に学習させる特徴量の数を低減しながらも重要度の高い特徴量（第２特徴量リスト１５４）で学習を実施することで、高精度な予測するモデルを生成することが可能となる。 With the above configuration, the longitudinal data analysis device 1 can generate a highly accurate prediction model by selecting important features, thereby reducing the number of features to be learned by the machine learning unit 160, while still performing learning using features with high importance (second feature list 154).

（４）上記（２）に記載の特徴量生成方法であって、前記計算機が、前記算出された前記特徴量重要度（１５２）を入力して、前記特徴量重要度（１５２）の値に応じて前記特徴量算出定義（１１７）を変更する特徴量算出更新ステップを、さらに含むことを特徴とする特徴量生成方法。 (4) The feature generation method according to (2) above, further comprising a feature calculation update step in which the computer inputs the calculated feature importance (152) and changes the feature calculation definition (117) according to the value of the feature importance (152).

上記構成により、経時データ分析装置１では、特徴量重要度算出部１５１で算出した重要度を、経時特徴量生成部１１０の特徴量算出定義更新部２０１へフィードバックすることで、新たな特徴量を算出するために特徴量算出定義１１７の更新を示唆することが可能となる。 With the above configuration, in the longitudinal data analysis device 1, the importance calculated by the feature importance calculation unit 151 is fed back to the feature calculation definition update unit 201 of the longitudinal feature generation unit 110, making it possible to suggest updating the feature calculation definition 117 in order to calculate new features.

（５）上記（１）に記載の特徴量生成方法であって、前記負例基準日決定ステップは、第１基準日を予め設定された基準日に設定し、前記第１基準日から所定の日数間隔でずらして第Ｎ基準日まで複数の基準日を設定する基準スライドステップと、前記第１基準日から前記第Ｎ基準日までのそれぞれについて、予め設定した複数の統計期間を設定する統計期間設定ステップと、前記第１基準日から前記第Ｎ基準日までのそれぞれについて各統計期間で負例時系列データ（１１３）の特徴量を算出して各基準日毎に負例基準日別特徴量（８０４）を算出する負例基準日別特徴量算出ステップと、前記正例基準日（１１５）のそれぞれについて前記複数の各統計期間で正例時系列データ（１１２）の特徴量を算出して各正例基準日（１１５）毎に正例基準日別特徴量を算出する正例基準日別特徴量算出ステップと、前記負例基準日別特徴量と前記正例基準日別特徴量を所定の特徴量空間に配置して、各基準日間の距離を算出して、前記正例基準日別特徴量のうちの何れかと最も距離の近い負例基準日別特徴量の基準日を負例基準日（１２０）として決定する決定ステップと、を含むことを特徴とする特徴量生成方法。 (5) In the feature generation method described in (1) above, the negative example reference date determination step includes a reference sliding step of setting a first reference date as a preset reference date and setting multiple reference dates from the first reference date to an Nth reference date by shifting the multiple reference dates at intervals of a predetermined number of days, a statistical period setting step of setting multiple statistical periods that are preset for each of the first reference date to the Nth reference date, and a negative example time series data setting step of calculating a feature of the negative example time series data (113) for each statistical period for each of the first reference date to the Nth reference date, and calculating a negative example reference date feature (804) for each reference date. A feature generation method comprising: a step of calculating a feature amount for each positive example reference date; a step of calculating a feature amount for each positive example reference date (115) by calculating a feature amount of the positive example time series data (112) for each of the plurality of statistical periods for each of the positive example reference dates (115) to calculate a feature amount for each positive example reference date (115); and a step of arranging the negative example reference date feature amount and the positive example reference date feature amount in a predetermined feature amount space, calculating the distance between each reference date, and determining the reference date of the negative example reference date feature amount that is closest to any of the positive example reference date feature amounts as the negative example reference date (120).

上記構成により、経時データ分析装置１は、負例時系列データ１１３から正例特徴量１１８の基準日別特徴量に近いことを指標として負例基準日決定部１１９を決定することが可能となる。これにより、経時データ分析装置１は、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 The above configuration enables the longitudinal data analysis device 1 to determine the negative example reference date determination unit 119 from the negative example time series data 113 using as an index the proximity of the reference date feature of the positive example feature 118. This enables the longitudinal data analysis device 1 to generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

（６）上記（５）に記載の特徴量生成方法であって、前記負例基準日決定ステップは、前記複数の統計期間が負例時系列データ（１１３）の全期間を網羅するように、前記第１基準日から第Ｎ基準日と前記統計期間を設定することを特徴とする特徴量生成方法。 (6) The feature generation method described in (5) above, wherein the negative example reference date determination step is characterized in that the first reference date to the Nth reference date and the statistical periods are set so that the multiple statistical periods cover the entire period of the negative example time series data (113).

上記構成により、経時データ分析装置１は、負例時系列データ１１３から正例特徴量１１８の基準日別特徴量に近いことを指標として負例基準日決定部１１９を決定することが可能となる。これにより、経時データ分析装置１は、発生頻度の低い目的事象の発生を高
（７）上記（１）に記載の特徴量生成方法であって、前記負例基準日決定ステップは、前記正例時系列データ（１１２）のそれぞれについて目的事象の発生に関連する重要事象の発生日を重要事象発生日（１００３）として取得して、前記重要事象発生日（１００３）から前記正例基準日（１１５）までの期間を予兆期間とし、前記正例時系列データ（１１２）のそれぞれについて正例予兆期間頻度分布を算出し、前記正例予兆期間頻度分布と同一の確率分布となるように、前記負例時系列データ（１１３）の負例基準日（１２０）を決定することを特徴とする特徴量生成方法。 With the above configuration, the longitudinal data analysis device 1 can determine the negative example reference date determination unit 119 using the closeness of the reference date feature of the positive example feature 118 from the negative example time series data 113 as an index. As a result, the longitudinal data analysis device 1 can determine the occurrence of a low frequency target event as a high frequency. (7) A feature generation method according to the above (1), wherein the negative example reference date determination step obtains an occurrence date of an important event related to the occurrence of the target event for each of the positive example time series data (112) as an important event occurrence date (1003), sets a period from the important event occurrence date (1003) to the positive example reference date (115) as a predictive period, calculates a positive example predictive period frequency distribution for each of the positive example time series data (112), and determines a negative example reference date (120) of the negative example time series data (113) so as to have the same probability distribution as the positive example predictive period frequency distribution.

上記構成により、経時データ分析装置１は、正例特徴量１１８の重要事象発生日から目的事象発生時刻までの予兆期間を算出し、目的事象の発生に関連する重要事象発生日を負例時系列データ１１３に設定して、予兆期間を加算することで負例基準日１２０を決定することが可能となる。 With the above configuration, the longitudinal data analysis device 1 can calculate the warning period from the important event occurrence date of the positive example feature 118 to the target event occurrence time, set the important event occurrence date related to the occurrence of the target event in the negative example time series data 113, and determine the negative example reference date 120 by adding the warning period.

（８）上記（１）に記載の特徴量生成方法であって、前記負例基準日決定ステップは、前記正例基準日（１１５）の頻度分布（１００５）を算出し、前記頻度分布（１００５）と同一の確率分布で、負例時系列データ（１１３）のそれぞれについて負例基準日（１２０）を決定することを特徴とする特徴量生成方法。 (8) The feature generation method according to (1) above, wherein the negative example reference date determination step calculates a frequency distribution (1005) of the positive example reference dates (115) and determines a negative example reference date (120) for each piece of negative example time series data (113) with the same probability distribution as the frequency distribution (1005).

上記構成により、経時データ分析装置１は、正例基準日１１５の発生頻度と同一の確率分布で、負例時系列データ１１３の負例基準日１２０を決定することが可能となり、機械学習部１６０では、発生頻度の低い目的事象の発生を高精度に予測するモデルを生成することが可能となる。 The above configuration enables the longitudinal data analysis device 1 to determine the negative example reference date 120 of the negative example time series data 113 with the same probability distribution as the occurrence frequency of the positive example reference date 115, and the machine learning unit 160 to generate a model that predicts with high accuracy the occurrence of a target event that occurs infrequently.

（９）上記（５）に記載の特徴量生成方法であって、前記計算機が、前記特徴量重要度（１５２）を受け付けて前記特徴量算出定義（１１７）を更新する特徴量算出定義（１１７）更新ステップを、さらに含み、前記特徴量重要度（１５２）算出ステップは、前記複数の異なる統計期間毎の前記特徴量から前記特徴量重要度（１５２）を算出し、前記特徴量算出定義（１１７）更新ステップは、前記複数の異なる統計期間毎の前記特徴量重要度（１５２）を受け付けて、前記特徴量重要度（１５２）が他の統計期間よりも大きい統計期間が存在する場合には、新たな統計期間の追加を通知する。 (9) The feature generation method described in (5) above, further including a feature calculation definition (117) update step in which the computer receives the feature importance (152) and updates the feature calculation definition (117), the feature importance (152) calculation step calculates the feature importance (152) from the feature for each of the multiple different statistical periods, and the feature calculation definition (117) update step receives the feature importance (152) for each of the multiple different statistical periods, and notifies the addition of a new statistical period if there is a statistical period in which the feature importance (152) is greater than the other statistical periods.

上記構成により、経時データ分析装置１は、特徴量重要度算出部１５１で算出した重要度を、経時特徴量生成部１１０の特徴量算出定義更新部２０１へフィードバックすることで、新たな特徴量を算出するために特徴量算出定義１１７の更新を示唆することが可能となる。 With the above configuration, the longitudinal data analysis device 1 can suggest updating the feature calculation definition 117 to calculate new features by feeding back the importance calculated by the feature importance calculation unit 151 to the feature calculation definition update unit 201 of the longitudinal feature generation unit 110.

（１０）上記（７）に記載の特徴量生成手法であって、前記負例基準日決定ステップは、前記正例時系列データ（１１２）のそれぞれについて目的事象の発生に関連する重要事象の発生日を重要事象発生日（１００３）として取得して、前記重要事象発生日（１００３）から前記正例基準日（１１５）までの期間を予兆期間として算出し、前記正例時系列データ（１１２）のそれぞれについて正例予兆期間頻度分布を算出し、前記正例予兆期間頻度分布（１００５）から予兆期間を決定するステップと、前記負例時系列データ（１１３）のそれぞれについて特徴量を算出し、当該特徴量から特徴量重要度（１５２）を算出し、前記特徴量重要度（１５２）を値の大きい順に累積を行って、累積値が所定の閾値Ｔｈ１に達するまでの負例特徴量（１２１）を算出するステップと、前記負例時系列データ（１１３）の時系列の過去から現在へ向けて特徴量が所定の閾値Ｔｈ４を初めて超えた日を重要事象発生日（１００３）として算出するステップと、前記正例予兆期間頻度分布（１００５）から算出した予兆期間を前記重要事象発生日（１００３）に加算して負例基準日（１２０）を算出するステップと、を含むことを特徴とする特徴量生成方法。 (10) The feature generation method described in (7) above, wherein the negative example reference date determination step includes the steps of: acquiring an occurrence date of an important event related to the occurrence of a target event for each of the positive example time series data (112) as an important event occurrence date (1003); calculating a period from the important event occurrence date (1003) to the positive example reference date (115) as a predictive period; calculating a frequency distribution of a positive example predictive period for each of the positive example time series data (112); and determining a predictive period from the positive example predictive period frequency distribution (1005); and calculating features for each of the negative example time series data (113); A feature generation method comprising the steps of: calculating feature importance (152) from the feature; accumulating the feature importance (152) in descending order of value; and calculating negative example feature (121) until the accumulated value reaches a predetermined threshold value Th1; calculating the day on which the feature first exceeds a predetermined threshold value Th4 from the past to the present in the time series of the negative example time series data (113) as the important event occurrence date (1003); and adding the predictive period calculated from the positive example predictive period frequency distribution (1005) to the important event occurrence date (1003) to calculate the negative example reference date (120).

上記構成により、経時データ分析装置１は、正例特徴量１１８の重要事象発生日から目的事象発生時刻までの予兆期間を算出し、目的事象の発生に関連する重要事象発生日を負例特徴量１２１から算出し、負例の重要事象発生日に予兆期間を加算することで負例基準日１２０を決定することが可能となる。 With the above configuration, the longitudinal data analysis device 1 can calculate the warning period from the important event occurrence date of the positive example feature 118 to the time of occurrence of the target event, calculate the important event occurrence date related to the occurrence of the target event from the negative example feature 121, and determine the negative example reference date 120 by adding the warning period to the important event occurrence date of the negative example.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に記載したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加、削除、又は置換のいずれもが、単独で、又は組み合わせても適用可能である。 The present invention is not limited to the above-described embodiments, but includes various modified examples. For example, the above-described embodiments are described in detail to clearly explain the present invention, and are not necessarily limited to those having all of the configurations described. It is also possible to replace part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. In addition, the addition, deletion, or replacement of part of the configuration of each embodiment with other configurations can be applied alone or in combination.

また、上記の各構成、機能、処理部、及び処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、及び機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記録装置、又は、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。 The above configurations, functions, processing units, and processing means may be realized in part or in whole in hardware, for example by designing them as integrated circuits. The above configurations and functions may be realized in software by a processor interpreting and executing a program that realizes each function. Information on the programs, tables, files, etc. that realize each function may be stored in a memory, a recording device such as a hard disk or SSD (Solid State Drive), or a recording medium such as an IC card, SD card, or DVD.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 In addition, the control lines and information lines shown are those considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. In reality, it can be assumed that almost all components are interconnected.

１経時データ分析装置
２プロセッサ
３メモリ
４ストレージ装置
１０１目的事象発生時刻データ
１０２時系列データ
１０３特徴量算出方法ユーザ設定
１１０経時特徴量生成部
１１１時系列データ分割部
１１２正例時系列データ
１１３負例時系列データ
１１４正例基準日決定部
１１５正例基準日
１１６特徴量算出部
１１７特徴量算出定義
１１８正例特徴量
１１９負例基準日決定部
１２０負例基準日
１２１負例特徴量
１２２第１特徴量リスト
１５０特徴選択部
１５１特徴量重要度算出部
１５２特徴量重要度
１５３特徴量累積閾値判定部
１５４第２特徴量リスト
１６０機械学習部 1 Longitudinal data analysis device 2 Processor 3 Memory 4 Storage device 101 Target event occurrence time data 102 Time series data 103 Feature calculation method user setting 110 Longitudinal feature generation unit 111 Time series data division unit 112 Positive example time series data 113 Negative example time series data 114 Positive example reference date determination unit 115 Positive example reference date 116 Feature calculation unit 117 Feature calculation definition 118 Positive example feature 119 Negative example reference date determination unit 120 Negative example reference date 121 Negative example feature 122 First feature list 150 Feature selection unit 151 Feature importance calculation unit 152 Feature importance 153 Feature accumulation threshold determination unit 154 Second feature list 160 Machine learning unit

Claims

A feature generation method in which a computer having a processor and a memory receives time-series data and generates features to be input data to a machine learning unit that predicts an occurrence of a target event, comprising the steps of:
A time series data input step in which the computer receives a plurality of time series data including values and timestamps;
a target event occurrence data input step in which the computer receives target event occurrence data including a timestamp of when the target event occurs;
a feature calculation definition input step of the computer receiving a feature calculation definition that defines details of calculating the feature of the time series data;
a division step in which the computer divides the time series data into positive example time series data and negative example time series data by referring to target event occurrence data;
a positive example reference date determination step in which the computer determines a positive example reference date, which is a reference date in the positive example time series data;
a positive example feature calculation step of calculating a positive example feature from a combination of the positive example time-series data and the positive example reference date based on the feature calculation definition;
a negative example reference date determination step in which the computer determines a negative example reference date using the positive example reference date, the positive example feature amount, and the negative example time-series data as input;
a negative example feature calculation step of calculating negative example features from a combination of the negative example time-series data and the negative example reference date based on the feature calculation definition by the computer.

2. The feature generating method according to claim 1,
a time-course feature generating step of generating a list of the positive example features and the negative example features as a first feature list and outputting the positive example features, the negative example features, and the first feature list by the computer;
a feature importance calculation step in which the computer calculates feature importance of the positive example feature and the negative example feature listed in the first feature list;
the computer accumulates the feature importance values in descending order, and stores the positive example features and negative example features until the accumulated value reaches a predetermined threshold Th1 in a second feature list as features to be learned.

3. The feature generating method according to claim 2,
The feature amount accumulation threshold determination step includes:
a feature generation method characterized in that, if unprocessed data is present in the first feature list when the cumulative value reaches a predetermined threshold Th1, the unprocessed data is deleted and then the feature importance is calculated again in the feature importance calculation step, and the feature importance calculation step and the narrowing down step are repeated until there is no unprocessed data in the first feature list when the cumulative value of the feature importance reaches a predetermined threshold Th1.

3. The feature generating method according to claim 2,
a feature calculation update step in which the computer inputs the calculated feature importance and changes the feature calculation definition in accordance with the value of the feature importance.

2. The feature generating method according to claim 1,
The negative example reference date determination step includes:
a reference sliding step of setting a first reference date as a preset reference date and setting a plurality of reference dates from the first reference date to an Nth reference date at intervals of a predetermined number of days;
a statistical period setting step of setting a plurality of preset statistical periods for each of the first reference date and the Nth reference date;
a negative example reference date feature amount calculation step of calculating a feature amount of negative example time series data for each statistical period from the first reference date to the Nth reference date, and calculating a negative example reference date feature amount for each reference date;
a positive case reference date feature value calculation step of calculating a feature value of positive case time series data for each of the plurality of statistical periods for each of the positive case reference dates, and calculating a positive case reference date feature value for each of the positive case reference dates;
and determining, as the negative example reference date, a reference date for which the negative example reference date feature is closest to any one of the positive example reference date features, by arranging the negative example reference date feature and the positive example reference date feature in a predetermined feature space, and calculating a distance between each reference date.

The feature generating method according to claim 5,
The negative example reference date determination step includes:
a first reference date to an Nth reference date and the statistical periods are set so that the statistical periods cover an entire period of negative example time series data.

2. The feature generating method according to claim 1,
The negative example reference date determination step includes:
a date of occurrence of an important event related to the occurrence of a target event for each of the positive example time series data is obtained as an important event occurrence date, a period from the important event occurrence date to the positive example reference date is set as a predictive period, a frequency distribution of the positive example predictive period is calculated for each of the positive example time series data, and a negative example reference date for the negative example time series data is determined so as to have the same probability distribution as the frequency distribution of the positive example predictive period.

2. The feature generating method according to claim 1,
The negative example reference date determination step includes:
a frequency distribution of the positive example reference date is calculated, and a negative example reference date is determined for each of the negative example time series data with the same probability distribution as the frequency distribution.

The feature generating method according to claim 5,
a time-course feature generating step of generating a list of the positive example features and the negative example features as a first feature list and outputting the positive example features, the negative example features, and the first feature list by the computer;
a feature importance calculation step in which the computer calculates feature importance of the positive example feature and the negative example feature listed in the first feature list;
a feature accumulation threshold determination step in which the computer accumulates the feature importance values in descending order, and stores the positive example feature values and negative example feature values until the accumulated value reaches a predetermined threshold Th1 in a second feature list as feature values to be learned;
The computer further includes a feature quantity calculation definition updating step of updating the feature quantity calculation definition by inputting the calculated feature quantity importance,
The feature importance calculation step includes:
calculating the feature importance from the feature for each of the plurality of different statistical periods;
The feature amount calculation definition update step includes:
a feature generation method comprising: receiving the feature importance for each of the plurality of different statistical periods; and, if there is a statistical period in which the feature importance is greater than the other statistical periods, notifying the addition of a new statistical period.

The feature generation method according to claim 7,
The negative example reference date determination step includes:
acquiring an occurrence date of an important event related to the occurrence of a target event for each of the positive case time series data as an important event occurrence date, calculating a period from the important event occurrence date to the positive case reference date as a predictive period, calculating a frequency distribution of the positive case predictive period for each of the positive case time series data, and determining a predictive period from the positive case predictive period frequency distribution;
calculating a feature amount for each of the negative example time-series data, calculating a feature amount importance from the feature amount, and accumulating the feature amount importance in descending order of value to calculate negative example feature amounts until an accumulated value reaches a predetermined threshold value Th1;
calculating, as an important event occurrence date, a date on which a feature amount from the past to the present of the time series of the negative example time series data first exceeds a predetermined threshold value Th4;
and calculating a negative example reference date by adding the predictive period calculated from the frequency distribution of the positive example predictive period to the important event occurrence date.

A feature generation device including a processor and a memory, which receives time-series data and generates features to be input data to a machine learning unit that predicts an occurrence of a target event,
a time-series feature generation unit that receives a plurality of time-series data including values and timestamps, target event occurrence data including a timestamp when the target event occurs, and a feature calculation definition that defines content for calculating features of the time-series data, and outputs positive example features, negative example features, and a first feature list from the time-series data;
a feature selection unit that receives the positive example feature, the negative example feature, and the first feature list, and generates a second feature list that specifies the positive example feature and the negative example feature of a learning target,
The temporal feature generation unit
a time series data division unit that divides the time series data into positive example time series data and negative example time series data by referring to the target event occurrence data;
A positive case reference date determination unit that determines a positive case reference date, which is a reference date in the positive case time series data;
a feature amount calculation unit that calculates a positive example feature amount based on the feature amount calculation definition from a combination of the positive example time series data and the positive example reference date;
a negative example reference date determination unit that determines a negative example reference date using the positive example reference date, the positive example feature amount, and the negative example time-series data as inputs,
The feature amount calculation unit
a feature generating device for calculating negative example features based on the feature calculation definition from a combination of the negative example time-series data and the negative example reference date.

The feature generating device according to claim 11,
The temporal feature generation unit
generating a list of the positive example features and the negative example features as a first feature list, and outputting the positive example features, the negative example features, and the first feature list;
The feature selection unit :
a feature importance calculation unit that calculates feature importance of the positive example feature and the negative example feature listed in the first feature list;
and a feature accumulation threshold determination unit that accumulates the feature importance values in descending order, and stores the positive example features and the negative example features until the accumulated value reaches a predetermined threshold Th1 in a second feature list as features to be learned.

The feature generating device according to claim 12,
The feature amount accumulation threshold determination unit
a feature generation device characterized in that, if unprocessed data exists in the first feature list when the cumulative value reaches a predetermined threshold Th1, the unprocessed data is deleted and the feature importance calculation unit calculates the feature importance again, and the narrowing down is repeated by the feature importance calculation unit and the feature accumulation threshold determination unit until there is no unprocessed data in the first feature list when the cumulative value of the feature importance reaches the threshold Th1.

The feature generating device according to claim 12,
a feature calculation definition update unit that receives the calculated feature importance and changes the feature calculation definition in accordance with the value of the feature importance.

The feature generating device according to claim 11,
The negative example reference date determination unit
a feature generation device comprising: a first reference date set as a predetermined reference date; a plurality of reference dates set from the first reference date to an Nth reference date at intervals of a predetermined number of days; a plurality of predetermined statistical periods set for each of the periods from the first reference date to the Nth reference date; a feature amount of negative example time-series data for each statistical period set for each of the periods from the first reference date to the Nth reference date to calculate a negative example reference date feature amount for each reference date; a feature amount of positive example time-series data for each of the plurality of statistical periods set for each of the positive example reference dates to calculate a positive example reference date feature amount for each positive example reference date;

The feature generating device according to claim 15,
The negative example reference date determination unit
a first reference date to an N-th reference date and the statistical periods are set so that the plurality of statistical periods covers an entire period of the negative example time series data.

The feature generating device according to claim 11,
The negative example reference date determination unit
a date of occurrence of an important event related to the occurrence of a target event for each of the positive example time series data is obtained as an important event occurrence date, a period from the important event occurrence date to the positive example reference date is set as a predictive period, a frequency distribution of the positive example predictive period is calculated for each of the positive example time series data, and a negative example reference date for the negative example time series data is determined so as to have the same probability distribution as the frequency distribution of the positive example predictive period.

The feature generating device according to claim 11,
The negative example reference date determination unit
a frequency distribution of the positive example reference date is calculated, and a negative example reference date is determined for each of the negative example time series data with the same probability distribution as the frequency distribution.

The feature generating device according to claim 15,
The temporal feature generation unit
generating a list of the positive example features and the negative example features as a first feature list, and outputting the positive example features, the negative example features, and the first feature list;
The feature selection unit:
a feature importance calculation unit that calculates feature importance of the positive example feature and the negative example feature listed in the first feature list;
a feature accumulation threshold determination unit that accumulates the feature importance values in descending order and stores the positive example feature values and the negative example feature values in a second feature list until the accumulated value reaches a predetermined threshold Th1 as feature values to be learned,
the feature generation device further includes a feature calculation definition update unit that updates the feature calculation definition by inputting the calculated feature importance,
The feature importance calculation unit
calculating the feature importance from the feature for each of the plurality of different statistical periods;
The feature amount calculation definition update unit is
and receiving the feature importance for each of the plurality of different statistical periods, and if there is a statistical period in which the feature importance is greater than the other statistical periods, notifying the addition of a new statistical period.

The feature generation method according to claim 17,
The negative example reference date determination unit
a predictive period determination unit that obtains an occurrence date of an important event related to the occurrence of a target event for each of the positive case time series data as an important event occurrence date, calculates a period from the important event occurrence date to the positive case reference date as a predictive period, calculates a positive case predictive period frequency distribution for each of the positive case time series data, and determines a predictive period from the positive case predictive period frequency distribution;
an important feature search unit that calculates a feature for each of the negative example time-series data, calculates a feature importance from the feature, accumulates the feature importance in descending order of value, calculates negative example features until an accumulated value reaches a predetermined threshold Th1, calculates a day on which a feature exceeds a predetermined threshold Th4 for the first time from the past to the present in the time series of the negative example time-series data as an important event occurrence date, and calculates a negative example reference date by adding a predictive period calculated from the frequency distribution of a predictive period of a positive example to the important event occurrence date.