JP7354844B2

JP7354844B2 - Impact determination program, device, and method

Info

Publication number: JP7354844B2
Application number: JP2020001670A
Authority: JP
Inventors: 大明松本; 雄介大木
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-01-08
Filing date: 2020-01-08
Publication date: 2023-10-03
Anticipated expiration: 2040-01-08
Also published as: JP2021111060A

Description

開示の技術は、影響判定技術に関する。 The disclosed technology relates to an impact determination technology.

教師あり機械学習により学習されたモデルに時系列データを入力し、入力した時系列データの時点よりも後の時点の状態等を推定（推論と言い換えることもできる）することが行われている。人事や金融領域における推定では、推定結果に対する解釈性が求められる場合がある。例えば、過去の勤怠データを入力し、将来的な休職の可能性を推定するような場合、入力した勤怠データのうち、どのデータの影響を強く受けたか、すなわち、休職の可能性あり又はなしと推定した理由の提示が求められる場合がある。 Time-series data is input into a model learned by supervised machine learning, and the state etc. at a time later than the input time-series data is estimated (which can also be called inference). Estimation in the human resources and financial fields may require interpretability of estimation results. For example, when inputting past attendance data and estimating the possibility of future leave of absence, it is important to know which of the inputted attendance data was most affected by the possibility of leave of absence or not. You may be required to provide the reason for your assumption.

推定結果の解釈に関する技術として、ＬＩＭＥ（Local Interpretable Model-agnostic Explanations）という技術が提案されている。ＬＩＭＥでは、時系列データに関する学習済みモデルにおいて、評価対象のデータ周辺の学習データを用いて学習済みモデルを局所的に近似した重回帰モデルを生成する。そして、重回帰モデルを示す回帰方程式の各説明変数に対応する偏回帰係数の大小に基づいて、より推定に影響を与えた説明変数を特定する。 A technique called LIME (Local Interpretable Model-agnostic Explanations) has been proposed as a technique for interpreting estimation results. In LIME, a multiple regression model is generated by locally approximating a learned model regarding time series data using learning data around the data to be evaluated. Then, based on the magnitude of the partial regression coefficient corresponding to each explanatory variable of the regression equation representing the multiple regression model, explanatory variables that have more influence on the estimation are identified.

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, ""Why Should I Trust You?" Explaining the Predictions of Any Classifier", arXiv:1602.04938v3 [cs.LG] 9 Aug 2016.Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, ""Why Should I Trust You?" Explaining the Predictions of Any Classifier", arXiv:1602.04938v3 [cs.LG] 9 Aug 2016.

しかしながら、従来技術では、データの時系列の特徴を捉えることができないため、推定結果の解釈性が低下するという問題がある。 However, in the conventional technology, the time-series characteristics of the data cannot be captured, so there is a problem that the interpretability of the estimation results deteriorates.

一つの側面として、開示の技術は、時系列データを用いた推定において、より推定結果に影響を与えた時系列データの部分を判定することを目的とする。 One aspect of the disclosed technique is to determine, in estimation using time-series data, a portion of the time-series data that has more influence on the estimation result.

一つの態様として、開示の技術は、時系列データに基づいて推定結果を出力する機械学習モデルを近似した重回帰モデルの各項に、前記時系列データの各時点のデータを時系列順に対応させ入力する。この際、開示の技術は、前記各時点のデータのそれぞれに対して、前記各時点のデータのそれぞれより後の時点のデータに対応する前記重回帰モデルの項の偏回帰係数を０として、重回帰値を算出する。また、開示の技術は、前記各時点のデータのそれぞれについて算出された前記重回帰値の変化に応じて、前記時系列データが示す期間のうち、特定の条件を満たす期間を特定する。そして、開示の技術は、特定された前記特定の条件を満たす期間に関する情報を、前記推定結果に影響を与えた要因として出力する。 As one aspect, the disclosed technology associates data at each time point of the time series data in chronological order with each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data. input. At this time, the disclosed technology sets the partial regression coefficient of the term of the multiple regression model corresponding to data at a time later than each of the data at each time to 0, and Calculate the regression value. Further, the disclosed technique identifies a period that satisfies a specific condition among the periods indicated by the time series data, according to changes in the multiple regression values calculated for each of the data at each point in time. Then, the disclosed technology outputs information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result.

一つの側面として、時系列データを用いた推定において、より推定結果に影響を与えた時系列データの部分を判定することができる、という効果を有する。 One aspect is that in estimation using time-series data, it is possible to determine which part of the time-series data has more influence on the estimation result.

本実施形態に係る推定システムの機能的な概略構成を示すブロック図である。FIG. 1 is a block diagram showing a functional schematic configuration of an estimation system according to the present embodiment. 学習用時系列データを説明するための図である。FIG. 3 is a diagram for explaining learning time series data. 時系列データを用いた推定を説明するための図である。FIG. 3 is a diagram for explaining estimation using time series data. 学習モデルを概略的に示す図である。FIG. 2 is a diagram schematically showing a learning model. 推定結果として出力されるリストの一例を示す図である。It is a figure which shows an example of the list output as an estimation result. 重回帰モデルの生成を説明するための図である。FIG. 3 is a diagram for explaining generation of a multiple regression model. 既存手法による推定結果の解釈の問題点を説明するための図である。FIG. 3 is a diagram for explaining problems in interpreting estimation results using existing methods. 本実施形態における影響度の算出を説明するための図である。FIG. 3 is a diagram for explaining calculation of influence degree in this embodiment. 重要期間の特定を説明するための図である。FIG. 3 is a diagram for explaining identification of an important period. 重要期間の特定を説明するための図である。FIG. 3 is a diagram for explaining identification of an important period. 推定理由の出力例を示す図である。It is a figure which shows the example of an output of an estimation reason. 推定理由の他の出力例を示す図である。It is a figure which shows another output example of an estimation reason. 本実施形態に係る影響判定装置として機能するコンピュータの概略構成を示すブロック図である。FIG. 1 is a block diagram showing a schematic configuration of a computer that functions as an influence determination device according to the present embodiment. 本実施形態における影響判定処理の一例を示すフローチャートである。7 is a flowchart illustrating an example of influence determination processing in the present embodiment.

以下、図面を参照して、開示の技術に係る実施形態の一例を説明する。以下の実施形態では、従業員の勤怠データに基づいて、数か月先のメンタル不調による休職に繋がる療養欠勤の発生を推定する推定システムに開示の技術の影響判定装置を適用した場合について説明する。 Hereinafter, an example of an embodiment according to the disclosed technology will be described with reference to the drawings. In the following embodiment, a case will be described in which the impact determination device of the disclosed technology is applied to an estimation system that estimates the occurrence of medical absences that will lead to absence from work due to mental illness several months in the future based on employee attendance data. .

図１に示すように、本実施形態に係る推定システム１００は、影響判定装置１０と、学習推定装置３０とを含む。 As shown in FIG. 1, an estimation system 100 according to the present embodiment includes an influence determination device 10 and a learning estimation device 30.

学習推定装置３０は、機能的には、図１に示すように、学習部３１と、推定部３２とを含む。また、学習推定装置３０の所定の記憶領域には、学習モデル４０が記憶される。 Functionally, the learning estimation device 30 includes a learning section 31 and an estimation section 32, as shown in FIG. Further, a learning model 40 is stored in a predetermined storage area of the learning estimation device 30.

学習部３１は、学習用時系列データを受け付ける。学習用時系列データは、図２に示すように、各従業員の勤怠データに基づいて、日付毎に、残業、早退、遅刻等の有無、遅刻か休暇か等の特徴を抽出した所定期間分のデータである。図２の例では、残業、早退、遅刻、休暇、及び出勤の各項目について、日付毎のブロックで各特徴量を表しており、網掛のブロックが、各項目に該当することを表している。なお、学習データとして用いる特徴は上記の例に限定されず、出張の有無、残業時間の長さ等、他の特徴を用いてもよい。また、学習用時系列データの各々には、推定結果の正解が対応付けられている。 The learning unit 31 receives learning time series data. As shown in Figure 2, the learning time-series data is based on the attendance data of each employee for a predetermined period of time, and features such as overtime work, early leaving, tardiness, etc., and whether they are late or on vacation are extracted for each date. This is the data. In the example of FIG. 2, each feature amount is represented by a block for each date for each item of overtime, leaving early, being late, vacation, and attendance, and the shaded block represents that it corresponds to each item. Note that the characteristics used as the learning data are not limited to the above example, and other characteristics such as whether or not the person has a business trip or the length of overtime work may be used. Further, each of the learning time series data is associated with a correct estimation result.

例えば、図３に示すように、参照期間（例えば、１８０日）の勤怠データに基づいて、参照期間以降の推定期間（例えば、９０日）内にメンタルの問題に基づく休職に繋がる療養欠勤が発生するか否かを推定するとする。この場合、学習用時系列データは、参照期間分の時系列データであり、推定結果の正解は、推定期間における療養欠勤の有無である。 For example, as shown in Figure 3, based on attendance data for a reference period (e.g. 180 days), medical absences that lead to leave due to mental problems occur within an estimated period (e.g. 90 days) after the reference period. Let us estimate whether or not. In this case, the learning time series data is time series data for the reference period, and the correct estimation result is the presence or absence of medical absence during the estimation period.

学習部３１は、受け付けた学習用時系列データのうち、療養欠勤ありの正解が対応付けられているデータを正例の学習データ、療養欠勤なしの正解が対応付けられているデータを負例の学習データとして、既存の手法を用いて学習モデル４０を学習する。図４に、学習モデル４０を概略的に示す。図４において、「１」は正例の学習データ、「０」は負例の学習データ、破線は学習モデル４０の決定境界を示す。学習モデル４０は、推定期間（上記の例では９０日）に療養欠勤が発生する確度を出力する。 Of the received learning time series data, the learning unit 31 uses data associated with the correct answer ``with medical absence'' as positive example learning data, and data associated with the correct answer ``no medical absence'' as negative example learning data. As learning data, a learning model 40 is trained using an existing method. FIG. 4 schematically shows the learning model 40. In FIG. 4, "1" indicates learning data for positive examples, "0" indicates learning data for negative examples, and broken lines indicate decision boundaries of the learning model 40. The learning model 40 outputs the probability that medical absence will occur during the estimated period (90 days in the above example).

推定部３２は、推定用時系列データを受け付ける。推定用時系列データは、図２に概念的に示す学習用時系列データと同様のデータ構造で、推定結果の正解が未知のデータである。推定部３２は、学習部３１により学習された学習モデル４０に推定用時系列データを入力することにより、推定結果として、療養欠勤が発生する確度を得る。例えば、推定部３２は、図３に示すように、毎月１回（例えば、毎月１日）に、各従業員の直前の１８０日分の勤怠データを推定用時系列データとして学習モデル４０に入力することで、直後の９０日以内に療養欠勤が発生する確度を推定する。 The estimation unit 32 receives estimation time series data. The time series data for estimation has a data structure similar to the time series data for learning conceptually shown in FIG. 2, and the correct answer of the estimation result is unknown. The estimation unit 32 inputs the estimation time series data into the learning model 40 learned by the learning unit 31, thereby obtaining the probability that medical absence will occur as an estimation result. For example, as shown in FIG. 3, the estimation unit 32 inputs the last 180 days of attendance data of each employee into the learning model 40 as estimation time series data once a month (for example, on the 1st of every month). By doing so, the probability that medical absence will occur within the immediately following 90 days is estimated.

なお、図３の例では、推定１の段階では、療養欠勤が発生する確度が低く、推定２～４の段階では、療養欠勤が発生する確度が高く推定できていれば、推定成功である。 In the example of FIG. 3, at the estimation stage 1, the probability that medical absence will occur is low, and at the estimation stages 2 to 4, if the probability that medical absence will occur is high, the estimation is successful.

推定部３２は、例えば、図５に示すように、療養欠勤が発生する確度が高い順に該当の従業員をリストにした推定結果を出力する。図５の例では、推定結果を示すリストには、該当の従業員の「従業員番号」、推定用時系列データの期間に相当する「参照期間」、「推定期間」、療養欠勤が発生する「確度」、「理由提示」等の項目が含まれている。「理由提示」欄には、各従業員についての推定結果の解釈、すなわち推定の理由の提示を指示するための理由提示ボタンが表示される。 For example, as shown in FIG. 5, the estimating unit 32 outputs an estimation result in which employees are listed in descending order of probability of occurrence of medical absence. In the example in Figure 5, the list showing the estimation results includes the "employee number" of the relevant employee, the "reference period" corresponding to the period of time series data for estimation, the "estimated period", and the occurrence of medical absences. Items such as "accuracy" and "reason presentation" are included. The "reason presentation" column displays a reason presentation button for instructing interpretation of the estimation results for each employee, that is, presentation of the reason for the estimation.

影響判定装置１０は、機能的には、図１に示すように、重回帰モデル学習部１１と、算出部１２と、特定部１３と、出力部１４とを含む。また、影響判定装置１０の所定の記憶領域には、重回帰モデル２０が記憶される。 The influence determination device 10 functionally includes a multiple regression model learning section 11, a calculation section 12, a specifying section 13, and an output section 14, as shown in FIG. Further, a multiple regression model 20 is stored in a predetermined storage area of the influence determination device 10.

重回帰モデル学習部１１は、図６に示すように、学習モデル４０に対応する特徴空間において、該当の従業員についての推定用時系列データの周辺の学習データを用いて、学習モデル４０の決定境界を局所的に近似した重回帰モデル２０を生成する。図６において、「１」は正例の学習データ、「０」は負例の学習データ、「ａ」は対象の推定用時系列データ、破線は学習モデル４０の決定境界、一点鎖線は重回帰モデル２０の決定境界を示す。この重回帰モデル２０は、下記（１）式で表される。
ｙ＝α_１ｘ_１＋α_２ｘ_２＋・・・α_ｎｘ_ｎ＋β ・・・（１） As shown in FIG. 6, the multiple regression model learning unit 11 determines the learning model 40 using learning data surrounding the estimation time series data for the relevant employee in the feature space corresponding to the learning model 40. A multiple regression model 20 that locally approximates the boundary is generated. In FIG. 6, "1" is the learning data of positive examples, "0" is the learning data of negative examples, "a" is the target time series data for estimation, the broken line is the decision boundary of the learning model 40, and the dashed line is multiple regression. The decision boundary of model 20 is shown. This multiple regression model 20 is expressed by the following equation (1).
y= _α1x1 + _α2x2 + _... _αnxn +β... ₍ ₁ )

（１）式において、ｙは療養欠勤が発生する確度、ｘ_ｉはｉ番目の説明変数であり、時系列データの先頭からｉ番目の日付に相当し、ｎは時系列データに含まれるデータ数、すなわち日数である。また、α_ｉは説明変数ｘ_ｉについての偏回帰係数、βは切片である。 In equation (1), y is the probability that medical absence will occur, x _i is the i-th explanatory variable, which corresponds to the i-th date from the beginning of the time-series data, and n is the number of data included in the time-series data. , that is, the number of days. Further, α _i is a partial regression coefficient for the explanatory variable x _i , and β is an intercept.

ここで、上述したＬＩＭＥの技術を用いた推定結果の解釈では、図７に示すように、偏回帰係数α_ｉを、説明変数ｘ_ｉ、すなわちｉ番目の日付の勤怠データが推定結果に与えた影響の度合いを示す影響度とする。そして、影響度が閾値以上の日付の勤怠データが、より推定結果に影響を与えた勤怠データとして提示される。この場合、同じ特徴量の日付（説明変数）は同じ影響度が算出され、日付単体の影響度しかわからない。例えば、人事部門や健康管理担当部門等の担当者は、図５に示すような推定結果でリストアップされた従業員との面談等の対策を行う。この際、担当者は、影響度がどのように時間と共に変化しているか、影響度の大きい期間はどこなのか等、推定結果に影響を与えた時系列データの特徴を把握したい場合がある。 Here, in the interpretation of the estimation results using _the above-mentioned LIME technology, as shown in FIG. 7, the partial regression coefficient _α The degree of influence indicates the degree of influence. Then, the attendance data of dates whose degree of influence is equal to or higher than the threshold value is presented as attendance data that has more influenced the estimation result. In this case, the same degree of influence is calculated for dates (explanatory variables) with the same feature amount, and only the degree of influence of the date alone is known. For example, a person in charge of the human resources department, health management department, etc. takes measures such as interviews with the employees listed in the estimation results as shown in FIG. At this time, the person in charge may want to understand the characteristics of the time-series data that influenced the estimation results, such as how the degree of influence changes over time and which period has the greatest degree of influence.

そこで、本実施形態に係る影響判定装置１０では、時系列データの特徴を捉えた指標に基づいて、推定結果に影響を与えた時系列データの部分を判定する。以下、算出部１２、特定部１３、及び出力部１４の各々について詳述する。 Therefore, the influence determination device 10 according to the present embodiment determines the portion of the time series data that influenced the estimation result based on an index that captures the characteristics of the time series data. Hereinafter, each of the calculation section 12, the identification section 13, and the output section 14 will be explained in detail.

算出部１２は、重回帰モデル学習部１１により生成された重回帰モデル２０において、推定用時系列データに含まれる各データに対応する日付より後の日付に対応する項の偏回帰係数を０とした場合の重回帰値を、日付（説明変数）毎の影響度として算出する。 In the multiple regression model 20 generated by the multiple regression model learning unit 11, the calculation unit 12 sets the partial regression coefficient of the term corresponding to the date after the date corresponding to each data included in the estimation time series data to 0. In this case, the multiple regression value is calculated as the degree of influence for each date (explanatory variable).

具体的には、算出部１２は、以下に示すように、ｉ番目の日付（説明変数）についての影響度ｙ_ｉを算出する。
ｙ_０＝０＋０＋・・・０＋β
ｙ_１＝α_１ｘ_１＋０＋・・・０＋β
ｙ_２＝α_１ｘ_１＋α_２ｘ_２＋０＋・・・０＋β
・・・
ｙ_ｎ＝α_１ｘ_１＋α_２ｘ_２＋・・・α_ｎｘ_ｎ＋β Specifically, the calculation unit 12 calculates the degree of influence y _i for the i-th date (explanatory variable) as shown below.
y ₀ =0+0+...0+β
y ₁ =α ₁ x ₁ +0+...0+β
y ₂ =α ₁ x ₁ +α ₂ x ₂ +0+...0+β
...
y _n = α ₁ x ₁ + α ₂ x ₂ +... α _n x _n + β

これにより、図８に示すように、時系列順に各日付の偏回帰係数が積算された影響度が算出される。このように各日付の偏回帰係数を時系列順に積算して影響度を算出することで、図９の上段の図に示すように、時系列順の影響度を示す波形の傾きから、影響度が高くなっている期間を理解することができる。 As a result, as shown in FIG. 8, the degree of influence is calculated by integrating the partial regression coefficients of each date in chronological order. In this way, by integrating the partial regression coefficients of each date in chronological order to calculate the degree of influence, as shown in the upper diagram of Figure 9, the degree of influence be able to understand periods when the value is high.

算出部１２は、算出した日付（説明変数ｘ_ｉ）毎の影響度ｙ_ｉを特定部１３へ受け渡す。 The calculation unit 12 passes the calculated degree of influence y _i for each date (explanatory variable x _i ) to the identification unit 13 .

特定部１３は、影響度ｙ_ｉの時系列における変化に応じて、推定用時系列データが示す期間のうち、より推定結果への影響が大きい期間である重要期間を特定する。特定部１３は、影響度ｙ_ｉが、時系列順に継続的に増加する期間を重要期間として特定することができる。より具体的には、特定部１３は、影響度ｙ_ｉの傾きを日付（説明変数ｘ_ｉ）毎に算出し、予め定めた閾値以上の傾きの日付（説明変数ｘ_ｉ）が所定数連続する期間を重要期間として特定することができる。 The specifying unit 13 specifies an important period, which is a period that has a greater influence on the estimation result, among the periods indicated by the estimation time series data, according to the change in the degree of influence y _i in the time series. The identifying unit 13 can identify a period in which the degree of influence y _i continuously increases in chronological order as an important period. More specifically, the identifying unit 13 calculates the slope of the influence degree y _i for each date (explanatory variable x _i ), and determines that a predetermined number of consecutive dates (explanatory variable x _i ) have a slope equal to or greater than a predetermined threshold. Periods can be identified as critical periods.

例えば、特定部１３は、説明変数ｘ_ｉについて、周辺の説明変数、例えば、ｘ_ｉ－２，ｘ_ｉ－１，ｘ_ｉ＋１，ｘ_ｉ＋２の影響度を用いて、説明変数ｘ_ｉの影響度の傾きを算出する。特定部１３は、例えば、下記（２）式により、説明変数ｘ_ｉについての傾きｂ_ｉを算出することができる。
For example, the specifying unit 13 determines the degree of influence of the explanatory variable x _i using the degree of influence of surrounding explanatory variables, for example, x _i-2 , x _i-1 , x _i+1 , x _i+2 _. Calculate the slope. The specifying unit 13 can calculate the slope b _i for the explanatory variable x _i using the following equation (2), for example.

（２）式において、ｘ^－（式中では、ｘの上に「バー」）は、説明変数ｘ_ｉ、及びその周辺の説明変数の平均、ｙ^－（式中では、ｙの上に「バー」）は、説明変数ｘ_ｉ、及びその周辺の説明変数の各々の影響度の平均である。図９の中段に、時系列順の傾きｂ_ｉの波形を示す。 In formula (2), x ^- (in the formula, a "bar" above x) is the average of the explanatory variable x _i and its surrounding explanatory variables, and y ^- (in the formula, a "bar" above y '') is the average degree of influence of the explanatory variable x _i and each of its surrounding explanatory variables. The middle part of FIG. 9 shows the waveform of the slope b _i in chronological order.

特定部１３は、図９の下段の図及び図１０に示すように、所定の閾値ｔｈ（例えば、１．５）以上の影響度の傾きｂ_ｉが所定数以上連続している期間を重要期間として特定する。推定用時系列データが示す期間内に重要期間に該当する期間が複数存在する場合、特定部１３は、それらの全ての期間を重要期間として特定する。この場合、特定部１３は、各重要期間内の傾きｂ_ｉの平均値が高い順に重要期間をランク付けしてもよい。特定部１３は、特定した重要期間を出力部１４に通知する。 As shown in the lower diagram of FIG. 9 and FIG. 10, the identification unit 13 defines a period in which the slope b _i of the degree of influence is equal to or higher than a predetermined threshold value th (for example, 1.5) for a predetermined number or more consecutive periods as an important period. Specify as. If there are a plurality of periods that correspond to important periods within the period indicated by the estimation time series data, the specifying unit 13 specifies all of the periods as important periods. In this case, the identifying unit 13 may rank the important periods in descending order of the average value of the slope b _i within each important period. The identifying unit 13 notifies the output unit 14 of the identified important period.

出力部１４は、学習推定装置３０から出力された、例えば図５に示すような推定結果において選択された理由提示ボタンに対応する従業員について、特定部１３により特定された重要期間を推定理由として出力する。また、出力部１４は、推定用時系列データの重要期間に含まれるデータ、及び重要期間に含まれるデータの集計結果の少なくとも一方を、推定結果に影響を与えた要因として、推定理由に含めて出力してもよい。図１１に、推定理由の出力例を示す。図１１の例では、１つの重要期間が特定されており、重要期間に含まれるデータの集計結果を重要期間と共に出力した例を示している。 The output unit 14 outputs the important period specified by the identification unit 13 as the estimation reason for the employee corresponding to the reason presentation button selected in the estimation result as shown in FIG. 5, for example, output from the learning estimation device 30. Output. In addition, the output unit 14 includes at least one of the data included in the important period of the estimation time series data and the aggregation result of the data included in the important period as a factor that influenced the estimation result in the estimation reason. You can also output it. FIG. 11 shows an output example of the estimated reason. In the example of FIG. 11, one important period is specified, and an example is shown in which aggregation results of data included in the important period are output together with the important period.

図１２に、推定理由の他の出力例を示す。図１２の例では、出力部１４は、重要期間と共に、推定用時系列データが示す期間のうち、重要期間のデータと、重要期間以外の期間のデータとを比較した結果を、推定結果に影響を与えた要因として出力している。 FIG. 12 shows another output example of the presumed reason. In the example of FIG. 12, the output unit 14 outputs the result of comparing the data of the important period and the data of the period other than the important period among the periods indicated by the estimation time-series data, as well as the important period. It is output as a factor that gave.

影響判定装置１０は、例えば図１３に示すコンピュータ５０で実現することができる。コンピュータ５０は、ＣＰＵ（Central Processing Unit）５１と、一時記憶領域としてのメモリ５２と、不揮発性の記憶部５３とを備える。また、コンピュータ５０は、入力部、表示部等の入出力装置５４と、記憶媒体５９に対するデータの読み込み及び書き込みを制御するＲ／Ｗ（Read/Write）部５５とを備える。また、コンピュータ５０は、インターネット等のネットワークに接続される通信Ｉ／Ｆ（Interface）５６を備える。ＣＰＵ５１、メモリ５２、記憶部５３、入出力装置５４、Ｒ／Ｗ部５５、及び通信Ｉ／Ｆ５６は、バス５７を介して互いに接続される。 The influence determination device 10 can be realized, for example, by a computer 50 shown in FIG. 13. The computer 50 includes a CPU (Central Processing Unit) 51, a memory 52 as a temporary storage area, and a nonvolatile storage section 53. The computer 50 also includes an input/output device 54 such as an input section and a display section, and an R/W (Read/Write) section 55 that controls reading and writing of data to and from a storage medium 59. The computer 50 also includes a communication I/F (Interface) 56 connected to a network such as the Internet. The CPU 51, memory 52, storage section 53, input/output device 54, R/W section 55, and communication I/F 56 are connected to each other via a bus 57.

記憶部５３は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等によって実現できる。記憶媒体としての記憶部５３には、コンピュータ５０を、影響判定装置１０として機能させるための影響判定プログラム６０が記憶される。影響判定プログラム６０は、重回帰モデル学習プロセス６１と、算出プロセス６２と、特定プロセス６３と、出力プロセス６４とを有する。 The storage unit 53 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. An impact determination program 60 for causing the computer 50 to function as the impact determination device 10 is stored in the storage unit 53 as a storage medium. The influence determination program 60 includes a multiple regression model learning process 61 , a calculation process 62 , a specific process 63 , and an output process 64 .

ＣＰＵ５１は、影響判定プログラム６０を記憶部５３から読み出してメモリ５２に展開し、影響判定プログラム６０が有するプロセスを順次実行する。ＣＰＵ５１は、重回帰モデル学習プロセス６１を実行することで、図１に示す重回帰モデル学習部１１として動作する。また、ＣＰＵ５１は、算出プロセス６２を実行することで、図１に示す算出部１２として動作する。また、ＣＰＵ５１は、特定プロセス６３を実行することで、図１に示す特定部１３として動作する。また、ＣＰＵ５１は、出力プロセス６４を実行することで、図１に示す出力部１４として動作する。また、ＣＰＵ５１は、生成された重回帰モデル２０をメモリ５２に展開する。これにより、影響判定プログラム６０を実行したコンピュータ５０が、影響判定装置１０として機能することになる。なお、プログラムを実行するＣＰＵ５１はハードウェアである。 The CPU 51 reads the impact determination program 60 from the storage unit 53, expands it into the memory 52, and sequentially executes the processes included in the impact determination program 60. The CPU 51 operates as the multiple regression model learning section 11 shown in FIG. 1 by executing the multiple regression model learning process 61. Further, the CPU 51 operates as the calculation unit 12 shown in FIG. 1 by executing the calculation process 62. Further, the CPU 51 operates as the specifying unit 13 shown in FIG. 1 by executing the specifying process 63. Further, the CPU 51 operates as the output unit 14 shown in FIG. 1 by executing the output process 64. Further, the CPU 51 develops the generated multiple regression model 20 in the memory 52. Thereby, the computer 50 that has executed the influence determination program 60 functions as the influence determination apparatus 10. Note that the CPU 51 that executes the program is hardware.

なお、影響判定プログラム６０により実現される機能は、例えば半導体集積回路、より詳しくはＡＳＩＣ（Application Specific Integrated Circuit）等で実現することも可能である。 Note that the functions realized by the influence determination program 60 can also be realized, for example, by a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit).

学習推定装置３０も、影響判定装置１０と同様に、ＣＰＵ、メモリ、記憶部、入出力装置、Ｒ／Ｗ部、通信Ｉ／Ｆ等を含むコンピュータで実現することができるため、詳細な説明を省略する。 Like the influence determination device 10, the learning estimation device 30 can also be realized by a computer including a CPU, memory, storage section, input/output device, R/W section, communication I/F, etc., so a detailed explanation will be omitted. Omitted.

次に、本実施形態に係る推定システム１００の作用について説明する。まず、学習推定装置３０に学習用時系列データが入力されると、学習部３１が、入力された学習用時系列データを受け付ける。そして、学習部３１が、参照期間分（例えば、１８０日分）の時系列データに基づいて、それ以降の推定期間（例えば、９０日）内に療養欠勤が発生する確度を推定するための学習モデル４０を生成する。学習部３１は、生成した学習モデル４０を所定の記憶領域に記憶する。 Next, the operation of the estimation system 100 according to this embodiment will be explained. First, when learning time series data is input to the learning estimation device 30, the learning section 31 receives the input learning time series data. Then, the learning unit 31 performs learning to estimate the probability that medical absence will occur within a subsequent estimation period (for example, 90 days) based on the time series data for the reference period (for example, 180 days). A model 40 is generated. The learning unit 31 stores the generated learning model 40 in a predetermined storage area.

そして、学習推定装置３０に推定用時系列データが入力されると、推定部３２が、学習モデル４０に推定用時系列データを入力することにより、推定結果として、療養欠勤が発生する確度を得る。推定部３２は、例えば、図５に示すように、療養欠勤が発生する確度が高い順に該当の従業員をリストにした推定結果を出力する。 Then, when the estimation time series data is input to the learning estimation device 30, the estimation unit 32 inputs the estimation time series data to the learning model 40, thereby obtaining the probability that medical absence will occur as an estimation result. . For example, as shown in FIG. 5, the estimating unit 32 outputs an estimation result in which employees are listed in descending order of probability of occurrence of medical absence.

出力された推定結果を示すリストが、例えば、人事部門や健康管理担当部門等の担当者が使用する情報処理装置の表示部に表示されると、影響判定装置１０において、図１４に示す影響判定処理が実行される。なお、影響判定処理は、開示の技術の影響判定方法の一例である。 When a list indicating the output estimation results is displayed on the display section of an information processing device used by a person in charge of the human resources department, health management department, etc., the impact determination device 10 performs the impact determination shown in FIG. 14. Processing is executed. Note that the impact determination process is an example of the impact determination method of the disclosed technology.

ステップＳ１１で、重回帰モデル学習部１１が、推定結果を示すリストに含まれるいずれかの理由提示ボタンが選択されたか否かを判定することにより、理由提示が指示されたか否かを判定する。理由提示が指示された場合には、処理はステップＳ１２へ移行し、指示されていない場合には、処理はステップＳ１８へ移行する。 In step S11, the multiple regression model learning unit 11 determines whether presentation of a reason has been instructed by determining whether any reason presentation button included in the list indicating the estimation results has been selected. If the reason presentation is instructed, the process moves to step S12, and if the reason presentation is not instructed, the process moves to step S18.

ステップＳ１２で、重回帰モデル学習部１１が、学習モデル４０に対応する特徴空間において、選択された理由提示ボタンに対応する従業員についての推定用時系列データの周辺の学習データを探索する。例えば、重回帰モデル学習部１１は、推定用時系列データを示すベクトルと、学習データを示すベクトルとのユークリッド距離が所定値以下の学習データを周辺の学習データとして探索する。 In step S12, the multiple regression model learning unit 11 searches the feature space corresponding to the learning model 40 for learning data around the estimation time series data for the employee corresponding to the selected reason presentation button. For example, the multiple regression model learning unit 11 searches for learning data in which the Euclidean distance between the vector representing the estimation time series data and the vector representing the learning data is equal to or less than a predetermined value, as peripheral learning data.

次に、ステップＳ１３で、重回帰モデル学習部１１が、探索した周辺の学習データを用いて、該当の従業員についての推定用時系列データの周辺で、学習モデル４０の決定境界を局所的に近似した重回帰モデル２０を生成する。 Next, in step S13, the multiple regression model learning unit 11 uses the searched surrounding learning data to locally adjust the decision boundary of the learning model 40 around the estimation time series data for the relevant employee. An approximate multiple regression model 20 is generated.

次に、ステップＳ１４で、算出部１２が、重回帰モデル２０において、推定用時系列データに含まれる各データに対応する日付より後の日付に対応する項の偏回帰係数を０とした場合の重回帰値を、日付（説明変数）毎の影響度として算出する。 Next, in step S14, the calculation unit 12 calculates, in the multiple regression model 20, the partial regression coefficient of the term corresponding to the date after the date corresponding to each data included in the estimation time series data is set to 0. The multiple regression value is calculated as the degree of influence for each date (explanatory variable).

次に、ステップＳ１５で、特定部１３が、例えば（２）式により、影響度の傾きを日付（説明変数）毎に算出する。 Next, in step S15, the specifying unit 13 calculates the slope of the degree of influence for each date (explanatory variable), for example, using equation (2).

次に、ステップＳ１６で、特定部１３が、所定の閾値ｔｈ以上の影響度の傾きが所定数以上連続している期間を重要期間として特定する。特定部１３は、特定した重要期間を出力部１４に通知する。 Next, in step S16, the specifying unit 13 specifies as an important period a period in which the slope of the degree of influence is equal to or greater than a predetermined threshold value th for a predetermined number or more consecutively. The identifying unit 13 notifies the output unit 14 of the identified important period.

次に、ステップＳ１７で、出力部１４が、該当の従業員について、上記ステップＳ１６で特定された重要期間を推定理由として提示する。出力部１４は、推定用時系列データの重要期間に含まれるデータ、及び重要期間に含まれるデータの集計結果の少なくとも一方を、推定結果に影響を与えた要因として、推定理由に含めて出力してもよい。そして、処理はステップＳ１１に戻る。 Next, in step S17, the output unit 14 presents the important period specified in step S16 as the reason for the estimation for the relevant employee. The output unit 14 outputs at least one of the data included in the important period of the estimation time series data and the aggregation result of the data included in the important period as factors that influenced the estimation result and included in the estimation reason. It's okay. Then, the process returns to step S11.

ステップＳ１８では、推定結果を示すリストの表示の終了が指示されたか否かを判定する。リストの表示の終了が指示されていない場合には、処理はステップＳ１１に戻り、終了が指示された場合には、影響判定処理は終了する。 In step S18, it is determined whether an instruction has been given to end the display of the list showing the estimation results. If the end of list display is not instructed, the process returns to step S11, and if the end is instructed, the influence determination process ends.

以上説明したように、本実施形態に係る推定システムによれば、影響判定装置が、過去の時系列データに基づいて、以降の推定期間における状態の推定結果を出力する学習モデルを局所的に近似した重回帰モデルを生成する。そして、影響判定装置は、重回帰モデルを示す回帰方程式の各項に、時系列データの各日付のデータを時系列順に対応させ、各日付に対応する項より後の項の偏回帰係数を０とした場合の重回帰値を、日付毎の影響度として算出する。さらに、影響判定装置は、日付毎の影響度の傾きが所定値以上で所定数連続する期間を重要期間として特定し、推定結果の理由として出力する。これにより、時系列データを用いた推定において、より推定結果に影響を与えた時系列データの部分を判定することができる。 As explained above, according to the estimation system according to the present embodiment, the influence determination device locally approximates the learning model that outputs the estimation result of the state in the subsequent estimation period based on the past time series data. generate a multiple regression model. Then, the influence determination device associates the data of each date of the time series data with each term of the regression equation indicating the multiple regression model in chronological order, and sets the partial regression coefficient of the term after the term corresponding to each date to 0. The multiple regression value is calculated as the degree of influence for each date. Further, the influence determination device identifies a period in which the slope of the degree of influence for each date is equal to or greater than a predetermined value for a predetermined number of consecutive periods as an important period, and outputs it as the reason for the estimation result. Thereby, in estimation using time-series data, it is possible to determine which part of the time-series data has more influence on the estimation result.

なお、上記実施形態では、従業員の勤怠データに基づいて、数か月先のメンタル不調による休職に繋がる療養欠勤の発生を推定する例について説明したが、これに限定されない。例えば、株価等の変動を予測するシステム等、時系列データを用いた推定に適用することができる。 In the above embodiment, an example has been described in which the occurrence of medical absences that will lead to absence from work due to mental illness several months in the future is estimated based on the employee's attendance data, but the present invention is not limited to this. For example, it can be applied to estimation using time-series data, such as a system for predicting changes in stock prices.

また、時系列データの単位も日付単位に限定されず、時間単位、週単位、月単位等でもよい。いずれの場合でも、各時点のデータを、重回帰モデルを示す回帰方程式の各項に時系列順に対応させることにより、開示の技術を適用することができる。 Moreover, the unit of time series data is not limited to the date unit, but may also be an hourly unit, weekly unit, monthly unit, or the like. In either case, the disclosed technique can be applied by associating data at each point in time with each term of a regression equation representing a multiple regression model in chronological order.

また、上記実施形態では、影響度の時系列的な変化を示す指標として、影響度の傾きを用いる場合について説明したが、これに限定されない。例えば、各時点の影響度の移動平均をとる等、周辺の時点における影響度を考慮した指標を用いてもよい。 Further, in the above embodiment, a case has been described in which the slope of the degree of influence is used as an index indicating a time-series change in the degree of influence, but the present invention is not limited to this. For example, an index that takes into account the degree of influence at surrounding times may be used, such as by taking a moving average of the degree of influence at each time point.

また、上記実施形態では、影響度判定装置と学習推定装置とを別々のコンピュータで実現する場合について説明したが、影響度判定装置と学習推定装置とを１つのコンピュータで実現してもよい。 Further, in the above embodiment, the case where the influence degree determination device and the learning estimation device are realized by separate computers has been described, but the influence degree determination device and the learning estimation device may be realized by one computer.

また、上記実施形態では、影響判定プログラムが記憶部に予め記憶（インストール）されている態様を説明したが、これに限定されない。開示の技術に係るプログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリ等の記憶媒体に記憶された形態で提供することも可能である。 Further, in the above embodiment, a mode has been described in which the influence determination program is stored (installed) in the storage unit in advance, but the present invention is not limited to this. The program according to the disclosed technology can also be provided in a form stored in a storage medium such as a CD-ROM, DVD-ROM, or USB memory.

以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes are further disclosed.

（付記１）
時系列データに基づいて推定結果を出力する機械学習モデルを近似した重回帰モデルの各項に、前記時系列データの各時点のデータを時系列順に対応させ入力する際、前記各時点のデータのそれぞれに対して、前記各時点のデータのそれぞれより後の時点のデータに対応する前記重回帰モデルの項の偏回帰係数を０として、重回帰値を算出し、
前記各時点のデータのそれぞれについて算出された前記重回帰値の変化に応じて、前記時系列データが示す期間のうち、特定の条件を満たす期間を特定し、
特定された前記特定の条件を満たす期間に関する情報を、前記推定結果に影響を与えた要因として出力する、
処理をコンピュータに実行させることを特徴とする影響判定プログラム。 (Additional note 1)
When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, For each, calculate a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time after each of the data at each time as 0,
Identifying a period that satisfies a specific condition among the periods indicated by the time series data according to changes in the multiple regression values calculated for each of the data at each point in time;
outputting information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An impact determination program characterized by causing a computer to execute processing.

（付記２）
前記特定の条件を満たす期間を特定する処理は、前記各時点のデータのそれぞれに対して算出された前記重回帰値が、時系列順に継続的に増加する期間を、前記特定の条件を満たす期間として特定する処理である、
ことを特徴とする付記１に記載の影響判定プログラム。 (Additional note 2)
The process of identifying a period that satisfies the specific condition includes determining a period in which the multiple regression values calculated for each of the data at each time point continuously increase in chronological order as a period that satisfies the specific condition. It is a process to identify as
The impact determination program according to Supplementary Note 1, characterized in that:

（付記３）
前記重回帰値が継続的に増加する期間として特定する処理は、前記各時点のデータのそれぞれに対して算出された前記重回帰値の時系列における変化の度合いを前記各時点のデータのそれぞれに対して算出し、閾値以上の変化の度合いを示すデータが所定数連続する期間を、前記重回帰値が継続的に増加する期間として特定する処理である、
ことを特徴とする付記２に記載の影響判定プログラム。 (Additional note 3)
The process of identifying the period in which the multiple regression value continuously increases includes determining the degree of change in the time series of the multiple regression value calculated for each of the data at each time point for each of the data at each time point. is a process of calculating a period in which a predetermined number of consecutive data indicating a degree of change equal to or greater than a threshold value is present as a period in which the multiple regression value continuously increases.
The impact determination program according to appendix 2, characterized in that:

（付記４）
前記特定の条件を満たす期間に関連する情報は、前記特定の条件を満たす期間を示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータを示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータの集計結果、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータとそれ以外のデータとの比較結果の少なくとも一つを含む、
ことを特徴とする付記１～付記３のいずれか１項に記載の影響判定プログラム。 (Additional note 4)
The information related to the period that satisfies the specific condition includes information indicating the period that satisfies the specific condition, information indicating data included in the period that satisfies the specific condition among the time series data, including at least one of the aggregation results of the data included in the period that satisfies the specific condition, and the comparison result of the data included in the period that satisfies the specific condition among the time series data and other data;
The impact determination program according to any one of Supplementary Notes 1 to 3, characterized in that:

（付記５）
前記重回帰モデルは、前記機械学習モデルの学習に利用された学習データであって、前記機械学習モデルに対応する特徴空間において、前記時系列データの周辺に位置する前記学習データを用いた機械学習により生成される、
ことを特徴とする付記１～付記４のいずれか１項に記載の影響判定プログラム。 (Appendix 5)
The multiple regression model is learning data used for learning the machine learning model, and is machine learning using the learning data located around the time series data in the feature space corresponding to the machine learning model. generated by,
The impact determination program according to any one of Supplementary Notes 1 to 4, characterized in that:

（付記６）
時系列データに基づいて推定結果を出力する機械学習モデルを近似した重回帰モデルの各項に、前記時系列データの各時点のデータを時系列順に対応させ入力する際、前記各時点のデータのそれぞれに対して、前記各時点のデータのそれぞれより後の時点のデータに対応する前記重回帰モデルの項の偏回帰係数を０として、重回帰値を算出する算出部と、
前記各時点のデータのそれぞれについて算出された前記重回帰値の変化に応じて、前記時系列データが示す期間のうち、特定の条件を満たす期間を特定する特定部と、
特定された前記特定の条件を満たす期間に関する情報を、前記推定結果に影響を与えた要因として出力する出力部と、
を含むことを特徴とする影響判定装置。 (Appendix 6)
When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, a calculation unit that calculates a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time later than each of the data at each time to 0 for each;
a specifying unit that specifies a period that satisfies a specific condition among the periods indicated by the time-series data according to changes in the multiple regression values calculated for each of the data at each point in time;
an output unit that outputs information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An influence determination device comprising:

（付記７）
前記特定部は、前記各時点のデータのそれぞれに対して算出された前記重回帰値が、時系列順に継続的に増加する期間を、前記特定の条件を満たす期間として特定する、
ことを特徴とする付記６に記載の影響判定装置。 (Appendix 7)
The identifying unit identifies a period in which the multiple regression value calculated for each of the data at each time point continuously increases in chronological order as a period that satisfies the specific condition.
The influence determination device according to appendix 6, characterized in that:

（付記８）
前記特定部は、前記各時点のデータのそれぞれに対して算出された前記重回帰値の時系列における変化の度合いを前記各時点のデータのそれぞれに対して算出し、閾値以上の変化の度合いを示すデータが所定数連続する期間を、前記重回帰値が継続的に増加する期間として特定する、
ことを特徴とする付記７に記載の影響判定装置。 (Appendix 8)
The identification unit calculates, for each of the data at each time, the degree of change in the multiple regression value in a time series calculated for each of the data at each time, and determines the degree of change equal to or greater than a threshold value. identifying a period in which a predetermined number of consecutive data shown as a period in which the multiple regression value continuously increases;
The influence determination device according to appendix 7, characterized in that:

（付記９）
前記特定の条件を満たす期間に関連する情報は、前記特定の条件を満たす期間を示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータを示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータの集計結果、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータとそれ以外のデータとの比較結果の少なくとも一つを含む、
ことを特徴とする付記６～付記８のいずれか１項に記載の影響判定装置。 (Appendix 9)
The information related to the period that satisfies the specific condition includes information indicating the period that satisfies the specific condition, information indicating data included in the period that satisfies the specific condition among the time series data, including at least one of the aggregation results of the data included in the period that satisfies the specific condition, and the comparison result of the data included in the period that satisfies the specific condition among the time series data and other data;
The influence determination device according to any one of Supplementary Notes 6 to 8, characterized in that:

（付記１０）
前記重回帰モデルは、前記機械学習モデルの学習に利用された学習データであって、前記機械学習モデルに対応する特徴空間において、前記時系列データの周辺に位置する前記学習データを用いた機械学習により生成される、
ことを特徴とする付記６～付記９のいずれか１項に記載の影響判定装置。 (Appendix 10)
The multiple regression model is learning data used for learning the machine learning model, and is machine learning using the learning data located around the time series data in the feature space corresponding to the machine learning model. generated by,
The influence determination device according to any one of Supplementary notes 6 to 9, characterized in that:

（付記１１）
時系列データに基づいて推定結果を出力する機械学習モデルを近似した重回帰モデルの各項に、前記時系列データの各時点のデータを時系列順に対応させ入力する際、前記各時点のデータのそれぞれに対して、前記各時点のデータのそれぞれより後の時点のデータに対応する前記重回帰モデルの項の偏回帰係数を０として、重回帰値を算出し、
前記各時点のデータのそれぞれについて算出された前記重回帰値の変化に応じて、前記時系列データが示す期間のうち、特定の条件を満たす期間を特定し、
特定された前記特定の条件を満たす期間に関する情報を、前記推定結果に影響を与えた要因として出力する、
処理をコンピュータが実行することを特徴とする影響判定方法。 (Appendix 11)
When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, For each, calculate a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time after each of the data at each time as 0,
Identifying a period that satisfies a specific condition among the periods indicated by the time series data according to changes in the multiple regression values calculated for each of the data at each point in time;
outputting information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An impact determination method characterized in that processing is executed by a computer.

（付記１２）
前記特定の条件を満たす期間を特定する処理は、前記各時点のデータのそれぞれに対して算出された前記重回帰値が、時系列順に継続的に増加する期間を、前記特定の条件を満たす期間として特定する処理である、
ことを特徴とする付記１１に記載の影響判定方法。 (Appendix 12)
The process of identifying a period that satisfies the specific condition includes determining a period in which the multiple regression values calculated for each of the data at each time point continuously increase in chronological order as a period that satisfies the specific condition. It is a process to identify as
The influence determination method according to appendix 11, characterized in that:

（付記１３）
前記重回帰値が継続的に増加する期間として特定する処理は、前記各時点のデータのそれぞれに対して算出された前記重回帰値の時系列における変化の度合いを前記各時点のデータのそれぞれに対して算出し、閾値以上の変化の度合いを示すデータが所定数連続する期間を、前記重回帰値が継続的に増加する期間として特定する処理である、
ことを特徴とする付記１２に記載の影響判定方法。 (Appendix 13)
The process of identifying the period in which the multiple regression value continuously increases includes determining the degree of change in the time series of the multiple regression value calculated for each of the data at each time point for each of the data at each time point. is a process of calculating a period in which a predetermined number of consecutive data indicating a degree of change equal to or greater than a threshold value is present as a period in which the multiple regression value continuously increases.
The influence determination method according to appendix 12, characterized in that:

（付記１４）
前記特定の条件を満たす期間に関連する情報は、前記特定の条件を満たす期間を示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータを示す情報、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータの集計結果、前記時系列データのうち前記特定の条件を満たす期間に含まれるデータとそれ以外のデータとの比較結果の少なくとも一つを含む、
ことを特徴とする付記１１～付記１３のいずれか１項に記載の影響判定方法。 (Appendix 14)
The information related to the period that satisfies the specific condition includes information indicating the period that satisfies the specific condition, information indicating data included in the period that satisfies the specific condition among the time series data, including at least one of the aggregation results of the data included in the period that satisfies the specific condition, and the comparison result of the data included in the period that satisfies the specific condition among the time series data and other data;
The influence determination method according to any one of Supplementary Notes 11 to 13, characterized in that:

（付記１５）
前記重回帰モデルは、前記機械学習モデルの学習に利用された学習データであって、前記機械学習モデルに対応する特徴空間において、前記時系列データの周辺に位置する前記学習データを用いた機械学習により生成される、
ことを特徴とする付記１１～付記１４のいずれか１項に記載の影響判定方法。 (Appendix 15)
The multiple regression model is learning data used for learning the machine learning model, and is machine learning using the learning data located around the time series data in the feature space corresponding to the machine learning model. generated by,
The influence determination method according to any one of Supplementary Notes 11 to 14, characterized in that:

（付記１６）
時系列データに基づいて推定結果を出力する機械学習モデルを近似した重回帰モデルの各項に、前記時系列データの各時点のデータを時系列順に対応させ入力する際、前記各時点のデータのそれぞれに対して、前記各時点のデータのそれぞれより後の時点のデータに対応する前記重回帰モデルの項の偏回帰係数を０として、重回帰値を算出し、
前記各時点のデータのそれぞれについて算出された前記重回帰値の変化に応じて、前記時系列データが示す期間のうち、特定の条件を満たす期間を特定し、
特定された前記特定の条件を満たす期間に関する情報を、前記推定結果に影響を与えた要因として出力する、
処理をコンピュータに実行させることを特徴とする影響判定プログラムを記憶した記憶媒体。 (Appendix 16)
When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, For each, calculate a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time after each of the data at each time as 0,
Identifying a period that satisfies a specific condition among the periods indicated by the time series data according to changes in the multiple regression values calculated for each of the data at each point in time;
outputting information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
A storage medium storing an impact determination program that causes a computer to execute processing.

１０影響判定装置
１１重回帰モデル学習部
１２算出部
１３特定部
１４出力部
２０重回帰モデル
３０学習推定装置
３１学習部
３２推定部
４０学習モデル
５０コンピュータ
５１ＣＰＵ
５２メモリ
５３記憶部
５９記憶媒体
６０影響判定プログラム
１００推定システム 10 Impact determination device 11 Multiple regression model learning section 12 Calculation section 13 Specification section 14 Output section 20 Multiple regression model 30 Learning estimation device 31 Learning section 32 Estimation section 40 Learning model 50 Computer 51 CPU
52 Memory 53 Storage unit 59 Storage medium 60 Impact determination program 100 Estimation system

Claims

When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, For each, calculate a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time after each of the data at each time as 0,
Identifying a period that satisfies a specific condition among the periods indicated by the time series data according to changes in the multiple regression values calculated for each of the data at each point in time;
outputting information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An impact determination program characterized by causing a computer to execute processing.

The process of identifying a period that satisfies the specific condition includes determining a period in which the multiple regression values calculated for each of the data at each time point continuously increase in chronological order as a period that satisfies the specific condition. It is a process to identify as
The influence determination program according to claim 1, characterized in that:

The process of identifying the period in which the multiple regression value continuously increases includes determining the degree of change in the time series of the multiple regression value calculated for each of the data at each time point for each of the data at each time point. is a process of calculating a period in which a predetermined number of consecutive data indicating a degree of change equal to or greater than a threshold value is present as a period in which the multiple regression value continuously increases.
The influence determination program according to claim 2, characterized in that:

The information related to the period that satisfies the specific condition includes information indicating the period that satisfies the specific condition, information indicating data included in the period that satisfies the specific condition among the time series data, including at least one of the aggregation results of the data included in the period that satisfies the specific condition, and the comparison result of the data included in the period that satisfies the specific condition among the time series data and other data;
The influence determination program according to any one of claims 1 to 3, characterized in that:

The multiple regression model is learning data used for learning the machine learning model, and is machine learning using the learning data located around the time series data in the feature space corresponding to the machine learning model. generated by,
The influence determination program according to any one of claims 1 to 4, characterized in that:

When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, a calculation unit that calculates a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time later than each of the data at each time to 0 for each;
a specifying unit that specifies a period that satisfies a specific condition among the periods indicated by the time-series data according to changes in the multiple regression values calculated for each of the data at each point in time;
an output unit that outputs information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An influence determination device comprising:

When inputting data at each point in time series data in chronological order to each term of a multiple regression model that approximates a machine learning model that outputs estimation results based on time series data, For each, calculate a multiple regression value by setting the partial regression coefficient of the term of the multiple regression model corresponding to data at a time after each of the data at each time as 0,
Identifying a period that satisfies a specific condition among the periods indicated by the time series data according to changes in the multiple regression values calculated for each of the data at each point in time;
outputting information regarding a period that satisfies the identified specific condition as a factor that influenced the estimation result;
An impact determination method characterized in that processing is executed by a computer.