JP2021093020A

JP2021093020A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2021093020A
Application number: JP2019223816A
Authority: JP
Inventors: 舜奥野; Shun Okuno; 弘樹上田; Hiroki Ueda; 信輔櫻木; Shinsuke Sakuragi; 優田中; Masaru Tanaka; 玲是此田; Rei Koreida
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2021-06-17
Anticipated expiration: 2039-12-11
Also published as: JP7414502B2

Abstract

To efficiently extract data giving harmful affection and good affection on prediction accuracy.SOLUTION: An information processing apparatus according to an embodiment of the present invention has an acquiring unit, a learning unit, and an output control unit. The acquiring unit acquires an input data for use in input to a model for outputting an inference result based on the input of an input data including time-series data having values continuously changed in response to a position and a correct answer data indicating a correct answer of the inference by the model. The learning unit learns a model by using first input data selected from the input data and the correct answer data. The output control unit outputs a degree of contribution of the first input data to the inference result of the learned model. The learning unit further learns the model by using third input data based on second input data designated in response to the output degree of contribution of the input data and the correct answer model.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、情報処理装置、情報処理方法およびプログラムに関する。 Embodiments of the present invention relate to information processing devices, information processing methods and programs.

電力を使用する地域ごとに機械学習などにより作成されたモデルを用いて、指定された地域での電力の需要を気象データから予測する技術が提案されている。気象データは、例えば気象台などの施設が設置されている地域について気象庁が提供するデータを用いることができる。また、限定された地域に対して提供される気象データから、気象データが提供されない地域の気象データを予測する技術が提案されている。予測した気象データも用いれば、広範囲の地域の需要予測をより高精度に実現可能となる。 A technique for predicting the demand for electricity in a designated area from meteorological data has been proposed by using a model created by machine learning for each area where electricity is used. As the meteorological data, data provided by the Japan Meteorological Agency can be used for the area where the facility such as the meteorological station is installed. In addition, a technique for predicting meteorological data in areas where meteorological data is not provided has been proposed from meteorological data provided to a limited area. By using the predicted meteorological data, it is possible to realize demand forecasting in a wide range of areas with higher accuracy.

上記のような予測技術では、予測の精度に悪い影響を与えるデータおよび良い影響を与えるデータを抽出し、悪い影響を与えるデータを用いず、良い影響を与えるデータを用いるように構成することが望ましい。 In the above prediction technology, it is desirable to extract data that has a bad influence on the accuracy of prediction and data that has a good influence, and to use data that has a good influence without using data that has a bad influence. ..

特開２０１９−０８７０２７号公報JP-A-2019-087027

しかしながら、従来技術では、予測の精度に悪い影響を与えるデータおよび良い影響を与えるデータを効率的に抽出できない場合があった。 However, in the prior art, there are cases where data having a bad influence on the accuracy of prediction and data having a good influence cannot be efficiently extracted.

実施形態の情報処理装置は、取得部と、学習部と、出力制御部と、を備える。取得部は、位置に応じて値が連続的に変化する時系列データを含む入力データを入力して推論結果を出力するモデルに入力する入力データ、および、モデルによる推論の正解を表す正解データを取得する。学習部は、入力データから選択された第１入力データと、正解データと、を用いてモデルを学習する。出力制御部は、学習されたモデルによる推論結果に対する第１入力データの寄与度を出力する。学習部は、入力データのうち出力された寄与度に応じて指定された第２入力データに基づく第３入力データと、正解データと、を用いてモデルをさらに学習する。 The information processing device of the embodiment includes an acquisition unit, a learning unit, and an output control unit. The acquisition unit inputs input data including time-series data whose value changes continuously according to the position and outputs the inference result. Input data to be input to the model and correct answer data indicating the correct answer of the inference by the model are input. get. The learning unit learns the model using the first input data selected from the input data and the correct answer data. The output control unit outputs the contribution of the first input data to the inference result by the learned model. The learning unit further learns the model using the third input data based on the second input data specified according to the output contribution of the input data and the correct answer data.

図１は、実施形態にかかる情報処理装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus according to the embodiment. 図２は、事象データのデータ構造の一例を示す図である。FIG. 2 is a diagram showing an example of a data structure of event data. 図３は、気象データに含まれうる気象を表す項目の例を示す図である。FIG. 3 is a diagram showing an example of items representing the weather that can be included in the weather data. 図４は、地理データのデータ構造の一例を示す図である。FIG. 4 is a diagram showing an example of a data structure of geographic data. 図５は、事象データのデータ構造の一例を示す図である。FIG. 5 is a diagram showing an example of a data structure of event data. 図６は、追加データのデータ構造の一例を示す図である。FIG. 6 is a diagram showing an example of a data structure of additional data. 図７は、実施形態における学習処理の一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of the learning process in the embodiment. 図８は、実施形態における推論処理の一例を示すフローチャートである。FIG. 8 is a flowchart showing an example of inference processing in the embodiment. 図９は、説明変数抽出処理の一例を示すフローチャートである。FIG. 9 is a flowchart showing an example of the explanatory variable extraction process. 図１０は、度数分布Ｒ１の作成方法の一例を説明するための図である。FIG. 10 is a diagram for explaining an example of a method of creating the frequency distribution R1. 図１１は、度数分布Ｒ１の作成方法の一例を説明するための図である。FIG. 11 is a diagram for explaining an example of a method of creating the frequency distribution R1. 図１２は、度数分布Ｒ１の作成方法の一例を説明するための図である。FIG. 12 is a diagram for explaining an example of a method of creating the frequency distribution R1. 図１３は、度数分布Ｒ２の例を示す図である。FIG. 13 is a diagram showing an example of the frequency distribution R2. 図１４は、説明変数抽出処理の他の例を示すフローチャートである。FIG. 14 is a flowchart showing another example of the explanatory variable extraction process. 図１５は、推論結果を表示する表示画面の例を示す図である。FIG. 15 is a diagram showing an example of a display screen for displaying the inference result. 図１６は、推論結果を表示する表示画面の例を示す図である。FIG. 16 is a diagram showing an example of a display screen for displaying the inference result. 図１７は、推論結果を表示する表示画面の例を示す図である。FIG. 17 is a diagram showing an example of a display screen for displaying the inference result. 図１８は、推論結果を表示する表示画面の例を示す図である。FIG. 18 is a diagram showing an example of a display screen for displaying the inference result. 図１９は、推論結果を表示する表示画面の例を示す図である。FIG. 19 is a diagram showing an example of a display screen for displaying the inference result. 図２０は、実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。FIG. 20 is an explanatory diagram showing a hardware configuration example of the information processing apparatus according to the embodiment.

以下に添付図面を参照して、この発明にかかる情報処理装置の好適な実施形態を詳細に説明する。 A preferred embodiment of the information processing apparatus according to the present invention will be described in detail below with reference to the accompanying drawings.

以下では、気象データ（予測した気象データを含む）を含む入力データから、ある事象の発生を推論（予測）する情報処理システムを例に説明する。入力データは気象データに限られるものではない。位置（地図上の位置など）に応じて値が連続的に変化する他の時系列データを入力データとして用いてもよい。推論対象はどのような対象であってもよい。例えば、推論対象は、事象が発生するか否かであってもよいし、事象の発生量であってもよい。 In the following, an information system that infers (predicts) the occurrence of a certain event from input data including meteorological data (including predicted meteorological data) will be described as an example. The input data is not limited to meteorological data. Other time-series data whose value continuously changes according to the position (position on the map, etc.) may be used as input data. The inference object may be any object. For example, the inference target may be whether or not an event occurs, or may be the amount of occurrence of an event.

上記のような情報処理システムは、１以上の説明変数から目的変数を推論（予測）するシステムであると解釈することができる。電力需要予測の場合は、気象データが説明変数に相当し、電力需要が目的変数に相当する。 The above information processing system can be interpreted as a system that infers (predicts) the objective variable from one or more explanatory variables. In the case of power demand forecasting, meteorological data corresponds to the explanatory variable and power demand corresponds to the objective variable.

目的変数となる事象のデータ（事象データ）と、説明変数となる気象データとの関連性が高いほど機械学習モデルによる予測精度は高くなる。一方、時系列的な地域傾向を勘案した場合に、予測の対象となる地域と遠い地域の気象データなどは、予測精度に悪い影響を与える場合がある。従来は、どのようなデータが予測に影響を与えるかについての知見は、分析者の経験則として蓄積されており、需要予測作業が属人化している場合があった。 The higher the relationship between the event data (event data) that serves as the objective variable and the meteorological data that serves as the explanatory variable, the higher the prediction accuracy of the machine learning model. On the other hand, when the time-series regional trends are taken into consideration, the weather data of the area to be predicted and the area far away may adversely affect the prediction accuracy. In the past, knowledge about what kind of data influences forecasts has been accumulated as an analyst's rule of thumb, and demand forecasting work has sometimes been personalized.

本実施形態では、多数の説明変数（気象データなど）のうち、予測の精度に影響を与える説明変数を可視化し、定量的な指標に基づいて所望の説明変数を除外可能とするとともに、分析者の経験則などに基づく、予測精度によい影響を与える説明変数を指定可能とする。 In the present embodiment, among a large number of explanatory variables (such as meteorological data), the explanatory variables that affect the accuracy of prediction are visualized, and the desired explanatory variables can be excluded based on a quantitative index, and the analyst. It is possible to specify explanatory variables that have a positive effect on prediction accuracy, based on the empirical rules of.

例えば日本全域を対象として電力需要を予測するような場合、データが非常に多量になることから、分析者などが人手により分析することは困難となるし、計算機を利用した場合でも計算コストが非常に高くなる。本実施形態によれば、機械学習モデルによる推論（予測）の精度に影響がある説明変数の種類および属性を可視化し、従来の知見も考慮して予測精度によい影響を与える説明変数を指定（選択）可能となる。このため、計算コストを低減することができる。 For example, when forecasting power demand for the entire area of Japan, the amount of data is extremely large, making it difficult for analysts to analyze manually, and even when using a computer, the calculation cost is extremely high. Will be expensive. According to this embodiment, the types and attributes of explanatory variables that affect the accuracy of inference (prediction) by the machine learning model are visualized, and the explanatory variables that have a positive effect on the prediction accuracy are specified in consideration of the conventional knowledge ( (Selection) is possible. Therefore, the calculation cost can be reduced.

図１は、本実施形態にかかる情報処理装置１００の構成の一例を示すブロック図である。図１に示すように、情報処理装置１００は、取得部１０１と、符号化部１０２と、抽出部１０３と、学習部１０４と、出力制御部１０５と、推論部１１１と、表示部１３１と、事象データ記憶部１２１と、気象データ記憶部１２２と、追加データ記憶部１２３と、地理データ記憶部１２４と、特徴量記憶部１２５と、モデル記憶部１２６と、抽出情報記憶部１２７と、を備えている。 FIG. 1 is a block diagram showing an example of the configuration of the information processing apparatus 100 according to the present embodiment. As shown in FIG. 1, the information processing apparatus 100 includes an acquisition unit 101, a coding unit 102, an extraction unit 103, a learning unit 104, an output control unit 105, an inference unit 111, a display unit 131, and the like. It includes an event data storage unit 121, a weather data storage unit 122, an additional data storage unit 123, a geographic data storage unit 124, a feature amount storage unit 125, a model storage unit 126, and an extraction information storage unit 127. ing.

取得部１０１は、情報処理装置１００で実行される各種処理で用いられる各種データを取得する。例えば取得部１０１は、推論に用いられる機械学習モデルに入力する入力データ、正解データ、および、地理データなどを取得する。 The acquisition unit 101 acquires various data used in various processes executed by the information processing apparatus 100. For example, the acquisition unit 101 acquires input data, correct answer data, geographic data, and the like to be input to the machine learning model used for inference.

取得部１０１による各データの取得方法は、どのような方法であってもよい。例えば、外部装置からネットワークを介して取得する方法、および、記憶媒体に記憶されたデータを読み込む方法などを適用できる。ネットワークは、ＬＡＮ（ローカルエリアネットワーク）、および、インターネットなどであるが、その他のどのようなネットワークであってもよい。またネットワークは、有線ネットワークおよび無線ネットワークのいずれであってもよい。取得するデータに応じて取得方法を変更してもよい。例えば取得部１０１は、サーバ装置から事象データを取得し、気象データを提供する気象予測システムから気象データを取得し、ＷＥＢスクレイピングシステムから追加データ（後述）を取得するように構成してもよい。 The acquisition method of each data by the acquisition unit 101 may be any method. For example, a method of acquiring data from an external device via a network, a method of reading data stored in a storage medium, and the like can be applied. The network is a LAN (local area network), the Internet, or the like, but may be any other network. The network may be either a wired network or a wireless network. The acquisition method may be changed according to the data to be acquired. For example, the acquisition unit 101 may be configured to acquire event data from the server device, acquire the weather data from the weather prediction system that provides the weather data, and acquire additional data (described later) from the web scraping system.

以下では、位置に応じて値が連続的に変化する時系列データとして気象データを入力データが含む場合を例に説明する。入力データは、気象データなどの他に、推論に影響しうるどのようなデータを含んでもよい。気象データ以外に用いられるデータを以下では追加データという場合がある。入力データ（気象データ、追加データ）は、説明変数に相当する。 In the following, a case where the input data includes meteorological data as time-series data whose value continuously changes according to the position will be described as an example. The input data may include any data that may affect the inference, in addition to meteorological data and the like. Data used in addition to meteorological data may be referred to as additional data below. Input data (weather data, additional data) correspond to explanatory variables.

正解データは、機械学習モデルの学習時に用いられるデータであり、機械学習モデルによる推論の正解を表すデータである。例えば、予測の対象となる事象が過去に発生した時刻および位置などを表すデータが、正解データとなりうる。以下では、正解データを事象データという場合がある。 The correct answer data is data used when learning a machine learning model, and is data representing the correct answer of inference by the machine learning model. For example, data representing the time and position where the event to be predicted occurred in the past can be the correct answer data. In the following, the correct answer data may be referred to as event data.

地理データは、事象が発生しうる地域の位置などを表すデータである。気象データ、事象データ、および、地理データのデータ構造の例は後述する。 Geographical data is data that represents the location of an area where an event can occur. Examples of data structures for meteorological data, event data, and geographic data will be described later.

符号化部１０２は、入力されたデータ（入力データ、正解データなど）を符号化（エンコード）し、符号化の結果である特徴量を出力する。符号化部１０２は、入力されたデータを後段の処理で使用しやすい形式に変換する処理を行うと解釈することができる。例えば時間的に疎なデータを用いる場合、時間的に連続な事象を予測できるように、このデータを時間的に連続なデータに変換する必要がある。符号化部１０２は、例えば以下のような手法により、データを符号化する。
・ワンホット符号化（One-hot Encoding）：任意の事象のカテゴリ値を機械学習モデルが解釈しやすい形式のデータ（特徴量）で表す。
・カウント符号化（Count Encoding）：任意の期間内におけるカテゴリの出現回数を特徴量とする。
・コンソリデーション符号化（Consolidation Encoding）：データ内に存在する表記揺れ等の事項を解消しながら特徴量に符号化する。
・インタラクション（Interaction）：特徴量同士の関係を新たな特徴量とする。
・トレンドライン（Trend lines）データセットを任意の時系列の傾向データに加工し、傾向値を特徴量として表す。 The coding unit 102 encodes (encodes) the input data (input data, correct answer data, etc.) and outputs the feature amount that is the result of the coding. It can be interpreted that the coding unit 102 performs a process of converting the input data into a format that is easy to use in the subsequent process. For example, when using temporally sparse data, it is necessary to convert this data into temporally continuous data so that temporally continuous events can be predicted. The coding unit 102 encodes the data by, for example, the following method.
-One-hot Encoding: The category value of an arbitrary event is represented by data (features) in a format that can be easily interpreted by a machine learning model.
-Count Encoding: The feature quantity is the number of occurrences of a category within an arbitrary period.
-Consolidation Encoding: Coding into features while eliminating items such as notational fluctuations that exist in the data.
-Interaction: The relationship between features is used as a new feature.
-Trend lines Data sets are processed into arbitrary time-series trend data, and trend values are expressed as features.

符号化の手法は上記に限られるものではない。符号化部１０２は、上記の複数の手法を組み合わせて用いてもよい。 The coding method is not limited to the above. The coding unit 102 may use the above-mentioned plurality of methods in combination.

抽出部１０３は、複数の説明変数（特徴量に符号化された入力データ）から、機械学習モデルの学習処理、および、機械学習モデルによる推論処理で用いる説明変数を抽出する。例えば抽出部１０３は、複数の説明変数のうち、目的変数（事象データ）に対してより相関がある説明変数を抽出する。 The extraction unit 103 extracts explanatory variables used in the learning process of the machine learning model and the inference process by the machine learning model from the plurality of explanatory variables (input data encoded by the feature amount). For example, the extraction unit 103 extracts an explanatory variable that is more correlated with the objective variable (event data) from the plurality of explanatory variables.

学習部１０４は、機械学習モデルを学習する。機械学習モデルは、入力データ（気象データなど）を入力して推論結果（事象の発生など）を出力するモデルである。機械学習モデルは、どのような形式のモデルであってもよいが、例えば、ランダムフォレスト、二分木、および、ニューラルネットワークなどのモデルを適用できる。学習部１０４は、適用する機械学習モデルで用いられるどのような学習方法により学習処理を実行してもよい。例えば学習部１０４は、複数の説明変数から選択（抽出）された説明変数（第１入力データ）と、正解データに相当する事象データと、を用いて機械学習モデルを学習する。 The learning unit 104 learns a machine learning model. The machine learning model is a model that inputs input data (weather data, etc.) and outputs inference results (event occurrence, etc.). The machine learning model may be of any type, and for example, a model such as a random forest, a binary tree, and a neural network can be applied. The learning unit 104 may execute the learning process by any learning method used in the machine learning model to be applied. For example, the learning unit 104 learns a machine learning model using explanatory variables (first input data) selected (extracted) from a plurality of explanatory variables and event data corresponding to correct answer data.

推論部１１１は、機械学習モデルによる推論を実行する。例えば推論部１１１は、学習部１０４により学習済みの機械学習モデルに、新たに入力データを入力して推論を実行する。推論に用いる入力データは、例えば、複数の説明変数（特徴量に符号化された入力データ）のうち、学習時に選択（抽出）された説明変数と同じ説明変数である。 The inference unit 111 executes inference by a machine learning model. For example, the inference unit 111 newly inputs input data to the machine learning model learned by the learning unit 104 and executes inference. The input data used for inference is, for example, the same explanatory variable as the explanatory variable selected (extracted) at the time of learning from among a plurality of explanatory variables (input data encoded by the feature amount).

出力制御部１０５は、表示部１３１などの出力装置に対するデータの出力を制御する。例えば出力制御部１０５は、学習部１０４により学習された機械学習モデルによる推論結果を表示部１３１に表示させる。本実施形態では、出力制御部１０５は、学習部１０４による学習時に、学習された機械学習モデルによる推論に寄与する説明変数を可視化する処理を制御する。例えば出力制御部１０５は、学習された機械学習モデルによる推論結果に対する各説明変数の寄与度を表示部１３１に表示する。 The output control unit 105 controls the output of data to an output device such as the display unit 131. For example, the output control unit 105 causes the display unit 131 to display the inference result by the machine learning model learned by the learning unit 104. In the present embodiment, the output control unit 105 controls the process of visualizing the explanatory variables that contribute to the inference by the learned machine learning model at the time of learning by the learning unit 104. For example, the output control unit 105 displays the contribution of each explanatory variable to the inference result by the learned machine learning model on the display unit 131.

寄与度の出力方法は、使用する機械学習モデルに応じて様々な方法を適用しうる。機械学習モデルとして決定木を用いる場合は、出力制御部１０５は、dtreevizと呼ばれる可視化方法を適用できる。dtreevizは、決定木の内部における特徴量の可視化が可能なＯＳＳ（Open Source Software）ライブラリである。寄与度を可視化することにより、ある説明変数が機械学習モデル内でどのような振る舞いをして予測結果に寄与しているかを確認し、特異な影響を与えている説明変数を明らかにすることが可能となる。 As the contribution output method, various methods can be applied depending on the machine learning model used. When a decision tree is used as a machine learning model, the output control unit 105 can apply a visualization method called dtreeviz. dtreeviz is an OSS (Open Source Software) library that enables visualization of features inside decision trees. By visualizing the degree of contribution, it is possible to confirm how a certain explanatory variable behaves in the machine learning model and contribute to the prediction result, and to clarify the explanatory variable that has a peculiar influence. It will be possible.

ユーザは、表示された寄与度を参照して、学習に用いる説明変数をさらに指定（選択）することができる。抽出部１０３は、このようにして指定された説明変数（第２入力データ）、および、指定された説明変数に基づく他の説明変数を、さらに抽出する。また、学習部１０４は、抽出された説明変数と事象データとを用いて機械学習モデルを学習する処理を繰り返す。このような処理により、予測の精度に悪い影響を与えるデータおよび良い影響を与えるデータを効率的に抽出可能となる。 The user can further specify (select) the explanatory variables used for learning by referring to the displayed contribution. The extraction unit 103 further extracts the explanatory variable (second input data) designated in this way and other explanatory variables based on the designated explanatory variable. Further, the learning unit 104 repeats the process of learning the machine learning model using the extracted explanatory variables and the event data. By such processing, it becomes possible to efficiently extract data having a bad influence on the accuracy of prediction and data having a good influence.

表示部１３１は、データを表示する液晶ディスプレイなどの表示装置である。表示部１３１は、出力制御部１０５の制御に従い、例えば、機械学習モデルによる推論結果を表示する。 The display unit 131 is a display device such as a liquid crystal display that displays data. The display unit 131 displays, for example, the inference result by the machine learning model according to the control of the output control unit 105.

事象データ記憶部１２１は、例えば取得部１０１により取得された事象データを記憶する。図２は、事象データのデータ構造の一例を示す図である。図２に示すように、事象データは、ＩＤと、発生日時と、緯度と、経度と、を含む。ＩＤは、事象データを識別する情報である。発生日時は、事象が発生した日時（年月日、時刻など）を表す。緯度および経度は、事象が発生した位置を特定するための情報である。事象データは、緯度および経度の代わりに、事象が発生した位置を特定可能な他の情報を含んでもよい。例えば、事象が発生した地域の名称（都市名など）、および、事象が発生した施設の名称などが、事象が発生した位置を特定可能な情報の他の例である。 The event data storage unit 121 stores, for example, the event data acquired by the acquisition unit 101. FIG. 2 is a diagram showing an example of a data structure of event data. As shown in FIG. 2, the event data includes an ID, an occurrence date and time, a latitude, and a longitude. The ID is information that identifies event data. The date and time of occurrence represents the date and time when the event occurred (year, month, day, time, etc.). Latitude and longitude are information for identifying the position where the event occurred. Event data may include other information that can identify where the event occurred, instead of latitude and longitude. For example, the name of the area where the event occurred (such as the city name) and the name of the facility where the event occurred are other examples of information that can identify the location where the event occurred.

図２は、ある事象が発生したか否かを表す事象データの例である。ある事象の発生量（例えば電力需要を予測する場合、発生した電力需要の量）を含む事象データが用いられてもよい。 FIG. 2 is an example of event data showing whether or not a certain event has occurred. Event data including the amount of occurrence of a certain event (for example, when predicting power demand, the amount of power demand generated) may be used.

図１に戻り、気象データ記憶部１２２は、例えば取得部１０１により取得された気象データを記憶する。気象データは、例えば、地域ごとおよび日時ごとの、気温、風速、および、降水量などの気象に関する各項目の値を含む。気象データは、気象庁などにより提供されるデータを用いてもよいし、提供されたデータから予測された気象データを用いてもよい。予測された気象データを用いれば、気象データが疎な地域で精密な予測ができないなどの問題を回避し、より広範囲の地域でより高精度に予測を実行することが可能となる。 Returning to FIG. 1, the meteorological data storage unit 122 stores, for example, the meteorological data acquired by the acquisition unit 101. Meteorological data includes values for meteorological items such as temperature, wind speed, and precipitation, for example by region and by date and time. As the meteorological data, data provided by the Japan Meteorological Agency or the like may be used, or meteorological data predicted from the provided data may be used. By using the predicted meteorological data, it is possible to avoid problems such as inability to make accurate predictions in areas where the meteorological data is sparse, and to execute predictions with higher accuracy in a wider area.

図３は、気象データに含まれうる気象に関する項目の例を示す図である。図３に示すように、気象データは、気温、風速、および、降水量などの一般的に知られる項目のみでなく、他の多数の項目を含みうる。本実施形態によれば、これらの項目のうち、機械学習モデルによる推論によい影響を与える項目を効率的に見つけることが可能となる。 FIG. 3 is a diagram showing an example of items related to the weather that can be included in the weather data. As shown in FIG. 3, meteorological data may include not only commonly known items such as temperature, wind speed, and precipitation, but also many other items. According to the present embodiment, among these items, it is possible to efficiently find an item that has a positive effect on inference by a machine learning model.

図１に戻り、追加データ記憶部１２３は、例えば取得部１０１により取得された追加データを記憶する。上記のように、追加データは、気象データ以外に入力データとして追加されうるデータである。追加データは必須ではない。追加データはどのようなデータおよびデータ構造であってもよい。例えば、大型連休（ゴールデンウィーク、シルバーウィーク、お盆休暇、年末年始休暇など）などのイベントの有無を追加データとすることができる。 Returning to FIG. 1, the additional data storage unit 123 stores, for example, the additional data acquired by the acquisition unit 101. As described above, the additional data is data that can be added as input data in addition to the meteorological data. No additional data is required. The additional data may be any data and data structure. For example, the presence or absence of events such as long holidays (Golden Week, Silver Week, Obon holidays, year-end and New Year holidays, etc.) can be used as additional data.

地理データ記憶部１２４は、例えば取得部１０１により取得された地理データを記憶する。図４は、地理データのデータ構造の一例を示す図である。図４の地理データは、予測対象とする地域ごとの緯度および経度などの位置情報を定めたデータである。図４に示すように、地理データは、都道府県コードと、ＩＤと、緯度と、経度と、地域名と、を含む。都道府県コードは、日本の都道府県を識別する情報である。ＩＤは、地域を識別する情報である。地域名は、地域の名称を表す。 The geographic data storage unit 124 stores, for example, the geographic data acquired by the acquisition unit 101. FIG. 4 is a diagram showing an example of a data structure of geographic data. The geographical data of FIG. 4 is data that defines position information such as latitude and longitude for each region to be predicted. As shown in FIG. 4, the geographical data includes a prefecture code, an ID, a latitude, a longitude, and a region name. The prefecture code is information that identifies the prefecture in Japan. The ID is information that identifies the area. The area name represents the name of the area.

地理データは、例えば、出力制御部１０５が推論結果を地図上に表示するとき、および、符号化部１０２がデータを地域ごとのデータに符号化するときに参照される。図５は、特徴量に符号化した後の事象データのデータ構造の一例を示す図である。図５は、例えば、図２に示すような緯度および経度ごとに表された事象の発生を示す事象データを、地域ごとの事象の発生回数を表すように符号化した特徴量の例を示す。図５に示すように、符号化した事象データ（特徴量）は、日付ごと、および、地域ごとの事象の発生回数を含む。 The geographic data is referred to, for example, when the output control unit 105 displays the inference result on the map, and when the coding unit 102 encodes the data into regional data. FIG. 5 is a diagram showing an example of the data structure of the event data after being encoded into the feature amount. FIG. 5 shows, for example, an example of a feature amount in which event data indicating the occurrence of an event represented by latitude and longitude as shown in FIG. 2 is encoded so as to represent the number of occurrences of an event in each region. As shown in FIG. 5, the coded event data (feature amount) includes the number of occurrences of events for each date and each region.

図１に戻り、特徴量記憶部１２５は、例えば符号化部１０２により符号化された特徴量を記憶する。例えば特徴量記憶部１２５は、図５で説明した、符号化された事象データ、および、図６に示すような符号化された追加データを記憶する。図６は、特徴量に符号化した後の追加データのデータ構造の一例を示す図である。 Returning to FIG. 1, the feature amount storage unit 125 stores, for example, the feature amount encoded by the coding unit 102. For example, the feature amount storage unit 125 stores the coded event data described in FIG. 5 and the coded additional data as shown in FIG. FIG. 6 is a diagram showing an example of a data structure of additional data after being encoded into a feature amount.

図６は、例えば、休暇（イベントの一例）の期間（日付の範囲）を示す追加データが取得された場合に、この追加データを、地域ごとおよび日付ごとのイベントの発生有無（１：発生する、０：発生しない）を示す形式に符号化した例を示す。 FIG. 6 shows, for example, when additional data indicating the period (date range) of a vacation (an example of an event) is acquired, this additional data is used for whether or not an event occurs for each region and for each date (1: Occurs). , 0: does not occur) is shown as an example of coding.

図１に戻り、モデル記憶部１２６は、機械学習モデルを表す情報を記憶する。 Returning to FIG. 1, the model storage unit 126 stores information representing the machine learning model.

抽出情報記憶部１２７は、複数の説明変数から推論に用いる説明変数を抽出するための条件を示す抽出情報を記憶する。例えば抽出情報記憶部１２７は、より高精度に予測が可能な機械学習モデルを学習したときに抽出された説明変数を特定する情報を、抽出情報として記憶する。 The extraction information storage unit 127 stores the extraction information indicating the conditions for extracting the explanatory variables used for inference from the plurality of explanatory variables. For example, the extraction information storage unit 127 stores information for specifying an explanatory variable extracted when learning a machine learning model capable of predicting with higher accuracy as extraction information.

上記各部（取得部１０１、符号化部１０２、抽出部１０３、学習部１０４、出力制御部１０５、および、推論部１１１）は、例えば、１または複数のプロセッサにより実現される。例えば上記各部は、ＣＰＵ（Central Processing Unit）などのプロセッサにプログラムを実行させること、すなわちソフトウェアにより実現してもよい。上記各部は、専用のＩＣ（Integrated Circuit）などのプロセッサ、すなわちハードウェアにより実現してもよい。上記各部は、ソフトウェアおよびハードウェアを併用して実現してもよい。複数のプロセッサを用いる場合、各プロセッサは、各部のうち１つを実現してもよいし、各部のうち２以上を実現してもよい。 Each of the above units (acquisition unit 101, coding unit 102, extraction unit 103, learning unit 104, output control unit 105, and inference unit 111) is realized by, for example, one or more processors. For example, each of the above parts may be realized by causing a processor such as a CPU (Central Processing Unit) to execute a program, that is, by software. Each of the above parts may be realized by a processor such as a dedicated IC (Integrated Circuit), that is, hardware. Each of the above parts may be realized by using software and hardware in combination. When a plurality of processors are used, each processor may realize one of each part, or may realize two or more of each part.

上記各記憶部（事象データ記憶部１２１、気象データ記憶部１２２、追加データ記憶部１２３、地理データ記憶部１２４、特徴量記憶部１２５、モデル記憶部１２６、抽出情報記憶部１２７）は、フラッシュメモリ、メモリカード、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、および、光ディスクなどの一般的に利用されているあらゆる記憶媒体により構成することができる。各記憶部は、物理的に異なる記憶媒体としてもよいし、物理的に同一の記憶媒体の異なる記憶領域として実現してもよい。さらに記憶部のそれぞれは、物理的に異なる複数の記憶媒体により実現してもよい。各記憶部のそれぞれは、物理的に異なる複数の記憶媒体により実現してもよい。 Each of the above storage units (event data storage unit 121, meteorological data storage unit 122, additional data storage unit 123, geographic data storage unit 124, feature quantity storage unit 125, model storage unit 126, extraction information storage unit 127) is a flash memory. , Memory card, RAM (Random Access Memory), HDD (Hard Disk Drive), and any commonly used storage medium such as an optical disk. Each storage unit may be physically different storage media, or may be realized as different storage areas of physically the same storage medium. Further, each of the storage units may be realized by a plurality of physically different storage media. Each of the storage units may be realized by a plurality of physically different storage media.

なお、図１は、学習処理を実行する機能と推論処理を実行する機能とを１つの情報処理装置１００内に備える例を示しているが、２つの機能をそれぞれ異なる装置で実行するように構成してもよい。情報処理装置１００は、クラウド環境上で動作する装置であってもよい。また、図１に示す各部のうち一部を、情報処理装置１００の外部の装置により実行するように構成してもよい。例えば、表示部１３１をパーソナルコンピュータ、スマートフォン、および、タブレットなどの端末装置に備え、出力制御部１０５は、端末装置に対して情報を出力するように構成してもよい。外部の装置は、クラウド環境上で動作する装置であってもよい。 Note that FIG. 1 shows an example in which one information processing device 100 includes a function for executing learning processing and a function for executing inference processing, but the two functions are configured to be executed by different devices. You may. The information processing device 100 may be a device that operates in a cloud environment. Further, a part of each part shown in FIG. 1 may be configured to be executed by an external device of the information processing device 100. For example, the display unit 131 may be provided in a terminal device such as a personal computer, a smartphone, and a tablet, and the output control unit 105 may be configured to output information to the terminal device. The external device may be a device that operates in a cloud environment.

次に、このように構成された本実施形態にかかる情報処理装置１００による機械学習モデルの学習処理について説明する。図７は、本実施形態における学習処理の一例を示すフローチャートである。 Next, the learning process of the machine learning model by the information processing apparatus 100 according to the present embodiment configured as described above will be described. FIG. 7 is a flowchart showing an example of the learning process in the present embodiment.

取得部１０１は、事象データ、追加データ、気象データ、および、地理データを取得する（ステップＳ１０１）。取得部１０１は、不要な場合は追加データを取得しなくてもよい。符号化部１０２は、取得された各データを用いて符号化処理を実行する（ステップＳ１０２）。例えば符号化部１０２は、少なくとも正解データ（事象データ）、および、入力データ（気象データ、追加データ）をそれぞれ特徴量に符号化する。特徴量に符号化された入力データが、説明変数として用いられる。特徴量に符号化された事象データが、目的変数として用いられる。 The acquisition unit 101 acquires event data, additional data, meteorological data, and geographic data (step S101). The acquisition unit 101 does not have to acquire additional data when it is not needed. The coding unit 102 executes a coding process using each of the acquired data (step S102). For example, the coding unit 102 encodes at least the correct answer data (event data) and the input data (weather data, additional data) into feature quantities. The input data encoded in the feature quantity is used as an explanatory variable. The event data encoded by the feature quantity is used as the objective variable.

次に説明変数抽出処理が実行される（ステップＳ１０３）。説明変数抽出処理は、取得された複数の説明変数（特徴量に符号化された入力データ）から、機械学習モデルの推論の精度向上に貢献する説明変数を抽出し、抽出した説明変数により機械学習モデルを学習する処理である。説明変数抽出処理の詳細は後述する。 Next, the explanatory variable extraction process is executed (step S103). In the explanatory variable extraction process, explanatory variables that contribute to improving the inference accuracy of the machine learning model are extracted from the acquired multiple explanatory variables (input data encoded by the features), and machine learning is performed using the extracted explanatory variables. This is the process of learning a model. The details of the explanatory variable extraction process will be described later.

抽出部１０３は、説明変数抽出処理の中で、より高精度に予測が可能な機械学習モデルを学習したときに抽出された説明変数を特定する抽出情報を、抽出情報記憶部１２７に記憶する。また学習部１０４は、説明変数抽出処理により学習された学習済みの機械学習モデルを表す情報をモデル記憶部１２６に記憶する（ステップＳ１０４）。 The extraction unit 103 stores in the extraction information storage unit 127 the extraction information that identifies the explanatory variables extracted when the machine learning model capable of predicting with higher accuracy is learned in the explanatory variable extraction process. Further, the learning unit 104 stores information representing the learned machine learning model learned by the explanatory variable extraction process in the model storage unit 126 (step S104).

出力制御部１０５は、学習済みの機械学習モデルに関する情報を例えば表示部１３１に表示する（ステップＳ１０５）。例えば出力制御部１０５は、抽出情報を参照し、機械学習モデルの学習に用いられた説明変数を示す情報を表示する。出力制御部１０５は、機械学習モデルのパラメータを示す情報を表示してもよい。 The output control unit 105 displays information about the trained machine learning model on, for example, the display unit 131 (step S105). For example, the output control unit 105 refers to the extracted information and displays information indicating the explanatory variables used for learning the machine learning model. The output control unit 105 may display information indicating the parameters of the machine learning model.

次に、本実施形態にかかる情報処理装置１００による機械学習モデルを用いた推論処理について説明する。図８は、本実施形態における推論処理の一例を示すフローチャートである。 Next, the inference processing using the machine learning model by the information processing apparatus 100 according to the present embodiment will be described. FIG. 8 is a flowchart showing an example of inference processing in the present embodiment.

ステップＳ２０１およびステップＳ２０２は、図７のステップＳ１０１およびステップＳ１０２と同様であるため説明を省略する。 Since steps S201 and S202 are the same as steps S101 and S102 in FIG. 7, description thereof will be omitted.

取得部１０１は、抽出情報記憶部１２７から抽出情報を読み出し、モデル記憶部１２６から学習済みの機械学習モデルの情報を読み出す（ステップＳ２０３）。抽出部１０３は、読み出された抽出情報を用いて、符号化された複数の特徴量（説明変数）から、推論に用いる説明変数を抽出（選択）する（ステップＳ２０４）。 The acquisition unit 101 reads the extracted information from the extracted information storage unit 127, and reads the learned machine learning model information from the model storage unit 126 (step S203). The extraction unit 103 extracts (selects) an explanatory variable used for inference from a plurality of encoded feature variables (explanatory variables) using the extracted extraction information (step S204).

推論部１１１は、読み出された機械学習モデルに、符号化された入力データ（気象データ、追加データ）を入力することにより推論処理を実行する（ステップＳ２０５）。出力制御部１０５は、推論処理の推論結果を例えば表示部１３１に表示する（ステップＳ２０６）。推論結果の表示方法の具体例は後述する。 The inference unit 111 executes the inference process by inputting encoded input data (weather data, additional data) into the read machine learning model (step S205). The output control unit 105 displays the inference result of the inference process on, for example, the display unit 131 (step S206). A specific example of how to display the inference result will be described later.

次に、ステップＳ１０３の説明変数抽出処理の詳細について説明する。図９は、説明変数抽出処理の一例を示すフローチャートである。 Next, the details of the explanatory variable extraction process in step S103 will be described. FIG. 9 is a flowchart showing an example of the explanatory variable extraction process.

なお、説明変数抽出処理は、予測すべき事象が気象データと相関がある場合にのみ実行されてもよい。例えば抽出部１０３は、事象データと気象データとの間の相関係数を算出し、算出した相関係数の絶対値が閾値（例えば０．２）以上である場合に、両者に相関があると判定する。抽出部１０３は、任意の単位時間を設定し、この単位時間内での各データの移動平均を用いて、相関係数を算出する。この相関係数が閾値以上と成り得る単位時間が、予測が有効となる時間粒度の目安となる。例えば、以降の処理で相関係数を算出するときにも、この単位時間を時間粒度として用いることができる。 The explanatory variable extraction process may be executed only when the event to be predicted correlates with the meteorological data. For example, the extraction unit 103 calculates a correlation coefficient between event data and meteorological data, and when the absolute value of the calculated correlation coefficient is equal to or greater than a threshold value (for example, 0.2), it is determined that there is a correlation between the two. judge. The extraction unit 103 sets an arbitrary unit time, and calculates the correlation coefficient by using the moving average of each data within this unit time. The unit time at which this correlation coefficient can be equal to or greater than the threshold value serves as a guideline for the time particle size for which the prediction is effective. For example, this unit time can also be used as the time particle size when calculating the correlation coefficient in the subsequent processing.

説明変数抽出処理では、まず、抽出部１０３は、複数の説明変数（特徴量に符号化された入力データ）から、学習に用いる説明変数を選択する（ステップＳ３０１）。例えば抽出部１０３は、複数の説明変数から一様乱数などを用いてランダムに、適当な個数の説明変数を選択する。 In the explanatory variable extraction process, first, the extraction unit 103 selects an explanatory variable to be used for learning from a plurality of explanatory variables (input data encoded by the feature amount) (step S301). For example, the extraction unit 103 randomly selects an appropriate number of explanatory variables from a plurality of explanatory variables using a uniform random number or the like.

次に学習部１０４は、選択された説明変数から、入力された事象データ（目的変数）を推論するための機械学習モデルを学習する（ステップＳ３０２）。推論部１１１は、学習された機械学習モデルを用いた推論処理を実行する（ステップＳ３０３）。推論処理では、学習処理で用いられた入力データと異なる入力データが用いられてもよい。 Next, the learning unit 104 learns a machine learning model for inferring the input event data (objective variable) from the selected explanatory variable (step S302). The inference unit 111 executes an inference process using the learned machine learning model (step S303). In the inference process, input data different from the input data used in the learning process may be used.

また、選択された説明変数を用いた分析者の人手による予測が並行して実行されてもよい。分析者による予測結果は、後述するように、学習に用いる説明変数を追加するときに参考とすることができる。 In addition, the analyst's manual prediction using the selected explanatory variables may be executed in parallel. The prediction results by the analyst can be used as a reference when adding explanatory variables used for learning, as will be described later.

出力制御部１０５は、例えばdtreevizなどの可視化方法を用いて、機械学習モデルによる推論結果に対して寄与度の高い説明変数を表示部１３１に表示する（ステップＳ３０４）。 The output control unit 105 displays an explanatory variable having a high contribution to the inference result by the machine learning model on the display unit 131 by using a visualization method such as dtreeviz (step S304).

ユーザは、表示された寄与度を参照して、次の学習に用いる説明変数を指定することができる。ユーザは、分析者による予測結果および分析者が予測の根拠とした説明変数の情報などを参考にして、次の学習に用いる説明変数を指定してもよい。ユーザが説明変数を指定する代わりに、例えば寄与度が高い順に一定数（例えば上位一割など）の説明変数を抽出部１０３が抽出してもよい。 The user can specify the explanatory variables to be used for the next learning by referring to the displayed contribution. The user may specify the explanatory variables to be used for the next learning by referring to the prediction result by the analyst and the information of the explanatory variables on which the analyst bases the prediction. Instead of the user specifying the explanatory variables, the extraction unit 103 may extract a fixed number of explanatory variables (for example, the top 10%) in descending order of contribution.

指定（抽出）された説明変数のみでなく、指定された説明変数と相関する他の説明変数をさらに次の学習で用いるように構成してもよい。例えば抽出部１０３は、指定された説明変数と相関する他の説明変数を抽出する（ステップＳ３０５）。抽出部１０３は、例えば両者の相関係数の絶対値と閾値とを比較し、相関係数の絶対値が閾値を超える場合に、両者が相関すると判定する。このようにして指定および抽出された説明変数を、以下で使用する説明変数群とする。 Not only the specified (extracted) explanatory variable but also other explanatory variables that correlate with the specified explanatory variable may be configured to be used in the next learning. For example, the extraction unit 103 extracts another explanatory variable that correlates with the designated explanatory variable (step S305). For example, the extraction unit 103 compares the absolute value of the correlation coefficient between the two with the threshold value, and determines that the two are correlated when the absolute value of the correlation coefficient exceeds the threshold value. The explanatory variables specified and extracted in this way are used as the explanatory variables used below.

抽出部１０３は、さらに、説明変数群に含まれる各説明変数と類似する説明変数を、気象データに対する関係の類似性によって抽出する。 The extraction unit 103 further extracts explanatory variables similar to each explanatory variable included in the explanatory variable group based on the similarity of the relationship with the meteorological data.

抽出部１０３は、まず、説明変数群に含まれる各説明変数と、気象データとの関係を表す関係情報（第１関係情報）を作成する。関係情報は、例えば、説明変数と気象データとの関係を表す指標値が指定期間に含まれる複数の期間それぞれで出現する回数を表す度数分布Ｒ１である（ステップＳ３０６）。 First, the extraction unit 103 creates relational information (first relational information) representing the relation between each explanatory variable included in the explanatory variable group and the meteorological data. The relationship information is, for example, a frequency distribution R1 representing the number of times an index value representing the relationship between the explanatory variable and the meteorological data appears in each of a plurality of periods included in the designated period (step S306).

図１０から図１２は、度数分布Ｒ１の作成方法の一例を説明するための図である。図１０は、ある１つの説明変数と気象データとの相関係数（関係を表す指標値の一例）の、指定期間内での変化の様子を示している。図１０では、横軸が時間を表し、縦軸が相関係数を表す。図１１は、この指定期間内での相関係数の出現回数を表す度数分布の例を示している。図１２は、１０個の説明変数に対する度数分布を重ねて表した例である。図１２では、１つの説明変数に対する度数分布が１つの折れ線で表されている。 10 to 12 are diagrams for explaining an example of a method of creating the frequency distribution R1. FIG. 10 shows how the correlation coefficient (an example of an index value representing the relationship) between a certain explanatory variable and the meteorological data changes within a specified period. In FIG. 10, the horizontal axis represents time and the vertical axis represents the correlation coefficient. FIG. 11 shows an example of a frequency distribution representing the number of occurrences of the correlation coefficient within this designated period. FIG. 12 is an example in which the frequency distributions for 10 explanatory variables are superimposed. In FIG. 12, the frequency distribution for one explanatory variable is represented by one polygonal line.

なお図１２は、説明変数群に１０個の説明変数が含まれることに相当するが、説明変数群に含まれる説明変数の個数は１０個に限られるものではない。一方、説明変数の個数が多い場合（例えば個数が閾値を超える場合）は、抽出部１０３は、相互に類似する度数分布を１つの分布に統合するクラスタリングを行い、適切な個数となるように調整してもよい。クラスタリングを行った場合、抽出部１０３は、それぞれのクラスタの代表ベクトルを度数分布Ｒ１とする。代表ベクトルは、例えば、各ベクトルの要素の平均値を要素とするベクトルである。 Note that FIG. 12 corresponds to the explanatory variable group including 10 explanatory variables, but the number of explanatory variables included in the explanatory variable group is not limited to 10. On the other hand, when the number of explanatory variables is large (for example, when the number exceeds the threshold value), the extraction unit 103 performs clustering that integrates frequency distributions that are similar to each other into one distribution, and adjusts the number to an appropriate number. You may. When clustering is performed, the extraction unit 103 sets the representative vector of each cluster as the frequency distribution R1. The representative vector is, for example, a vector having the average value of the elements of each vector as an element.

図９に戻り、抽出部１０３は、すべての説明変数群と、気象データとの関係を表す関係情報（第２関係情報）を作成する。関係情報は、上記と同様に、例えば、説明変数と気象データとの関係を表す指標値（例えば相関係数）が指定期間に含まれる複数の期間それぞれで出現する回数を表す度数分布Ｒ２である（ステップＳ３０７）。 Returning to FIG. 9, the extraction unit 103 creates relational information (second relational information) representing the relation between all the explanatory variable groups and the meteorological data. Similar to the above, the relationship information is, for example, a frequency distribution R2 representing the number of times an index value (for example, a correlation coefficient) representing the relationship between the explanatory variable and the meteorological data appears in each of a plurality of periods included in the designated period. (Step S307).

図１３は、度数分布Ｒ２の例を示す図である。図１３は、度数分布Ｒ１と同様の手法により、すべての説明変数に対して作成された度数分布Ｒ２の例を示す。 FIG. 13 is a diagram showing an example of the frequency distribution R2. FIG. 13 shows an example of the frequency distribution R2 created for all the explanatory variables by the same method as the frequency distribution R1.

図９に戻り、抽出部１０３は、度数分布Ｒ１と一致または類似する度数分布Ｒ２に対応する説明変数を、次の学習に用いる説明変数（第３入力データ）として特定（抽出）する。例えば抽出部１０３は、度数分布Ｒ１に近似する度数分布Ｒ２に対応する説明変数を特定する（ステップＳ３０８）。抽出部１０３は、例えば、ｋ近傍法（ＫＮＮ：k-nearest neighbor algorithm）を用いて、度数分布Ｒ１に最も距離（ベクトル距離）が近い度数分布Ｒ２を求め、求めた度数分布Ｒ２に対応する説明変数を特定する。 Returning to FIG. 9, the extraction unit 103 specifies (extracts) an explanatory variable corresponding to the frequency distribution R2 that matches or is similar to the frequency distribution R1 as an explanatory variable (third input data) used for the next learning. For example, the extraction unit 103 specifies an explanatory variable corresponding to the frequency distribution R2 that approximates the frequency distribution R1 (step S308). The extraction unit 103 obtains the frequency distribution R2 having the closest distance (vector distance) to the frequency distribution R1 by using, for example, the k-nearest neighbor algorithm (KNN), and describes the frequency distribution R2 corresponding to the obtained frequency distribution R2. Identify the variable.

このようにして特定された説明変数を用いて、さらに機械学習モデルが学習される。すなわち、学習部１０４は、特定された説明変数から、入力された事象データ（目的変数）を推論するための機械学習モデルを学習する（ステップＳ３０９）。推論部１１１は、学習された機械学習モデルを用いた推論処理を実行する（ステップＳ３１０）。 Further machine learning models are trained using the explanatory variables identified in this way. That is, the learning unit 104 learns a machine learning model for inferring the input event data (objective variable) from the specified explanatory variable (step S309). The inference unit 111 executes an inference process using the learned machine learning model (step S310).

学習部１０４は、推論精度が向上したか否かを判定する（ステップＳ３１１）。例えば学習部１０４は、分析者による予測結果の精度よりステップＳ３１０で機械学習モデルを用いて推論した場合の精度の方が大きいか否かを判定する。学習部１０４は、機械学習モデルによる前回の推論時（例えばステップＳ３０３、または、ステップＳ３１０を繰り返し実行する場合の直前に実行したステップＳ３０９）の推論結果よりも精度が向上したかを判定してもよい。 The learning unit 104 determines whether or not the inference accuracy is improved (step S311). For example, the learning unit 104 determines whether or not the accuracy when inferred using the machine learning model in step S310 is greater than the accuracy of the prediction result by the analyst. Even if the learning unit 104 determines whether the accuracy is improved from the inference result at the time of the previous inference by the machine learning model (for example, step S303 or step S309 executed immediately before the case where step S310 is repeatedly executed). Good.

推論精度が向上した場合（ステップＳ３１１：Ｙｅｓ）、説明変数抽出処理を終了する。推論精度が向上していない場合（ステップＳ３１１：Ｎｏ）、学習部１０４は、ステップＳ３０６からステップＳ３１１までの処理の繰り返しの回数（処理回数）が上限値に達したか否かを判定する（ステップＳ３１２）。処理回数が上限値に達したか判定する代わりに、ユーザにより処理の終了が指定されたか否かを判定するように構成してもよい。上限値に達していない場合（ステップＳ３１２：Ｎｏ）、抽出部１０３は、さらに説明変数を追加する（ステップＳ３１３）。 When the inference accuracy is improved (step S311: Yes), the explanatory variable extraction process is terminated. When the inference accuracy is not improved (step S311: No), the learning unit 104 determines whether or not the number of times of repeating the process (number of processes) from step S306 to step S311 has reached the upper limit value (step). S312). Instead of determining whether the number of processes has reached the upper limit, it may be configured to determine whether the end of the process is specified by the user. When the upper limit value has not been reached (step S312: No), the extraction unit 103 further adds an explanatory variable (step S313).

例えばユーザは、現在使用されている説明変数に含まれない説明変数を分析者が予測の根拠としているような場合には、その説明変数を追加する説明変数として指定する。抽出部１０３は、指定された説明変数を追加する。その後、追加された説明変数を含む説明変数群に対して、ステップＳ３０６以降の処理が繰り返し実行される。 For example, when the analyst uses an explanatory variable that is not included in the currently used explanatory variable as the basis for prediction, the user specifies the explanatory variable to be added as an explanatory variable. The extraction unit 103 adds the specified explanatory variable. After that, the processes after step S306 are repeatedly executed for the explanatory variable group including the added explanatory variable.

上限値に達したと判定された場合（ステップＳ３１２：Ｙｅｓ）、抽出部１０３は、さらに別の手法により説明変数を選択する。例えば抽出部１０３は、Backward Eliminationと呼ばれる手法により説明変数を選択する（ステップＳ３１４）。学習部１０４は、選択された説明変数から、入力された事象データ（目的変数）を推論するための機械学習モデルを学習する（ステップＳ３１５）。 When it is determined that the upper limit value has been reached (step S312: Yes), the extraction unit 103 selects an explanatory variable by yet another method. For example, the extraction unit 103 selects explanatory variables by a method called Backward Elimination (step S314). The learning unit 104 learns a machine learning model for inferring the input event data (objective variable) from the selected explanatory variable (step S315).

Backward Eliminationは、まず、すべての特徴量（説明変数）を含んだモデルを作成し、重要でないと判断される特徴量を逐次削除する手法である。Backward Eliminationでは、気象データおよびイベント情報（追加データの一例）などの説明変数と、目的変数との関連性を広域的に可視化することで、削除すべき説明変数が明らかされる。説明変数の削除方法としては、影響度（重要度）などに基づきユーザが指定した説明変数を削除する方法、および、影響度と閾値との比較結果などに基づき抽出部１０３が削除する方法（以下で説明するBoruta）などを適用できる。 Backward Elimination is a method of first creating a model that includes all features (explanatory variables) and then sequentially deleting features that are judged to be insignificant. Backward Elimination clarifies the explanatory variables to be deleted by broadly visualizing the relationship between the explanatory variables such as meteorological data and event information (an example of additional data) and the objective variable. As a method of deleting the explanatory variable, a method of deleting the explanatory variable specified by the user based on the degree of influence (importance) and the like, and a method of deleting the explanatory variable by the extraction unit 103 based on the comparison result between the degree of influence and the threshold value (hereinafter, Boruta) explained in the above can be applied.

Backward Eliminationの一例であるBorutaは、多数の特徴量の中から有効な特徴量を取り出すために、偽の特徴量を作成し、重要度を比較する手法である。Borutaでは、例えば、既存の特徴量（Original Data）をコピーし、各列のサンプルをシャッフルした偽の特徴量（Shadow Feature）を作り、既存の特徴量と偽の特徴量とを結合してランダムフォレストを訓練する。偽の特徴量の重要度のうち最も大きな重要度から、寄与しない既存の特徴量の重要度の目安を得ることができる。すなわち、偽の特徴量の重要度のうち最も大きな重要度より重要度が小さい既存の特徴量は、有効でないと特徴量であると判断される。 Boruta, which is an example of Backward Elimination, is a method of creating false features and comparing their importance in order to extract effective features from a large number of features. In Boruta, for example, an existing feature (Original Data) is copied, a sample in each column is shuffled to create a fake feature (Shadow Feature), and the existing feature and the fake feature are combined to make a random feature. Train the forest. From the highest importance of false features, it is possible to obtain an indication of the importance of existing features that do not contribute. That is, an existing feature that is less important than the most important of the false features is judged to be a feature if it is not effective.

ランダムフォレストは、その性質上、訓練するたびに特徴量の重要度が変動する。このため、多数のサンプルを得た上で、統計的に検定を行う必要があり、ビッグデータに適用するには計算コストが増大する。 Due to the nature of random forests, the importance of features changes each time they are trained. Therefore, it is necessary to obtain a large number of samples and then perform a statistical test, which increases the calculation cost to apply to big data.

図９では、分析者による予測結果に対する精度を比較する例を説明した。分析者が存在しない場合などを想定し、分析者による予測結果を用いずに説明変数を抽出してもよい。図１４は、このように構成される場合の説明変数抽出処理の一例を示すフローチャートである。 In FIG. 9, an example of comparing the accuracy of the prediction result by the analyst has been described. Assuming that there is no analyst, the explanatory variables may be extracted without using the prediction result by the analyst. FIG. 14 is a flowchart showing an example of the explanatory variable extraction process in the case of such a configuration.

ステップＳ４０１〜ステップＳ４１０は、図９のステップＳ３０１〜ステップＳ３１０までと同様であるため説明を省略する。図１４の例では、例えばステップＳ４０３の処理と並行して分析者による予測を実行する必要はない。 Since steps S401 to S410 are the same as steps S301 to S310 in FIG. 9, description thereof will be omitted. In the example of FIG. 14, it is not necessary to perform the prediction by the analyst in parallel with, for example, the process of step S403.

ユーザは、ステップＳ４０４で表示された寄与度を参照して、次の学習に用いる説明変数を指定することができる。なお図１４の例では、ユーザは、分析者による予測結果を参考にして説明変数を指定することはできない。 The user can specify the explanatory variables to be used for the next learning by referring to the contribution displayed in step S404. In the example of FIG. 14, the user cannot specify the explanatory variable with reference to the prediction result by the analyst.

ステップＳ４１１で、学習部１０４は、推論精度が向上したか否かを判定する（ステップＳ４１１）。例えば学習部１０４は、ステップＳ４０３での推論結果の精度よりステップＳ４１０で機械学習モデルを用いて推論した場合の精度の方が大きいか否かを判定する。 In step S411, the learning unit 104 determines whether or not the inference accuracy is improved (step S411). For example, the learning unit 104 determines whether or not the accuracy when inferred using the machine learning model in step S410 is greater than the accuracy of the inference result in step S403.

推論精度が向上した場合（ステップＳ４１１：Ｙｅｓ）、説明変数抽出処理を終了する。推論精度が向上していない場合（ステップＳ４１１：Ｎｏ）、抽出部１０３は、別の手法により説明変数を選択する。例えば抽出部１０３は、Backward Eliminationにより説明変数を選択する（ステップＳ４１２）。学習部１０４は、選択された説明変数から、入力された事象データ（目的変数）を推論するための機械学習モデルを学習する（ステップＳ４１３）。 When the inference accuracy is improved (step S411: Yes), the explanatory variable extraction process is terminated. When the inference accuracy is not improved (step S411: No), the extraction unit 103 selects an explanatory variable by another method. For example, the extraction unit 103 selects an explanatory variable by Backward Elimination (step S412). The learning unit 104 learns a machine learning model for inferring the input event data (objective variable) from the selected explanatory variable (step S413).

ステップＳ３０６およびステップＳ３０７で度数分布を作成するときに算出される、説明変数と気象データとの関係を表す指標値は、上記のように、例えば説明変数と気象データとの相関係数である。指標値は相関係数に限られるものではなく、以下のような指標値を用いてもよい。また、複数の指標値を用いて説明変数と気象データとの関係性を評価してもよい。採用する指標の数および説明変数の個数が増加すると関係性の評価精度は向上するが、計算時間が増加する。従って、これらを考慮して適切な個数の指標を用いることが望ましい。
・誤差（二乗平均平方根誤差（RMSE）、平均絶対誤差（MAE）など）：
変数同士のユークリッド距離を評価し、誤差の大きさにより関係性を評価する。
・データ整形：
変数の変動に応じた移動平均線または包絡線を描き、それら同士の関係性を相関係数、誤差、および、その他の手法で評価する。
・位相変化度数：
ある時間幅で、需要量が増加から減少に転じた極値の個数、および、減少から増加に転じた極値の個数で関係性を評価する。
・位相変化インターバル：
ある時間幅で、需要量が増加から減少に転じた極値の時刻、および、減少から増加に転じた極値の時刻のそれぞれの時間的間隔分布で関係性を評価する。
・ヒストグラム密度推定（Peristimulus Time Histogram）：
ある時間幅でのイベント発生頻度を回数で表し、ある期間内に発生したイベントの回数に着目して関係性を評価する。例えば、単位期間の間にスパイクが何回発生したかで度数分布が作成される。
・発火時間間隔（interspike interval）：
全期間での突発的な変動の発生頻度で度数分布（例えばスパイクの間隔の長さの度数分布）を作成し、前にスパイクが発生してから次のスパイクが発生するまでの期間に着目し関係性を評価する。 As described above, the index value representing the relationship between the explanatory variable and the meteorological data, which is calculated when the frequency distribution is created in steps S306 and S307, is, for example, the correlation coefficient between the explanatory variable and the meteorological data. The index value is not limited to the correlation coefficient, and the following index values may be used. In addition, the relationship between the explanatory variables and the meteorological data may be evaluated using a plurality of index values. As the number of indicators to be adopted and the number of explanatory variables increase, the evaluation accuracy of the relationship improves, but the calculation time increases. Therefore, it is desirable to use an appropriate number of indexes in consideration of these.
-Errors (root mean square error (RMSE), mean absolute error (MAE), etc.):
Evaluate the Euclidean distance between variables and evaluate the relationship based on the magnitude of the error.
・ Data shaping:
Draw moving averages or envelopes according to variable fluctuations and evaluate their relationships with correlation coefficients, errors, and other methods.
・ Phase change frequency:
The relationship is evaluated by the number of extremums whose demand has changed from an increase to a decrease and the number of extremums whose demand has changed from a decrease to an increase over a certain period of time.
・ Phase change interval:
The relationship is evaluated by the time interval distribution of the extreme value when the demand changes from increase to decrease and the time of the extreme value when the demand changes from decrease to increase in a certain time width.
・ Histogram density estimation (Peristimulus Time Histogram):
The frequency of event occurrence in a certain time width is expressed by the number of times, and the relationship is evaluated by focusing on the number of events that occurred within a certain period. For example, a frequency distribution is created based on how many spikes occur during a unit period.
-Ignition interval (interspike interval):
Create a frequency distribution (for example, the frequency distribution of the length of the spike interval) based on the frequency of sudden fluctuations over the entire period, and focus on the period from the occurrence of the previous spike to the occurrence of the next spike. Evaluate the relationship.

次に、ステップＳ２０６などで推論結果を表示する表示方法の例について説明する。図１５〜図１９は、推論結果を表示する表示画面の例である。 Next, an example of a display method for displaying the inference result in step S206 or the like will be described. 15 to 19 are examples of display screens for displaying inference results.

図１５に示すように、表示画面は、選択欄１５０１と、地図上に表示されるマーク１５１１、１５１２、１５１３と、を含む。選択欄１５０１は、推論結果に寄与した説明変数を選択するための欄である。図１５の選択欄１５０１では、気温、風速、および、降雨・降雪が説明変数として選択可能である。これらの説明変数は一例であり、他の説明変数を追加可能としてもよい。例えば別の指定画面で、選択欄１５０１に表示する説明変数を指定できるように構成してもよい。 As shown in FIG. 15, the display screen includes a selection field 1501 and marks 1511, 1512, 1513 displayed on the map. The selection column 1501 is a column for selecting an explanatory variable that has contributed to the inference result. In the selection field 1501 of FIG. 15, temperature, wind speed, and rainfall / snowfall can be selected as explanatory variables. These explanatory variables are examples, and other explanatory variables may be added. For example, it may be configured so that the explanatory variable to be displayed in the selection field 1501 can be specified on another specification screen.

マーク１５１１は、発生すると予測された事象の発生位置を示すための記号である。マーク１５１１と同じ形状のマークが表示される位置は、事象がそれぞれ１件発生すると予測された位置であることを意味する。 The mark 1511 is a symbol for indicating the occurrence position of the event predicted to occur. A position where a mark having the same shape as the mark 1511 is displayed means a position where one event is predicted to occur.

マーク１５１２およびマーク１５１３は、円形の記号の内部に数値が記載されている。このような形状のマークは、マークが表示される位置を含む範囲で、数値に相当する件数の事象が発生すると予測されたことを意味する。すなわち、このような形状のマークは、複数のマーク１５１１を集約したマークに相当する。出力制御部１０５は、発生件数に応じてマークの表示態様（色など）を変更してもよい。例えば出力制御部１０５は、発生件数が１桁、２桁、３桁以上の場合にそれぞれマークの色を緑、黄、赤となるように表示してもよい。 The marks 1512 and 1513 have numerical values written inside the circular symbols. A mark having such a shape means that it is predicted that a number of events corresponding to a numerical value will occur within the range including the position where the mark is displayed. That is, a mark having such a shape corresponds to a mark obtained by aggregating a plurality of marks 1511. The output control unit 105 may change the display mode (color, etc.) of the mark according to the number of occurrences. For example, the output control unit 105 may display the mark colors as green, yellow, and red when the number of occurrences is one digit, two digits, three digits or more, respectively.

図１６は、気温のみが説明変数として選択された場合に表示される表示画面の例を示す。選択欄１６０１に示すように、この例では、気温（気温由来）のみが説明変数として選択されている。この場合、選択された説明変数（気温）によって予測された事象についての予測結果が、地図上に表示される。図１６の例では、マーク１５１１、１５１２は表示され、マーク１５１３は気温由来でないため表示されなくなる。 FIG. 16 shows an example of a display screen displayed when only temperature is selected as an explanatory variable. As shown in the selection field 1601, in this example, only the air temperature (derived from the air temperature) is selected as the explanatory variable. In this case, the prediction result for the event predicted by the selected explanatory variable (temperature) is displayed on the map. In the example of FIG. 16, the marks 1511 and 1512 are displayed, and the marks 1513 are not displayed because they are not derived from the temperature.

表示画面は、ユーザの指定などに応じて拡大または縮小表示可能としてもよい。図１７は、拡大表示された表示画面の例を示す。図１７に示すように、マーク１７０１が選択された場合に、出力制御部１０５は、そのマーク１７０１に対応する事象に関する詳細情報を表示してもよい。 The display screen may be enlarged or reduced according to the user's designation or the like. FIG. 17 shows an example of an enlarged display screen. As shown in FIG. 17, when the mark 1701 is selected, the output control unit 105 may display detailed information about the event corresponding to the mark 1701.

上記表示画面は一例であり、推論結果の表示方法はこれらに限られるものではない。図１５および図１６では、発生件数を数値で表示したが、発生件数に応じてマークの表示態様（大きさなど）を変更して表示する表示画面を用いてもよい。図１８および図１９は、このように構成される表示画面の一例を示す図である。 The above display screen is an example, and the display method of the inference result is not limited to these. In FIGS. 15 and 16, the number of occurrences is displayed numerically, but a display screen may be used in which the display mode (size, etc.) of the mark is changed according to the number of occurrences. 18 and 19 are views showing an example of a display screen configured in this way.

図１８の選択欄１８０１では、３つの説明変数のすべてが選択されている。このような場合は、３つの説明変数によって予測された事象の発生位置に、発生件数に応じた半径となる円形のマークを表示する表示画面が表示される。図１９の選択欄１９０１では、１つの説明変数（気温由来）が選択されている。このような場合は、選択された１つの説明変数によって予測された事象の発生位置に、発生件数に応じた半径となる円形のマークを表示する表示画面が表示される。 In the selection field 1801 of FIG. 18, all three explanatory variables are selected. In such a case, a display screen for displaying a circular mark having a radius corresponding to the number of occurrences is displayed at the occurrence position of the event predicted by the three explanatory variables. In the selection field 1901 of FIG. 19, one explanatory variable (derived from temperature) is selected. In such a case, a display screen for displaying a circular mark having a radius corresponding to the number of occurrences is displayed at the occurrence position of the event predicted by one selected explanatory variable.

既存の手法による予測結果、および、分析者による予測結果などの比較対象のデータがある場合は、出力制御部１０５は、比較対象のデータと、本実施形態による予測結果とを対比して表示する表示画面を表示してもよい。 When there is data to be compared such as a prediction result by an existing method and a prediction result by an analyst, the output control unit 105 displays the data to be compared and the prediction result by the present embodiment in comparison with each other. The display screen may be displayed.

ユーザは、以上のような表示画面を用いることにより、どのような説明変数が予測に寄与するかを把握することが可能となる。 By using the display screen as described above, the user can grasp what kind of explanatory variables contribute to the prediction.

（適用例）
本実施形態の情報処理装置は、以下のようなシステムに適用しうる。
（適用例１）道路上における渋滞を予測するシステム
交通における渋滞が起こる要因は様々であるが、その１つに交通需要という要因がある。交通需要とは、各時間帯に道路を通過する車両台数であり、道路を通過可能な交通量（交通容量）に制限が無かった場合の交通量を指す。例えば、１分間あたり５０台の車両が通過する交通容量を想定して設計された道路区間に対して、それ以上の車両が殺到する場合、多くの場合はボトルネックとなり渋滞が発生する。 (Application example)
The information processing device of this embodiment can be applied to the following systems.
(Application example 1) System for predicting traffic congestion on roads There are various factors that cause traffic congestion, and one of them is traffic demand. The traffic demand is the number of vehicles passing through the road in each time zone, and refers to the traffic volume when there is no limit to the traffic volume (traffic capacity) that can pass through the road. For example, when a road section designed assuming a traffic capacity of 50 vehicles per minute is flooded with more vehicles, it often becomes a bottleneck and congestion occurs.

交通需要量は、気象データと強く関連することが統計的に判明している。従って、上記実施形態により気象データから予測することが可能である。また、道路の新規開設および地域ごとのイベント等による交通需要の変化に対して、従来の交通シミュレーションに基づいた数値補正を行うことで、日時ごとおよび道路区間ごとの交通需要の予測（「ｘ月ｙ日ｚ時、道路区間ｊの交通需要は○台／分です」など）が可能となる。 Traffic demand has been statistically found to be strongly associated with meteorological data. Therefore, it is possible to make a prediction from the meteorological data according to the above embodiment. In addition, by making numerical corrections based on conventional traffic simulations for changes in traffic demand due to new road openings and regional events, etc., traffic demand forecasts for each date and time and for each road section ("x month") At the time of y day z, the traffic demand of the road section j is ○ vehicles / minute. "

交通需要の高い区間および低い区間を予測することにより、最適な所要ルートの提示、および、交通需要が高い要因および低い要因の分析が可能となる。従って、道路事業者の渋滞緩和施策の決定、および、ドライバーの行動の支援などが可能となる。 By predicting the sections with high and low traffic demand, it is possible to present the optimum required route and analyze the factors with high and low traffic demand. Therefore, it is possible to determine measures to alleviate traffic congestion of road operators and to support the behavior of drivers.

（適用例２）保険事業者コールセンターにおける入電予測システム
例えば自動車保険を取り扱う事業者のコールセンター業務では、オペレータの人員配置を最適化することが求められている。時間別および地域別に発生しうる入電数の需要を予測することにより、遊休人材の抑制によるコストダウン、および、雇用計画の精微化による経営計画の精度上昇を実現できる。また地域ごとに発生しうる入電の種別を予測することによって、その事例に対応可能なアフターサービス要員を事前に緻密に配置することが可能となる。この結果、入電発生からサービスを行うまでの時間を短縮し、顧客満足度を高めることができる。 (Application example 2) Incoming call prediction system in an insurance business call center For example, in the call center business of a business operator handling automobile insurance, it is required to optimize the staffing of operators. By forecasting the demand for the number of incoming calls that can occur by hour and region, it is possible to reduce costs by curbing idle human resources and improve the accuracy of management plans by refining employment plans. In addition, by predicting the types of incoming calls that can occur in each region, it is possible to precisely allocate after-sales service personnel who can handle the cases in advance. As a result, the time from the occurrence of incoming call to the service can be shortened, and customer satisfaction can be improved.

（適用例３）保険事業者における予測情報提供システム
時間別および地域別に発生しうる入電の種別を予測することにより、その時間に該当地域を走行する自動車保険の利用者に対して「エンジン不停止の発生にご注意ください」といった注意喚起を行うことができる。また、発生確率が高いと予測される事象（パンク、オーバーヒート、衝突など）について、例えば「走行中、道路上の異物にご注意ください」というように、エンドユーザに回避するために行うべき行動を情報提供することが可能となる。 (Application example 3) Forecast information provision system in insurance companies By predicting the types of incoming calls that can occur by time and region, "engine non-stop" is given to automobile insurance users traveling in the relevant region at that time. Please be careful about the occurrence of. " In addition, for events that are expected to occur with high probability (puncture, overheating, collision, etc.), actions that should be taken to avoid end users, such as "Be careful of foreign objects on the road while driving". It becomes possible to provide information.

（適用例４）保険事業者において、継続的にモデルを改善する入電予測システム
事故および故障が多い地域の天候情報、および、その地域でよく利用される車種を入電情報から蓄積し、また、その地域および時間帯におけるエンドユーザの自動車の利用目的などをヒアリングすることで得たデータを新たに追加データとして加えることができる。このようにして追加した追加データを精度向上に活用し、継続的に機械学習モデルの予測精度を改善することができる。 (Application example 4) Incoming call prediction system that continuously improves the model in insurance companies Accumulates weather information in areas where there are many accidents and breakdowns, and vehicle types that are often used in that area from incoming call information, and also Data obtained by hearing the end user's purpose of use of the vehicle in the region and time zone can be newly added as additional data. The additional data added in this way can be utilized for improving the accuracy, and the prediction accuracy of the machine learning model can be continuously improved.

以上説明したとおり、上記実施形態によれば、予測の精度に悪い影響を与えるデータおよび良い影響を与えるデータをより効率的に抽出することが可能となる。この結果、より高精度に事象を予測可能な予測システムが実現可能となる。 As described above, according to the above embodiment, it is possible to more efficiently extract data having a bad influence on the accuracy of prediction and data having a good influence. As a result, a prediction system capable of predicting events with higher accuracy can be realized.

次に、実施形態にかかる情報処理装置のハードウェア構成について図２０を用いて説明する。図２０は、実施形態にかかる情報処理装置のハードウェア構成例を示す説明図である。 Next, the hardware configuration of the information processing apparatus according to the embodiment will be described with reference to FIG. FIG. 20 is an explanatory diagram showing a hardware configuration example of the information processing apparatus according to the embodiment.

実施形態にかかる情報処理装置は、ＣＰＵ（Central Processing Unit）５１などの制御装置と、ＲＯＭ（Read Only Memory）５２やＲＡＭ（Random Access Memory）５３などの記憶装置と、ネットワークに接続して通信を行う通信Ｉ／Ｆ５４と、各部を接続するバス６１を備えている。 The information processing device according to the embodiment connects to a control device such as a CPU (Central Processing Unit) 51 and a storage device such as a ROM (Read Only Memory) 52 or a RAM (Random Access Memory) 53 to communicate with each other. It is provided with a communication I / F 54 for performing communication and a bus 61 for connecting each unit.

実施形態にかかる情報処理装置で実行されるプログラムは、ＲＯＭ５２等に予め組み込まれて提供される。 The program executed by the information processing apparatus according to the embodiment is provided by being incorporated in ROM 52 or the like in advance.

実施形態にかかる情報処理装置で実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disk）等のコンピュータで読み取り可能な記録媒体に記録してコンピュータプログラムプロダクトとして提供されるように構成してもよい。 The program executed by the information processing apparatus according to the embodiment is a file in an installable format or an executable format, and is a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD), or a CD-R (Compact Disk Recordable). ), DVD (Digital Versatile Disk), or the like, which may be recorded on a computer-readable recording medium and provided as a computer program product.

さらに、実施形態にかかる情報処理装置で実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、実施形態にかかる情報処理装置で実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成してもよい。 Further, the program executed by the information processing apparatus according to the embodiment may be stored on a computer connected to a network such as the Internet and provided by downloading via the network. Further, the program executed by the information processing apparatus according to the embodiment may be configured to be provided or distributed via a network such as the Internet.

実施形態にかかる情報処理装置で実行されるプログラムは、コンピュータを上述した情報処理装置の各部として機能させうる。このコンピュータは、ＣＰＵ５１がコンピュータ読取可能な記憶媒体からプログラムを主記憶装置上に読み出して実行することができる。 The program executed by the information processing apparatus according to the embodiment can make the computer function as each part of the information processing apparatus described above. This computer can read a program from a computer-readable storage medium onto the main storage device and execute the program by the CPU 51.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other embodiments, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are also included in the scope of the invention described in the claims and the equivalent scope thereof.

１００情報処理装置
１０１取得部
１０２符号化部
１０３抽出部
１０４学習部
１０５出力制御部
１１１推論部
１２１事象データ記憶部
１２２気象データ記憶部
１２３追加データ記憶部
１２４地理データ記憶部
１２５特徴量記憶部
１２６モデル記憶部
１２７抽出情報記憶部
１３１表示部 100 Information processing device 101 Acquisition unit 102 Coding unit 103 Extraction unit 104 Learning unit 105 Output control unit 111 Reasoning unit 121 Event data storage unit 122 Meteorological data storage unit 123 Additional data storage unit 124 Geographical data storage unit 125 Feature quantity storage unit 126 Model storage unit 127 Extraction information storage unit 131 Display unit

Claims

Acquire the input data to be input to the model that outputs the inference result by inputting the input data including the time-series data whose value continuously changes according to the position, and the correct answer data representing the correct answer of the inference by the model. Acquisition department and
A learning unit that learns the model using the first input data selected from the input data and the correct answer data.
It is provided with an output control unit that outputs the contribution of the first input data to the inference result by the learned model.
The learning unit further learns the model using the third input data based on the second input data designated according to the output degree of contribution of the input data and the correct answer data.
Information processing device.

The first relationship information representing the relationship between the second input data and the time series data matches or resembles the second relationship information representing the relationship between the third input data and the time series data.
The information processing device according to claim 1.

The first relationship information representing the relationship between the second input data and the data including the data correlating with the second input data and the time-series data represents the relationship between the third input data and the time-series data. 2 Matches or resembles related information,
The information processing device according to claim 2.

The first relationship information is a frequency distribution representing the number of times that an index value representing the relationship between the second input data and the time series data appears in each of a plurality of periods included in the designated period.
The second relationship information is a frequency distribution representing the number of times that an index value representing the relationship between the third input data and the time series data appears in each of a plurality of periods included in the designated period.
The information processing device according to claim 2 or 3.

The learning unit determines whether the first relational information and the second relational information match or are similar to each other based on the distance between the plurality of frequency distributions.
The information processing device according to claim 4.

Further, an extraction unit for randomly selecting the first input data from the input data is provided.
The learning unit learns the model using the first input data selected by the extraction unit and the correct answer data.
The information processing device according to claim 1.

The time-series data is predetermined regional meteorological data.
The information processing device according to claim 1.

The output control unit further displays the inference result of the learned model on the display device.
The information processing device according to claim 1.

Acquire the input data to be input to the model that outputs the inference result by inputting the input data including the time-series data whose value continuously changes according to the position, and the correct answer data representing the correct answer of the inference by the model. Acquisition steps and
A first learning step of learning the model using the first input data selected from the input data and the correct answer data, and
An output control step that outputs the contribution of the first input data to the inference result by the learned model, and
A second learning step of further learning the model using the third input data based on the second input data specified according to the output degree of contribution among the input data and the correct answer data.
Information processing methods including.

Computer,
Acquire the input data to be input to the model that outputs the inference result by inputting the input data including the time-series data whose value continuously changes according to the position, and the correct answer data representing the correct answer of the inference by the model. Acquisition department and
A learning unit that learns the model using the first input data selected from the input data and the correct answer data.
It functions as an output control unit that outputs the contribution of the first input data to the inference result by the learned model.
The learning unit further learns the model using the third input data based on the second input data designated according to the output degree of contribution of the input data and the correct answer data.
program.