JPWO2017221856A1

JPWO2017221856A1 - Analysis device, analysis method, and storage medium

Info

Publication number: JPWO2017221856A1
Application number: JP2018524061A
Authority: JP
Inventors: 三橋　秀男; 秀男三橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-06-21
Filing date: 2017-06-19
Publication date: 2019-04-11
Anticipated expiration: 2037-06-19
Also published as: WO2017221856A1; JP6943242B2

Abstract

サンプルデータを削減することも追加することもなく、サンプル数の偏りの影響を低減したデータ分析を行うことを可能とする。分析装置は、説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行する解析手段と、前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行する予測手段と、前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する算出手段と、を備える。It is possible to perform data analysis with reduced influence of sample number deviation without reducing or adding sample data. The analysis device performs machine learning analysis for deriving a relationship between the explanatory variable and the objective variable of the plurality of groups for a plurality of groups generated by grouping a plurality of analysis targets in which the explanatory variable and the objective variable are associated with each other. , Based on the analysis means for each grouping, the values of the explanatory variables of the plurality of groups, and the relationship, the calculation of the predicted value that is the value of the target variable of the plurality of groups is performed for each grouping. And a calculation means for calculating a score relating to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping.

Description

本開示は、データの分析および予測に関する。 The present disclosure relates to data analysis and prediction.

教師データを用いた機械学習に基づく、データの予測において、教師データのうちの目的変数の値ごとにサンプルの個数が大きく異なると、予測の正確度が低下するという問題がある。 In data prediction based on machine learning using teacher data, if the number of samples differs greatly for each value of the objective variable in the teacher data, there is a problem that the accuracy of the prediction decreases.

たとえば、あるエリアにおける１日当たりのある事象（たとえば、事故など）の発生件数を目的変数として機械学習を実行する場合、一般的には、過去の１日当たりの発生件数と説明変数の組が、教師データとして使われる。このとき、その教師データのうちの大多数において、目的変数の値は「０件」や「１件」であり、「２件」以上の目的変数の値を有する教師データは少ないことが考えられうる。すなわち、上述のような教師データでは、目的変数の値ごとにサンプルの個数が大きく異なる。このように、目的変数の値ごとのサンプル数が偏った教師データは、不均衡データ（ＩｍｂａｌａｎｃｅｄＤａｔａ）と呼ばれることがある。不均衡データに基づいて機械学習および予測を行うと、目的変数の値の出現頻度が比較的低いサンプルデータの影響が小さくなり、予測の正確度が悪くなる。 For example, when machine learning is performed using the number of occurrences of an event per day (for example, an accident) in a certain area as an objective variable, in general, a set of past occurrences per day and explanatory variables is a teacher. Used as data. At this time, in the majority of the teacher data, the value of the objective variable is “0” or “1”, and there are few teacher data having the value of the objective variable of “2” or more. sell. That is, in the teacher data as described above, the number of samples greatly differs for each value of the objective variable. In this way, teacher data in which the number of samples for each value of the objective variable is biased may be referred to as imbalanced data (Imbalanced Data). When machine learning and prediction are performed based on unbalanced data, the influence of sample data having a relatively low frequency of appearance of objective variable values is reduced, and the accuracy of prediction is deteriorated.

特に、上述の発生件数の予測においては、発生件数が多いエリアを特定することが所望されるにも関わらず、発生件数の値が大きいサンプルは数が比較的少ないため、そのサンプルの特徴が機械学習において無視されやすい。その結果、たとえば発生件数を予測する式が、発生件数が多い地域の特徴を反映した式にならず、正確な予測ができないおそれがある。 In particular, in the above-described prediction of the number of occurrences, although it is desired to specify an area where the number of occurrences is large, samples having a large number of occurrences have a relatively small number. It is easy to be ignored in learning. As a result, for example, the formula for predicting the number of occurrences does not reflect the characteristics of the region where the number of occurrences is large, and there is a possibility that accurate prediction cannot be performed.

上述の問題を解決するための技術の一例として、特許文献１に記載される、ＲａｎｄｏｍＯｖｅｒＳａｍｐｌｉｎｇ（ＲＯＳ）やＲａｎｄｏｍＵｎｄｅｒＳａｍｐｌｉｎｇ（ＲＵＳ）という方法がある。 As an example of a technique for solving the above problem, there is a method called Random Over Sampling (ROS) or Random Under Sampling (RUS) described in Patent Document 1.

ＲＯＳは、教師データに含まれる２つのクラスのうち、目的変数の値の出現頻度が高い方のクラスのサンプル数に合わせて、頻度が低い方のクラスのサンプル数を増加させる方法である。ＲＵＳは、教師データに含まれる２つのクラスのうち、頻度が低い方のクラスのサンプル数に合わせて、頻度が高い方のクラスのサンプル数を減少させる方法である。 ROS is a method of increasing the number of samples of the class with the lower frequency in accordance with the number of samples of the class with the higher appearance frequency of the value of the objective variable among the two classes included in the teacher data. RUS is a method of reducing the number of samples of the higher frequency class in accordance with the number of samples of the lower frequency class of the two classes included in the teacher data.

特開２０１０−２０４９６６号公報JP 2010-204966 A

ＲＯＳでは意味のないノイズデータを生成してしまう場合がある。一方、ＲＵＳでは有用なサンプルデータを除外してしまう場合がある。その理由は、サンプルデータを人為的に削減したり追加したりするためである。 ROS may generate meaningless noise data. On the other hand, RUS may exclude useful sample data. The reason is to artificially reduce or add sample data.

本発明は、サンプルデータを削減することも追加することもなく、サンプル数の偏りの影響を低減したデータ分析を行うことができる分析装置および方法を提供することを目的の１つとする。 An object of the present invention is to provide an analysis apparatus and method capable of performing data analysis with reduced influence of the deviation in the number of samples without reducing or adding sample data.

本発明の一態様に係る分析装置は、説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行する解析手段と、前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行する予測手段と、前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する算出手段と、を備える。 An analysis apparatus according to an aspect of the present invention relates to a plurality of groups generated by grouping a plurality of analysis targets in which an explanatory variable and an objective variable are associated, and a relationship between the explanatory variable and the objective variable of the plurality of groups. Based on the analysis means for executing the machine learning analysis for each grouping, the values of the explanatory variables of the plurality of groups, and the relationship, the predicted values that are the values of the objective variables of the plurality of groups Prediction means for executing calculation for each grouping, and calculation means for calculating a score relating to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping And comprising.

本発明の一態様に係る分析方法は、説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行し、前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行し、前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する。 The analysis method according to an aspect of the present invention provides a relationship between an explanatory variable and an objective variable of the plurality of groups, for a plurality of groups generated by grouping a plurality of analysis targets in which the explanatory variable and the objective variable are associated. Machine learning analysis is performed for each grouping, and based on the explanatory variable values of the plurality of groups and the relationship, calculation of a predicted value that is the value of the target variable of the plurality of groups, The process is executed for each grouping, and a score related to the analysis target is calculated by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping.

本発明の一態様に係るプログラムは、コンピュータに、説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行する解析処理と、前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行する予測処理と、前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する算出処理と、を実行させる。 A program according to an aspect of the present invention provides an explanatory variable and an objective variable of the plurality of groups for a plurality of groups generated by grouping a plurality of analysis targets in which the explanatory variable and the objective variable are associated with each other. Based on the analysis process for performing the machine learning analysis for deriving the relationship for each grouping, the value of the explanatory variable of the plurality of groups, and the relationship, and the prediction that is the value of the objective variable of the plurality of groups The calculation of the value is calculated for each grouping, and the score related to the analysis target is calculated by the calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping. And a calculation process.

本発明によれば、サンプルデータを削減することも追加することもなく、サンプル数の偏りの影響を低減したデータ分析を行うことができる。 According to the present invention, it is possible to perform data analysis in which the influence of the deviation in the number of samples is reduced without reducing or adding sample data.

サンプルデータの例を表すデータ構造の図である。It is a figure of the data structure showing the example of sample data. 本発明の第１の実施形態に係る分析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 1st Embodiment of this invention. 第１の実施形態に係る分析装置の主要な動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the main operation | movement of the analyzer which concerns on 1st Embodiment. 本発明の第２の実施形態に係る分析装置の構成を示すブロック図である。It is a block diagram which shows the structure of the analyzer which concerns on the 2nd Embodiment of this invention. それぞれのセルにおける、ある一日の事故の発生件数の例を示す図である。It is a figure which shows the example of the occurrence number of an accident of a certain day in each cell. 図５の例のデータの発生件数の分布を表すヒストグラムである。6 is a histogram showing the distribution of the number of occurrences of data in the example of FIG. それぞれのセルにおける、一日の事故の発生件数の数百日分にわたるデータの、発生件数の分布を表すヒストグラムである。It is a histogram showing distribution of the occurrence number of the data over several hundred days of the occurrence number of accidents in one day in each cell. 縦方向にグループを作成する概念を表す図である。It is a figure showing the concept which creates a group in the vertical direction. グループの目的変数の値の分布を表すヒストグラムである。It is a histogram showing distribution of the value of the objective variable of a group. 横方向にグループを作成する概念を表す図である。It is a figure showing the concept which creates a group in a horizontal direction. 縦方向の組分けおよび横方向の組分けにより生成するグループのそれぞれの目的変数の値の例を説明する図である。It is a figure explaining the example of the value of each objective variable of the group produced | generated by the grouping of the vertical direction and the grouping of a horizontal direction. 各グループの予測値の例と、その予測値から算出される各セルのスコアの例を示す図である。It is a figure which shows the example of the predicted value of each group, and the example of the score of each cell calculated from the predicted value. 第２の実施形態に係る分析装置の動作の流れを示すフローチャートである。It is a flowchart which shows the flow of operation | movement of the analyzer which concerns on 2nd Embodiment. 組分け法の第３の例を表す図である。It is a figure showing the 3rd example of the grouping method. 組分け法の第４の例を表す図である。It is a figure showing the 4th example of the grouping method. 組分け法の第５の例を表す図である。It is a figure showing the 5th example of the grouping method. 組分け法の第６の例を表す図である。It is a figure showing the 6th example of the grouping method. セルのデータと組分け法とから、その組分け法により生成するグループのそれぞれの目的変数の値を導出する例を示す図である。It is a figure which shows the example which derives | leads-out the value of each objective variable of the group produced | generated by the grouping method from the cell data and the grouping method. ３つの組分け法により各セルのスコアを算出する例を示す図である。It is a figure which shows the example which calculates the score of each cell by three grouping methods. ５つの組分け法により生成する各グループの目的変数の予測値の例を示す図である。It is a figure which shows the example of the predicted value of the objective variable of each group produced | generated by five grouping methods. 変形例３によって算出される各セルのスコアの例を示す図である。It is a figure which shows the example of the score of each cell calculated by the modification 3. 本発明の各実施形態の各部を構成するハードウェアの例を示すブロック図である。It is a block diagram which shows the example of the hardware which comprises each part of each embodiment of this invention.

以下、図面を参照しながら、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜＜第１の実施形態＞＞
まず、本発明の第１の実施形態について説明する。<< First Embodiment >>
First, a first embodiment of the present invention will be described.

第１の実施形態に係る分析装置１１は、分析対象について蓄積された説明変数と目的変数とのデータの組である、サンプルデータを扱う。分析対象とは、目的変数の値または値の目安を導出する対象である。分析対象は、識別番号によって識別されてもよいし、分析装置１１が表示する画面上の位置によって識別されてもよい。なお、分析対象にはそれぞれ説明変数の値が関連付けられる。分析装置１１は、後述する処理によって、その説明変数の値に基づいて分析対象の目的変数の値の目安となるスコアを算出する。 The analysis apparatus 11 according to the first embodiment handles sample data, which is a data set of explanatory variables and objective variables accumulated for an analysis target. The analysis target is a target for deriving the value of the objective variable or a guide for the value. The analysis target may be identified by an identification number, or may be identified by a position on the screen displayed by the analysis apparatus 11. Each analysis target is associated with a value of an explanatory variable. The analysis device 11 calculates a score that serves as a guideline for the value of the objective variable to be analyzed based on the value of the explanatory variable by a process described later.

目的変数は、たとえば、ユーザが値を予測したい変数として選択した変数である。目的変数は、たとえば、あるエリアにおける、１日あたりの事故件数、１週間あたりの事件件数、または、１日あたりの救急車の出動件数、等でもよい。たとえば、ユーザがあるエリアにおける翌日の事故の発生件数を予測したい場合、目的変数は、そのエリアにおける１日あたりの事故の発生件数に設定されればよい。なお、この場合の分析対象は、そのエリアである。 The objective variable is, for example, a variable selected by the user as a variable whose value is to be predicted. The objective variable may be, for example, the number of accidents per day, the number of incidents per week, or the number of ambulances dispatched per day in a certain area. For example, when the user wants to predict the number of accidents that occur in the next day in a certain area, the objective variable may be set to the number of accidents per day in that area. In this case, the analysis target is the area.

説明変数は、目的変数の値に影響を与える要因であると考えられる変数である。たとえば、目的変数が、あるエリア内の１日あたりの事故の発生件数であれば、説明変数として考えられる変数は、例えば、各日における交通量、自動車保有率、自転車保有台数、信号機の数、標識の数、交差点の数、過去の事故の発生件数、天候、道路の幅員の平均、および、平日であるか休日であるか、等である。 The explanatory variable is a variable that is considered to be a factor that affects the value of the objective variable. For example, if the objective variable is the number of accidents per day in an area, the variables that can be considered as explanatory variables are, for example, traffic volume, car ownership rate, number of bicycles, number of traffic lights, The number of signs, the number of intersections, the number of past accidents, the weather, the average road width, and whether it is a weekday or a holiday.

図１に、サンプルデータの例を示す。図１に示されるように、サンプルデータは、分析対象（この例ではエリア）ごとの、その分析対象に関連する説明変数と目的変数との組である。図１に示す例では、Ｘ年Ｙ月Ｚ日の事故の発生件数と、その日の交通量や天気等の情報とが、サンプルデータに含まれている。 FIG. 1 shows an example of sample data. As shown in FIG. 1, the sample data is a set of explanatory variables and objective variables related to the analysis object for each analysis object (area in this example). In the example shown in FIG. 1, the number of accidents on year Y, month Z, and information such as traffic volume and weather on that day are included in the sample data.

分析装置１１は、たとえば図示しない記憶装置から、分析対象に関するサンプルデータを取得する。そして、分析装置１１は、複数の分析対象のそれぞれについて、分析対象に関連づけられた説明変数に基づいて目的変数の値の目安となるスコアを算出する。以下、分析装置１１の制御構造について説明する。 The analysis device 11 acquires sample data related to the analysis target from, for example, a storage device (not shown). Then, the analysis device 11 calculates, for each of the plurality of analysis targets, a score that serves as a guideline for the value of the objective variable based on the explanatory variable associated with the analysis target. Hereinafter, the control structure of the analyzer 11 will be described.

＜構成＞
図２は、第１の実施形態の分析装置１１の構成を示すブロック図である。分析装置１１は、解析部１１３と、予測部１１４と、算出部１１５と、を備える。<Configuration>
FIG. 2 is a block diagram illustrating a configuration of the analyzer 11 according to the first embodiment. The analysis device 11 includes an analysis unit 113, a prediction unit 114, and a calculation unit 115.

解析部１１３は、分析対象に対する組分けによって生成する複数のグループについて、それぞれのグループの説明変数と目的変数との関係を導出する機械学習分析を実行する。なお、組分けとは、分析対象を複数のグループに分類することである。 The analysis unit 113 performs machine learning analysis for deriving the relationship between the explanatory variable and the objective variable of each group for a plurality of groups generated by grouping the analysis targets. The grouping means that the analysis target is classified into a plurality of groups.

組分けは、たとえば図示しない組分け部がそれぞれの分析対象にグループを識別する番号等を関連づけることにより、行われればよい。組分けによって、例えば、１００個の分析対象が、それぞれ１０個のグループのいずれかに関連づけられる。 The grouping may be performed by, for example, associating a number or the like for identifying a group with each analysis target by a grouping unit (not shown). By grouping, for example, 100 analysis objects are associated with any of 10 groups.

解析部１１３は、この組分けにより生成するグループを教師データの単位とした、機械学習分析を行う。 The analysis unit 113 performs machine learning analysis using a group generated by the grouping as a unit of teacher data.

具体的には、解析部１１３はまず、グループごとに、グループに含まれる分析対象に関連づけられるサンプルデータに基づいて、グループデータを作成する。グループデータとは、そのグループを１つのまとまりとして捉えた場合の説明変数と目的変数との組み合わせである。 Specifically, the analysis unit 113 first creates group data for each group based on sample data associated with the analysis target included in the group. Group data is a combination of explanatory variables and objective variables when the group is regarded as one unit.

たとえば、解析部１１３は、グループに含まれる分析対象に関連づけられるサンプルデータの目的変数の値を統合する。値を統合するとは、各々の値に基づく代表値を設定することである。すなわち、解析部１１３は、グループに含まれる分析対象に関連づけられるサンプルデータの各々の目的変数の値に基づく代表値を設定し、その代表値をグループの目的変数の値と見なす。値を統合するとは、具体的には、たとえば、それらの値を合計することである。あるいは、値を統合するとは、それらの値の平均を算出することでもよい。 For example, the analysis unit 113 integrates the values of objective variables of sample data associated with analysis targets included in the group. To integrate values is to set a representative value based on each value. That is, the analysis unit 113 sets a representative value based on the value of each objective variable of the sample data associated with the analysis target included in the group, and regards the representative value as the value of the objective variable of the group. Specifically, the values are integrated, for example, by summing those values. Alternatively, integrating values may mean calculating an average of those values.

同様に、解析部１１３は、それぞれの説明変数の値を統合する。なお、数値で表されない説明変数の値を統合する場合は、解析部１１３は、たとえばその説明変数の値の導出の仕方に基づいて、改めてそのグループにおける代表値を決定してもよい。 Similarly, the analysis unit 113 integrates the values of the respective explanatory variables. When integrating the values of explanatory variables not represented by numerical values, the analysis unit 113 may determine a representative value in the group again based on, for example, how to derive the values of the explanatory variables.

なお、解析部１１３は、異なる日における目的変数（または説明変数）を、別々に統合してよい。したがって、たとえば、数百日にわたるサンプルデータがある場合は、解析部１１３は、［分析対象の分類数×数百（個）］のグループデータを作成しうる。 Note that the analysis unit 113 may integrate objective variables (or explanatory variables) for different days separately. Therefore, for example, when there is sample data over several hundred days, the analysis unit 113 can create [number of classifications to be analyzed × several hundred (pieces)] group data.

そして、解析部１１３は、作成したグループデータを教師データとして使用した機械学習分析を実行する。機械学習分析は、たとえば、教師データに基づいて説明変数と目的変数との間の関係を導出する分析である。たとえば、機械学習分析では、説明変数と目的変数との間の関係を表す関数が導出される。この導出される関数は、帰納的に導出されるものであるから、いわば説明変数の値から目的変数の値を予測する関数である。以下、機械学習分析により導出される関数を「予測式」と呼ぶ。 Then, the analysis unit 113 performs machine learning analysis using the created group data as teacher data. The machine learning analysis is an analysis for deriving a relationship between explanatory variables and objective variables based on teacher data, for example. For example, in machine learning analysis, a function representing a relationship between explanatory variables and objective variables is derived. Since this derived function is derived recursively, it is a function that predicts the value of the objective variable from the value of the explanatory variable. Hereinafter, a function derived by machine learning analysis is referred to as a “prediction formula”.

すなわち、一例として、解析部１１３は、上記の教師データを使用した解析により、目的変数の値を予測する予測式を導出する。 That is, as an example, the analysis unit 113 derives a prediction formula for predicting the value of the objective variable by analysis using the teacher data.

解析部１１３は、２以上の組分けのそれぞれについて、予測式を導出する。たとえば、解析部１１３は、まず１つの組分けにより生成する全てのグループのグループデータを教師データとした機械学習分析により、１つの組分けに対する１つの予測式を導出する。そして、解析部１１３は、さらに、別の組分けにより生成する全てのグループのグループデータを教師データとした機械学習分析により、別の予測式を導出する。 The analysis unit 113 derives a prediction formula for each of the two or more groupings. For example, the analysis unit 113 first derives one prediction formula for one grouping by machine learning analysis using group data of all groups generated by one grouping as teacher data. Then, the analysis unit 113 further derives another prediction formula by machine learning analysis using group data of all groups generated by another grouping as teacher data.

予測部１１４は、組分けごとに、解析部１１３が導出した予測式とグループごとの説明変数の値とに基づいて、グループごとの目的変数の値である予測値を算出する。たとえば、予測部１１４は、翌日におけるグループの目的変数の値を、予測式と、翌日におけるグループの説明変数とに基づいて、算出する。グループの説明変数の値は、ユーザによって入力されてもよい。グループの説明変数の値は、データベース３２０に含まれる情報に基づいて設定されてもよい。 For each grouping, the prediction unit 114 calculates a prediction value that is a value of the objective variable for each group based on the prediction formula derived by the analysis unit 113 and the value of the explanatory variable for each group. For example, the prediction unit 114 calculates the value of the objective variable of the group on the next day based on the prediction formula and the explanatory variable of the group on the next day. The value of the group explanatory variable may be entered by the user. The value of the group explanatory variable may be set based on information included in the database 320.

算出部１１５は、組分けごとに算出されたグループの目的変数の値に基づく演算によって、分析対象に関する値を算出する。算出部１１５によって算出される値を、「スコア」と呼ぶ。スコアは、目的変数の値の大きさの目安となる。 The calculation unit 115 calculates a value related to the analysis target by an operation based on the value of the objective variable of the group calculated for each grouping. The value calculated by the calculation unit 115 is referred to as “score”. The score is a measure of the value of the objective variable.

たとえば、ある組分け（第１の組分け）によって生成するグループの目的変数の値と、その組分けとは異なる組分け（第２の組分け）によって生成するグループの目的変数の値とがあるとする。算出部１１５は、分析対象のスコアとして、第１の組分けにおいてその分析対象が属するグループの予測値と、第２の組分けにおいてその分析対象が属するグループの予測値とを、乗算した値を算出する。算出部１１５は、分析対象のスコアとして、分析対象が属するグループの予測値の平均を算出してもよい。 For example, there are a group objective variable value generated by a certain grouping (first grouping) and a group objective variable value generated by a grouping different from the grouping (second grouping). And The calculation unit 115 multiplies the predicted value of the group to which the analysis target belongs in the first grouping by the predicted value of the group to which the analysis target belongs in the second grouping, as the analysis target score. calculate. The calculation unit 115 may calculate the average of the predicted values of the group to which the analysis target belongs as the analysis target score.

＜主要な動作＞
分析装置１１の主要な動作の流れを、図３のフローチャートに沿って説明する。<Main operations>
The main operation flow of the analyzer 11 will be described with reference to the flowchart of FIG.

ステップＳ２１において、解析部１１３は、分析対象に対する組分けによって生成する複数のグループについて、複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、組分けごとに実行する。 In step S21, the analysis unit 113 performs, for each grouping, machine learning analysis for deriving the relationship between the explanatory variables of the plurality of groups and the objective variable for the plurality of groups generated by the grouping for the analysis target.

ステップＳ２２において、予測部１１４は、グループの説明変数の値と、解析部１１３が導出した関係とに基づいた、グループの目的変数の値である予測値の算出を、組分けごとに実行する。 In step S 22, the prediction unit 114 calculates, for each grouping, a predicted value that is a value of the target variable of the group based on the value of the explanatory variable of the group and the relationship derived by the analysis unit 113.

ステップＳ２３において、算出部１１５は、分析対象に関するスコアを、組分けごとに算出された、その分析対象が属するグループの予測値に基づく演算によって、算出する。 In step S23, the calculation unit 115 calculates a score related to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping.

＜効果＞
第１の実施形態の構成によれば、サンプルデータを削減することも追加することもなく、目的変数の値ごとのサンプル数の偏りの影響を低減したデータ分析を行うことができる。その理由は、各サンプルデータがグループにまとめられることによって、教師データにおける目的変数の値ごとのサンプル数の偏りが軽減されるからである。かつ、この分析においては、サンプルデータは削減も追加もされていない。<Effect>
According to the configuration of the first embodiment, it is possible to perform data analysis in which the influence of the deviation of the number of samples for each value of the objective variable is reduced without reducing or adding sample data. The reason for this is that each sample data is grouped together to reduce the deviation in the number of samples for each value of the objective variable in the teacher data. And in this analysis, the sample data has not been reduced or added.

＜＜第２の実施形態＞＞
次に、本発明の第２の実施形態について説明する。<< Second Embodiment >>
Next, a second embodiment of the present invention will be described.

＜構成＞
図４は、第２の実施形態に係る分析装置１２の構成を示すブロック図である。<Configuration>
FIG. 4 is a block diagram showing the configuration of the analyzer 12 according to the second embodiment.

分析装置１２は、データベース３２０を記憶する記憶装置３２と通信可能に接続されている。分析装置１２は、記憶装置３２からデータベース３２０が含む情報を読み出す。分析装置１２が読み出す情報は、たとえば、ユーザによる指定に基づいて読み出されてもよい。 The analysis device 12 is connected to a storage device 32 that stores the database 320 so as to be communicable. The analysis device 12 reads information included in the database 320 from the storage device 32. The information read by the analysis device 12 may be read based on, for example, designation by the user.

本実施形態のデータベース３２０は、ある地域における事故に関連する情報を含む。たとえば、データベース３２０は、その地域における、事故が発生した日時および場所、ならびに、日ごとの交通量、天気、降雨量、交差点の数、信号機の数、道路の幅員の平均、および平日であるか休日であるかの区別等の情報を含む、所定期間（たとえば過去数百日分）にわたるデータを記憶する。 The database 320 of this embodiment includes information related to accidents in a certain area. For example, whether the database 320 is the date and location of the accident in the region, as well as daily traffic, weather, rainfall, number of intersections, number of traffic lights, average road width, and weekdays Data over a predetermined period (for example, the past several hundred days) including information such as discrimination of whether it is a holiday is stored.

分析装置１２は、分割部１１１、組分け部１１２、解析部１１３、予測部１１４、算出部１１５、および、出力部１１６を備える。 The analysis device 12 includes a division unit 111, a grouping unit 112, an analysis unit 113, a prediction unit 114, a calculation unit 115, and an output unit 116.

＝＝＝分割部１１１＝＝＝
分割部１１１は、分析の範囲（すなわち、地域の範囲）を特定する。範囲の特定において、分割部１１１は、たとえば、ユーザから地域の範囲を指定する情報を取得する。分割部１１１は、データベース３２０から、地域の範囲を指定する情報を読み出してもよい。分割部１１１は、地域の範囲を指定する情報に基づいて、分析の範囲を特定すればよい。=== Division Unit 111 ===
The dividing unit 111 identifies the analysis range (that is, the region range). In specifying the range, the dividing unit 111 acquires information specifying the range of the region from the user, for example. The dividing unit 111 may read information specifying a region range from the database 320. The dividing unit 111 may specify the analysis range based on the information specifying the area range.

分割部１１１は、分析の範囲を複数の区画にメッシュ分割する。分割する際のメッシュのサイズは、目的に応じて適宜選択されてよい。例えば、メッシュのサイズは、ユーザが指定してもよい。この場合、たとえばユーザが「１キロメートル四方」を示す情報を分析装置１２に入力することにより、分割部１１１は、地域を１キロメートル四方のメッシュサイズで分割してもよい。あるいは、メッシュのサイズは、特定された地域の大きさやデータ数に応じて、分析装置１２によって適宜設定されてもよい。 The dividing unit 111 divides the analysis range into a plurality of sections. The size of the mesh when dividing may be appropriately selected according to the purpose. For example, the user may specify the mesh size. In this case, for example, when the user inputs information indicating “1 km square” to the analysis device 12, the dividing unit 111 may divide the area with a mesh size of 1 km square. Alternatively, the size of the mesh may be appropriately set by the analysis device 12 according to the size of the specified region and the number of data.

以下、分割によって生成する区画の１つ１つを、「セル」と呼ぶ。１つのセルのサイズは、たとえば数十メートル四方でも、数キロメートル四方でもよい。セルの形は四角形でなくともよい。全てのセルのサイズが同一である必要はない。 Hereinafter, each of the sections generated by the division is referred to as a “cell”. The size of one cell may be several tens of meters square or several kilometers square, for example. The shape of the cell need not be a rectangle. It is not necessary that all cells have the same size.

なお、このようにして生成したセルが、本実施形態の分析装置１２の算出部１１５によるスコアの算出の対象、すなわち分析対象である。 The cell generated in this way is a score calculation target by the calculation unit 115 of the analysis apparatus 12 of the present embodiment, that is, an analysis target.

分割部１１１は、セルごとに、サンプルデータを特定してもよい。すなわち、分割部１１１は、セルごとに、これまでに測定された目的変数の値および説明変数の値の組を特定してもよい。目的変数は、たとえば、ユーザによって設定される。例として、目的変数は１日あたりの事故の発生件数である。分割部１１１は、例えば、サンプルとなる目的変数および説明変数の値を、データベース３２０に記憶されるデータに基づいて特定する。例えば、分析装置１２が目的変数を１日あたりの事故の発生件数とする分析を行う場合には、分割部１１１は、データベース３２０に記憶される、これまでに記録された事故のデータに基づき、各セルにおける日ごとの発生件数と説明変数の値とを算出してもよい。 The dividing unit 111 may specify sample data for each cell. That is, the dividing unit 111 may specify a set of objective variable values and explanatory variable values measured so far for each cell. The objective variable is set by the user, for example. As an example, the objective variable is the number of accidents per day. For example, the dividing unit 111 specifies the values of the objective variable and the explanatory variable that are samples based on the data stored in the database 320. For example, when the analysis device 12 performs analysis using the objective variable as the number of accidents per day, the dividing unit 111 is based on accident data recorded so far, which is stored in the database 320. The number of occurrences per day in each cell and the value of the explanatory variable may be calculated.

図５は、ある日のそれぞれのセルの事故の発生件数の一例を示す図である。事故の発生件数のような、単位期間（例えば１日間）における１つのセル（例えば１キロメートル四方）における事象の発生回数を表す変数は、その値が、１以下の値をとる場合と比較して、２以上の値をとることが少ない場合がある。そのような変数が目的変数である場合、図５に示されるように、目的変数の値が０件や１件であるセルが多く、値が２件以上であるセルは少なくなる。 FIG. 5 is a diagram illustrating an example of the number of accidents in each cell on a certain day. A variable that represents the number of occurrences of an event in one cell (for example, 1 km square) in a unit period (for example, 1 day), such as the number of occurrences of an accident, is compared with the case where the value takes a value of 1 or less. In some cases, the value of 2 or more is rare. When such a variable is an objective variable, as shown in FIG. 5, there are many cells whose value of the objective variable is 0 or 1, and there are few cells whose value is 2 or more.

図６は、図５に示した例の、各セルの目的変数の値の度数分布を表すヒストグラムである。図６で明白なように、目的変数の値（発生件数）ごとのサンプルデータ数は、大きく偏っていることがわかる。 FIG. 6 is a histogram showing the frequency distribution of the value of the objective variable in each cell in the example shown in FIG. As can be seen from FIG. 6, the number of sample data for each value (number of occurrences) of the objective variable is greatly biased.

分析装置１２が扱うデータは、特定の１日のデータのみである必要はない。分析装置１２は、複数の日のデータを扱ってもよい。図７は、数百日にわたってデータベース３２０に蓄積された各日のデータの、目的変数の値の度数分布を示すヒストグラムの一例である。一般的には、数百日分のデータを使用することによりサンプルデータ数は増えるため、予測の精度（precision、すなわち、予測結果のばらつきの小ささ）は向上すると考えられる。しかしながら、サンプルデータ数の増加によっても、目的変数の値ごとのサンプルデータ数の不均衡は依然として改善されないため、正確度（accuracy、すなわち、真の値への近さ）が向上するとはいえない。 The data handled by the analyzer 12 need not be only data for a specific day. The analysis device 12 may handle data for a plurality of days. FIG. 7 is an example of a histogram showing the frequency distribution of the value of the objective variable of each day's data accumulated in the database 320 over several hundred days. In general, since the number of sample data increases by using data for several hundred days, it is considered that the accuracy of the prediction (precision, that is, the small variation in the prediction result) is improved. However, even if the number of sample data increases, the imbalance in the number of sample data for each value of the objective variable is still not improved, so it cannot be said that accuracy (accuracy, that is, closeness to the true value) is improved.

＝＝＝組分け部１１２＝＝＝
組分け部１１２は、同じ列のセルを１つのグループにまとめる。図８は、組分け部１１２が同じ列のセルを１つのグループのまとめる様子を示す概念図である。図８に示す例では、組分け部１１２は、縦方向に並ぶセルが同じグループになるよう、５つのグループＡ_１，Ａ_２，Ａ_３、Ａ_４，およびＡ_５にまとめる。すなわち、組分け部１１２は、セルを５つのグループに分類する。=== Grouping Unit 112 ===
The grouping unit 112 groups cells in the same column into one group. FIG. 8 is a conceptual diagram showing how the grouping unit 112 groups cells in the same column into one group. In the example illustrated in FIG. 8, the grouping unit 112 groups the cells arranged in the vertical direction into five groups A ₁ , A ₂ , A ₃ , A ₄ , and A ₅ so that the cells are arranged in the same group. That is, the grouping unit 112 classifies the cells into five groups.

図９は、それぞれのグループにおける目的変数の値の度数分布を示すヒストグラムの一例である。図９で示されるように、図７に示される例に比べ、目的変数の値が広範囲にわたって分布し、目的変数の値ごとのサンプルデータ数のばらつきは抑えられる。 FIG. 9 is an example of a histogram showing the frequency distribution of the value of the objective variable in each group. As shown in FIG. 9, compared to the example shown in FIG. 7, the value of the objective variable is distributed over a wide range, and the variation in the number of sample data for each value of the objective variable is suppressed.

組分け部１１２は、同様に、横方向の行が同じセルを１つのグループにまとめる。すなわち、組分け部１１２は、図１０のように、組分け部１１２は、各セルを、グループＢ_１，Ｂ_２，Ｂ_３、Ｂ_４，およびＢ_５に分類する。Similarly, the grouping unit 112 groups cells having the same horizontal row into one group. That is, the grouping unit 112 classifies each cell into groups B ₁ , B ₂ , B ₃ , B ₄ , and B _{5 as shown} in FIG.

＝＝＝解析部１１３＝＝＝
解析部１１３は、組分けごとに、グループを教師データの単位として機械学習を行う。具体的には、解析部１１３は、機械学習を以下のように行う。=== Analysis Unit 113 ===
For each grouping, the analysis unit 113 performs machine learning using the group as a unit of teacher data. Specifically, the analysis unit 113 performs machine learning as follows.

解析部１１３は、まず、グループを１つの単位とした教師データを取得する。すなわち、解析部１１３は、各グループの目的変数の値および説明変数の値を取得する。 First, the analysis unit 113 acquires teacher data with a group as one unit. That is, the analysis unit 113 acquires the value of the objective variable and the value of the explanatory variable for each group.

グループの目的変数の値は、たとえば、グループに含まれるセルの目的変数の総和である。たとえば、図８によれば、グループＡ_１の目的変数の値は、０＋１＋１＋１＋０＝３である。グループの目的変数の値は、グループに含まれるセルの目的変数の平均でもよい。The value of the objective variable of the group is, for example, the sum of the objective variables of the cells included in the group. For example, according to FIG. 8, the value of the objective variable of group A ₁ is 0 + 1 + 1 + 1 + 0 = 3. The value of the objective variable of the group may be an average of the objective variables of the cells included in the group.

図１１は、図５に示したサンプルの例において、セルを縦方向および横方向にまとめることによって生成したグループの、それぞれの目的変数の値の例を示す図である。 FIG. 11 is a diagram illustrating an example of values of respective objective variables of a group generated by grouping cells in the vertical direction and the horizontal direction in the sample example illustrated in FIG. 5.

解析部１１３は、各グループの説明変数の値を算出する。グループの説明変数の値は、たとえば、グループに含まれるセルの説明変数の総和でもよいし、平均でもよい。 The analysis unit 113 calculates the value of the explanatory variable for each group. The value of the explanatory variable of the group may be, for example, the sum of the explanatory variables of the cells included in the group or an average.

こうして、解析部１１３は、グループを１つの単位とした教師データ（すなわち、グループの目的変数の値および説明変数の値の組）を取得する。 In this way, the analysis unit 113 obtains teacher data (that is, a set of objective variable values and explanatory variable values) of the group as one unit.

そして、解析部１１３は、得られた教師データを用いて、機械学習分析を行う。 Then, the analysis unit 113 performs machine learning analysis using the obtained teacher data.

解析部１１３は、たとえば、縦方向にまとめられたグループの、たとえば過去数百日分のデータに基づいて、機械学習分析を行い、１つの予測式を導出する。この予測式は、グループの説明変数の値からグループの目的変数の値を予測する式である。 For example, the analysis unit 113 performs machine learning analysis based on, for example, the past several hundred days of data grouped in the vertical direction, and derives one prediction formula. This prediction formula is a formula for predicting the value of the objective variable of the group from the value of the explanatory variable of the group.

解析部１１３は、同様に、横方向にまとめられたグループの過去数百日分のデータに基づいて、機械学習分析を行い、さらに別の予測式を導出する。なお、この機械学習分析において基となる過去のデータは、縦方向にまとめられたグループに対する機械学習分析で用いたデータと同じ期間のデータであってもよいし、異なる期間のデータであってもよい。 Similarly, the analysis unit 113 performs machine learning analysis based on the data for the past several hundred days of the group grouped in the horizontal direction, and derives another prediction formula. The past data that is the basis of this machine learning analysis may be data in the same period as the data used in the machine learning analysis for the group grouped in the vertical direction, or may be data in a different period. Good.

これにより、不均衡性が低減された教師データに基づく予測式が算出される。 Thereby, the prediction formula based on the teacher data with reduced imbalance is calculated.

＝＝＝予測部１１４＝＝＝
予測部１１４は、解析部１１３が導出した予測式に基づいて、各グループの目的変数の予測値を算出する。具体的には、予測部１１４は、各グループの説明変数の値を予測式に代入することにより、そのグループの目的変数の予測値を得る。このとき用いられる各グループの説明変数の値は、たとえば、予測値を算出したい日の、説明変数の実測値もしくは予測値である。たとえば、信号機数や交差点数の値は、前日と同一の値が設定されてよい。自転車保有台数の値は、前日と同一の値または増減率を考慮した値が設定されればよい。天候は、天気予報等の情報から、尤もらしい値が設定されればよい。=== Prediction unit 114 ===
The prediction unit 114 calculates the predicted value of the objective variable of each group based on the prediction formula derived by the analysis unit 113. Specifically, the prediction unit 114 obtains the predicted value of the objective variable of the group by substituting the value of the explanatory variable of each group into the prediction formula. The value of the explanatory variable of each group used at this time is, for example, the actual value or the predicted value of the explanatory variable on the day on which the predicted value is to be calculated. For example, the same value as the previous day may be set for the number of traffic lights and the number of intersections. The value of the number of bicycles possessed may be set to the same value as the previous day or a value that takes into account the rate of change. A reasonable value may be set for the weather from information such as a weather forecast.

以下、グループの目的変数の予測値を、単に「グループの予測値」と呼ぶことがある。 Hereinafter, the predicted value of the objective variable of the group may be simply referred to as “the predicted value of the group”.

＝＝＝算出部１１５＝＝＝
算出部１１５は、予測部１１４が導出したグループの予測値に基づき、各セルの目的変数の値の目安となるスコアを算出する。=== Calculating Unit 115 ===
The calculation unit 115 calculates a score serving as a guide for the value of the objective variable of each cell based on the predicted value of the group derived by the prediction unit 114.

たとえば、算出部１１５は、それぞれのセルについて、そのセルを含むグループのそれぞれの予測値をかけ合わせた値を、そのセルにおけるスコアとして算出する。 For example, the calculation unit 115 calculates, for each cell, a value obtained by multiplying each predicted value of the group including the cell as a score in the cell.

図１２は、予測部１１４が算出した各グループの予測値から、各セルのスコアを算出する一例を示す図である。なお、表の左に付されたアラビア数字および表の上に付されたローマ数字は、説明の便宜上付された記号であり、それぞれ表における行または列を識別する記号である。 FIG. 12 is a diagram illustrating an example of calculating the score of each cell from the predicted value of each group calculated by the prediction unit 114. Note that the Arabic numerals attached to the left of the table and the Roman numerals attached to the top of the table are symbols provided for convenience of explanation, and are symbols for identifying rows or columns in the table, respectively.

図１２に示されるように、予測部１１４は、グループＡ_１〜Ａ_５の予測値としてそれぞれ［３，０，４，７，１］を算出し、グループＢ_１〜Ｂ_５の予測値としてそれぞれ［０，２，７，４，２］を算出したとする。この場合、例えば第１行第Ｉ列に相当するセルのスコアは［０×３］で［０］、第２行第Ｉ列に相当するセルのスコアは［２×３］で［６］、第３行第ＩＶ列に相当するセルのスコアは［７×７］で［４９］となる。ただし、「×」は掛け算を表す演算子である。As illustrated in FIG. 12, the prediction unit 114 calculates [3, 0, 4, 7, ₁ ] as the prediction values of the groups A _{1 to} A ₅ , respectively, and the prediction values of the groups B _{1 to} B ₅ , respectively. [0, 2, 7, 4, 2] is calculated. In this case, for example, the score of the cell corresponding to the first row and column I is [0 × 3] [0], the score of the cell corresponding to the second row and column I is [2 × 3] [6], The score of the cell corresponding to the third row and the fourth column is [7 × 7], which is [49]. However, “×” is an operator representing multiplication.

本実施形態の例における各セルのスコアの算出の方法は、次のようにも表せる。すなわち、セルの列番号を左から順にｉ（ｉ＝１，２，３，４，５）、行番号を上から順にｊ（ｊ＝１，２，３，４，５）で表すとすると、第ｊ行第ｉ列のセルのスコアはＢ_ｊ×Ａ_ｉである。The method for calculating the score of each cell in the example of the present embodiment can also be expressed as follows. That is, if the cell column number is represented by i (i = 1, 2, 3, 4, 5) from the left and the row number is represented by j (j = 1, 2, 3, 4, 5) from the top, The score of the cell in the j-th row and the i-th column is B _j × A _i .

算出部１１５は、以上のようにして、各セルのスコアを算出し、算出した値をセルに関連づける。 The calculation unit 115 calculates the score of each cell as described above, and associates the calculated value with the cell.

＝＝＝出力部１１６＝＝＝
出力部１１６は、スコアに基づいた情報を出力する。たとえば、出力部１１６は、スコアが算出されたセルのうち、スコアの値が大きいセルを、事故が多く発生すると予測される場所として示す情報を出力する。=== Output 116 ===
The output unit 116 outputs information based on the score. For example, the output unit 116 outputs information indicating a cell having a large score value among the cells whose scores are calculated as a place where many accidents are predicted to occur.

たとえば、出力部１１６は、スコアの値が最も大きいセルから順に所定の数のセルを抽出し、抽出されたセルを他のセルとは異なる態様で表示してもよい。たとえば、出力部１１６は、分析対象の地域の地図において抽出されたセルに相当するエリアを強調した画像を出力してもよい。出力部１１６は、セルとスコアの値とを関連づけたデータを出力してもよい。 For example, the output unit 116 may extract a predetermined number of cells in order from the cell with the largest score value, and display the extracted cells in a manner different from other cells. For example, the output unit 116 may output an image in which the area corresponding to the cell extracted in the map of the analysis target area is emphasized. The output unit 116 may output data associating cells with score values.

＜動作＞
第２の実施形態に係る分析装置１２の動作の流れを、図１３に沿って説明する。<Operation>
The operation flow of the analyzer 12 according to the second embodiment will be described with reference to FIG.

まず、分割部１１１が、分析の範囲を特定する（ステップＳ９１）。そして、分割部１１１は、分析の範囲を複数のセルに分割する（ステップＳ９２）。 First, the dividing unit 111 specifies the analysis range (step S91). Then, the dividing unit 111 divides the analysis range into a plurality of cells (step S92).

次に、組分け部１１２が、縦方向に並ぶセルを同じグループとした組分けを行う（ステップＳ９３）。そして、解析部１１３が、縦方向のグループのデータを教師データとした機械学習分析を実行する（ステップＳ９４）。そして、予測部１１４が、予測式と縦方向のグループのそれぞれの説明変数の値とに基づいて、グループのそれぞれの目的変数の予測値を算出する（ステップＳ９５）。 Next, the grouping unit 112 performs grouping by grouping the cells arranged in the vertical direction into the same group (step S93). Then, the analysis unit 113 performs machine learning analysis using the vertical group data as teacher data (step S94). Then, the prediction unit 114 calculates the predicted value of each objective variable of the group based on the prediction formula and the value of each explanatory variable of the vertical group (step S95).

組分け部１１２は、横方向に並ぶセルを同じグループとした組分けも行う（ステップＳ９６）。解析部１１３は、横方向のグループのデータを教師データとした機械学習分析を実行する（ステップＳ９７）。予測部１１４は、予測式と横方向のグループのそれぞれの説明変数の値とに基づいて、グループのそれぞれの目的変数の予測値を算出する（ステップＳ９８）。 The grouping unit 112 also performs grouping of cells arranged in the horizontal direction as the same group (step S96). The analysis unit 113 performs machine learning analysis using the horizontal group data as teacher data (step S97). The prediction unit 114 calculates the predicted value of each objective variable of the group based on the prediction formula and the value of each explanatory variable of the horizontal group (step S98).

ステップＳ９３からステップＳ９８の処理の順序は、上述の例に限られない。たとえば、ステップＳ９３からステップＳ９５の処理と、ステップＳ９６からステップＳ９８の処理とは、並行して行われてもよい。 The order of processing from step S93 to step S98 is not limited to the above example. For example, the processing from step S93 to step S95 and the processing from step S96 to step S98 may be performed in parallel.

そして、算出部１１５は、縦方向のグループの目的変数の予測値と、横方向のグループの目的変数の予測値とに基づいて、各セルのスコアを算出する（ステップＳ９９）。 Then, the calculation unit 115 calculates the score of each cell based on the predicted value of the objective variable of the vertical group and the predicted value of the objective variable of the horizontal group (step S99).

最後に、出力部１１６が、スコアに基づいた情報を出力する（ステップＳ１００）。 Finally, the output unit 116 outputs information based on the score (step S100).

＜効果＞
第２の実施形態に係る分析装置１２によれば、分析の範囲を複数に分割することにより生成したセルの、予測したい目的変数の値の目安となるスコアを算出することができる。<Effect>
According to the analysis device 12 according to the second embodiment, it is possible to calculate a score that is a measure of the value of an objective variable to be predicted in a cell generated by dividing the analysis range into a plurality of ranges.

たとえば、上述した具体的な例に従えば、分析装置１２は、翌日における、エリアごとの事故の発生件数の多さの目安となるスコアを算出できる。また、分析装置１２は、分析の範囲の地域のうちの、事故の発生件数が高いと予測されるエリアを特定することができる。また、特定した結果を出力することにより、ユーザは、そのエリアを事故の発生リスクが高い場所として認識することができる。 For example, according to the specific example described above, the analyzer 12 can calculate a score that is a measure of the number of accidents for each area on the next day. In addition, the analysis device 12 can specify an area that is predicted to have a high number of accidents out of the areas in the analysis range. Further, by outputting the identified result, the user can recognize the area as a place where the risk of accident occurrence is high.

この分析において、分析装置１２は、データベース３２０に含まれる、機械学習分析に用いることができるデータを不必要に選別したり捨てたりする必要がない。また、分析装置１２は、データベース３２０に含まれていないデータを新たに生成したり追加したりする必要はない。 In this analysis, the analysis device 12 does not need to unnecessarily sort out or discard data included in the database 320 that can be used for machine learning analysis. Further, the analysis device 12 does not need to newly generate or add data that is not included in the database 320.

すなわち、分析装置１２は、サンプルデータを削減することも追加することもなく、データ分析を行うことができる。 That is, the analyzer 12 can perform data analysis without reducing or adding sample data.

上記に加え、この分析では目的変数の値ごとのサンプル数の偏りの影響が低減される。その理由は、各サンプルデータがグループにまとめられることによって、教師データにおける目的変数の値ごとのサンプル数の偏りが軽減されるからである。偏りが低減されることにより、目的変数の値が出現頻度の低い値であるサンプルの特徴が、機械学習分析において無視されにくくなる。 In addition to the above, this analysis reduces the effect of sample number bias for each value of the objective variable. The reason for this is that each sample data is grouped together to reduce the deviation in the number of samples for each value of the objective variable in the teacher data. By reducing the bias, the feature of the sample whose objective variable has a low appearance frequency is less likely to be ignored in machine learning analysis.

＜＜変形例＞＞
分析装置１２が扱う説明変数および目的変数は、機械学習の対象となりうる変数であれば何でもよい。目的変数は特定の種類の事件や事故の件数でもよい。その他、目的変数は、落雷件数、落とし物の届け出件数、小動物の死骸の発見件数、または公共物の破損があった数等でもよい。<< Modification >>
The explanatory variable and the objective variable handled by the analysis device 12 may be any variables as long as they can be machine learning targets. The objective variable may be the number of specific types of incidents or accidents. In addition, the objective variable may be the number of lightning strikes, the number of reported lost items, the number of dead carcasses found, or the number of damaged public objects.

分析対象は、地域である必要はない。分析対象は、交差点でもよいし、交番、または建物でもよい。分析対象は、目的変数に応じて設定されればよい。 The analysis target need not be a region. The analysis target may be an intersection, a police box, or a building. The analysis target may be set according to the objective variable.

以下、組分けの方法およびスコアの算出の方法に関する変形例を紹介する。 In the following, variations on the grouping method and the score calculation method will be introduced.

（変形例１）
上記第２の実施形態の説明では、組分け部１１２は、同じ列または行に並ぶセルを同一のグループとする組分けを行うが、組分けの方法（以下、「組分け法」と呼ぶ。）はこれらに限られない。(Modification 1)
In the description of the second embodiment, the grouping unit 112 performs grouping by grouping cells arranged in the same column or row into the same group, which is referred to as a grouping method (hereinafter referred to as “grouping method”). ) Is not limited to these.

図１４Ｃ、図１４Ｄ、図１４Ｅ、および図１４Ｆは、それぞれ、上に示した組分け法以外の組分け法の例（それぞれ、組分け法Ｃ、組分け法Ｄ、組分け法Ｅ、および組分け法Ｆ）を示す図である。なお、図１４Ｃ〜１４Ｆに示される表のそれぞれの左側に付されたアラビア数字および上側に付されたローマ数字は、説明の便宜上付された記号であり、それぞれ表における行または列を識別する記号である。なお、図８で説明された縦方向の組分け法を組分け法Ａ、図１０で説明された横方向の組分け法を組分け法Ｂ、とする。 14C, FIG. 14D, FIG. 14E, and FIG. 14F are examples of grouping methods other than the grouping method shown above (grouping method C, grouping method D, grouping method E, and group, respectively). It is a figure which shows the division method F). 14C to 14F, Arabic numerals attached to the left side and Roman numerals attached to the upper side of each of the tables shown in FIGS. 14C to 14F are symbols attached for convenience of explanation, and are symbols that respectively identify rows or columns in the tables. It is. The vertical grouping method explained in FIG. 8 is called grouping method A, and the horizontal grouping method explained in FIG. 10 is called grouping method B.

図１４Ｃを参照すると、たとえば、組分け法Ｃでは、１行Ｉ列、２行ＩＩ列、３行ＩＩＩ列、４行ＩＶ列、５行Ｖ列、が同じグループＣ_１となる。Referring to FIG. 14C, for example, the grouping method C, 1 row I column, second row II column 3 line III column, line 4 column IV, line 5 V sequences in but the same group _{C 1.}

組分け部１１２は、組分け法Ｃ〜Ｆのような組分け法を採用してもよい。たとえば、組分け部１１２は、組分け法Ａを第１の組分けに採用し、組分け法Ｃを第２の組分けに採用してもよい。 The grouping unit 112 may employ a grouping method such as the grouping methods C to F. For example, the grouping unit 112 may employ the grouping method A for the first grouping and the grouping method C for the second grouping.

なお、図１５は、組分け法Ｃによって生成するグループのそれぞれの、統合された目的変数の値の算出例を示す図である。たとえば、図１５の左に示されるセルが、図１５の右に示される組分け法で組分けされる場合、グループＣ_１の説明変数の値は０＋０＋２＋２＋０＝４である。FIG. 15 is a diagram illustrating a calculation example of the value of the integrated objective variable of each group generated by the grouping method C. For example, when the cell shown on the left side of FIG. 15 is grouped by the grouping method shown on the right side of FIG. 15, the value of the explanatory variable of group C ₁ is 0 + 0 + 2 + 2 + 0 = 4.

解析部１１３は、第１の組分けに基づく機械学習分析と、第２の組分けに基づく機械学習分析とを行い、それぞれ予測式を導出する。予測部１１４がそれぞれの予測式に基づく各グループの予測値を算出し、算出部１１５が、各セルが属するグループの乗算値をスコアとして算出する。この方法によっても、各セルのスコアは算出される。算出されるスコアは、セルに特有の計算式に基づく。その理由は、どのセルも、そのセルを含むグループの組み合わせが、他のセルのそれと異なるからである。 The analysis unit 113 performs machine learning analysis based on the first grouping and machine learning analysis based on the second grouping, and derives a prediction formula for each. The prediction unit 114 calculates the predicted value of each group based on each prediction formula, and the calculation unit 115 calculates the multiplication value of the group to which each cell belongs as a score. Also by this method, the score of each cell is calculated. The calculated score is based on a calculation formula specific to the cell. The reason is that every cell has a different group combination including that of other cells.

組分け部１１２は、セルごとにセルを含むグループの組み合わせが異なるような、２つの組分け法であれば、どのような２つの組分け法を用いてもよい。 The grouping unit 112 may use any two grouping methods as long as the two grouping methods have different combinations of groups including cells for each cell.

なお、上述した組分け法Ａ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆは、どの２つのセルも、任意の２つ以上の組分けにおいて異なるグループに属するように、設計されている。このように設計された６種類の組分けに対して、算出部１１５は、任意の２つの組分けに基づくデータを用いて、各セルに対応するスコアを算出してよい。 The grouping methods A, B, C, D, E, and F described above are designed so that any two cells belong to different groups in any two or more groups. For the six types of groupings designed in this way, the calculation unit 115 may calculate a score corresponding to each cell using data based on two arbitrary groupings.

（変形例２）
組分け部１１２は３種類以上の組分けを実行してもよい。そして、解析部１１３は、組分け部１１２が行った組分けのそれぞれに対して、予測式を導出してもよい。予測部１１４は、３種類以上の予測値を算出してもよい。(Modification 2)
The grouping unit 112 may execute three or more types of grouping. Then, the analysis unit 113 may derive a prediction formula for each grouping performed by the grouping unit 112. The prediction unit 114 may calculate three or more types of prediction values.

たとえば、組分け部１１２は、図８、図１０および図１４Ｃ〜１４Ｆで示される組分け法のうち、組分け法Ａ、組分け法Ｂ、および組分け法Ｃを実行したとする。解析部１１３は、それぞれの組分けに基づく予測式を導出する。それぞれのグループの予測値が、図１６の、各グループ名が付されたデータ列の値のように、予測部１１４によって算出されたとする。算出部１１５は、各セルのスコアを、当該セルが属するグループの予測値を乗算することにより算出する。すると、図１６の左上の表に示されるような結果が得られる。 For example, it is assumed that the grouping unit 112 executes the grouping method A, the grouping method B, and the grouping method C among the grouping methods illustrated in FIGS. 8, 10, and 14 C to 14 F. The analysis unit 113 derives a prediction formula based on each grouping. It is assumed that the predicted value of each group is calculated by the prediction unit 114 like the value of the data string to which each group name is assigned in FIG. The calculation unit 115 calculates the score of each cell by multiplying the predicted value of the group to which the cell belongs. Then, the results as shown in the upper left table of FIG. 16 are obtained.

このように、３種類以上の組分けを用いて分析を行うことによっても、分析装置１２は各セルのスコアを算出できる。組分けの種類を増やして分析することにより、スコアの算出に用いる予測値の個数が増え、各セルのスコアの値はより大きくばらつく。たとえば、図１６で示される本変形例のスコアの算出例では、第３行第Ｉ列のセルのスコアが４２、第３行第ＩＩＩ列のセルのスコアが１１２であり、両者の値には明確な差がある。この差は、図１２で示された、２種類の組分けに基づいたスコアの算出例における両者の差（第３行第Ｉ列のセルのスコアが２１、第３行第ＩＩＩ列のセルのスコアが２８）に比べ、はるかに大きい。このように、組分けの種類を増やして分析することにより、各セルのスコアの値はより大きくばらつき、それにより発生リスクが大きい箇所をより特定しやすくなる。なお、この効果は、スコアを乗算によって算出する場合に特に顕著に表れる。 Thus, the analysis device 12 can also calculate the score of each cell by performing analysis using three or more types of groupings. By analyzing by increasing the types of grouping, the number of predicted values used for calculating the score increases, and the score value of each cell varies more greatly. For example, in the score calculation example of the present modification shown in FIG. 16, the score of the cell in the third row and column I is 42, and the score of the cell in the third row and column III is 112. There is a clear difference. This difference is the difference between the two examples in the score calculation example shown in FIG. 12 (the cell score in the third row and column I is 21, the cell in the third row and column III is the difference). The score is much higher than 28). In this way, by increasing the types of grouping and analyzing, the score value of each cell varies more greatly, thereby making it easier to identify a location with a high occurrence risk. This effect is particularly noticeable when the score is calculated by multiplication.

また、乗算によるスコアの算出では、ある組分けにおける特定のグループの予測値が０であった場合、そのグループに含まれるセルのスコアはすべて０となるため、そのセルにおける事故の発生のリスクが少ないことが明らかになる。 Further, in the calculation of the score by multiplication, when the predicted value of a specific group in a certain grouping is 0, the scores of the cells included in the group are all 0, so there is a risk of occurrence of an accident in that cell. It becomes clear that there are few.

また、３種類以上の組分けに基づいて分析を行うことによって、発生リスクが小さいセルのスコアが偶発的に大きく算出されるというリスクが低減されうる。 Further, by performing analysis based on three or more types of grouping, the risk that a score of a cell with a low occurrence risk is accidentally calculated large can be reduced.

なお、組分け部１１２は、教師データの目的変数の値がなるべく不均衡でない組分け法を採用してもよい。たとえば、組分け部１１２は、行った組分けによって生成したグループの教師データの目的変数の値のばらつきが、所定の基準を外れるか（たとえば、分散が所定の値を下回るか）を判定してもよい。そして、組分け部１１２は、ばらつきが所定の基準を外れる場合に、もう一度異なる組分けを行ってもよい。この構成によって、ばらつきが所定の基準を外れない組分けによるグループでの機械学習分析が可能となる。 Note that the grouping unit 112 may employ a grouping method in which the values of the objective variables of the teacher data are as imbalanced as possible. For example, the grouping unit 112 determines whether the variation in the value of the objective variable of the teacher data of the group generated by the performed grouping deviates from a predetermined standard (for example, whether the variance is lower than the predetermined value). Also good. Then, the grouping unit 112 may perform another grouping again when the variation deviates from a predetermined standard. With this configuration, it is possible to perform machine learning analysis in a group based on a grouping in which variations do not deviate from a predetermined standard.

以上の変形例１および２に示した組分けの方法は、地域を複数のセルにメッシュ分割した場合以外にも用いることができる。たとえば、上述した組分けの方法は、既に識別された複数個の分析対象に対して用いてもよい。 The grouping method shown in the first and second modifications can be used in addition to the case where the region is mesh-divided into a plurality of cells. For example, the grouping method described above may be used for a plurality of analysis objects that have already been identified.

また、上述の例では、分析対象の数が５×５個であったが、分析対象の数はこれに限られない。分析対象の個数が整数の２乗でない場合は、組分けごとにグループを構成する分析対象の数が異なっていてもよい。たとえば、分析対象が３０個である場合は、組分け部１１２は、分析対象が５つずつであるグループに分割する第１の組分けと、分析対象が６つずつであるグループに分割する第２の組分けと、分析対象が５つずつであるグループに分割する第３の組分けとを行ってもよい。このように、分析対象の個数が整数の２乗でなくとも、複数の組分けおよびスコアの算出は可能である。 In the above example, the number of analysis objects is 5 × 5, but the number of analysis objects is not limited to this. When the number of analysis objects is not the square of an integer, the number of analysis objects constituting a group may be different for each grouping. For example, when there are 30 analysis targets, the grouping unit 112 divides the first grouping into groups each having five analysis targets and the first grouping into six groups having six analysis targets. You may perform 2 groupings and the 3rd grouping divided | segmented into the group whose analysis object is 5 each. Thus, even if the number of objects to be analyzed is not the square of an integer, a plurality of groups and scores can be calculated.

（変形例３）
ｎ×ｎ個（ｎは２以上の整数）の分析対象に対して、変形例１で示されるような、「どの２つの分析対象も、任意の２つ以上の組分けにおいて異なるグループに属するような組分け法」が、ｎ＋１個作れる場合、算出部１１５は、分析対象（以下の説明では、セル）のスコアを、以下に示す方法で算出してもよい。以下、変形例３として、算出部１１５がセルのスコアを前述の方法とは異なる方法で算出する構成を説明する。(Modification 3)
With respect to n × n (n is an integer of 2 or more) analysis objects, as shown in the first modification example, “any two analysis objects seem to belong to different groups in any two or more groupings. When “n + 1” can be created, the calculation unit 115 may calculate the score of the analysis target (cell in the following description) by the method described below. Hereinafter, as Modification 3, a configuration in which the calculation unit 115 calculates the cell score by a method different from the above-described method will be described.

まず、分割部１１１が、地域をｎ×ｎのセルに分割したとする。 First, it is assumed that the dividing unit 111 divides an area into n × n cells.

組分け部１１２は、ｎ×ｎ個のセルに対し、どの２つのセルも、任意の２つ以上の組分けにおいて属するグループの組み合わせが異なるような、ｎ＋１種類の組分け法を実行する。（たとえばｎ＝５である場合、上述した組分け法Ａ〜Ｆの６つが、上記ｎ＋１種類の組分け法の一例である。）言い換えれば、組分け部１１２は、任意の２つのセルが、いずれかただ１つの組分けにおいて同一のグループに属するような、ｎ＋１回の組分けを行う。 The grouping unit 112 executes n + 1 types of grouping methods in which any two cells have different group combinations in any two or more groupings for n × n cells. (For example, when n = 5, six of the above-described grouping methods A to F are examples of the above-described n + 1 types of grouping methods.) In other words, the grouping unit 112 includes any two cells, Any one grouping is performed n + 1 times so that it belongs to the same group.

解析部１１３は、それぞれの組分けに基づく機械学習分析を行い、それぞれの予測式を導出する。 The analysis unit 113 performs machine learning analysis based on each grouping and derives each prediction formula.

予測部１１４は、それぞれの予測式に基づく各グループの予測値を算出する。 The prediction unit 114 calculates a predicted value of each group based on each prediction formula.

そして、算出部１１５は、セルのスコアを次のように算出する。すなわち、
・当該セルが属するグループのすべての予測値の総和を算出し、
・算出された総和から、任意の組分けに基づいたグループの予測値の総和（Ｓとする）を減算し、
・減算された値をｎで除する。Then, the calculation unit 115 calculates the cell score as follows. That is,
・ Calculate the sum of all predicted values of the group to which the cell belongs,
Subtract the sum of the predicted values of the group based on an arbitrary grouping (assumed as S) from the calculated sum,
• Divide the subtracted value by n.

なお、総和Ｓは、各セルのスコアの計算において同一であってよい。また、総和Ｓの値は、任意の複数の組分けのそれぞれに基づいたグループの予測値の総和の、平均や中央値でもよい。 The sum S may be the same in calculating the score of each cell. Further, the value of the sum S may be an average or median of sums of predicted values of groups based on each of a plurality of arbitrary groupings.

以下、具体例を、図５に示される例を用いて説明する。図５に示される場合では、ｎ＝５である。 A specific example will be described below using the example shown in FIG. In the case shown in FIG. 5, n = 5.

組分け部１１２は、図８、１０および１４Ｃ〜１４Ｆに示される組分け法Ａ〜Ｆを行い、各グループの説明変数および目的変数を算出する。 The grouping unit 112 performs the grouping methods A to F shown in FIGS. 8, 10 and 14 C to 14 F, and calculates explanatory variables and objective variables for each group.

解析部１１３は、それぞれの組分けに基づく機械学習分析を行い、それぞれの予測式を導出する。予測部１１４は、それぞれの予測式に基づく各グループの予測値を算出する。その結果、それぞれの組分け法に基づくグループの予測値が、図１７に示すような値になったとする。すなわち、組分け法Ａに基づくグループＡ_１〜Ａ_５の予測値［ａ_１，ａ_２，ａ_３，ａ_４，ａ_５］は［３，０，４，７，１］となり、組分け法Ｂに基づくグループＢ_１〜Ｂ_５の予測値［ｂ_１，ｂ_２，ｂ_３，ｂ_４，ｂ_５］は［０，２，７，４，２］となり、組分け法Ｃに基づくグループＣ_１〜Ｃ_５の予測値［ｃ_１，ｃ_２，ｃ_３，ｃ_４，ｃ_５］は［４，３，３，２，３］となり、組分け法Ｄに基づくグループＤ_１〜Ｄ_５の予測値［ｄ_１，ｄ_２，ｄ_３，ｄ_４，ｄ_５］は［２，３，２，３，５］となり、組分け法Ｅに基づくグループＥ_１〜Ｅ_５の予測値［ｅ_１，ｅ_２，ｅ_３，ｅ_４，ｅ_５］は［２，４，４，２，３］となり、組分け法Ｆに基づくグループＦ_１〜Ｆ_５の予測値［ｆ_１，ｆ_２，ｆ_３，ｆ_４，ｆ_５］は［４，５，２，１，３］となったとする。The analysis unit 113 performs machine learning analysis based on each grouping and derives each prediction formula. The prediction unit 114 calculates a predicted value of each group based on each prediction formula. As a result, it is assumed that the predicted value of the group based on each grouping method becomes a value as shown in FIG. That is, the predicted values [a ₁ , a ₂ , a ₃ , a ₄ , a ₅ ] of the groups A _{1 to} A ₅ based on the grouping method A become [ ₃ , 0, ₄ , 7, 1], and the grouping method The predicted values [b ₁ , b ₂ , b ₃ , b ₄ , b ₅ ] of the groups B _{1 to} B ₅ based on B become [ ₀ , ₂ , ₇ , ₄ , ₂ ], and the group C based on the grouping method C The predicted values [c ₁ , c ₂ , c ₃ , c ₄ , c ₅ ] of _{1 to} C ₅ are [ ₄ , ₃ , ₃ , ₂ , ₃ ], and the group D _{1 to} D ₅ based on the grouping method D The predicted values [d ₁ , d ₂ , d ₃ , d ₄ , d ₅ ] become [ ₂ , ₃ , ₂ , ₃ , ₅ ], and the predicted values [e ₁ of the groups E _{1 to} E ₅ based on the grouping method E , E ₂ , e ₃ , e ₄ , e ₅ ] become [ ₂ , ₄ , ₄ , ₂ , ₃ ], and the predicted values [f ₁ , f ₂ , f of the groups F _{1 to} F ₅ based on the grouping method F _3, f , _{F 5]} is to have become a [4,5,2,1,3].

算出部１１５は、各セルのスコアを算出する。なお、スコアを算出するにあたり、算出部１１５は、任意の組分け法に基づくグループの予測値の総和Ｓの値を求める。総和Ｓの値は、たとえば、ａ_１＋ａ_２＋ａ_３＋ａ_４＋ａ_５である。The calculation unit 115 calculates the score of each cell. In calculating the score, the calculation unit 115 calculates a value of the sum S of predicted values of the group based on an arbitrary grouping method. The value of the sum S is, for example, a ₁ + a ₂ + a ₃ + a ₄ + a ₅ .

算出部１１５は、ターゲットのセルが属するグループのすべての予測値の総和からＳを減算した値をｎで除した値を、ターゲットのセルのスコアの値として算出する。 The calculation unit 115 calculates a value obtained by dividing, by n, the value obtained by subtracting S from the sum of all predicted values of the group to which the target cell belongs as the score value of the target cell.

たとえば、１行Ｉ列に相当するセルは、グループＡ_１，Ｂ_１，Ｃ_１，Ｄ_１，Ｅ_１，Ｆ_１に属するから、このセルのスコアは、
｛（ａ_１＋ｂ_１＋ｃ_１＋ｄ_１＋ｅ_１＋ｆ_１）−Ｓ｝／５
で算出される。For example, since the cell corresponding to 1 row and I column belongs to the groups A ₁ , B ₁ , C ₁ , D ₁ , E ₁ , F ₁ , the score of this cell is
{(A ₁ + b ₁ + c ₁ + d ₁ + e ₁ + f ₁ ) −S} / 5
Is calculated by

同様に、たとえば、３行ＩＶ列に相当するセルのスコアは、
｛（ａ_４＋ｂ_３＋ｃ_２＋ｄ_５＋ｅ_３＋ｆ_１）−Ｓ｝／５
で算出される。Similarly, for example, the score of a cell corresponding to 3 rows and IV columns is
{(A ₄ + b ₃ + c ₂ + d ₅ + e ₃ + f ₁ ) −S} / 5
Is calculated by

図１８は、図１７に示される予測値に基づいて上記の方法で算出された各セルのスコアの値を示す図である。 FIG. 18 is a diagram illustrating the score value of each cell calculated by the above method based on the predicted value shown in FIG.

このようにして算出されたスコアの値は、そのセルの目的変数の予測値と見なすことができる。その理由は、ターゲットのセルが属するグループのすべての予測値の総和の値は、すべてのセルが１つ分ずつ寄与した値と、ターゲットのセルがｎ個分寄与した値とを足し合わせた値と見なせるからである。 The score value thus calculated can be regarded as a predicted value of the objective variable of the cell. The reason for this is that the total sum of all predicted values of the group to which the target cell belongs is the sum of the value contributed by all cells one by one and the value contributed by n target cells. Because it can be considered.

なお、教師データに用いられたグループの目的変数が各セルの目的変数の平均によって算出されていた場合は、上述したスコアの算出の工程において、ｎで除する工程を省略してもよい。 When the objective variable of the group used for the teacher data is calculated by averaging the objective variables of each cell, the step of dividing by n may be omitted in the above-described score calculation step.

以上説明した方法により、分析装置１２は、目的変数の予測値としてより確度の高い値を算出することができる。 With the method described above, the analysis device 12 can calculate a value with higher accuracy as the predicted value of the objective variable.

（ハードウェアについて）
以上、説明した本発明の各実施形態において、各装置の各構成要素は、機能単位のブロックを示している。各装置の各構成要素の一部または全部は、例えば図１９に示すようなコンピュータ１９００とプログラムとの可能な組み合わせにより実現される。コンピュータ１９００は、一例として、以下のような構成を含む。(About hardware)
As described above, in each embodiment of the present invention described above, each component of each device represents a functional unit block. A part or all of each component of each device is realized by a possible combination of a computer 1900 and a program as shown in FIG. 19, for example. The computer 1900 includes the following configuration as an example.

・ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１９０１
・ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１９０２
・ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１９０３
・ＲＡＭ１９０３にロードされるプログラム１９０４Ａおよび記憶情報１９０４Ｂ
・プログラム１９０４Ａおよび記憶情報１９０４Ｂを格納する記憶装置１９０５
・記録媒体１９０６の読み書きを行うドライブ装置１９０７
・通信ネットワーク１９０９と接続する通信インタフェース１９０８
・データの入出力を行う入出力インタフェース１９１０
・各構成要素を接続するバス１９１１
各実施形態における各装置の各構成要素は、これらの機能を実現するプログラム１９０４ＡをＣＰＵ１９０１がＲＡＭ１９０３にロードして実行することで実現される。各装置の各構成要素の機能を実現するプログラム１９０４Ａは、例えば、予め記憶装置１９０５やＲＯＭ１９０２に格納されており、必要に応じてＣＰＵ１９０１が読み出す。なお、プログラム１９０４Ａは、通信ネットワーク１９０９を介してＣＰＵ１９０１に供給されてもよいし、予め記録媒体１９０６に格納されており、ドライブ装置１９０７が当該プログラムを読み出してＣＰＵ１９０１に供給してもよい。CPU (Central Processing Unit) 1901
ROM (Read Only Memory) 1902
RAM (Random Access Memory) 1903
A program 1904A and storage information 1904B loaded into the RAM 1903
A storage device 1905 for storing the program 1904A and storage information 1904B
A drive device 1907 that reads and writes the recording medium 1906
A communication interface 1908 connected to the communication network 1909
Input / output interface 1910 for inputting / outputting data
-Bus 1911 connecting each component
Each component of each device in each embodiment is realized by the CPU 1901 loading the program 1904A for realizing these functions into the RAM 1903 and executing it. A program 1904A for realizing the function of each component of each device is stored in advance in, for example, the storage device 1905 or the ROM 1902, and is read out by the CPU 1901 as necessary. Note that the program 1904A may be supplied to the CPU 1901 via the communication network 1909, or may be stored in advance in the recording medium 1906, and the drive device 1907 may read the program and supply it to the CPU 1901.

各装置の実現方法には、様々な変形例がある。例えば、各装置は、構成要素毎にそれぞれ別個のコンピュータ１９００とプログラムとの可能な組み合わせにより実現されてもよい。また、各装置が備える複数の構成要素が、一つのコンピュータ１９００とプログラムとの可能な組み合わせにより実現されてもよい。 There are various modifications to the method of realizing each device. For example, each device may be realized by a possible combination of a separate computer 1900 and a program for each component. A plurality of components included in each device may be realized by a possible combination of one computer 1900 and a program.

また、各装置の各構成要素の一部または全部は、その他の汎用または専用の回路、コンピュータ等やこれらの組み合わせによって実現される。これらは、単一のチップによって構成されてもよいし、バスを介して接続される複数のチップによって構成されてもよい。 In addition, some or all of the constituent elements of each device are realized by other general-purpose or dedicated circuits, computers, or combinations thereof. These may be configured by a single chip or may be configured by a plurality of chips connected via a bus.

各装置の各構成要素の一部または全部が複数のコンピュータや回路等により実現される場合には、複数のコンピュータや回路等は、集中配置されてもよいし、分散配置されてもよい。例えば、コンピュータや回路等は、クライアントアンドサーバシステム、クラウドコンピューティングシステム等、各々が通信ネットワークを介して接続される形態として実現されてもよい。 When some or all of the constituent elements of each device are realized by a plurality of computers, circuits, etc., the plurality of computers, circuits, etc. may be centrally arranged or distributedly arranged. For example, the computer, the circuit, and the like may be realized as a form in which each is connected via a communication network, such as a client and server system and a cloud computing system.

本願発明は以上に説明した実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 The present invention is not limited to the embodiment described above. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

本出願は、２０１６年６月２１日に出願された日本出願特願２０１６−１２２８４３を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2006-122843 for which it applied on June 21, 2016, and takes in those the indications of all here.

上記実施形態の一部または全部は以下の付記のようにも記載され得るが、以下には限られない。 A part or all of the above embodiment may be described as in the following supplementary notes, but is not limited thereto.

＜＜付記＞＞
［付記１］
説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行する解析手段と、
前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行する予測手段と、
前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する算出手段と、
を備える分析装置。
［付記２］
前記組分けにおいて同一のグループに属する前記分析対象のうちの任意の２つが、他の前記組分けにおいて異なるグループに属する、
付記１に記載の分析装置。
［付記３］
前記解析手段は、前記機械学習分析を、３つ以上の前記組分けごとに実行し、
前記予測手段は、前記予測値の算出を、前記３つ以上の組分けごとに実行し、
前記算出手段は、前記スコアを、前記３つ以上の組分けごとに算出された、当該分析対象が属する前記グループの前記予測値のそれぞれを乗算することによって算出する、
付記１または２に記載の分析装置。
［付記４］
前記組分けについて、当該組分けによって生成するグループの目的変数の値のばらつきが所定の基準を外れるかを判定し、前記ばらつきが前記所定の基準を外れた場合に、新たに前記組分けを実行する、組分け手段をさらに備え、
前記解析手段は、前記ばらつきが所定の基準を外れないと判定された前記組分けによって生成する複数のグループについて前記機械学習分析を実行する、
付記１から付記３のいずれか一項に記載の分析装置。
［付記５］
ｎ×ｎ個（ｎは２以上の整数）の前記分析対象に対し、任意の２つの前記分析対象がいずれかただ１つの組分けにおいて同一のグループに属するようなｎ＋１回の前記組分けを行う組分け手段をさらに備え、
前記算出手段は、前記分析対象のそれぞれの前記スコアを、当該分析対象が属するすべての前記グループについて算出された前記予測値の総和から、前記組分けのいずれかに基づいて算出された各グループの前記予測値の総和を引いた値を用いて、算出する、
付記１に記載の分析装置。
［付記６］
前記分析対象のうち前記スコアが高い方から所定数の前記分析対象を、他の前記分析対象とは異なる態様で表示する出力手段をさらに備える、
付記１から付記５のいずれか一項に記載の分析装置。
［付記７］
説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行し、
前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行し、
前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する、
分析方法。
［付記８］
前記組分けにおいて同一のグループに属する前記分析対象のうちの任意の２つが、他の前記組分けにおいて異なるグループに属する、
付記７に記載の分析方法。
［付記９］
前記機械学習分析を、３つ以上の前記組分けごとに実行し、
前記予測値の算出を、前記３つ以上の組分けごとに実行し、
前記スコアを、前記３つ以上の組分けごとに算出された、当該分析対象が属する前記グループの前記予測値のそれぞれを乗算することによって算出する、
付記７または８に記載の分析方法。
［付記１０］
前記組分けについて、当該組分けによって生成するグループの目的変数の値のばらつきが所定の基準を外れるかを判定し、前記ばらつきが前記所定の基準を外れた場合に、新たに前記組分けを実行し、
前記ばらつきが所定の基準を外れないと判定された前記組分けによって生成する複数のグループについて前記機械学習分析を実行する、
付記７から付記９のいずれか一項に記載の分析方法。
［付記１１］
ｎ×ｎ個（ｎは２以上の整数）の前記分析対象に対し、任意の２つの前記分析対象がいずれかただ１つの組分けにおいて同一のグループに属するようなｎ＋１回の前記組分けを行い、
前記分析対象の前記スコアを、当該分析対象が属するすべての前記グループについて算出された前記予測値の総和から、前記組分けのいずれかに基づいて算出された各グループの前記予測値の総和を引いた値を用いて、算出する、
付記７に記載の分析方法。
［付記１２］
前記分析対象のうち前記スコアが高い方から所定数の前記分析対象を、他の前記分析対象とは異なる態様で表示する、
付記７から付記１１のいずれか一項に記載の分析方法。
［付記１３］
コンピュータに、
説明変数と目的変数とが関連づけられる複数の分析対象を分類する組分けによって生成する複数のグループについて、前記複数のグループの説明変数と目的変数との関係を導出する機械学習分析を、前記組分けごとに実行する解析処理と、
前記複数のグループの説明変数の値と前記関係とに基づいて、前記複数のグループの目的変数の値である予測値の算出を、前記組分けごとに実行する予測処理と、
前記分析対象に関するスコアを、前記組分けごとに算出された、当該分析対象が属する前記グループの前記予測値に基づく演算によって、算出する算出処理と、
を実行させるプログラム。
［付記１４］
前記組分けにおいて同一のグループに属する前記分析対象のうちの任意の２つが、他の前記組分けにおいて異なるグループに属する、
付記１３に記載のプログラム。
［付記１５］
前記解析処理は、前記機械学習分析を、３つ以上の前記組分けごとに実行し、
前記予測処理は、前記予測値の算出を、前記３つ以上の組分けごとに実行し、
前記算出処理は、前記スコアを、前記３つ以上の組分けごとに算出された、当該分析対象が属する前記グループの前記予測値のそれぞれを乗算することによって算出する、
付記１３または１４に記載のプログラム。
［付記１６］
コンピュータに、
前記組分けについて、当該組分けによって生成するグループの目的変数の値のばらつきが所定の基準を外れるかを判定し、前記ばらつきが前記所定の基準を外れた場合に、新たに前記組分けを実行する、組分け処理を実行させ、
前記解析処理は、前記ばらつきが所定の基準を外れないと判定された前記組分けによって生成する複数のグループについて前記機械学習分析を実行する、
付記１３から付記１５のいずれか一項に記載のプログラム。
［付記１７］
コンピュータに、ｎ×ｎ個（ｎは２以上の整数）の前記分析対象に対し、任意の２つの前記分析対象がいずれかただ１つの組分けにおいて同一のグループに属するようなｎ＋１回の前記組分けを行う組分け処理を実行させ、
前記算出処理は、前記分析対象のそれぞれの前記スコアを、当該分析対象が属するすべての前記グループについて算出された前記予測値の総和から、前記組分けのいずれかに基づいて算出された各グループの前記予測値の総和を引いた値を用いて、算出する、
付記１３に記載のプログラム。
［付記１８］
コンピュータに、前記分析対象のうち前記スコアが高い方から所定数の前記分析対象を、他の前記分析対象とは異なる態様で表示する出力処理を実行させる、
付記１３から付記１７のいずれか一項に記載のプログラム。<< Appendix >>
[Appendix 1]
A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. Analysis means to be executed every time,
Prediction means for executing, for each grouping, calculation of a predicted value that is a value of an objective variable of the plurality of groups based on the values of the explanatory variables of the plurality of groups and the relationship;
Calculating means for calculating a score related to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping;
An analyzer comprising:
[Appendix 2]
Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The analyzer according to appendix 1.
[Appendix 3]
The analysis means performs the machine learning analysis for each of the three or more groupings,
The prediction means executes the calculation of the predicted value for each of the three or more groupings,
The calculation means calculates the score by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The analyzer according to appendix 1 or 2.
[Appendix 4]
For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. Further comprising grouping means,
The analysis means performs the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard.
The analyzer according to any one of appendix 1 to appendix 3.
[Appendix 5]
For n × n (n is an integer of 2 or more) analysis objects, the grouping is performed n + 1 times so that any two analysis objects belong to the same group in any one grouping. Further comprising a grouping means,
The calculation means calculates each score of the analysis target from the sum of the predicted values calculated for all the groups to which the analysis target belongs, for each group calculated based on any of the groupings. Calculate using a value obtained by subtracting the sum of the predicted values.
The analyzer according to appendix 1.
[Appendix 6]
An output means for displaying a predetermined number of the analysis objects from the analysis object having a higher score in a manner different from the other analysis objects;
The analyzer according to any one of appendix 1 to appendix 5.
[Appendix 7]
A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. Run every
Based on the value of the explanatory variable of the plurality of groups and the relationship, calculation of a predicted value that is the value of the target variable of the plurality of groups is performed for each grouping,
A score related to the analysis target is calculated by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping.
Analysis method.
[Appendix 8]
Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The analysis method according to attachment 7.
[Appendix 9]
Performing the machine learning analysis for each of the three or more groups;
The calculation of the predicted value is performed for each of the three or more groupings,
The score is calculated by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The analysis method according to appendix 7 or 8.
[Appendix 10]
For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. And
Performing the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard;
The analysis method according to any one of appendix 7 to appendix 9.
[Appendix 11]
For n × n (n is an integer of 2 or more) analysis objects, the grouping is performed n + 1 times so that any two analysis objects belong to the same group in any one grouping. ,
The score of the analysis target is subtracted from the sum of the prediction values calculated for all the groups to which the analysis target belongs, and the sum of the prediction values of each group calculated based on any of the groupings. Using the calculated value,
The analysis method according to attachment 7.
[Appendix 12]
Displaying a predetermined number of the analysis objects from the one with the higher score among the analysis objects in a manner different from the other analysis objects;
The analysis method according to any one of appendix 7 to appendix 11.
[Appendix 13]
On the computer,
A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. An analysis process to be executed every time,
Based on the values of the explanatory variables of the plurality of groups and the relationship, a prediction process for calculating a prediction value that is a value of the objective variable of the plurality of groups for each grouping;
A calculation process for calculating a score related to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping;
A program that executes
[Appendix 14]
Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The program according to attachment 13.
[Appendix 15]
The analysis process performs the machine learning analysis for each of the three or more groupings,
The prediction process executes calculation of the predicted value for each of the three or more groupings,
The calculation process calculates the score by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The program according to appendix 13 or 14.
[Appendix 16]
On the computer,
For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. Execute the grouping process,
The analysis process performs the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard.
The program according to any one of supplementary note 13 to supplementary note 15.
[Appendix 17]
In a computer, for n × n (n is an integer of 2 or more) analysis target, n + 1 times of the sets such that any two of the analysis targets belong to the same group in any one grouping Execute the grouping process to split,
In the calculation process, each score of the analysis target is calculated based on one of the groupings based on the total of the predicted values calculated for all the groups to which the analysis target belongs. Calculate using a value obtained by subtracting the sum of the predicted values.
The program according to attachment 13.
[Appendix 18]
Causing the computer to execute an output process for displaying a predetermined number of the analysis objects in a manner different from the other analysis objects from the analysis object having the higher score.
The program according to any one of appendix 13 to appendix 17.

１１、１２分析装置
３２記憶装置
１１１分割部
１１２組分け部
１１３解析部
１１４予測部
１１５算出部
１１６出力部
３２０データベース
１９００コンピュータ
１９０１ＣＰＵ
１９０２ＲＯＭ
１９０３ＲＡＭ
１９０４Ａプログラム
１９０４Ｂ記憶情報
１９０５記憶装置
１９０６記録媒体
１９０７ドライブ装置
１９０８通信インタフェース
１９０９通信ネットワーク
１９１０入出力インタフェース
１９１１バス11, 12 Analysis device 32 Storage device 111 Dividing unit 112 Grouping unit 113 Analyzing unit 114 Prediction unit 115 Calculation unit 116 Output unit 320 Database 1900 Computer 1901 CPU
1902 ROM
1903 RAM
1904A program 1904B storage information 1905 storage device 1906 recording medium 1907 drive device 1908 communication interface 1909 communication network 1910 input / output interface 1911 bus

Claims

A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. Analysis means to be executed every time,
Prediction means for executing, for each grouping, calculation of a predicted value that is a value of an objective variable of the plurality of groups based on the values of the explanatory variables of the plurality of groups and the relationship;
Calculating means for calculating a score related to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping;
An analyzer comprising:

Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The analyzer according to claim 1.

The analysis means performs the machine learning analysis for each of the three or more groupings,
The prediction means executes the calculation of the predicted value for each of the three or more groupings,
The calculation means calculates the score by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The analyzer according to claim 1 or 2.

For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. Further comprising grouping means,
The analysis means performs the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard.
The analyzer according to any one of claims 1 to 3.

For n × n (n is an integer of 2 or more) analysis objects, the grouping is performed n + 1 times so that any two analysis objects belong to the same group in any one grouping. Further comprising a grouping means,
The calculation means calculates each score of the analysis target from the sum of the predicted values calculated for all the groups to which the analysis target belongs, for each group calculated based on any of the groupings. Calculate using a value obtained by subtracting the sum of the predicted values.
The analyzer according to claim 1.

An output means for displaying a predetermined number of the analysis objects from the analysis object having a higher score in a manner different from the other analysis objects;
The analyzer according to any one of claims 1 to 5.

A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. Run every
Based on the value of the explanatory variable of the plurality of groups and the relationship, calculation of a predicted value that is the value of the target variable of the plurality of groups is performed for each grouping,
A score related to the analysis target is calculated by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping.
Analysis method.

Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The analysis method according to claim 7.

Performing the machine learning analysis for each of the three or more groups;
The calculation of the predicted value is performed for each of the three or more groupings,
The score is calculated by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The analysis method according to claim 7 or 8.

For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. And
Performing the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard;
The analysis method according to any one of claims 7 to 9.

For n × n (n is an integer of 2 or more) analysis objects, the grouping is performed n + 1 times so that any two analysis objects belong to the same group in any one grouping. ,
The score of the analysis target is subtracted from the sum of the prediction values calculated for all the groups to which the analysis target belongs, and the sum of the prediction values of each group calculated based on any of the groupings. Using the calculated value,
The analysis method according to claim 7.

Displaying a predetermined number of the analysis objects from the one with the higher score among the analysis objects in a manner different from the other analysis objects;
The analysis method according to any one of claims 7 to 11.

On the computer,
A machine learning analysis for deriving a relationship between an explanatory variable and an objective variable of the plurality of groups is performed on the plurality of groups generated by the grouping that classifies a plurality of analysis targets associated with the explanatory variable and the objective variable. An analysis process to be executed every time,
Based on the values of the explanatory variables of the plurality of groups and the relationship, a prediction process for calculating a prediction value that is a value of the objective variable of the plurality of groups for each grouping;
A calculation process for calculating a score related to the analysis target by calculation based on the predicted value of the group to which the analysis target belongs, calculated for each grouping;
A computer-readable non-transitory storage medium storing a program for executing the program.

Any two of the analysis objects belonging to the same group in the grouping belong to different groups in the other grouping,
The storage medium according to claim 13.

The analysis process performs the machine learning analysis for each of the three or more groupings,
The prediction process executes calculation of the predicted value for each of the three or more groupings,
The calculation process calculates the score by multiplying each of the predicted values of the group to which the analysis target belongs, calculated for each of the three or more groupings.
The storage medium according to claim 13 or 14.

The program is stored in the computer.
For the grouping, it is determined whether the variation in the value of the objective variable of the group generated by the grouping deviates from a predetermined standard, and the grouping is newly executed when the variation deviates from the predetermined standard. Execute the grouping process,
The analysis process performs the machine learning analysis on a plurality of groups generated by the grouping determined that the variation does not deviate from a predetermined standard.
The storage medium according to any one of claims 13 to 15.

The program causes the computer to make any two of the analysis objects belong to the same group in any one grouping with respect to n × n (n is an integer of 2 or more) of the analysis objects. a grouping process for performing the grouping of n + 1 times is executed,
In the calculation process, each score of the analysis target is calculated based on one of the groupings based on the total of the predicted values calculated for all the groups to which the analysis target belongs. Calculate using a value obtained by subtracting the sum of the predicted values.
The storage medium according to claim 13.

The program causes the computer to execute an output process for displaying a predetermined number of the analysis objects in a manner different from the other analysis objects from the analysis object having the higher score.
The storage medium according to any one of claims 13 to 17.