JP6993863B2

JP6993863B2 - Information processing system and learning method of information processing system

Info

Publication number: JP6993863B2
Application number: JP2017241430A
Authority: JP
Inventors: 和男矢野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-12-18
Filing date: 2017-12-18
Publication date: 2022-01-14
Anticipated expiration: 2037-12-18
Also published as: CN110033113B; CN110033113A; JP2019109648A

Description

本発明は、企業や人間や社会活動のデータを用いて、予測や判断を支援するための情報処理システムに関する。 The present invention relates to an information processing system for supporting prediction and judgment using data of companies, humans, and social activities.

企業や社会に時々刻々収集され蓄積されているデータを活用するために、人工知能技術が注目されている。 Artificial intelligence technology is attracting attention in order to utilize the data that is collected and accumulated from moment to moment in companies and society.

特に、データの特徴を捉えることにより、画像から顔や対象の識別を行う画像認識や音声の特徴から言語を識別する音声認識などに関しては、ディープラーニングという技術により大幅な精度向上が近年実現された。 In particular, with regard to image recognition that identifies faces and objects from images by capturing the characteristics of data and voice recognition that identifies languages from the characteristics of voice, a technology called deep learning has realized a significant improvement in accuracy in recent years. ..

ディープラーニングを含む機械学習や人工知能の技術の発展によって、データからビジネスや社会の予測が可能になることが期待されている。このようなデータと機械学習を用いた予測技術は、企業業績の予測、需要の予測、事故や故障の予測などに幅広い活用が期待されている。このような先行技術に例えば特許文献１のようなものがある。 With the development of machine learning and artificial intelligence technologies including deep learning, it is expected that it will be possible to predict business and society from data. Forecasting technology using such data and machine learning is expected to be widely used for forecasting corporate performance, demand, accidents and failures. Such prior art includes, for example, Patent Document 1.

特開２０１７－２０１５２６号公報JP-A-2017-201526

機械学習では、過去のデータに潜む事象の特徴を抽出することで、データから予測のモデル式を生成する。これを人工知能（ＡＩ）の用語では「学習」と呼ぶ。 In machine learning, a model formula for prediction is generated from the data by extracting the characteristics of the events hidden in the past data. This is called "learning" in artificial intelligence (AI) terminology.

ところが、発生頻度が低い稀な事象に対して学習を行うのは、過去の実績データが少ないため、より困難になる。 However, learning for rare events that occur infrequently is more difficult due to the lack of past performance data.

従来のディープラーニングを含む機械学習では、過去の実績データを用いて予測誤差が小さくなるように予測式に含まれる予測パラメータを調整する。しかし発生頻度の低い事象においては、特定の状況でたまたま起きた事象にあわせて予測パラメータを調整することで過剰適応に陥り、新たな状況ではむしろ予測精度を下げてしまうという「過学習」という現象が起きるのが大きな問題であった。 In conventional machine learning including deep learning, the prediction parameters included in the prediction formula are adjusted so that the prediction error becomes small by using the past actual data. However, in low-frequency events, the phenomenon of "overfitting" is that the prediction parameters are adjusted according to the events that happen to occur in a specific situation, resulting in overfitting, and in new situations the prediction accuracy is rather lowered. Was a big problem.

本発明の好ましい一態様では、元データを入力して予測結果を出す情報処理システムにおいて、元データから、少なくとも第１のデータと第２のデータが生成される。第１のデータを用いて予測を行う第１の予測式は少なくとも一つのパラメータを有し、第１の予測式による第１の予測結果を用いて該パラメータを調整する第１の学習器を有する。第２のデータを用いて予測を行う第２の予測式は少なくとも一つのパラメータを有し、第２の予測式による第２の予測結果を用いて該パラメータを調整する第２の学習器を有する。そして、第１の学習器が調整するパラメータと第２の学習器が調整するパラメータに、共通のパラメータが少なくとも一つある。 In a preferred embodiment of the present invention, in an information processing system that inputs original data and produces a prediction result, at least first data and second data are generated from the original data. The first prediction formula that makes a prediction using the first data has at least one parameter, and has a first learner that adjusts the parameter using the first prediction result by the first prediction formula. .. The second prediction formula that makes a prediction using the second data has at least one parameter, and has a second learner that adjusts the parameter using the second prediction result by the second prediction formula. .. Then, there is at least one common parameter in the parameter adjusted by the first learner and the parameter adjusted by the second learner.

本発明の好ましい他の一態様では、説明変数と第１の結果データの組からなる教師データを複数準備し、説明変数の組からなる第１の学習用データを複数準備し、複数のパラメータからなる予測パラメータを用いた予測式を用いて、第１の学習用データから第１の予測データを得、第１の結果データと第１の予測データの誤差が小さくなるように、予測パラメータを変更して第１の予測パラメータを得る。また、説明変数と第２の結果データの組からなる改変データを複数準備し、説明変数の組からなる第２の学習用データを複数準備し、予測パラメータを用いた予測式を用いて、第２の学習用データから第２の予測データを得、第２の結果データと第２の予測データの誤差が小さくなるように、予測パラメータを変更して第２の予測パラメータを得る。そして、第２の予測パラメータの変化に対する誤差の変化、および、第２の予測パラメータの変化に対する第２の結果データと第２の予測データの相関係数の変化、の少なくとも一つを評価して、予測パラメータから所定のパラメータを抽出し、第１の予測パラメータのうち、抽出した所定のパラメータに該当するパラメータについて、第１の予測パラメータを補正する。 In another preferred embodiment of the present invention, a plurality of teacher data consisting of a set of explanatory variables and a first result data are prepared, a plurality of first learning data consisting of a set of explanatory variables are prepared, and a plurality of parameters are used. The first prediction data is obtained from the first training data by using the prediction formula using the prediction parameters, and the prediction parameters are changed so that the error between the first result data and the first prediction data becomes small. To obtain the first prediction parameter. In addition, a plurality of modified data consisting of a set of explanatory variables and a second result data are prepared, a plurality of second learning data consisting of a set of explanatory variables are prepared, and a prediction formula using prediction parameters is used. The second prediction data is obtained from the training data of 2, and the prediction parameters are changed so that the error between the second result data and the second prediction data becomes small, and the second prediction parameter is obtained. Then, at least one of the change in the error with respect to the change in the second prediction parameter and the change in the correlation coefficient between the second result data and the second prediction data with respect to the change in the second prediction parameter is evaluated. , A predetermined parameter is extracted from the prediction parameters, and the first prediction parameter is corrected for the parameter corresponding to the extracted predetermined parameter among the first prediction parameters.

従来の機械学習（ディープラーニング含む）が抱えていた、データが少ない事象に関しては、予測精度が低くなるという問題を回避することができる。 It is possible to avoid the problem of low prediction accuracy for events with a small amount of data, which was inherent in conventional machine learning (including deep learning).

実施例の情報処理システムを示す概念図である。It is a conceptual diagram which shows the information processing system of an Example. 実施例を構成する予測器を示すブロック図である。It is a block diagram which shows the predictor which constitutes an Example. 実施例の情報処理システムの構成を示すブロック図である。It is a block diagram which shows the structure of the information processing system of an Example. 実施例の情報処理システムを構成する学習器２を示すブロック図である。It is a block diagram which shows the learning apparatus 2 which constitutes the information processing system of an Example. 実施例の学習器２の処理フローを示す流れ図である。It is a flow chart which shows the processing flow of the learning apparatus 2 of an Example.

実施の形態について、図面を用いて詳細に説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The embodiments will be described in detail with reference to the drawings. However, the present invention is not limited to the description of the embodiments shown below. It is easily understood by those skilled in the art that a specific configuration thereof can be changed without departing from the idea or purpose of the present invention.

以下に説明する発明の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。 In the configuration of the invention described below, the same reference numerals may be used in common among different drawings for the same parts or parts having similar functions, and duplicate description may be omitted.

本明細書等における「第１」、「第２」、「第３」などの表記は、構成要素を識別するために付するものであり、必ずしも、数、順序、もしくはその内容を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 Notations such as "first", "second", and "third" in the present specification and the like are attached to identify components, and do not necessarily limit the number, order, or contents thereof. is not it. Further, the numbers for identifying the components are used for each context, and the numbers used in one context do not always indicate the same composition in the other contexts. Further, it does not prevent the component identified by a certain number from functioning as the component identified by another number.

図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each configuration shown in the drawings and the like may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the position, size, shape, range and the like disclosed in the drawings and the like.

本明細書で引用した刊行物、特許および特許出願は、そのまま本明細書の説明の一部を構成する。 The publications, patents and patent applications cited herein form part of the description herein.

本明細書において単数形で表される構成要素は、特段文脈で明らかに示されない限り、複数形を含むものとする。 The components represented in the singular form herein are intended to include the plural, unless explicitly stated in the context.

以下で説明される具体的な実施例では、過去のデータを使って予測誤差を小さくする従来の第１の学習サイクルに加え、意図的に間違ったデータをＡＩに入力することで、間違ったデータに影響をうけないことを学習する第２の学習サイクルを備える。これは、過去のデータから、反応すべき「シグナル」の特徴を学習するだけでなく、意味のない「ノイズ」に影響を受けないことをも学習するものである。 In the specific examples described below, incorrect data is entered by intentionally inputting incorrect data into AI in addition to the conventional first learning cycle that uses past data to reduce prediction errors. Provide a second learning cycle to learn not to be affected by. This not only learns the characteristics of the "signal" to react from past data, but also learns that it is not affected by meaningless "noise".

さらに好ましい形態では、人工知能から得られた結果の根拠を説明可能とするために、従来のディープラーニングが用いていた「多数決」に代わり、「和・積・否定」を基本要素として多層化したネットワーク構造により、予測式を構成する。 In a more preferred form, in order to be able to explain the basis of the results obtained from artificial intelligence, instead of the "majority vote" used in conventional deep learning, "sum, product, and denial" are used as basic elements in multiple layers. The prediction formula is constructed by the network structure.

これにより、従来の機械学習（ディープラーニング含む）が抱えていた、データが少ない事象に関しては、予測精度が低くなるという問題を回避し、少ないデータでも高度な予測能力をもちつつ、その結果に対し、排他的に分解して説明することができるようにできる。 This avoids the problem of low prediction accuracy for events with little data, which was inherent in conventional machine learning (including deep learning), and has a high level of prediction ability even with a small amount of data. , Can be exclusively disassembled and explained.

図１は、本発明の情報処理システムの具体例を示す概念図である。この具体例では、元データ（１０１）を入力し、元データに含まれる教師データ（正解データ）を予測する精度のよい予測モデルを出力する。ここで、予測モデルとは、具体的には予測のためのアルゴリズムである予測器１（１０６）と、そのパラメータである予測パラメータ（１１２）である。 FIG. 1 is a conceptual diagram showing a specific example of the information processing system of the present invention. In this specific example, the original data (101) is input, and an accurate prediction model for predicting the teacher data (correct answer data) included in the original data is output. Here, the prediction model is specifically a predictor 1 (106) which is an algorithm for prediction and a prediction parameter (112) which is a parameter thereof.

具体例として融資審査の予測を考える。元データは、融資先の情報（たとえば、住宅ローンなどの融資の申し込みデータに含まれる、性別、年齢、勤続年数、借り入れ金額、年収などの条件を規定する条件データ）であり、教師データは、融資の案件が貸し倒れになったかという過去の実績（結果）のデータすなわち、結果データである。条件データは説明変数に相当し、結果データは目的変数に相当する。様々な過去の融資先に関して、融資先の情報（説明変数）Ｍ個と貸し倒れしたかという教師データ（目的変数）１個の両者をあわせ、様々な融資先の過去の実績に関して、Ｎ個のデータセットを準備する。一件の融資は、Ｍ＋１個のデータからなるデータの束（即ちベクトル）で表される。このＭ＋１次元のベクトルデータを、Ｎ件分集めると、元データは、Ｎ行、Ｍ＋１列の表データあるいはデータベースやテキストデータになる。この情報処理システムは、融資の結果、その融資先が貸し倒れするかどうかを予測するモデル（予測式と予測パラメータ）を出力する。 As a specific example, consider the forecast of loan screening. The original data is the information of the lender (for example, the conditional data that defines the conditions such as gender, age, years of service, borrowing amount, annual income, etc. included in the loan application data such as a mortgage), and the teacher data is It is the data of the past performance (result) of whether the loan project has become bad debt, that is, the result data. Conditional data corresponds to explanatory variables, and result data corresponds to objective variables. With regard to various past lenders, M data of lenders (explanatory variable) and 1 teacher data (objective variable) indicating whether or not the loan has been lost are combined, and N data regarding the past performance of various lenders. Prepare the set. A loan is represented by a bundle (ie, a vector) of data consisting of M + 1 data. When this M + 1 dimensional vector data is collected for N cases, the original data becomes table data of N rows and M + 1 columns, or database or text data. This information processing system outputs a model (prediction formula and prediction parameter) that predicts whether or not the lender will be in bad debt as a result of the loan.

この情報処理システムを融資予測の例で説明する。まず元データを計算機処理しやすい形態に前処理する（１０２）。たとえば、データに勤務先分類が含まれていることを考える分類としては、金融業、製造業、公務員などというカテゴリで分類されているとする。これを、申込者が金融業の時には、１、そうでないときには０という１と０の数値に置き換える。これは勤務先が金融業であることをあらわす数値になる。カテゴリで分類されるデータはこのようにして、１と０の数値情報（カテゴリ毎に複数のデータカラムになる）に変換することができる。 This information processing system will be described with an example of loan forecasting. First, the original data is preprocessed into a form that can be easily processed by a computer (102). For example, it is assumed that the data is classified into categories such as financial industry, manufacturing industry, and civil servant as the classification considering that the work classification is included. This is replaced with the numbers 1 and 0, which are 1 when the applicant is in the financial industry and 0 when the applicant is not. This is a numerical value that indicates that the place of employment is in the financial industry. The data classified by category can be converted into numerical information of 1 and 0 (multiple data columns are formed for each category) in this way.

元データが数値データの場合を説明する。たとえば、年収の数値が入力されている場合には、これを年収の値を５段階に分類する。たとえば最も年収の高い分類を１億円以上とすれば、申込者の年収が１億円以上の場合には１とし、そうでない場合には０とする。これにより、年収のような数値情報も、０から１の正規化された情報に変換することができる。ただし、これを５つの分類すべてに行って１と０に変換すると、分類内の違いが丸められてしまう。たとえば、５００万円から１０００万円という分類に着目すると、５０１万円も９９９万円の申し込み者も同じカテゴリで同じ扱いになってしまう。これを避けるには、以下のようにする。たとえば、申し込み者の年収が５００万円以下の場合には０とし、１０００万円以上の場合には１とし、５００万円から１０００万円の場合には、（年収－５００万円）÷５００万円という式で０から１に変化する連続値（アナログ値）とする。これにより、年収に応じて、０から１の正規化された連続的に変化する数字にすることができる。これにより、元の連続的な変化の情報を丸めずに正規化することができる。 The case where the original data is numerical data will be described. For example, if a numerical value of annual income is input, the value of annual income is classified into five stages. For example, if the category with the highest annual income is 100 million yen or more, it is set to 1 if the applicant's annual income is 100 million yen or more, and 0 if it is not. As a result, numerical information such as annual income can be converted into normalized information from 0 to 1. However, if this is done for all five classifications and converted to 1 and 0, the differences within the classification will be rounded. For example, focusing on the classification of 5 million yen to 10 million yen, applicants of 5.01 million yen and 9.99 million yen will be treated the same in the same category. To avoid this, do the following: For example, if the applicant's annual income is 5 million yen or less, it is set to 0, if it is 10 million yen or more, it is set to 1, and if it is 5 to 10 million yen, (annual income-5 million yen) ÷ 500. It is a continuous value (analog value) that changes from 0 to 1 with the formula of 10,000 yen. This allows for a normalized, continuously changing number from 0 to 1 depending on the annual income. This allows the information of the original continuous change to be normalized without rounding.

この加工済みデータ（１２３）から学習用データ１（１０５）をデータ抽出器１（１０４）によって抽出する。加工済みデータはＮ行あるとすれば、これを学習用のより小さな単位毎に学習を行う。このため、もとのデータからランダムにデータを抽出する。このために、乱数生成１（１０３）を用いる。生成した乱数に対応するデータ行を抽出することで、ランダムに抽出を行うことができる。このような抽出のルールは、学習前に予め利用者（オペレータ）が設定しておくことができる。 The learning data 1 (105) is extracted from the processed data (123) by the data extractor 1 (104). If there are N rows of processed data, this is trained in smaller units for training. Therefore, data is randomly extracted from the original data. For this purpose, random number generation 1 (103) is used. Random sampling can be performed by extracting the data rows corresponding to the generated random numbers. Such extraction rules can be set in advance by the user (operator) before learning.

データ抽出器１の出力は二つある。ひとつは、学習用データ１（１０５）である。これは、説明変数を抽出したものである。もうひとつは、教師データ（１０７）である。これは学習用データ１（１０５）に対応する過去の実績（結果）データであり、融資の場合には、貸し倒れしたかどうかを１と０で数値化したものである（例えば、貸し倒れは「１」、貸し倒れでない場合は「０」とする）。 There are two outputs of the data extractor 1. One is learning data 1 (105). This is an extract of the explanatory variables. The other is teacher data (107). This is the past performance (result) data corresponding to the learning data 1 (105), and in the case of a loan, whether or not the loan is bad is quantified by 1 and 0 (for example, the bad debt is "1". , If it is not a bad debt, set it to "0").

この学習用データ１（１０５）を予測器１（１０６）に入力し、貸し倒れの確率を予測する。この予測器１は、予測パラメータ（１１２）を組み込んだ予測式に基づき予測値を計算する。予測式の具体例については後ほど図２を用いて詳細に説明するが、いずれにせよ、予測パラメータを組み込んだ数式である。この予測パラメータは最初、適当な初期値（たとえば乱数生成３（１１０）で生成した乱数を用いる）にしておく。したがって、最初予測結果の予測データ１（１０８）と過去の教師データ（１０７）は全くあわない。即ち誤差が大きい。しかし、この予測誤差を計算可能である。学習器１（１０９）の中では、この予測誤差を下記のように計算する。

予測誤差＝（教師データの数値）－（予測データの数値）
This learning data 1 (105) is input to the predictor 1 (106), and the probability of bad debt is predicted. The predictor 1 calculates a predicted value based on a prediction formula incorporating a prediction parameter (112). A specific example of the prediction formula will be described in detail later with reference to FIG. 2, but in any case, it is a formula incorporating prediction parameters. This prediction parameter is initially set to an appropriate initial value (for example, a random number generated by random number generation 3 (110) is used). Therefore, the prediction data 1 (108) of the first prediction result and the past teacher data (107) do not match at all. That is, the error is large. However, this prediction error can be calculated. In the learner 1 (109), this prediction error is calculated as follows.

Prediction error = (Numerical value of teacher data)-(Numerical value of predicted data)

そこで、予測式に含まれる予測パラメータ（１１２）のひとつひとつをわずかに変動（増加あるいは減少）させると、この予測誤差も変化する。予測誤差が小さくなるように予測パラメータを少しずつ変化（増減）させることで、予測誤差を小さくでき、予測式の精度を向上できる。 Therefore, if each of the prediction parameters (112) included in the prediction formula is slightly changed (increased or decreased), this prediction error also changes. By gradually changing (increasing or decreasing) the prediction parameters so that the prediction error becomes smaller, the prediction error can be reduced and the accuracy of the prediction formula can be improved.

この予測パラメータ（１１２）の調整を行うのが学習器１（１０９）である。具体的には、予測誤差を予測パラメータで微分し、その微分係数に比例する大きさだけ、予測パラメータ（１１２）を変動させることにより、効率よく予測誤差を下げ、予測精度を向上することができる。この比例係数は、学習パラメータ１（１１１）の具体例のひとつである。このように学習器１（１０９）は、予測パラメータ（１１２）を調整することで、予測器１（１０６）→予測データ１（１０８）→学習器１（１０９）→予測パラメータ（１１２）→予測器１（１０６）という学習サイクル上の処理を実行することにより、予測精度をある程度まで向上することができる。このような学習サイクルは、従来の教師あり機械学習の技術で行なうことができる。 It is the learner 1 (109) that adjusts this prediction parameter (112). Specifically, by differentiating the prediction error with the prediction parameter and changing the prediction parameter (112) by a magnitude proportional to the differential coefficient, the prediction error can be efficiently lowered and the prediction accuracy can be improved. .. This proportionality coefficient is one of the specific examples of the learning parameter 1 (111). In this way, the learner 1 (109) adjusts the prediction parameter (112), so that the predictor 1 (106) → the prediction data 1 (108) → the learner 1 (109) → the prediction parameter (112) → the prediction. By executing the process on the learning cycle of the device 1 (106), the prediction accuracy can be improved to some extent. Such a learning cycle can be performed by conventional supervised machine learning techniques.

ところが、予測対象が、融資における貸し倒れのように、稀にしか起きない事象の場合には、この学習だけでは十分な予測精度が実現できないという問題がある。 However, when the prediction target is an event that rarely occurs, such as a bad debt in a loan, there is a problem that sufficient prediction accuracy cannot be realized by this learning alone.

一般に、発生頻度の低い事象においては、特定の状況でたまたま起きた事象にあわせて予測パラメータを調整することで過剰適応に陥り、新たな状況ではむしろ予測精度を下げてしまうという「過学習」という現象が起きやすくなる。 In general, in low-frequency events, overfitting occurs by adjusting the prediction parameters according to the events that happen to occur in a specific situation, and in new situations, the prediction accuracy is rather lowered, which is called "overfitting". The phenomenon is more likely to occur.

本実施例では、このような稀にしかおきない事象にも精度よく予測を行うために、第２の学習サイクルを設けている。これを以下に説明する。 In this embodiment, a second learning cycle is provided in order to accurately predict such a rare event. This will be described below.

加工済みデータ（１２３）からデータ抽出器２（１１４）が学習用データ２（１１５）を抽出する。加工済みデータはＮ行あるとすれば、これを学習用のより小さな単位毎に学習を行う。このため、もとのデータからランダムにデータを抽出する。このために、乱数生成２（１０３）を用いる。学習用データ２（１１５）は、学習用データ１（１０５）と同じでもよい。このとき、平行して、教師データ（１０７）とは敢えて異なる改変データ（１１９）を自動で生成する。改変データの作成方法としては、もともと貸し倒れしていたデータ群（もともとは貸し倒れが１だったケース）に対して、１と０を敢えて混在させて割り当てたり、貸し倒れしていないデータ群に対して、同様に１と０を混在して割り当てることもできる。乱数生成４（１２２）を使って、このような元データとは異なる（誤った）データを割り当てることもできる。学習用データ２（１１５）の抽出のためのルールは、予め利用者（オペレータ）が設定しておくことができる。また、学習用データ２（１１５）における教師データ（すなわち改変データ（１１９））は、元データ（１０１）からのデータを用いず、元データの説明変数に対して、異なるラベルあるいは数値を目的変数として与えることで生成が可能である。 The data extractor 2 (114) extracts the learning data 2 (115) from the processed data (123). If there are N rows of processed data, this is trained in smaller units for training. Therefore, data is randomly extracted from the original data. For this purpose, random number generation 2 (103) is used. The learning data 2 (115) may be the same as the learning data 1 (105). At this time, in parallel, modified data (119) that is intentionally different from the teacher data (107) is automatically generated. As a method of creating modified data, 1 and 0 are intentionally mixed and assigned to the data group that was originally in bad debt (the case where the bad debt was originally 1), or to the data group that is not in bad debt. Similarly, 1 and 0 can be mixed and assigned. Random number generation 4 (122) can also be used to assign (wrong) data that is different from the original data. The rule for extracting the learning data 2 (115) can be set in advance by the user (operator). Further, the teacher data (that is, the modified data (119)) in the learning data 2 (115) does not use the data from the original data (101), and has a different label or numerical value as the objective variable with respect to the explanatory variable of the original data. It can be generated by giving as.

学習器２（１２０）では、学習器１（１０９）と同様に教師あり学習を行ない、予測パラメータ（１１２）を学習する。ただし、教師となるデータは改変データ（１１９）である。そして、学習の後、学習器２（１２０）では、改変データ（１１９）が予測パラメータに与える反応の大きさを評価する（反応性評価）。 In the learning device 2 (120), supervised learning is performed in the same manner as in the learning device 1 (109), and the prediction parameter (112) is learned. However, the data to be a teacher is modified data (119). Then, after learning, the learner 2 (120) evaluates the magnitude of the reaction given to the prediction parameter by the modified data (119) (reactivity evaluation).

なお、本実施例では予測器１（１０６）と予測器２（１１６）では、アルゴリズム（予測器）は共通である必要はないが、予測に用いる特徴量は共通なものが含まれる必要がある。これにより、予測器１（１０６）と予測器２（１１６）とで特徴量間で対応がつけられるようにする。 In this embodiment, the algorithm (predictor) does not have to be common between the predictor 1 (106) and the predictor 2 (116), but the features used for the prediction need to be common. .. As a result, the predictor 1 (106) and the predictor 2 (116) can make a correspondence between the feature quantities.

反応性評価では、例えばこの教師にはならない（誤った）改変データ（１１９）と予測器２（１１６）で予測した予測データ２（１１７）を比較して、誤差を計算する。そして、学習器２（１２０）は、予測器２（１１６）の各予測パラメータの変化に対する、改変データ（１１９）と予測データ２（１１７）の誤差の変化量を計算して評価する。ある予測パラメータの変化に対して誤差の変化が大きければ、その予測パラメータは改変データに対して敏感に反応するパラメータであるといえる。誤差の変化の大きさは、単純な方法としては、誤差の変化とパラメータの変化との比例係数の大きさに着目する。 In the reactivity evaluation, for example, the error is calculated by comparing the modified data (119) which is not a teacher and the predicted data 2 (117) predicted by the predictor 2 (116). Then, the learning device 2 (120) calculates and evaluates the amount of change in the error between the modified data (119) and the prediction data 2 (117) with respect to the change in each prediction parameter of the predictor 2 (116). If the change in error is large with respect to the change in a certain prediction parameter, it can be said that the prediction parameter is a parameter that reacts sensitively to the modified data. As a simple method, the magnitude of the change in the error focuses on the magnitude of the proportional coefficient between the change in the error and the change in the parameter.

また、別の反応性評価の手法では、改変データ（１１９）と予測データ２（１１７）の相関係数を計算して類似性を数値化する。そして、予測器２による予測式に用いる特徴量の変化に対し、予測データ２（１１７）と改変データ（１１９）との間の相関係数を計算することでも数値化できる。もし、ある特徴量の変化に対して両者の相関係数の変化が大きければ、その特徴量は改変データに対して敏感に反応するパラメータであるといえる。すなわち、この手法では相関係数の変化の大きさに着目する。 Further, in another method of reactivity evaluation, the correlation coefficient between the modified data (119) and the prediction data 2 (117) is calculated to quantify the similarity. Then, the change in the feature amount used in the prediction formula by the predictor 2 can be quantified by calculating the correlation coefficient between the prediction data 2 (117) and the modified data (119). If the change in the correlation coefficient between the two is large with respect to the change in a certain feature amount, it can be said that the feature amount is a parameter that reacts sensitively to the modified data. That is, this method focuses on the magnitude of the change in the correlation coefficient.

そこで、この敏感に反応する特徴量に関係する予測器１（１０６）のパラメータを０に近づける。なぜなら、このパラメータは、データに含まれる誤った情報やノイズ、さらにデータが少ないためにおきやすいデータの偏りに対し、敏感に反応してしまうパラメータであるためである。具体的な手法としては、各パラメータに重み係数を割り当てておき、改変データに対して敏感に反応するパラメータに対しては、他のパラメータよりも小さな重み係数を割り当てる。パラメータを小さくする方法としては、そのパラメータが大きくなると誤差が大きく見えるようなペナルティを与えることで、実質的に、そのパラメータを小さくすることができる。 Therefore, the parameter of the predictor 1 (106) related to this sensitively reacting feature is brought close to 0. This is because this parameter is a parameter that reacts sensitively to erroneous information and noise contained in the data, and data bias that tends to occur due to the small amount of data. As a specific method, a weighting coefficient is assigned to each parameter, and a weighting coefficient smaller than that of other parameters is assigned to a parameter that reacts sensitively to the modified data. As a method of reducing the parameter, the parameter can be substantially reduced by giving a penalty that the error looks large as the parameter becomes large.

データ抽出器２（１１４）でのデータ抽出方法の一例として、具体的には下記が効果的である。データ抽出器２（１１４）では、学習用データ１（１０５）において教師データが１のケースｐ個を抽出し、これにまだ学習していない加工済みデータ（１２３）の中から、教師データが１のケースｑ個を追加する。このデータセットを、学習用データ２（１１５）として抽出する。この学習用データ２における教師データは、本来ｐ＋ｑ個の１ばかりからなるデータセットになる。ここで、ｑ個の１を０に反転する。従って、ｐ個の１とｑ個の０からなる改変データ（１１９）が作れる。これはもちろん現実とは異なるものであるが、これを学習させると、それに敏感に反応する予測パラメータの変化は大きくなる。このようなパラメータは、データの偏りやノイズに過敏に反応するパラメータなので、これを０に近づけることで、予測精度を向上することができる。具体的には、各パラメータに重みをつけておき、敏感に反応する予測パラメータについては、他のパラメータより重みを小さくすれば良い。なお、上記のｐ、ｑなどのパラメータは、利用者（オペレータ）が予め設定することができる。 Specifically, the following is effective as an example of the data extraction method in the data extractor 2 (114). In the data extractor 2 (114), p cases with 1 teacher data are extracted from the training data 1 (105), and the teacher data is 1 from the processed data (123) that has not been learned yet. Add q cases of. This data set is extracted as learning data 2 (115). The teacher data in the learning data 2 is originally a data set consisting of only p + q 1s. Here, q 1's are inverted to 0. Therefore, modified data (119) consisting of p 1s and q 0s can be created. This is, of course, different from reality, but when this is learned, the changes in the predictive parameters that respond sensitively to it become large. Since such a parameter is a parameter that reacts sensitively to data bias and noise, it is possible to improve the prediction accuracy by making it close to 0. Specifically, each parameter may be weighted, and the weight of the sensitive predictive parameter may be smaller than that of the other parameters. The parameters such as p and q can be set in advance by the user (operator).

この予測器２（１１６）→予測データ２（１１７）→学習器２（１２０）→予測パラメータ（１１２）→予測器２（１１６）からなる学習サイクルを、データ抽出器２（１１４）から抽出される様々なケースについて学習することで、反応すべきでないデータに対して鈍感になるための学習ができる。 The learning cycle consisting of the predictor 2 (116) → the prediction data 2 (117) → the learner 2 (120) → the prediction parameter (112) → the predictor 2 (116) is extracted from the data extractor 2 (114). By learning about various cases, you can learn to be insensitive to data that should not react.

以上説明したように、図１の左側の予測器１（１０６）→予測データ１（１０８）→学習器１（１０９）→予測パラメータ（１１２）→予測器１（１０６）という学習サイクルが、データに潜む、反応すべき兆候に敏感に反応することを学習する学習サイクルである。これに対し、右側の予測器２（１１６）→予測データ２（１１７）→学習器２（１２０）→予測パラメータ（１１２）→予測器２（１１６）の学習サイクルが、反応すべきでない兆候への鈍感さを学習する学習サイクルになる。 As described above, the learning cycle of predictor 1 (106) → prediction data 1 (108) → learner 1 (109) → prediction parameter (112) → predictor 1 (106) on the left side of FIG. 1 is data. It is a learning cycle that learns to react sensitively to the signs that should be reacted. On the other hand, the learning cycle of predictor 2 (116) → prediction data 2 (117) → learner 2 (120) → prediction parameter (112) → predictor 2 (116) on the right side becomes a sign that it should not react. It becomes a learning cycle to learn the insensitivity of.

この両者の学習サイクルにより予測パラメータを学習することで、発生頻度が低い希な事業の予測精度を大幅に向上することができる。本実施例では、この両者の学習サイクルは同期して行なわれる。図１の左側の予測器１（１０６）を含む学習サイクルは、従来からあるディープニューラルネットワーク（ＤＮＮ）の学習方式を踏襲することができる。一方、図１の右側の予測器２（１１６）を含む学習サイクルは、従来からあるＤＮＮの学習方式を踏襲し、かつ、その学習結果に基づいて、左側の予測器１（１０６）を含む学習サイクルに基づく予測パラメータの変更を補正する。 By learning the prediction parameters by learning both of these learning cycles, it is possible to greatly improve the prediction accuracy of rare projects that occur infrequently. In this embodiment, both learning cycles are performed in synchronization. The learning cycle including the predictor 1 (106) on the left side of FIG. 1 can follow the learning method of the conventional deep neural network (DNN). On the other hand, the learning cycle including the predictor 2 (116) on the right side of FIG. 1 follows the conventional learning method of DNN, and based on the learning result, the learning including the predictor 1 (106) on the left side. Compensate for changes in predictive parameters based on the cycle.

なお、学習パラメータ１（１１１）、学習パラメータ２（１２１）は、それぞれの学習器１（１０９）、学習器２（１２０）に対して、学習を行なう前に利用者（オペレータ）が設定しておくものとする。学習パラメータを変更すると、学習結果（学習速度や予測精度）が変化するので、学習の結果を参照して利用者が学習パラメータを変更できるようにしておくのがよい。あるいは、学習パラメータを所定のルールで自動的に変更するようにしておき、各学習パラメータを用いたときの学習の結果に基づいて、好ましい学習パラメータを自動設定するようにしてもよい。 The learning parameter 1 (111) and the learning parameter 2 (121) are set by the user (operator) for each of the learning device 1 (109) and the learning device 2 (120) before learning. It shall be left. Since the learning result (learning speed and prediction accuracy) changes when the learning parameter is changed, it is better to allow the user to change the learning parameter by referring to the learning result. Alternatively, the learning parameters may be automatically changed according to a predetermined rule, and preferable learning parameters may be automatically set based on the learning result when each learning parameter is used.

また、データ抽出器１（１０４）、データ抽出器２（１１４）によるデータ抽出ルールは、学習を行なう前に利用者（オペレータ）が設定しておくものとする。データ抽出ルールを変更すると、学習結果（学習速度や予測精度）が変化するので、学習の結果を参照して利用者がデータ抽出ルールを変更できるようにしておくのがよい。あるいは、データ抽出ルールを所定のルールで自動的に変更するようにしておき、各データ抽出ルールを用いたときの学習の結果に基づいて、好ましいデータ抽出ルールを自動設定するようにしてもよい。 Further, the data extraction rule by the data extractor 1 (104) and the data extractor 2 (114) shall be set by the user (operator) before the learning. Since the learning result (learning speed and prediction accuracy) changes when the data extraction rule is changed, it is better to allow the user to change the data extraction rule by referring to the learning result. Alternatively, the data extraction rule may be automatically changed by a predetermined rule, and a preferable data extraction rule may be automatically set based on the learning result when each data extraction rule is used.

図１の中の予測器１（１０６）および予測器２（１１６）は、従来用いられていたＤＮＮを適用することができる。一般的なＤＮＮの各層は通常非線形の演算を行なう。しかし、さらに金融や産業界などでの社会的責任を伴う判断に人工知能技術（機械学習を含む）を適用する際には、予測の根拠が人間に理解可能な形で提供されることが求められている。しかし、一般のディープラーニングでは、予測結果がなぜ正しいのかの根拠を説明するのが困難であり、ブラックボックスであることが適用を阻む壁になっている。 For the predictor 1 (106) and the predictor 2 (116) in FIG. 1, the conventionally used DNN can be applied. Each layer of a typical DNN usually performs a non-linear operation. However, when applying artificial intelligence technology (including machine learning) to decisions involving social responsibility in finance and industry, it is required that the basis of prediction be provided in a form that can be understood by humans. Has been done. However, in general deep learning, it is difficult to explain the rationale for why the prediction results are correct, and the black box is a barrier to application.

図２に、図１の中の予測器１（１０６）および予測器２（１１６）の他の構成の例を示す。図２の入力層（２０１）は、x1からxmというm個のベクトルデータを入力する層である。このデータは、たとえば融資の申込者の年収や性別などの属性情報が入る。 FIG. 2 shows examples of other configurations of predictor 1 (106) and predictor 2 (116) in FIG. The input layer (201) in FIG. 2 is a layer for inputting m vector data from x1 to xm. This data contains attribute information such as the annual income and gender of the loan applicant.

この図で矢印は、データの流れを示す。この入力されたデータが処理層（２１１）で処理され、出力層（２１２）に予測値を出力する。学習用データ１（１０５）に含まれる様々なケース（融資の場合には、融資の申し込み案件）毎にこの予測値を出力したものが、予測データ１（１０８）である。 In this figure, the arrows indicate the flow of data. This input data is processed by the processing layer (211), and the predicted value is output to the output layer (212). The forecast data 1 (108) outputs the predicted value for each of various cases (in the case of a loan, a loan application project) included in the learning data 1 (105).

処理層（２１１）は、単層ないし複数の層から構成され、処理層１（２０２）では、入力層の複数のデータ間で積を計算する。これをp1，p2，…，pkとする。図中で、
p1＝x1×x2、p2＝x1×x3である。ここで×は算術積あるいは論理積を意味する。この積処理により、p1には「x1が１であり、かつx2が１」という複合指標が生成され、より詳細な条件を表現できるようになる。p2以降についても同様である。 The processing layer (211) is composed of a single layer or a plurality of layers, and in the processing layer 1 (202), a product is calculated among a plurality of data in the input layer. Let this be p1, p2, ..., pk. In the figure
p1 = x1 x x2, p2 = x1 x x3. Here, x means an arithmetic product or a logical product. By this product processing, a compound index that "x1 is 1 and x2 is 1" is generated in p1, and more detailed conditions can be expressed. The same applies to p2 and later.

処理層２（２０３）では、処理層１で生成された多数の組み合わせ指標から重要な指標を選択する。この例では、p1，p3，pkは選択されたが、p2は選択されていない（矢印がつながっていない）。この選択を具体的に行うには、例えば、処理層１で生成された多数の指標間で相関を計算し、相関の絶対値によって、指標間の類似性を数値化する。これにより、類似な指標を固まり（クラスタ）としてまとめて、そのクラスタ毎に最も教師データと最も相関の高い指標を選択する。これにより、類似な指標を間引き、使用する指標を互いに独立性の高い指標にすることができる。指標が互いに独立であるほど、予測式は安定性が高まる。 In the processing layer 2 (203), an important index is selected from a large number of combination indexes generated in the processing layer 1. In this example, p1, p3, and pk are selected, but p2 is not selected (arrows are not connected). To make this selection concretely, for example, the correlation is calculated among a large number of indexes generated in the processing layer 1, and the similarity between the indexes is quantified by the absolute value of the correlation. As a result, similar indexes are grouped together as a cluster, and the index having the highest correlation with the teacher data is selected for each cluster. As a result, similar indicators can be thinned out, and the indicators used can be made highly independent of each other. The more independent the indicators are, the more stable the forecast formula is.

処理層２で選択された指標（これをq1，q2，…qoとする)は処理層３（２０４）の入力になる。処理層３では、この指標を組合せて重みつき和を計算する。具体的には、

重み付つき和＝w1×q1＋w2×q2＋…

であり、ここでw1，w2，…はそれぞれの指標の重みである。重みの値が大きいということは、その指標を重視することに対応する。図１では、このq1，q2，…に相当する矢印が、破線にしているが、これは重みつきであることを区別するためである。 The index selected by the processing layer 2 (referred to as q1, q2, ... qo) is the input of the processing layer 3 (204). In the processing layer 3, the weighted sum is calculated by combining these indexes. In particular,

Weighted sum = w1 x q1 + w2 x q2 + ...

And here w1, w2, ... are the weights of each index. A large weight value corresponds to an emphasis on that index. In FIG. 1, the arrows corresponding to q1, q2, ... Are broken lines, but this is to distinguish that they are weighted.

処理層３の出力は、さらに処理層４（２０５）の入力となる。処理層４では、重みつき和を非線形関数に入力する。非線形関数としては、シグモイド関数やしきい値以下では０しきい値以上では直線的に上昇するランプ関数を用いる。これにより、非線形な依存性を表現できる。処理層３の重みつき和と処理層４の非線形関数を合わせて多数決論理とも呼ぶ。 The output of the processing layer 3 is further an input of the processing layer 4 (205). In the processing layer 4, the weighted sum is input to the nonlinear function. As the nonlinear function, a sigmoid function or a ramp function that rises linearly above the 0 threshold value below the threshold value is used. This makes it possible to express a non-linear dependency. The weighted sum of the processing layer 3 and the nonlinear function of the processing layer 4 are collectively called a majority logic.

このように、図２で○によって表現した演算処理（積、選択、重みつき和、非線形関数）を組み合わせたり、順序を変えたり、ネットワークの接続形態を変えたりして、複雑な関数（予測式）を表現できる。また、重みつき和（２０４）に用いる重みや選択層（２０３）に用いる選択基準（例えば所定の相関以下の指標を独立なクラスタにする場合に、所定の相関値）をパラメータとして変化させることで、関数を柔軟に変化させることが可能である。図１において予測パラメータ（１１２）として表現したのは、これらの重みや選択基準等のパラメータを指している。 In this way, complex functions (prediction formulas) can be combined by combining the arithmetic processes (product, selection, weighted sum, nonlinear function) represented by ○ in FIG. 2, changing the order, and changing the network connection form. ) Can be expressed. Further, by changing the weight used for the weighted sum (204) and the selection criterion used for the selection layer (203) (for example, a predetermined correlation value when an index having a predetermined correlation or less is made into an independent cluster) as parameters. , It is possible to change the function flexibly. In FIG. 1, what is expressed as a prediction parameter (112) refers to these parameters such as weights and selection criteria.

この図では４つ以上の処理層を含む例を示したが、最も単純な場合には、入力層の指標をそのまま出力させることも可能である。逆に、このような多様な処理層を多層に組み合わせて極めて複雑な予測式を作ることも可能である。 In this figure, an example including four or more processing layers is shown, but in the simplest case, the index of the input layer can be output as it is. On the contrary, it is also possible to combine such various processing layers in multiple layers to create an extremely complicated prediction formula.

ここで、処理層を積と選択と重みつき和だけの組合せで構成し、出力層だけに非線形層を用いることで、予測式は、

Y＝σ[Σw(Πxi)] σ[・]は非線形関数（例えばシグモイド関数）を表す。
(例えばy＝w1(x1)(x2)＋w2(x2)(x3)(x8)(x9)、この場合はσは恒等関数)

という形にすることができる。上記の例では、結果（出力）は「x1かつx2である」ことと、「x2かつx3かつx8かつx9である」ことによって決まっていることが分かる。このように、予測の結果を常にその要因に分解し、式を人が理解できる言葉で説明することが可能になる。これは従来のディープラーニングやニューラルネットにはない特徴である。 Here, by constructing the processing layer with a combination of only the product, selection, and weighted sum, and using the non-linear layer only for the output layer, the prediction formula can be expressed.

Y ＝ σ [Σw (Πxi)] σ [・] represents a nonlinear function (for example, a sigmoid function).
(For example, y ＝ w1 (x1) (x2) ＋ w2 (x2) (x3) (x8) (x9), in this case σ is an identity function)

Can be made into the form. In the above example, it can be seen that the result (output) is determined by "x1 and x2" and "x2 and x3 and x8 and x9". In this way, it is possible to always decompose the prediction results into the factors and explain the formula in human-understandable words. This is a feature not found in conventional deep learning and neural networks.

図３に本実施例のシステム構成図を示す。本実施例のハードウェア構成は、一般的な情報処理装置、例えばサーバで構成することができる。情報処理装置は、処理装置（３０１）と記憶装置を含む。記憶装置は、たとえばデータベース（３０２）、プログラム記憶装置（３０３）、演算記憶装置（３０４）を含む。また、図示しないが、情報処理装置として一般的な、入力装置および出力装置を備える。 FIG. 3 shows a system configuration diagram of this embodiment. The hardware configuration of this embodiment can be configured by a general information processing device, for example, a server. The information processing device includes a processing device (301) and a storage device. The storage device includes, for example, a database (302), a program storage device (303), and an arithmetic storage device (304). Further, although not shown, an input device and an output device, which are general as an information processing device, are provided.

処理装置（３０１）は、プログラム記憶装置（３０３）に格納された各種プログラムを実行する。 The processing device (301) executes various programs stored in the program storage device (303).

データベース（３０２）は、例えば磁気ディスク装置であって、予測パラメータ（１１２）、加工済みデータ（１２３）、元データ（１０１）、学習用データ１（１０５）、学習用データ２（１１５）、教師データ（１０７）、改変データ（１１９）、学習パラメータ１（１１１）、学習パラメータ２（１２１）等を記憶している。 The database (302) is, for example, a magnetic disk device, and is a prediction parameter (112), processed data (123), original data (101), learning data 1 (105), learning data 2 (115), and a teacher. Data (107), modified data (119), learning parameter 1 (111), learning parameter 2 (121), and the like are stored.

プログラム記憶装置（３０３）は、前処理器（１０２）、乱数生成（１０３，１１３，１１０，１２２）、データ抽出器１（１０４）、データ抽出器２（１１４）、予測器１（１０６）、予測器２（１１６）、学習器１（１０９）、学習器２（１２０）等のプログラムを格納している。 The program storage device (303) includes a preprocessing unit (102), a random number generator (103, 113, 110, 122), a data extractor 1 (104), a data extractor 2 (114), a predictor 1 (106), and the like. It stores programs such as a predictor 2 (116), a learner 1 (109), and a learner 2 (120).

演算記憶装置（３０４）は、データベース（３０２）やプログラム記憶装置（３０３）から読み出したデータを一時的に格納したり、処理装置（３０１）が演算などを行なう際のデータを格納する。プログラム記憶装置（３０３）や演算記憶装置（３０４）は、公知の各種半導体メモリを使用することができる。 The arithmetic storage device (304) temporarily stores data read from a database (302) or a program storage device (303), or stores data when a processing device (301) performs an operation or the like. As the program storage device (303) and the arithmetic storage device (304), various known semiconductor memories can be used.

本実施例では計算や制御等の機能は、プログラム記憶装置（３０３）に格納されたプログラムが処理装置（３０１）によって実行されることで、定められた処理を他のハードウェアと協働して実現される。計算機などが実行するプログラム、その機能、あるいはその機能を実現する手段を、「機能」、「手段」、「部」、「器」、「モジュール」等と呼ぶ場合がある。また、この構成は、単体のコンピュータで構成してもよいし、あるいは、入力装置、出力装置、処理装置、記憶装置の任意の部分が、ネットワークで接続された他のコンピュータで構成されてもよい。また、本実施例中、プログラムを使用して構成した機能と同等の機能は、FPGA（Field Programmable Gate Array）、ASIC（Application Specific Integrated Circuit）などのハードウェアでも実現できる。そのような態様も本実施例の範囲に含まれる。 In this embodiment, the functions such as calculation and control are performed by the processing device (301) executing the program stored in the program storage device (303) in cooperation with other hardware. It will be realized. A program executed by a computer, its function, or a means for realizing the function may be referred to as a "function", a "means", a "part", a "vessel", a "module", or the like. Further, this configuration may be configured by a single computer, or any part of the input device, the output device, the processing device, and the storage device may be configured by another computer connected by a network. .. Further, in this embodiment, the same function as the function configured by using the program can be realized by hardware such as FPGA (Field Programmable Gate Array) and ASIC (Application Specific Integrated Circuit). Such aspects are also included in the scope of this example.

図４は学習器２（１２０）の詳細を示すブロック図である。学習器２（１２０）は、学習部（１２０１）と、反応性評価部（１２０２）と、パラメータ補正部（１２０３）を含む。 FIG. 4 is a block diagram showing details of the learner 2 (120). The learning device 2 (120) includes a learning unit (1201), a reactivity evaluation unit (1202), and a parameter correction unit (1203).

図５は学習器２（１２０）が行なう処理の流れ図である。処理Ｓ５０１で学習部（１２０１）は、改変データ（１１９）を教師データとして、従来の教師有り学習を行なう。ただし、既述のように改変データ（１１９）は、例えば本来「貸し倒れあり（１）」であった加工済みデータのいくつかを、「貸し倒れなし（０）」に変更したデータである。あるいは本来「貸し倒れなし（０）」であった加工済みデータのいくつかを、「貸し倒れあり（１）」に変更したデータでもよい。学習部（１２０１）での学習の結果、改変データ（１１９）との誤差が小さくなるように、予測パラメータが計算される。 FIG. 5 is a flow chart of processing performed by the learner 2 (120). In the process S501, the learning unit (1201) performs the conventional supervised learning using the modified data (119) as the teacher data. However, as described above, the modified data (119) is, for example, data in which some of the processed data that was originally "with bad debt (1)" is changed to "no bad debt (0)". Alternatively, some of the processed data that was originally "no bad debt (0)" may be changed to "with bad debt (1)". As a result of learning in the learning unit (1201), the prediction parameters are calculated so that the error with the modified data (119) becomes small.

処理Ｓ５０２で反応性評価部（１２０２）は、各パラメータの改変データ（１１９）に対する反応の敏感さを評価する（反応性評価）。このため、既述のように、例えば予測パラメータの変化に対する予測誤差の変化を評価する。そして、改変データに対して敏感な予測パラメータを抽出する。 In the process S502, the reactivity evaluation unit (1202) evaluates the sensitivity of the reaction to the modified data (119) of each parameter (reactivity evaluation). Therefore, as described above, for example, the change in the prediction error with respect to the change in the prediction parameter is evaluated. Then, the prediction parameters sensitive to the modified data are extracted.

処理Ｓ５０２でパラメータ補正部（１２０３）は、敏感なパラメータについて「鈍感になる」ような補正を行なう。そのための一つの方法は、Ｓ５０２で抽出した敏感なパラメータについては、学習器１（１０９）で学習したパラメータの値に他のパラメータより小さな重みをつける。あるいは、そのパラメータをゼロにする。このために、学習器２（１２０）は、予測パラメータ（１１２）を補正する。 In the process S502, the parameter correction unit (1203) corrects sensitive parameters so as to be “insensitive”. One method for that is to give a smaller weight to the value of the parameter learned by the learner 1 (109) than the other parameters for the sensitive parameter extracted by S502. Alternatively, set the parameter to zero. To this end, the learner 2 (120) corrects the prediction parameter (112).

他の方法としては、敏感なパラメータについては学習器１（１０９）における学習で、通常とは逆に予測誤差が大きくなるような学習を行なう。このためには、学習器２（１２０）は、学習器１（１０９）の特定のパラメータについて、学習アルゴリズムに補正を加える。予測誤差が大きくなるような学習を行なうことにより、改変データの影響をより強く抑制することができる。以上は、敏感なパラメータを「鈍感にする」具体例であるが、これら複数の方式を組み合わせても良い。 As another method, the sensitive parameters are learned by the learning device 1 (109) so that the prediction error becomes large, contrary to the usual learning. To this end, the learner 2 (120) modifies the learning algorithm for specific parameters of the learner 1 (109). By learning so that the prediction error becomes large, the influence of the modified data can be suppressed more strongly. The above is a specific example of "desensitizing" sensitive parameters, but these multiple methods may be combined.

別の例では、図１において、元データを変えることで、この同じ情報処理システムを、投資判断のための予測にも用いることができる。この場合には、元データは、投資先の会社の経営情報や財務情報や対象市場の状況を表す数値群（Ｍ個）である。教師データは、投資の結果投資先から得られたリターン（たとえば得られた配当の額）の実績のデータ１個である。様々な投資先Ｎ件に関して、この投資先の情報と結果のリターンの情報を入力し、未知の投資先に投資したときに、どれほどリターンが得られるかのモデルを出力する。 In another example, in FIG. 1, by changing the original data, this same information processing system can also be used for forecasting for investment decisions. In this case, the original data is a numerical group (M pieces) representing the management information and financial information of the investee company and the situation of the target market. The teacher data is one piece of actual data of the return (for example, the amount of dividend obtained) obtained from the investee as a result of the investment. For N various investment destinations, this investment destination information and result return information are input, and a model of how much return can be obtained when investing in an unknown investment destination is output.

元となるデータとしては、Ｍ＋１列、Ｎ行のデータセットになり、これを表形式やテキストやデータベースの形態で元データ（１０１）に入力する。 The original data is a data set of M + 1 columns and N rows, and this is input to the original data (101) in the form of a table, text, or database.

これ以外にも、サプライチェーンにおける在庫や欠品の予測に用いることが可能になる。この場合には、在庫や欠品などの状況、納期、曜日、天気などの情報を説明変数にして、結果の在庫や欠品（受注残）の量を教師データ（目的変数）にして表形式のデータを入力することができる。 In addition to this, it can be used for forecasting inventory and shortages in the supply chain. In this case, information such as inventory and shortage status, delivery date, day of the week, weather, etc. is used as an explanatory variable, and the amount of resulting inventory and shortage (order backlog) is used as teacher data (objective variable) in a tabular format. You can enter the data of.

プラントにおける事故の予測に用いることも可能である。この場合には、プラントから収集された温度や圧力などのセンサ値や従業員の特徴（経験他）を説明変数とし、結果として事故が起きたかを教師データにする。 It can also be used to predict accidents in plants. In this case, sensor values such as temperature and pressure collected from the plant and employee characteristics (experience, etc.) are used as explanatory variables, and whether an accident has occurred as a result is used as teacher data.

さらに、製造ラインにおける不良の予測が可能になる。製造装置の稼働情報や温度などの条件、さらに環境温度や材料の仕入れ先などの情報を説明変数にし、教師データ（目的変数）に不良の有無を入力する。 Furthermore, it becomes possible to predict defects in the production line. Conditions such as manufacturing equipment operation information and temperature, as well as information such as environmental temperature and material supplier are used as explanatory variables, and the presence or absence of defects is input to the teacher data (objective variable).

新商品のヒットの予測にも用いることが可能である。これまでの商品の属性（製品カテゴリ、色、名称の特徴、値段など）や投入時期などを説明変数にし、発売後の売上を教師データ（目的変数）にすることができる。 It can also be used to predict new product hits. The attributes of the product so far (product category, color, characteristics of the name, price, etc.) and the launch time can be used as explanatory variables, and the sales after the sale can be used as teacher data (objective variable).

本発明は、説明変数と教師データからなるデータを準備できれば、ここに挙げた以外の幅広い用途に適用することができる。 The present invention can be applied to a wide range of applications other than those listed here if data consisting of explanatory variables and teacher data can be prepared.

以上説明した実施例では、機械学習を用いて、データから予測のモデル式を生成する際、発生頻度が低い稀な事象では、特定の状況でたまたま起きた事象にあわせてパラメータを調整することで過剰適応に陥り、予測精度を下げてしまうという「過学習」が起きるという点に着目した。そして、過去のデータを使って予測誤差を小さくする第１の学習に加え、意図的に間違ったデータをＡＩに入力することで、間違ったデータに影響をうけないことを学習する第２の学習サイクルを備える構成を提案している。 In the above-described embodiment, when a model formula for prediction is generated from data using machine learning, in rare events that occur infrequently, the parameters are adjusted according to the event that happened to occur in a specific situation. We focused on the fact that "over-learning" occurs, which leads to overfitting and lowers prediction accuracy. Then, in addition to the first learning that uses past data to reduce the prediction error, the second learning that learns not to be affected by the wrong data by intentionally inputting the wrong data into the AI. We are proposing a configuration with a cycle.

１０１・・・元データ
１０２・・・前処理器
１０３・・・乱数生成１
１０４・・・データ抽出器１
１０５・・・学習用データ１
１０６・・・予測器１
１０７・・・教師データ
１０８・・・予測データ１
１０９・・・学習器１
１１０・・・乱数生成３
１１１・・・学習パラメータ１
１１２・・・予測パラメータ
１１３・・・乱数生成２
１１４・・・データ抽出器２
１１５・・・学習用データ２
１１６・・・予測器２
１１７・・・予測データ２
１１９・・・教師データとは異なる改変データ
１２０・・・学習器２
１２１・・・学習パラメータ２
１２２・・・乱数生成４
１２３・・・加工済みデータ 101 ... Original data 102 ... Preprocessing device 103 ... Random number generation 1
104 ... Data extractor 1
105 ... Learning data 1
106 ... Predictor 1
107 ... Teacher data 108 ... Prediction data 1
109 ・・・ Learner 1
110 ... Random number generation 3
111 ... Learning parameter 1
112 ... Prediction parameter 113 ... Random number generation 2
114 ... Data extractor 2
115 ... Learning data 2
116 ... Predictor 2
117 ... Forecast data 2
119 ... Modified data different from teacher data 120 ... Learner 2
121 ... Learning parameter 2
122 ... Random number generation 4
123 ・・・ Processed data

Claims

In an information processing system that inputs original data and outputs prediction results
At least the first data and the second data are generated from the original data,
The first prediction formula for making a prediction using the first data has at least one parameter.
It has a first learner that adjusts the parameters using the first prediction result by the first prediction formula.
The second prediction formula for making a prediction using the second data has at least one parameter.
It has a second learner that adjusts the parameter using the second prediction result by the second prediction formula.
The parameters adjusted by the first learner and the parameters adjusted by the second learner have at least one common parameter.
An information processing system characterized in that the teacher data in the second data is data given a label or a numerical value different from the original data without using the data from the original data.

In the information processing system of claim 1,
The first prediction formula is an information processing system characterized by including a weighted sum and a non-linear function.

In the information processing system of claim 1,
The first prediction formula is an information processing system characterized by including a product and a weighted sum.

In the information processing system of claim 1,
The second learning device includes a learning unit and a reactivity evaluation unit.
The learning unit adjusts a plurality of parameters including the common parameter.
The plurality of parameters are adjusted so that the error between the second data and the second prediction result is small.
The reactivity evaluation unit
An information processing system that extracts, among the plurality of parameters, a parameter in which the amount of change in the error is larger than a predetermined value with respect to the change in the parameter.

In the information processing system of claim 1,
The second learning device includes a learning unit and a reactivity evaluation unit.
The learning unit adjusts a plurality of parameters including the common parameter.
The plurality of parameters are adjusted so that the error between the second data and the second prediction result is small.
The reactivity evaluation unit
An information processing system that extracts, among the plurality of parameters, a parameter in which the amount of change in the correlation coefficient between the second data and the second prediction result is larger than a predetermined value with respect to the change in the parameter.

In the information processing system of claim 1,
The second learning device includes a learning unit, a reactivity evaluation unit, and a parameter correction unit.
The learning unit adjusts a plurality of parameters including the common parameter.
The plurality of parameters are adjusted so that the error between the second data and the second prediction result is small.
The reactivity evaluation unit
From the plurality of parameters, a parameter in which the error or the amount of change in the correlation coefficient between the second data and the second prediction result is larger than a predetermined value with respect to the change in the parameter is extracted.
The parameter correction unit
An information processing system that corrects the parameters adjusted by the first learner with respect to the extracted parameters.

In the information processing system of claim 6 ,
The parameter correction unit
An information processing system that corrects the extracted parameters to reduce the weights of the parameters adjusted by the first learner.

In the information processing system of claim 6 ,
The parameter correction unit
An information processing system that corrects the extracted parameters so that the parameters adjusted by the first learner approach zero.

In the information processing system of claim 6 ,
The parameter correction unit
An information processing system in which the first learner adjusts the plurality of extracted parameters so that the error between the first data and the first prediction result becomes large.

Prepare multiple teacher data consisting of a set of explanatory variables and the first result data,
Prepare a plurality of first training data consisting of a set of explanatory variables,
The first prediction data is obtained from the first learning data by using the prediction formula using the prediction parameters consisting of a plurality of parameters.
The prediction parameters are changed to obtain the first prediction parameters so that the error between the first result data and the first prediction data becomes small.
Prepare multiple modified data consisting of a set of explanatory variables and the second result data,
Prepare a plurality of second training data consisting of a set of explanatory variables,
The second prediction data is obtained from the second learning data by using the prediction formula using the prediction parameters.
The prediction parameter was changed to obtain the second prediction parameter so that the error between the second result data and the second prediction data would be small.
At least one of the change in the error with respect to the change in the second prediction parameter and the change in the correlation coefficient between the second result data and the second prediction data with respect to the change in the second prediction parameter. Evaluate, extract a predetermined parameter from the prediction parameters,
Among the first prediction parameters, the first prediction parameter is adjusted with respect to the extracted parameter corresponding to the predetermined parameter.
The teacher data is a part of the original data, and the modified data is data obtained by modifying the original data and is different from the original data.
Information processing system learning method.

In the learning method of the information processing system of claim 10 ,
Of the first prediction parameters, the parameters corresponding to the predetermined parameters are corrected to reduce the weight of the first prediction parameters.
Information processing system learning method.

In the learning method of the information processing system of claim 10 ,
By changing the prediction parameter so that the error between the first result data and the first prediction data becomes large for the parameter corresponding to the predetermined parameter among the first prediction parameters. Correct the first prediction parameter,
Information processing system learning method.

In the learning method of the information processing system of claim 10 ,
The teacher data is a part of the original data, and the modified data is data obtained by modifying the original data and is different from the original data.
The method of modifying the original data can be changed.
Information processing system learning method.

Prepare multiple teacher data consisting of a set of explanatory variables and the first result data,
Prepare a plurality of first training data consisting of a set of explanatory variables,
The first prediction data is obtained from the first learning data by using the prediction formula using the prediction parameters consisting of a plurality of parameters.
The prediction parameters are changed to obtain the first prediction parameters so that the error between the first result data and the first prediction data becomes small.
Prepare multiple modified data consisting of a set of explanatory variables and the second result data,
Prepare a plurality of second training data consisting of a set of explanatory variables,
The second prediction data is obtained from the second learning data by using the prediction formula using the prediction parameters.
The prediction parameter was changed to obtain the second prediction parameter so that the error between the second result data and the second prediction data would be small.
At least one of the change in the error with respect to the change in the second prediction parameter and the change in the correlation coefficient between the second result data and the second prediction data with respect to the change in the second prediction parameter. Evaluate, extract a predetermined parameter from the prediction parameters,
Among the first prediction parameters, the first prediction parameter is adjusted with respect to the extracted parameter corresponding to the predetermined parameter.
The teacher data is a part of the original data, and the modified data is data obtained by modifying the original data and is different from the original data.
The method of modifying the original data can be changed.
Information processing system learning method.