JP7386203B2

JP7386203B2 - Information processing device and information processing method

Info

Publication number: JP7386203B2
Application number: JP2021079487A
Authority: JP
Inventors: 文江中屋; 公司田中; 佳範城代; 信治三浦
Original assignee: Hitachi Social Information Services Ltd
Current assignee: Hitachi Social Information Services Ltd
Priority date: 2021-05-10
Filing date: 2021-05-10
Publication date: 2023-11-24
Anticipated expiration: 2041-05-10
Also published as: JP2022173656A

Description

特許法第３０条第２項適用（１）令和２年１０月２８日に城代佳範、中屋文江、西山大輔、栗原義人が公開（２）令和２年１１月２４日に営業統括本部が公開Article 30, Paragraph 2 of the Patent Act applies (1) Published by Yoshinori Shiro, Fumie Nakaya, Daisuke Nishiyama, and Yoshito Kurihara on October 28, 2020 (2) Sales supervisor on November 24, 2020 Headquarters released

本発明は、情報処理装置および情報処理方法に関する。 The present invention relates to an information processing device and an information processing method.

近年、デジタルトランスフォーメーション（ＤＸ：Digital Transformation）推進の技術開発が盛んであり、その一環として、ソフトウェア開発等のプロジェクトの状態監視が重要視されている。状態監視は、重点監視と広域監視の２つに分けることができる。重点監視は、管理部署によるプロジェクトの監視である。広域監視は、全プロジェクトを対象にし、所定のツールを利用した監視である。状態監視では、まず、ツールに入力された原価などのデータに基づく広域監視で失敗の可能性や失敗時の影響が大きいと判断されたプロジェクトを重点監視の対象とする。次に、管理部署による支援により、重点監視の対象となるプロジェクトの悪化防止や改善を図る。 In recent years, technological development to promote digital transformation (DX) has been active, and as part of this, monitoring of the status of projects such as software development has become important. Status monitoring can be divided into two types: focused monitoring and wide area monitoring. Priority monitoring is project monitoring by the management department. Wide-area monitoring is monitoring that covers all projects and uses predetermined tools. In status monitoring, projects that are determined to have a high probability of failure or a large impact in the event of failure are targeted for priority monitoring through wide-area monitoring based on data such as costs entered into the tool. Next, with support from the management department, efforts will be made to prevent deterioration and improve projects that are subject to priority monitoring.

従来の広域監視は、原価が見積を超えるなどの問題が顕在化してから該当のプロジェクトを抽出するものであった。プロジェクトを成功に導くという目的に照らし合わせれば、問題が顕在化していない早期の段階でプロジェクトの悪化予兆を検知することが好ましい。しかし、従来の広域監視では、問題が顕在化していない早期の段階でプロジェクトの悪化予兆を検知することが困難であるという問題があった。 Conventional wide-area monitoring involves identifying projects after problems such as costs exceeding estimates become apparent. In light of the objective of leading the project to success, it is preferable to detect signs of deterioration in the project at an early stage before any problems have become apparent. However, with conventional wide-area monitoring, there is a problem in that it is difficult to detect signs of project deterioration at an early stage before problems have manifested.

また、従来の広域監視では、懸念されるプロジェクトを本当に重点監視の対象とするか否かの判断のための調査が行われる。調査は、例えば、プロジェクトの規模や見積原価超過額などの調査である。しかし、このような調査は多大な人的コストがかかっていた。このため、受注金額が大きいプロジェクトなど、状況が悪化した場合の影響が大きいプロジェクトが優先的に対応され、小規模なプロジェクトを含むすべてのプロジェクトを対象とした監視は行われないのが現状である。つまり、従来の広域監視では、重点監視の対象とすべきプロジェクトの判断に人的コストが大きいという問題があった。 Furthermore, in conventional wide-area monitoring, an investigation is conducted to determine whether or not a project of concern should really be targeted for priority monitoring. The investigation is, for example, an investigation into the scale of the project and the estimated cost excess. However, such investigations required a large amount of human resources. For this reason, priority is given to projects that would have a large impact if the situation worsens, such as projects with large order amounts, and currently monitoring is not conducted for all projects, including small-scale projects. . In other words, conventional wide-area monitoring has had the problem of high human cost in determining which projects should be targeted for priority monitoring.

なお、特許文献１には、ソフトウェア開発のリスクに繋がる要因の特定を可能にする発明が開示されている。 Note that Patent Document 1 discloses an invention that makes it possible to identify factors that lead to risks in software development.

特開２０１９－１４８８７４号公報Japanese Patent Application Publication No. 2019-148874

本発明は、このような事情に鑑みて、広域監視にて、プロジェクトの悪化予兆を早期に検知するとともに、重点監視の対象にするか否かの判断の人的コストを低減することを課題とする。 In view of these circumstances, the present invention aims to detect signs of deterioration of a project at an early stage through wide-area monitoring, and to reduce the human cost of determining whether or not to target a project for priority monitoring. do.

前記課題を解決する本発明は、
終了した第１プロジェクトの監視情報で訓練した予測モデルを生成する生成部と、
仕掛かりの第２プロジェクトの説明変数を前記予測モデルに入力し、前記説明変数を用いた判定条件を組み合わせた複数の決定木を利用し、多数決をとって前記第２プロジェクトの見積原価推定超過の予測値を出力するとともに、SHAP（Shapley Additive exPlanations）アルゴリズムを用いて求めた、前記予測値に対する説明変数の寄与率に従い、前記予測値の根拠となる説明変数を出力する予測部とを備える情報処理装置である。 The present invention for solving the above problems includes:
a generation unit that generates a predictive model trained using the monitoring information of the completed first project;
Input the explanatory variables of the second project in progress into the prediction model , use multiple decision trees that combine judgment conditions using the explanatory variables, and take a majority vote to determine whether the estimated cost of the second project exceeds the estimated cost. An information processing unit that outputs a predicted value and also outputs an explanatory variable that is the basis of the predicted value according to the contribution rate of the explanatory variable to the predicted value, which is determined using the SHAP (Shapley Additive exPlanations) algorithm. It is a device.

また、本発明は、
情報処理装置が、
終了した第１プロジェクトの監視情報で訓練した予測モデルを生成するステップと、
仕掛かりの第２プロジェクトの説明変数を前記予測モデルに入力し、前記説明変数を用いた判定条件を組み合わせた複数の決定木を利用し、多数決をとって前記第２プロジェクトの見積原価推定超過の予測値を出力するとともに、を出力するとともに、SHAPアルゴリズムを用いて求めた、前記予測値に対する説明変数の寄与率に従い、前記予測値の根拠となる説明変数を出力するステップを実行する情報処理方法である。 Moreover, the present invention
The information processing device
a step of generating a predictive model trained with the monitoring information of the completed first project;
Input the explanatory variables of the second project in progress into the prediction model , use multiple decision trees that combine judgment conditions using the explanatory variables, and take a majority vote to determine whether the estimated cost of the second project exceeds the estimated cost. An information processing method comprising: outputting a predicted value; and outputting an explanatory variable that is a basis for the predicted value according to a contribution rate of the explanatory variable to the predicted value, which is determined using the SHAP algorithm. It is.

本発明によれば、広域監視にて、プロジェクトの悪化予兆を早期に検知するとともに、重点監視の対象にするか否かの判断の人的コストを低減することができる。 According to the present invention, it is possible to detect signs of deterioration of a project at an early stage through wide-area monitoring, and to reduce the human cost of determining whether or not to target a project for intensive monitoring.

本実施形態における情報処理装置の機能構成図の例である。It is an example of a functional block diagram of an information processing device in this embodiment. 予測部の出力情報の画面例である。It is an example of a screen of output information of a prediction part. 本実施形態の処理を示すフローチャートの例である。It is an example of a flowchart showing processing of this embodiment.

≪第１実施形態≫
［構成］
図１に示す情報処理装置１００は、広域監視により仕掛かりのプロジェクトの悪化予兆を検知するコンピュータである。情報処理装置１００は、入力部、出力部、制御部、および、記憶部といったハードウェアを備える。例えば、制御部がＣＰＵ（Central Processing Unit）から構成される場合、その制御部を含むコンピュータによる情報処理は、ＣＰＵによるプログラム実行処理で実現される。また、そのコンピュータに含まれる記憶部は、ＣＰＵの指令により、そのコンピュータの機能を実現するためのさまざまなプログラムを記憶する。これによりソフトウェアとハードウェアの協働が実現される。前記プログラムは、記録媒体に記録したり、ネットワークを経由したりすることで提供可能となる。出力部は、画面表示をする表示部の機能を含めてもよい。 ≪First embodiment≫
[composition]
The information processing apparatus 100 shown in FIG. 1 is a computer that detects signs of deterioration in a project in progress through wide-area monitoring. The information processing device 100 includes hardware such as an input section, an output section, a control section, and a storage section. For example, when the control section is composed of a CPU (Central Processing Unit), information processing by a computer including the control section is realized by program execution processing by the CPU. Further, a storage unit included in the computer stores various programs for realizing the functions of the computer according to instructions from the CPU. This allows software and hardware to work together. The program can be provided by being recorded on a recording medium or via a network. The output unit may include the function of a display unit that displays a screen.

プロジェクトとは、所定の目的を達成するための業務をいう。プロジェクトは、終了済の過去のプロジェクトと、仕掛かりのプロジェクトに分類できる。本実施形態では、過去のプロジェクトを「第１プロジェクト」と呼び、仕掛かりのプロジェクトを「第２プロジェクト」と呼ぶ。また、過去のプロジェクトと仕掛かりのプロジェクトを区別しない場合は、単に、「プロジェクト」と呼ぶ。 A project is a task to achieve a predetermined purpose. Projects can be classified into past projects that have been completed and projects that are still in progress. In this embodiment, a past project is called a "first project" and a project in progress is called a "second project." Furthermore, when past projects and in-progress projects are not distinguished, they are simply referred to as "projects."

図１に示すように、情報処理装置１００は、生成部１と、予測部２を備えている。また、情報処理装置１００は、第１プロジェクトＤＢ３と、第２プロジェクトＤＢ４を記憶している。 As shown in FIG. 1, the information processing device 100 includes a generation section 1 and a prediction section 2. The information processing device 100 also stores a first project DB3 and a second project DB4.

生成部１は、機械学習で第２プロジェクトの実推原価を予測するための予測モデルを生成する。実推原価は、第２プロジェクト終了時までに発生する原価の推定値である。なお、原価には、製造原価や売上原価など複数種類存在するが、本実施形態では、原価とは、特定の目的を達成するために消費される経済的資源を貨幣で測定したものとし、製造原価や売上原価などを含む語として説明する。
予測部２は、生成部１が生成した予測モデルを用いて、対象の第２プロジェクトの実推原価を予測する。
第１プロジェクトＤＢ３は、第１プロジェクトの監視情報を第１プロジェクトごとに記憶するデータベースである。
第２プロジェクトＤＢ４は、第２プロジェクトの監視情報を第２プロジェクトごとに記憶するデータベースである。 The generation unit 1 generates a prediction model for predicting the actual estimated cost of the second project using machine learning. The actual estimated cost is an estimate of the cost that will be incurred by the end of the second project. Note that there are multiple types of costs, such as manufacturing costs and sales costs, but in this embodiment, cost refers to the economic resources consumed to achieve a specific purpose, measured in monetary terms, and manufacturing costs. It is explained as a term that includes cost of goods and cost of goods sold.
The prediction unit 2 uses the prediction model generated by the generation unit 1 to predict the actual estimated cost of the second project.
The first project DB3 is a database that stores monitoring information of the first project for each first project.
The second project DB4 is a database that stores monitoring information of the second project for each second project.

＜プロジェクトの監視情報＞
プロジェクトの監視情報は、プロジェクトの状況を監視するための情報である。第１プロジェクトの監視情報は、例えば、作番情報と、説明変数と、目的変数とから構成できる。 <Project monitoring information>
Project monitoring information is information for monitoring the status of a project. The monitoring information for the first project can be composed of, for example, production number information, explanatory variables, and objective variables.

作番情報は、第１プロジェクトを識別する情報である。例えば、作番情報は、作番と、進捗率（％）と、作番名を含むが、これらに限定されない。
作番は、第１プロジェクトの識別子であり、例えば、文字数字列で表現できる。
進捗率は、第１プロジェクトの進捗を定量的に示すパラメータである。
作番名は、第１プロジェクトの名称であり、例えば、観念可能な言葉で表現できる。 The production number information is information that identifies the first project. For example, the production number information includes, but is not limited to, the production number, the progress rate (%), and the production number name.
The production number is an identifier of the first project, and can be expressed, for example, as a string of alphanumeric characters.
The progress rate is a parameter that quantitatively indicates the progress of the first project.
The production number name is the name of the first project, and can be expressed, for example, in conceivable words.

説明変数は、第１プロジェクトの状態を表現する変数である。説明変数は複数種類存在する。説明変数は、例えば、作業開始時期、作業終了時期、実績原価、見通し原価、見積原価、担当者、作業時間、実績工数を含むがこれらに限定されない。例えば、Borutaを用いて説明変数候補を抽出し、その中から最適な説明変数を選択することができるが、説明変数の選択方法はこれに限定されない。
作業開始時期は、第１プロジェクトの開始時期（年月日）である。
作業終了時期は、第１プロジェクトの終了時期（年月日）である。
実績原価は、作業開始時期から所定時期までの間に発生した原価である。
見通し原価は、所定時期から作業終了時期までに発生することが見込まれる原価である。
見積原価は、作業開始時期から作業終了時期までに発生することが見込まれる原価である。
担当者は、第１プロジェクトを担当した者（複数可）である。
作業時間は、作業開始時期から所定時期までの間に、各担当者が第１プロジェクトの作業に費やした時間である。
実績工数は、第１プロジェクトを構成する全工数のうち、作業開始時期から所定時期までの間に完了した工数である。
なお、所定時期は、作業開始時期と作業終了時期までの間の任意の時期である。 The explanatory variable is a variable that expresses the state of the first project. There are multiple types of explanatory variables. Examples of explanatory variables include, but are not limited to, work start time, work end time, actual cost, estimated cost, estimated cost, person in charge, work time, and actual man-hours. For example, explanatory variable candidates can be extracted using Boruta, and the optimal explanatory variables can be selected from among them, but the method for selecting explanatory variables is not limited to this.
The work start time is the start time (year, month, and day) of the first project.
The work end time is the end time (year, month, and day) of the first project.
Actual costs are costs incurred from the start of work to a predetermined time.
Forecast costs are costs that are expected to be incurred from a predetermined period until the end of the work.
Estimated costs are costs that are expected to be incurred from the start of work to the end of work.
The person in charge is the person(s) in charge of the first project.
The work time is the time that each person in charge spent working on the first project between the work start time and a predetermined time.
The actual number of man-hours is the number of man-hours completed between the work start time and a predetermined time, out of all the man-hours constituting the first project.
Note that the predetermined time is any time between the work start time and the work end time.

目的変数は、説明変数に依存する変数である。目的変数は、例えば、第１プロジェクト終了時での実績原価となる、最終実績原価とすることができるが、これに限定されない。 The objective variable is a variable that depends on the explanatory variable. The target variable can be, for example, the final actual cost, which is the actual cost at the end of the first project, but is not limited thereto.

（進捗率のこと）
進捗率は、例えば、時期的基準で算出できる。例えば、プロジェクトの作業開始時期から作業終了時期までの期間が30日間であり、対象時期が作業開始時期から15日目であった場合、進捗率は50％となる。第１プロジェクトは終了した過去のプロジェクトであるため、現時点での進捗率は100％である。ここで、第１プロジェクトの説明変数は、進捗率に応じて変化する値とすることができる。第１プロジェクトの監視情報は、進捗率ごとの説明変数の集合として構成できる。 (progress rate)
The progress rate can be calculated on a temporal basis, for example. For example, if the period from the start of work to the end of a project is 30 days, and the target period is the 15th day from the start of work, the progress rate will be 50%. Since the first project is a completed past project, the current progress rate is 100%. Here, the explanatory variable of the first project can be a value that changes depending on the progress rate. The monitoring information for the first project can be configured as a set of explanatory variables for each progress rate.

例えば、説明変数としての実績原価は、進捗率0％～100％のあらゆる値での実績原価の集合となる。進捗率Ｘ％の実績原価は、作業開始時期から進捗率Ｘ％相当の時期までの間に発生した原価となる。また、説明変数としての見通し原価は、進捗率0％～100％のあらゆる値での見通し原価の集合となる。進捗率Ｘ％の見通し原価は、進捗率Ｘ％相当の時期から作業終了時期までの間に発生することが見込まれる原価となる。なお、作業開始時期や作業終了時期などのように、進捗率に応じて変化しない説明変数も存在するが、そのような説明変数は、進捗率に応じて同じ値をとる定数として扱うことが好ましい。 For example, the actual cost as an explanatory variable is a set of actual costs at all values from 0% to 100% progress rate. Actual costs at a progress rate of X% are costs incurred from the start of work to the time corresponding to a progress rate of X%. In addition, the projected cost as an explanatory variable is a set of projected costs at all values from 0% to 100% progress rate. The projected cost at a progress rate of X% is the cost that is expected to be incurred between the time corresponding to the progress rate of X% and the time at which the work is completed. Note that there are explanatory variables that do not change depending on the progress rate, such as work start time and work end time, but it is preferable to treat such explanatory variables as constants that take the same value depending on the progress rate. .

一方、第２プロジェクトの監視情報は、例えば、作番情報と、説明変数とから構成できる。 On the other hand, the monitoring information for the second project can be composed of, for example, production number information and explanatory variables.

作番情報は、第２プロジェクトを識別する情報である。例えば、作番情報は、作番と、進捗率（％）と、作番名を含むが、これらに限定されない。
作番は、第２プロジェクトの識別子であり、例えば、文字数字列で表現できる。
進捗率は、第２プロジェクトの進捗を定量的に示すパラメータである。
作番名は、第２プロジェクトの名称であり、例えば、観念可能な言葉で表現できる。 The production number information is information that identifies the second project. For example, the production number information includes, but is not limited to, the production number, the progress rate (%), and the production number name.
The production number is an identifier of the second project, and can be expressed, for example, as a string of letters and numbers.
The progress rate is a parameter that quantitatively indicates the progress of the second project.
The production number name is the name of the second project, and can be expressed, for example, in conceivable words.

説明変数は、第２プロジェクトの状態を表現する変数である。説明変数は複数種類存在する。説明変数は、例えば、作業開始時期、作業終了時期、実績原価、見通し原価、見積原価、担当者、作業時間、実績工数を含むがこれらに限定されない。例えば、Borutaを用いて説明変数候補を抽出し、その中から最適な説明変数を選択することができるが、説明変数の選択方法はこれに限定されない。
作業開始時期は、第２プロジェクトの開始時期（年月日）である。
作業終了時期は、第２プロジェクトの終了時期（年月日）である。
実績原価は、作業開始時期から所定時期までの間に発生した原価である。
見通し原価は、所定時期から作業終了時期までに発生することが見込まれる原価である。
見積原価は、作業開始時期から作業終了時期までに発生することが見込まれる原価である。
担当者は、第２プロジェクトを担当している者（複数可）である。
作業時間は、作業開始時期から所定時期までの間に、各担当者が第２プロジェクトの作業に費やした時間である。
実績工数は、第２プロジェクトを構成する全工数のうち、作業開始時期から所定時期までの間に完了した工数である。
なお、所定時期は、作業開始時期と作業終了時期までの間の現在である。
また、すでに説明した、第２プロジェクトの実推原価は、第２プロジェクトの実績原価と見通し原価との和である。 The explanatory variable is a variable that expresses the state of the second project. There are multiple types of explanatory variables. Examples of explanatory variables include, but are not limited to, work start time, work end time, actual cost, estimated cost, estimated cost, person in charge, work time, and actual man-hours. For example, explanatory variable candidates can be extracted using Boruta, and the optimal explanatory variables can be selected from among them, but the method for selecting explanatory variables is not limited to this.
The work start time is the start time (year, month, and day) of the second project.
The work end time is the end time (year, month, and day) of the second project.
Actual costs are costs incurred from the start of work to a predetermined time.
Forecast costs are costs that are expected to be incurred from a predetermined period until the end of the work.
Estimated costs are costs that are expected to be incurred from the start of work to the end of work.
The person in charge is the person(s) in charge of the second project.
The work time is the time that each person in charge spent working on the second project from the start of work to a predetermined time.
The actual number of man-hours is the number of man-hours completed between the work start time and a predetermined time, out of all the man-hours constituting the second project.
Note that the predetermined time is the current time between the work start time and the work end time.
Moreover, the actual estimated cost of the second project, which has already been explained, is the sum of the actual cost and the estimated cost of the second project.

第２プロジェクトの進捗率は、作業開始時期から作業終了時期までの期間と、作業開始時期から現在までの期間の比として算出できる。現在に相当する進捗率Ｘ％の実績原価は、作業開始時期から現在までの間に発生した原価となる。また、現在に相当する進捗率Ｘ％の見通し原価は、現在から作業終了時期までの間に発生することが見込まれる原価となる。なお、作業開始時期や作業終了時期などのように、進捗率に応じて変化しない説明変数も存在するが、そのような説明変数は、進捗率に応じて同じ値をとる定数として扱うことが好ましい。 The progress rate of the second project can be calculated as the ratio of the period from the work start time to the work end time and the period from the work start time to the present. Actual costs at a progress rate of X% corresponding to the current time are costs incurred from the start of work to the present. Further, the projected cost at a progress rate of X% corresponding to the current cost is the cost that is expected to be incurred between the current time and the end of the work. Note that there are explanatory variables that do not change depending on the progress rate, such as work start time and work end time, but it is preferable to treat such explanatory variables as constants that take the same value depending on the progress rate. .

＜予測モデル＞
（訓練）
生成部１は、例えば、ランダムフォレストに用いる複数の決定木を組み合わせて予測モデルを生成できる。ランダムフォレストは、機械学習アルゴリズムであり、複数の決定木を利用し、多数決をとって予測するアンサンブル学習アルゴリズムである。決定木は、例えば、説明変数を用いた判定条件を組み合わせたツリー状のロジックとして構成できる。判定条件は、適宜設計でき、例えば、担当者の１日の作業時間の平均が５時間以上か否か、などとすることができる。 <Prediction model>
(training)
For example, the generation unit 1 can generate a prediction model by combining a plurality of decision trees used in a random forest. Random forest is a machine learning algorithm, and is an ensemble learning algorithm that uses multiple decision trees and makes predictions by taking majority vote. The decision tree can be configured, for example, as a tree-like logic that combines judgment conditions using explanatory variables. The determination condition can be designed as appropriate, and may be, for example, whether the average daily working time of the person in charge is 5 hours or more.

生成部１は、第１プロジェクトの監視情報を訓練データとして用いて、予測モデルを訓練できる。例えば、第１プロジェクトごとに、第１プロジェクトの監視情報のうち進捗率50％相当の時期での説明変数を予測モデルの入力とすることができる。また、第１プロジェクトの監視情報の目的変数に基づく値を予測モデルの出力とすることができる。例えば、第１プロジェクトの最終実績原価から見積原価を引いた見積原価超過の値を予測モデルの出力とすることができる。生成部１は、所定数の第１プロジェクトの監視情報を用いて予測モデルを訓練する。 The generation unit 1 can train a prediction model using the monitoring information of the first project as training data. For example, for each first project, explanatory variables at a time when the progress rate corresponds to 50% among the monitoring information of the first project can be input to the prediction model. Further, a value based on the objective variable of the monitoring information of the first project can be used as the output of the prediction model. For example, the estimated cost excess value obtained by subtracting the estimated cost from the final actual cost of the first project can be used as the output of the prediction model. The generation unit 1 trains a prediction model using monitoring information of a predetermined number of first projects.

ここで、生成部１は、予測モデルを複数用意し、予測モデルの出力を複数段階の出力とすることができる。例えば、生成部１は、第１予測モデルと第２予測モデルを用意する。第１予測モデルの入力は、すべての第１プロジェクトの監視情報を対象にし、当該監視情報のうち進捗率50％相当の時期での説明変数とすることができる。また、第１予測モデルの出力は、第１プロジェクトの最終実績原価から見積原価を引いた見積原価超過があった（０Ｍ（0円）より大きい）か否かとすることができる。次に、第２予測モデルの入力は、第１予測モデルの出力で見積原価超過があった第１プロジェクトの監視情報を対象にし、当該監視情報のうち進捗率50％相当の時期での説明変数とすることができる。また、第２予測モデルの出力は、見積原価超過が１Ｍ（100万円）以上であるか否かとすることができる。結果的に、予測モデルの出力を、見積原価超過が０Ｍ以下、０Ｍより大きいかつ１Ｍ未満、１Ｍ以上、の３値に分類できる。 Here, the generation unit 1 can prepare a plurality of predictive models and output the predictive models in multiple stages. For example, the generation unit 1 prepares a first prediction model and a second prediction model. The input of the first prediction model can be the monitoring information of all the first projects, and the explanatory variables at the time when the progress rate of the monitoring information is equivalent to 50% can be used. Further, the output of the first prediction model can be determined as whether or not there is an estimated cost excess (greater than 0M (0 yen)) obtained by subtracting the estimated cost from the final actual cost of the first project. Next, the input of the second prediction model is the monitoring information of the first project in which the estimated cost exceeded in the output of the first prediction model, and the explanatory variables of the monitoring information at the time when the progress rate is 50%. It can be done. Further, the output of the second prediction model can be whether the estimated cost exceeds 1 million yen (1 million yen) or more. As a result, the output of the prediction model can be classified into three values: estimated cost exceedance of 0M or less, greater than 0M and less than 1M, and 1M or more.

（予測）
予測部２は、訓練済みの予測モデルを用いて、予測対象の第２プロジェクトの実推原価を予測する。例えば、予測部２は、第２プロジェクトの監視情報のうち、現在、つまり所定の進捗率（50％以上が好ましいが、50％未満でもよい）相当の時期での説明変数を予測モデルに入力する。すると、予測部２は、実推原価に基づく値を予測モデルの出力として取得できる。例えば、予測部２は、実推原価から見積原価を引いた見積原価推定超過の値を取得できる。 (prediction)
The prediction unit 2 uses the trained prediction model to predict the actual estimated cost of the second project to be predicted. For example, the prediction unit 2 inputs explanatory variables from the monitoring information of the second project at the current time, that is, at a time corresponding to a predetermined progress rate (preferably 50% or more, but may be less than 50%) into the prediction model. . Then, the prediction unit 2 can obtain a value based on the actual estimated cost as the output of the prediction model. For example, the prediction unit 2 can obtain the value of estimated cost estimate excess, which is obtained by subtracting the estimated cost from the actual estimated cost.

予測モデルが、上記した第１予測モデル、第２予測モデルである場合、予測部２は、予測対象の第２プロジェクトの監視情報のうち、現在での説明変数を第１予測モデルに入力する。すると、予測部２は、第１予測モデルの出力として、見積原価推定超過があった（０Ｍ（0円）より大きい）か否かを示す値を取得できる。見積原価推定超過があった場合、予測部２は、当該第２プロジェクトの監視情報のうち、現在での説明変数を第２予測モデルに入力する。すると、予測部２は、第２予測モデルの出力として、見積原価推定超過が１Ｍ（100万円）以上であるか否かを示す値を取得できる。結果的に、見積原価推定超過を０Ｍ以下、０Ｍより大きいかつ１Ｍ未満、１Ｍ以上、の３値に分類できる。
なお、１Ｍは例示であり、１Ｍより大きい値でもよいし、１Ｍより小さい値でもよい。 When the prediction model is the above-described first prediction model or second prediction model, the prediction unit 2 inputs the current explanatory variables from the monitoring information of the second project to be predicted into the first prediction model. Then, the prediction unit 2 can obtain, as an output of the first prediction model, a value indicating whether there is an excess of estimated cost (greater than 0M (0 yen)). If the estimated cost exceeds the estimated cost, the prediction unit 2 inputs the current explanatory variables from the monitoring information of the second project into the second prediction model. Then, the prediction unit 2 can obtain, as an output of the second prediction model, a value indicating whether the estimated cost estimate exceeds 1M (one million yen) or more. As a result, the estimated cost estimate excess can be classified into three values: 0M or less, greater than 0M and less than 1M, and 1M or more.
Note that 1M is an example and may be a value larger than 1M or a value smaller than 1M.

予測部２は、予測モデルの出力を含む情報を出力できる。例えば、情報処理装置１００の表示部は、図２に示すような予測部２の出力情報を画面表示できる。図２に示すように、予測部２の出力情報は、「予測結果」と、「作番情報」と、「予測結果の説明変数と特徴」を列とし、第２プロジェクトを行とする表形式とすることができる。 The prediction unit 2 can output information including the output of the prediction model. For example, the display unit of the information processing device 100 can display the output information of the prediction unit 2 as shown in FIG. 2 on the screen. As shown in Figure 2, the output information of the prediction unit 2 is in a tabular format with "prediction results", "crop number information", and "explanatory variables and characteristics of the prediction results" as columns and the second project as rows. It can be done.

「予測結果」は、予測モデルの出力内容を示す。「予測結果」は、「項番」と、「予測値」と、「確信度」から構成される。
「項番」は、第２プロジェクトごとに付される行番号である。
「予測値」は、予測モデルがすでに説明した第１予測モデルと第２予測モデルの組み合わせであるときの、見積原価推定超過の３値分類に従う結果である。「１：100万円以上超過」は、１Ｍ以上に対応する。「２：100万円未満超過」は、０Ｍより大きいかつ１Ｍ未満に対応する。「３：問題なし」は、０Ｍ以下に対応する。
「確信度」は、予測の信頼度であり0%～100%で示される。例えば、確信度は、バギングを用いて求めることができるが、これに限定されない。 “Prediction result” indicates the output content of the prediction model. “Prediction result” is composed of “item number”, “prediction value”, and “confidence level”.
“Item number” is a line number assigned to each second project.
The "predicted value" is the result according to the three-value classification of estimated cost estimation excess when the predictive model is a combination of the first predictive model and the second predictive model described above. "1: Exceeding 1 million yen or more" corresponds to 1 million yen or more. “2: Exceeding less than 1 million yen” corresponds to a value greater than 0M and less than 1M. “3: No problem” corresponds to 0M or less.
"Confidence" is the reliability of prediction and is expressed from 0% to 100%. For example, the confidence level can be determined using bagging, but is not limited thereto.

「作番情報」は、第２プロジェクトの監視情報の作番情報と同じである。
「予測結果の説明変数と特徴」は、「予測結果」に寄与する説明変数を示す。「予測結果の説明変数と特徴」は、「説明変数一覧」と、「特徴ランキング」から構成される。
「説明変数一覧」は、第２プロジェクトの監視情報の説明変数と同じである。
「特徴ランキング」は、「予測結果」に寄与する説明変数の順位を示す。順位が高いほど、その説明変数の予測値の出力に対する寄与率が大きい。例えば、各変数の寄与率は、SHAP（Shapley Additive exPlanations）アルゴリズムを用いて求めることができるが、これに限定されない。 The "production number information" is the same as the production number information in the monitoring information of the second project.
"Explanatory variables and features of prediction results" indicate explanatory variables that contribute to the "prediction results." “Explanatory variables and features of prediction results” consists of “list of explanatory variables” and “feature ranking”.
The "list of explanatory variables" is the same as the explanatory variables of the monitoring information of the second project.
"Feature ranking" indicates the ranking of explanatory variables that contribute to the "prediction result." The higher the rank, the greater the contribution of that explanatory variable to the predicted value output. For example, the contribution rate of each variable can be determined using the SHAP (Shapley Additive exPlanations) algorithm, but is not limited thereto.

第１実施形態によれば、見積原価推定超過が１Ｍ以上となる第２プロジェクトを抽出できる。よって、広域監視にて、プロジェクトの悪化予兆を早期に検知することができる。 According to the first embodiment, it is possible to extract a second project in which the estimated cost exceeds 1M or more. Therefore, signs of deterioration in the project can be detected early through wide-area monitoring.

（重点監視の対象にするか否かの判断）
図２の出力情報を知得した管理部署は、見積原価推定超過が１Ｍ以上となる第２プロジェクトを重点監視の対象とするか否かを判断する。従来では、プロジェクトの悪化予兆をＡＩで検知したとしても、ＡＩの予測結果の根拠はブラックボックス化されていた。このため、管理部署は、ＡＩの予測結果に対して、プロジェクトの悪化予兆の要因を追跡することが容易でなく、重点監視の対象とするか否かの判断に多大な人的コストを要していた。 (Decision on whether or not to be subject to intensive monitoring)
The management department that has learned the output information in FIG. 2 determines whether or not to target the second project in which the estimated cost exceeds 1M or more to be subject to priority monitoring. Previously, even if AI detected signs of deterioration in a project, the basis for the AI's prediction results was a black box. For this reason, it is difficult for management departments to track down factors that indicate signs of project deterioration based on AI prediction results, and it takes a great deal of human cost to decide whether or not to target them for priority monitoring. was.

図２の「特徴ランキング」は、見積原価推定超過が１Ｍ以上になった根拠を提示しているといえる。管理部署は、「特徴ランキング」を参照し、見積原価推定超過が１Ｍ以上という予測に大きく寄与する説明変数を容易に特定できる。その結果、該当の第２プロジェクトを重点監視の対象にするか否かの判断が容易になり、重点監視の対象にするか否かの判断の人的コストを低減することができる。 It can be said that the "feature ranking" in FIG. 2 presents the basis for why the estimated cost exceeds 1M or more. The management department can easily identify explanatory variables that greatly contribute to the prediction that the estimated cost exceeds 1M or more by referring to the "feature ranking." As a result, it becomes easier to determine whether or not to target the second project for priority monitoring, and it is possible to reduce the human cost for determining whether or not to target the second project for priority monitoring.

［処理］
情報処理装置１００が実行する処理は、図３に示すとおりである。つまり、まず、生成部１が予測モデルを生成する（ステップＳ１）。次に、生成部１が、所定の進捗率における第１プロジェクトの監視情報を用いて、予測モデルを訓練する（ステップＳ２）。次に、予測部２が、予測モデルを用いて、対象の第２プロジェクトの見積原価推定超過の予測値と、予測値に寄与する説明変数を予測根拠として出力する（ステップＳ３）。管理部署は、予測根拠から、悪化予兆を示す第２プロジェクトを重点監視の対象とするか否かを判断する。 [process]
The processing executed by the information processing apparatus 100 is as shown in FIG. That is, first, the generation unit 1 generates a prediction model (step S1). Next, the generation unit 1 trains a prediction model using the monitoring information of the first project at a predetermined progress rate (step S2). Next, the prediction unit 2 uses the prediction model to output a predicted value of the estimated cost estimate excess of the target second project and an explanatory variable that contributes to the predicted value as a basis for prediction (step S3). The management department determines whether or not to target the second project, which shows signs of deterioration, for priority monitoring based on the basis of the prediction.

≪第２実施形態≫
第２実施形態の説明の際、第１実施形態との相違する点について説明し、重複する点は説明を省略する。第１実施形態では、訓練データとなる第１プロジェクトの監視情報の説明変数は進捗率50％相当の時期での説明変数であった。第２実施形態では、訓練データに用いる第１プロジェクトの説明変数の時期を定期化する。 ≪Second embodiment≫
When describing the second embodiment, points that are different from the first embodiment will be explained, and explanations of overlapping points will be omitted. In the first embodiment, the explanatory variables of the monitoring information of the first project serving as training data are explanatory variables at a time when the progress rate is equivalent to 50%. In the second embodiment, the timing of explanatory variables of the first project used for training data is regularized.

例えば、第１プロジェクトの期間、つまり、作業開始時期から作業終了時期までの期間がおよそ数カ月に及ぶ場合、訓練データに用いる説明変数の時期、つまり、訓練日（学習日）を毎月25日に設定する。
よって、作番Ａの第１プロジェクトの作業開始時期が4/15であり、作業終了時期が6/30である場合、作番Ａの第１プロジェクトの訓練日は、4/25と5/25となる。つまり、作番Ａの第１プロジェクトの監視情報のうち4/25での説明変数（4/25相当の進捗率での説明変数）と、5/25での説明変数（5/25相当の進捗率での説明変数）の計２回分を予測モデルの入力とする。
また、作番Ｂの第１プロジェクトの作業開始時期が5/1であり、作業終了時期が9/15である場合、作番Ｂの第１プロジェクトの訓練日は、5/25,6/25，7/25，8/25となる。つまり、作番Ｂの第１プロジェクトの監視情報のうち5/25での説明変数（5/25相当の進捗率での説明変数）と、6/25での説明変数（6/25相当の進捗率での説明変数）と、7/25での説明変数（7/25相当の進捗率での説明変数）と、8/25での説明変数（8/25相当の進捗率での説明変数）の計４回分を予測モデルの入力とする。
また、作番Ｃの第１プロジェクトの作業開始時期が6/1であり、作業終了時期が7/10である場合、作番Ｃの第１プロジェクトの訓練日は、6/25となる。つまり、作番Ｃの第１プロジェクトの監視情報のうち6/25での説明変数（6/25相当の進捗率での説明変数）の計１回分を予測モデルの入力とする。 For example, if the period of the first project, that is, the period from the start of work to the end of work, is approximately several months, the time of the explanatory variable used for training data, that is, the training date (learning day), is set to the 25th of every month. do.
Therefore, if the work start time of the first project of crop number A is 4/15 and the work end time is 6/30, the training dates of the first project of crop number A are 4/25 and 5/25. becomes. In other words, among the monitoring information of the first project of production number A, the explanatory variable at 4/25 (an explanatory variable at a progress rate equivalent to 4/25) and the explanatory variable at 5/25 (an explanatory variable at a progress rate equivalent to 5/25) A total of two explanatory variables (explanatory variables) are input to the prediction model.
In addition, if the work start time of the first project of crop number B is 5/1 and the work end time is 9/15, the training dates of the first project of crop number B are 5/25 and 6/25. , 7/25, 8/25. In other words, among the monitoring information of the first project of production number B, the explanatory variable at 5/25 (an explanatory variable at a progress rate equivalent to 5/25) and the explanatory variable at 6/25 (an explanatory variable at a progress rate equivalent to 6/25) explanatory variable at 7/25 (explanatory variable at progress rate equivalent to 7/25), explanatory variable at 8/25 (explanatory variable at progress rate equivalent to 8/25) A total of four times are used as input for the prediction model.
Further, if the work start time of the first project of production number C is 6/1 and the work end time is July 10, the training date of the first project of production number C is 6/25. That is, a total of one explanatory variable at 6/25 (an explanatory variable at a progress rate equivalent to 6/25) of the monitoring information for the first project of production number C is input to the prediction model.

結果的に、第１プロジェクトの大部分に対して、複数種類の進捗率での説明変数が予測モデルに入力される。このようにして訓練された予測モデルを用いて、予測部２が第２プロジェクトの実推原価を予測する。この場合、第２プロジェクトの進捗率が低進捗率（例えば、30％程度）であり、低進捗率相当の時期での説明変数を予測モデルに入力したとしても、予測部２が出力した予測値の確信度（図２参照）が十分に高いことが確認された。 As a result, for most of the first project, explanatory variables with multiple types of progress rates are input into the prediction model. Using the prediction model trained in this way, the prediction unit 2 predicts the actual estimated cost of the second project. In this case, even if the progress rate of the second project is a low progress rate (for example, about 30%) and explanatory variables at a time corresponding to the low progress rate are input into the prediction model, the predicted value output by the prediction unit 2 It was confirmed that the confidence level (see Figure 2) was sufficiently high.

第２実施形態によれば、訓練日を定期化し、同じ第１プロジェクトについて複数種類の進捗率での説明変数を予測モデルに入力することができる。これにより、第２プロジェクトの実推原価の予測を早期化できる。
また、訓練日を定期化することで、すべての第１プロジェクトを対象にした予測モデルへの入力を体系化でき、訓練に要する処理を簡易にできる。 According to the second embodiment, training days can be regularized, and explanatory variables at multiple types of progress rates can be input into the prediction model for the same first project. As a result, the actual estimated cost of the second project can be predicted earlier.
Furthermore, by regularizing the training dates, input to the prediction model for all the first projects can be systematized, and the processing required for training can be simplified.

［変形例］
（ａ）：第１、第２実施形態では、進捗率をプロジェクトの期間を用いた時期的基準で算出した。しかし、例えば、進徳率は、プロジェクトで取り組む作業の達成度から算出してもよい。
（ｂ）：第１実施形態では、第１プロジェクトごとに、第１プロジェクトの監視情報のうち進捗率50％相当の時期での説明変数を予測モデルの入力とした。しかし、例えば、第１プロジェクトごとに、50％以外の任意の同じ進捗率相当の時期での説明変数を予測モデルの入力としてもよい。また、第１プロジェクトごとに異なる進捗率相当の時期での説明変数を予測モデルに入力してもよい。
（ｃ）：第２実施形態では、訓練日を定期化することで、同じ第１プロジェクトに対して、複数種類の進捗率を実質的に選択し、選択した進捗率での説明変数を予測モデルに入力した。しかし、例えば、情報処理装置１００のユーザが入力部を操作して、同じ第１プロジェクトに対して、任意の進捗率を複数種類選択し、選択した進捗率での説明変数を予測モデルに入力してもよい。 [Modified example]
(a): In the first and second embodiments, the progress rate was calculated on a timing basis using the project period. However, for example, the progress rate may be calculated from the degree of accomplishment of the work undertaken in the project.
(b): In the first embodiment, for each first project, explanatory variables at a time corresponding to a progress rate of 50% among the monitoring information of the first project are input to the prediction model. However, for example, for each first project, explanatory variables at any period corresponding to the same progress rate other than 50% may be input to the prediction model. Furthermore, explanatory variables at times corresponding to different progress rates for each first project may be input into the prediction model.
(c): In the second embodiment, by regularizing the training days, multiple types of progress rates can be effectively selected for the same first project, and the explanatory variables at the selected progress rates can be used as predictive models. entered. However, for example, the user of the information processing device 100 operates the input unit to select multiple types of arbitrary progress rates for the same first project, and inputs explanatory variables at the selected progress rates into the prediction model. It's okay.

（ｄ）：本実施形態で説明した種々の技術を適宜組み合わせた技術を実現することもできる。
（ｅ）：本実施形態で説明したソフトウェアをハードウェアとして実現することもでき、ハードウェアをソフトウェアとして実現することもできる。
（ｆ）：その他、ハードウェア、ソフトウェア、フローチャートなどについて、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 (d): It is also possible to realize a technique that combines the various techniques described in this embodiment as appropriate.
(e): The software described in this embodiment can be implemented as hardware, and the hardware can also be implemented as software.
(f): Other changes may be made as appropriate to the hardware, software, flowcharts, etc. without departing from the spirit of the present invention.

１００情報処理装置
１生成部
２予測部
３第１プロジェクトＤＢ
４第２プロジェクトＤＢ 100 Information processing device 1 Generation unit 2 Prediction unit 3 First project DB
4 2nd project DB

Claims

a generation unit that generates a predictive model trained using the monitoring information of the completed first project;
Input the explanatory variables of the second project in progress into the prediction model , use multiple decision trees that combine judgment conditions using the explanatory variables, and take a majority vote to determine whether the estimated cost of the second project exceeds the estimated cost. An information processing unit that outputs a predicted value and also outputs an explanatory variable that is the basis of the predicted value according to the contribution rate of the explanatory variable to the predicted value, which is determined using the SHAP (Shapley Additive exPlanations) algorithm. Device.

The prediction model inputs a first prediction model that outputs whether or not there is an excess of the estimated cost, and an explanatory variable of the second project for which the first prediction model has an excess of the estimated cost, and The information processing device according to claim 1, which is combined with a second prediction model that outputs whether the estimated cost estimate exceeds a predetermined value or not.

3. The information processing apparatus according to claim 1, wherein the monitoring information of the first project used for training the prediction model is monitoring information of the first project at a predetermined progress rate.

The information processing device according to claim 3, wherein a plurality of said predetermined progress rates are set.

5. The information processing apparatus according to claim 4, wherein when the predetermined progress rate is determined based on a timing standard with respect to the period of the first project, the time corresponding to the predetermined progress rate is regularized.

The information processing device
a step of generating a predictive model trained with the monitoring information of the completed first project;
Input the explanatory variables of the second project in progress into the prediction model , use multiple decision trees that combine judgment conditions using the explanatory variables, and take a majority vote to determine whether the estimated cost of the second project exceeds the estimated cost. An information processing method comprising: outputting a predicted value; and outputting an explanatory variable that is a basis for the predicted value according to a contribution rate of the explanatory variable to the predicted value, which is determined using the SHAP algorithm. .