JP6881203B2

JP6881203B2 - Classification program, classification method, and classification device

Info

Publication number: JP6881203B2
Application number: JP2017193865A
Authority: JP
Inventors: 孝明浜名; 藤田　大輔; 大輔藤田; 尚小山内; 史穂北本; 孝之佐野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2017-10-03
Filing date: 2017-10-03
Publication date: 2021-06-02
Anticipated expiration: 2037-10-03
Also published as: JP2019067270A

Description

本発明は、分類プログラム、分類方法、および分類装置に関する。 The present invention relates to a classification program, a classification method, and a classification device.

従来、学習データに基づいて、対象データを分類する条件を表すノードと対象データを分類する属性を表す葉ノードとを含む決定木モデルを生成し、生成した決定木モデルにより対象データを分類する技術がある。例えば、決定木モデルは、社会保障給付費の請求データを、正常な請求データまたは不正請求に関する異常な請求データに分類する際に利用される。 Conventionally, a technique of generating a decision tree model including a node representing a condition for classifying target data and a leaf node representing an attribute for classifying target data based on training data, and classifying the target data by the generated decision tree model. There is. For example, the decision tree model is used to classify social security benefit billing data into normal billing data or anomalous billing data for fraudulent billing.

国際公開第２０１６／１８９６０６号International Publication No. 2016/189606 特開２０１７−６２７１３号公報Japanese Unexamined Patent Publication No. 2017-62713 特表２００１−５１６１０７号公報Special Table 2001-516107

しかしながら、従来技術では、利用者は対象データが正常または異常に分類された理由を把握することが難しい。例えば、対象データを分類する過程において判定された条件の判定結果を羅列して利用者に通知することが考えられるが、判定された条件の数が増加するほど、利用者は、対象データが正常または異常に分類された理由を直観的に把握することが難しくなる。 However, in the prior art, it is difficult for the user to grasp the reason why the target data is classified as normal or abnormal. For example, it is conceivable to enumerate the judgment results of the conditions judged in the process of classifying the target data and notify the user, but as the number of the judged conditions increases, the user normalizes the target data. Or, it becomes difficult to intuitively understand the reason for the abnormal classification.

１つの側面では、本発明は、利用者が、対象データが正常または異常に分類された理由を把握するための情報を生成する分類プログラム、分類方法、および分類装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a classification program, a classification method, and a classification device that generate information for a user to understand why the target data is classified as normal or abnormal. ..

１つの実施態様によれば、対象データを分類する条件を表すノードと、前記対象データが正常であること、または、前記対象データが異常であることを表す葉ノードとを含む決定木モデルにより、前記対象データを分類し、名称が未設定である第１の葉ノードに前記対象データが分類された場合、ノード間の位置関係に基づいて、前記決定木モデルのうち、名称が設定済みである第２の葉ノードを選択し、選択した前記第２の葉ノードの名称に基づいて、前記第１の葉ノードの名称を生成する分類プログラム、分類方法、および分類装置が提案される。 According to one embodiment, a decision tree model that includes a node that represents a condition for classifying the target data and a leaf node that represents that the target data is normal or that the target data is abnormal. When the target data is classified and the target data is classified into the first leaf node whose name has not been set, the name has already been set in the decision tree model based on the positional relationship between the nodes. A classification program, a classification method, and a classification device that select a second leaf node and generate a name for the first leaf node based on the name of the selected second leaf node are proposed.

一態様によれば、利用者が、対象データが正常または異常に分類された理由を把握するための情報を生成することが可能になる。 According to one aspect, the user can generate information for grasping the reason why the target data is classified as normal or abnormal.

図１は、実施の形態にかかる分類方法の一実施例を示す説明図である。FIG. 1 is an explanatory diagram showing an embodiment of a classification method according to an embodiment. 図２は、分類システム２００の一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of the classification system 200. 図３は、分類装置１００のハードウェア構成例を示すブロック図である。FIG. 3 is a block diagram showing a hardware configuration example of the classification device 100. 図４は、請求データ４００のデータ構造の記憶内容の一例を示す説明図である。FIG. 4 is an explanatory diagram showing an example of the stored contents of the data structure of the billing data 400. 図５は、請求テーブル５００の記憶内容の一例を示す説明図である。FIG. 5 is an explanatory diagram showing an example of the stored contents of the billing table 500. 図６は、端末装置２０１のハードウェア構成例を示すブロック図である。FIG. 6 is a block diagram showing a hardware configuration example of the terminal device 201. 図７は、分類装置１００の機能的構成例を示すブロック図である。FIG. 7 is a block diagram showing a functional configuration example of the classification device 100. 図８は、分類装置１００を利用する状況の具体例を示す説明図である。FIG. 8 is an explanatory diagram showing a specific example of a situation in which the classification device 100 is used. 図９は、決定木モデル８０５を生成する流れを示す説明図である。FIG. 9 is an explanatory diagram showing a flow for generating the decision tree model 805. 図１０は、葉ノードに名称を設定する流れを示す説明図（その１）である。FIG. 10 is an explanatory diagram (No. 1) showing a flow of setting a name for a leaf node. 図１１は、葉ノードに名称を設定する流れを示す説明図（その２）である。FIG. 11 is an explanatory diagram (No. 2) showing a flow of setting a name for a leaf node. 図１２は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その１）である。FIG. 12 is an explanatory diagram (No. 1) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１３は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その２）である。FIG. 13 is an explanatory diagram (No. 2) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１４は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その３）である。FIG. 14 is an explanatory diagram (No. 3) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１５は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その４）である。FIG. 15 is an explanatory diagram (No. 4) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１６は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その５）である。FIG. 16 is an explanatory diagram (No. 5) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１７は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その６）である。FIG. 17 is an explanatory diagram (No. 6) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１８は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その７）である。FIG. 18 is an explanatory diagram (No. 7) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図１９は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図（その８）である。FIG. 19 is an explanatory diagram (No. 8) showing an operation example 1 in which the classification device 100 sets a name for the leaf node. 図２０は、分類装置１００が葉ノードに名称を設定する動作例２を示す説明図（その１）である。FIG. 20 is an explanatory diagram (No. 1) showing an operation example 2 in which the classification device 100 sets a name for the leaf node. 図２１は、分類装置１００が葉ノードに名称を設定する動作例２を示す説明図（その２）である。FIG. 21 is an explanatory diagram (No. 2) showing an operation example 2 in which the classification device 100 sets a name for the leaf node. 図２２は、分類装置１００が葉ノードに名称を設定する動作例２を示す説明図（その３）である。FIG. 22 is an explanatory diagram (No. 3) showing an operation example 2 in which the classification device 100 sets a name for the leaf node. 図２３は、端末装置２０１における出力例１を示す説明図である。FIG. 23 is an explanatory diagram showing an output example 1 in the terminal device 201. 図２４は、端末装置２０１における出力例２を示す説明図である。FIG. 24 is an explanatory diagram showing an output example 2 in the terminal device 201. 図２５は、全体処理手順の一例を示すフローチャートである。FIG. 25 is a flowchart showing an example of the overall processing procedure. 図２６は、選択処理手順の一例を示すフローチャートである。FIG. 26 is a flowchart showing an example of the selection processing procedure. 図２７は、検索処理手順の一例を示すフローチャートである。FIG. 27 is a flowchart showing an example of the search processing procedure.

以下に、図面を参照して、本発明にかかる分類プログラム、分類方法、および分類装置の実施の形態を詳細に説明する。 Hereinafter, embodiments of a classification program, a classification method, and a classification device according to the present invention will be described in detail with reference to the drawings.

（実施の形態にかかる分類方法の一実施例）
図１は、実施の形態にかかる分類方法の一実施例を示す説明図である。分類装置１００は、決定木モデルに基づいて対象データを分類するコンピュータである。 (An example of a classification method according to an embodiment)
FIG. 1 is an explanatory diagram showing an embodiment of a classification method according to an embodiment. The classification device 100 is a computer that classifies target data based on a decision tree model.

決定木モデルは、対象データを分類する条件を表すノードと、対象データを分類する属性を表す葉ノードとを含むモデルである。葉ノードは、例えば、対象データが正常であること、または、対象データが異常であることを表す。 The decision tree model is a model including a node representing a condition for classifying the target data and a leaf node representing an attribute for classifying the target data. The leaf node represents, for example, that the target data is normal or that the target data is abnormal.

以下の図面では、条件を表すノードの左側の子ノードに、条件の判定結果がＴｒｕｅの場合に対応する子ノードを表示し、右側の子ノードに、条件の判定結果がＦａｌｓｅの場合に対応する子ノードを表示し、ＴｒｕｅとＦａｌｓｅとの表示を省略する場合がある。 In the following drawings, the child node on the left side of the node representing the condition displays the child node corresponding to the case where the condition judgment result is True, and the child node on the right side corresponds to the case where the condition judgment result is False. The child node may be displayed and the display of True and False may be omitted.

ここで、省庁や自治体などでは、社会保障給付費の請求書に対応する請求データを受け付け、異常な請求データがあるか否かをチェックすることがあり、請求データが膨大になるほど、省庁や自治体などの職員にかかる負担が増大してしまう。また、請求データが膨大になるほど、省庁や自治体などの職員は、異常な請求データを見落としやすくなってしまう。異常な請求データは、例えば、不正請求の請求データである。異常な請求データは、具体的には、水増し請求の請求データである。異常な請求データは、具体的には、架空請求の請求データである。 Here, ministries and local governments may accept billing data corresponding to social security benefit bills and check whether there is abnormal billing data. The larger the billing data, the more ministries and local governments The burden on the staff will increase. In addition, the larger the billing data, the easier it is for staff such as ministries and local governments to overlook abnormal billing data. The abnormal billing data is, for example, fraudulent billing billing data. The anomalous billing data is specifically billing data for inflated billing. The abnormal billing data is specifically billing data for fictitious billing.

このため、請求データを対象データとして、正常な請求データまたは異常な請求データに自動で分類することが望まれる。例えば、請求データを、正常な請求データまたは異常な請求データに分類するために、決定木モデルを利用することが望まれる。決定木モデルは、例えば、過去の請求データに基づいて生成される。決定木モデルを利用する具体的な状況については、例えば、図８を用いて後述する。 Therefore, it is desired to automatically classify the billing data into normal billing data or abnormal billing data as target data. For example, it is desirable to use a decision tree model to classify billing data into normal billing data or abnormal billing data. The decision tree model is generated, for example, based on historical billing data. A specific situation in which the decision tree model is used will be described later with reference to, for example, FIG.

また、病院や薬局などでは、カルテや処方箋、および、医療報酬の明細書などに対応する医療データを管理し、異常な医療データがあるか否かをチェックすることがあり、医療データが膨大になるほど、病院や薬局などの職員にかかる負担が増大してしまう。また、病院や薬局などの職員は、異常な医療データを見落としやすくなってしまう。異常な医療データは、具体的には、特定の病気の情報を含むが、特定の病気に対して必須な薬や検査の情報を含まない医療データである。異常な医療データは、具体的には、月に１回だけ申請可能な請求が月に２回ある医療データである。 In addition, hospitals and pharmacies sometimes manage medical data corresponding to medical records, prescriptions, medical fee statements, etc., and check whether there is abnormal medical data, resulting in a huge amount of medical data. Indeed, the burden on staff such as hospitals and pharmacies will increase. In addition, staff at hospitals and pharmacies are more likely to overlook abnormal medical data. Abnormal medical data is, specifically, medical data that includes information on a specific disease but does not include information on drugs or tests that are essential for a specific disease. The abnormal medical data is, specifically, medical data in which a request can be made only once a month and the request is made twice a month.

このため、医療データを対象データとして、正常な医療データまたは異常な医療データに自動で分類することが望まれる。例えば、医療データを、正常な医療データまたは異常な医療データに分類するために、決定木モデルを利用することが望まれる。決定木モデルは、例えば、過去の医療データに基づいて生成される。 Therefore, it is desired to automatically classify medical data into normal medical data or abnormal medical data as target data. For example, it is desirable to use a decision tree model to classify medical data into normal or abnormal medical data. The decision tree model is generated, for example, based on historical medical data.

しかしながら、決定木モデルを利用する利用者は、対象データが正常または異常に分類された結果の細かい種別を把握することが難しく、対象データが正常または異常に分類された理由を把握することが難しい。具体的には、対象データが請求データであり、対象データが不正請求に分類された場合、利用者は、対象データが分類された不正請求の種別を把握することが難しく、対象データが分類された理由を把握することが難しい。 However, it is difficult for a user who uses the decision tree model to grasp the detailed type of the result of classifying the target data as normal or abnormal, and it is difficult to grasp the reason why the target data is classified as normal or abnormal. .. Specifically, when the target data is billing data and the target data is classified as fraudulent billing, it is difficult for the user to grasp the type of fraudulent billing in which the target data is classified, and the target data is classified. It is difficult to understand the reason.

これに対し、対象データを分類する過程において判定された条件の判定結果を羅列して利用者に通知する場合が考えられる。しかしながら、この場合、判定された条件の数が増加するほど、利用者は、対象データが正常または異常に分類された理由を直観的に把握することが難しくなってしまう。 On the other hand, it is conceivable that the judgment results of the conditions judged in the process of classifying the target data are listed and notified to the user. However, in this case, as the number of determined conditions increases, it becomes more difficult for the user to intuitively grasp the reason why the target data is classified as normal or abnormal.

また、利用者に決定木モデルのすべての葉ノードに名称を設定させ、対象データが葉ノードに分類された際、対象データに対応付けて葉ノードに設定された名称を利用者に通知する場合が考えられる。しかしながら、この場合、利用者に決定木モデルのすべての葉ノードに名称を設定させることは難しい。例えば、葉ノードの数が増大するほど、利用者にかかる負担が増大してしまう。 Also, when the user is made to set the name for all the leaf nodes of the decision tree model, and when the target data is classified into the leaf nodes, the user is notified of the name set for the leaf node in association with the target data. Can be considered. However, in this case, it is difficult to let the user set the name for all the leaf nodes of the decision tree model. For example, as the number of leaf nodes increases, the burden on the user increases.

また、教師なし学習を利用して、または、教師あり学習と教師なし学習とを併用して決定木モデルを生成する状況では、利用者は、教師なし学習によって決定木モデルに追加された葉ノードに、どのような対象データが分類されるのかを予め把握することが難しい。このため、利用者は、教師なし学習によって追加された葉ノードについては、名称を設定することは難しい。 Also, in situations where unsupervised learning is used or combined with supervised learning and unsupervised learning to generate a decision tree model, the user is a leaf node added to the decision tree model by unsupervised learning. In addition, it is difficult to grasp in advance what kind of target data is classified. Therefore, it is difficult for the user to set a name for the leaf node added by unsupervised learning.

そこで、本実施の形態では、決定木モデル上で名称が未設定の第１の葉ノードに対象データが分類された場合、名称が設定済みの第２の葉ノードに基づいて、第１の葉ノードの名称を生成することができる分類方法について説明する。この分類方法によれば、対象データを分類した葉ノードの名称を利用者に提示可能にすることができ、対象データが正常または異常に分類された理由を利用者が把握しやすくすることができる。 Therefore, in the present embodiment, when the target data is classified into the first leaf node whose name has not been set on the decision tree model, the first leaf is based on the second leaf node whose name has been set. A classification method that can generate node names will be described. According to this classification method, the name of the leaf node that classified the target data can be presented to the user, and the user can easily understand the reason why the target data is classified as normal or abnormal. ..

図１において、分類装置１００は、決定木モデル１１０を記憶する。分類装置１００は、例えば、学習データに基づいて決定木モデル１１０を生成して記憶する。分類装置１００は、例えば、学習データに基づいて決定木モデル１１０を生成する他装置から、決定木モデル１１０を受信して記憶してもよい。 In FIG. 1, the classification device 100 stores the decision tree model 110. The classification device 100 generates and stores the decision tree model 110 based on the learning data, for example. The classification device 100 may receive and store the decision tree model 110 from another device that generates the decision tree model 110 based on the learning data, for example.

決定木モデル１１０は、対象データを分類する条件を表すノードと、対象データが正常であること、または、対象データが異常であることを表す葉ノードとを含む。対象データは、例えば、何らかの事業者から提出される。対象データは、具体的には、社会保障給付費の請求書に対応する請求データである。図１の例では、決定木モデル１１０は、例えば、対象データを分類する条件を表すノード１１１〜１１６と、対象データが正常であること、または、対象データが異常であることを表す葉ノード１２１〜１２７とを含む。 The decision tree model 110 includes a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is submitted, for example, by some business operator. The target data is specifically the billing data corresponding to the invoice for social security benefits. In the example of FIG. 1, the decision tree model 110 has, for example, nodes 111 to 116 representing conditions for classifying the target data, and leaf nodes 121 indicating that the target data is normal or the target data is abnormal. Includes ~ 127.

分類装置１００は、決定木モデル１１０により対象データを分類する。分類装置１００は、例えば、決定木モデル１１０の根ノードから順に、ノードが表す条件を対象データが満たすか否かを判定し、判定した結果に基づいて、いずれかの葉ノードに対象データを分類する。葉ノードへの分類は、葉ノードが表す属性への分類に対応し、例えば、葉ノードが表す正常または異常への分類に対応する。図１の例では、分類装置１００は、具体的には、対象データを、葉ノード１２７に分類する。 The classification device 100 classifies the target data according to the decision tree model 110. The classification device 100 determines, for example, whether or not the target data satisfies the conditions represented by the nodes in order from the root node of the decision tree model 110, and classifies the target data into any leaf node based on the determination result. To do. The classification into leaf nodes corresponds to the classification into the attributes represented by the leaf node, for example, the classification into normal or abnormal represented by the leaf node. In the example of FIG. 1, the classification device 100 specifically classifies the target data into leaf nodes 127.

分類装置１００は、名称が未設定である第１の葉ノードに対象データが分類された場合、ノード間の位置関係に基づいて、決定木モデル１１０のうち、名称が設定済みである第２の葉ノードを選択する。ノード間の位置関係は、例えば、ノード間の親子関係によって表される。分類装置１００は、例えば、ノード間の位置関係に基づいて、第１の葉ノードの近傍にある葉ノードの中から、第２の葉ノードを選択する。ノード間の距離は、例えば、ノード間を接続するエッジの数によって表される。近傍は、例えば、ノード間を接続するエッジの数が所定数以下であることに対応する。 When the target data is classified into the first leaf node whose name has not been set, the classification device 100 has the name set in the second decision tree model 110 based on the positional relationship between the nodes. Select a leaf node. The positional relationship between nodes is represented by, for example, the parent-child relationship between nodes. The classification device 100 selects the second leaf node from the leaf nodes in the vicinity of the first leaf node, for example, based on the positional relationship between the nodes. The distance between nodes is represented, for example, by the number of edges connecting the nodes. The neighborhood corresponds, for example, that the number of edges connecting the nodes is less than or equal to a predetermined number.

図１の例では、分類装置１００は、具体的には、名称が未設定である葉ノード１２７に対象データが分類されたため、葉ノード１２７の近傍にある葉ノード１２３，１２６の中から、葉ノード１２３を選択する。第１の葉ノードの近傍にある葉ノードの中から第２の葉ノードを選択する具体例については、図１７および図１８を用いて後述する。 In the example of FIG. 1, the classification device 100 specifically classifies the target data into the leaf nodes 127 whose names have not been set, so that the leaves are among the leaf nodes 123 and 126 in the vicinity of the leaf nodes 127. Select node 123. A specific example of selecting the second leaf node from the leaf nodes in the vicinity of the first leaf node will be described later with reference to FIGS. 17 and 18.

分類装置１００は、選択した第２の葉ノードの名称に基づいて、第１の葉ノードの名称を生成する。第２の葉ノードの名称は、対象データが分類された結果の細かい種別に対応し、対象データが分類された理由を示す名称である。第２の葉ノードの名称は、例えば、ＸＸサービスやＢ不正などである。第２の葉ノードの名称は、具体的には、水増し請求や架空請求などである。 The classification device 100 generates the name of the first leaf node based on the name of the selected second leaf node. The name of the second leaf node corresponds to the detailed type of the result of classifying the target data, and is a name indicating the reason why the target data is classified. The name of the second leaf node is, for example, XX service or B fraud. Specifically, the name of the second leaf node is inflated billing or fictitious billing.

図１の例では、分類装置１００は、例えば、選択した葉ノード１２３の名称「Ｂ不正」に基づいて、葉ノード１２７の名称「Ｂ不正に近い傾向が見られる」を生成する。また、分類装置１００は、例えば、葉ノード１２６を選択した場合であれば、葉ノード１２６の名称「ＸＸサービス」に基づいて、葉ノード１２７の名称「ＸＸサービスに関連する不正請求と考えられる」を生成する。 In the example of FIG. 1, the classification device 100 generates, for example, the name “Tendency close to B fraud” of the leaf node 127 based on the name “B fraud” of the selected leaf node 123. Further, for example, when the leaf node 126 is selected, the classification device 100 is based on the name "XX service" of the leaf node 126, and the name of the leaf node 127 "is considered to be fraudulent billing related to the XX service". To generate.

分類装置１００は、さらに、選択した第２の葉ノードに対象データが分類される場合に判定される条件の判定結果と、第１の葉ノードに対象データが分類される場合に判定される条件の判定結果との差異を、第１の葉ノードの名称に反映してもよい。差異は、例えば、根ノードから第１の葉ノードまでの経路上の各ノードが表す条件の判定結果と、根ノードから第２の葉ノードまでの経路上の各ノードが表す条件の判定結果との差異である。 The classification device 100 further determines a determination result of a condition determined when the target data is classified into the selected second leaf node, and a condition determined when the target data is classified into the first leaf node. The difference from the determination result of may be reflected in the name of the first leaf node. The difference is, for example, the judgment result of the condition represented by each node on the route from the root node to the first leaf node and the judgment result of the condition represented by each node on the route from the root node to the second leaf node. Is the difference.

分類装置１００は、例えば、選択した葉ノード１２６の名称「ＸＸサービス」と、判定条件の差異「（ＸＸサービスに比べて）Ｄ＞１００ではない」とに基づいて、葉ノード１２７の名称「ＸＸサービスに関連し、Ｄ＞１００ではない」を生成する。 The classification device 100 uses, for example, the name "XX service" of the selected leaf node 126 and the name "XX service" of the leaf node 127 based on the difference in the determination conditions "(compared to the XX service) D> 100". Related to the service, D> not 100 "is generated.

これにより、分類装置１００は、生成した第１の葉ノードの名称を、対象データに対応付けて出力可能にすることができる。このため、分類装置１００は、利用者が、対象データが分類された結果の細かい種別を把握可能にし、対象データが分類された理由を把握可能にすることができる。 As a result, the classification device 100 can output the name of the generated first leaf node in association with the target data. Therefore, the classification device 100 can enable the user to grasp the detailed type of the result of classifying the target data and to grasp the reason why the target data is classified.

結果として、利用者は、対象データが膨大であっても、それぞれの対象データが分類された理由を参照して、効率よく異常な対象データがあるか否かをチェックすることができ、効率よく業務を行うことができ、負担の低減化を図ることができる。また、利用者は、異常な対象データを見落としにくくなり、効率よく業務を行うことができる。 As a result, even if the target data is enormous, the user can efficiently check whether or not there is abnormal target data by referring to the reason why each target data is classified, and efficiently. It is possible to carry out business and reduce the burden. In addition, the user is less likely to overlook abnormal target data and can perform business efficiently.

また、利用者は、いずれの対象データが、比較的重大な異常に分類されたかを把握し、いずれの対象データを精査することが好ましいかを判断することができる。また、利用者は、対象データを提出した複数の事業者のうち、いずれの事業者が比較的重大な異常がある対象データを提出したかを判断することができ、監査することが好ましい事業者を特定することができる。 In addition, the user can grasp which target data is classified as a relatively serious abnormality and determine which target data is preferable to be scrutinized. In addition, the user can determine which of the multiple businesses that submitted the target data submitted the target data with a relatively serious abnormality, and it is preferable to audit the business. Can be identified.

また、分類装置１００は、第１の葉ノードの名称を、第２の葉ノードの名称に基づいて生成することができ、利用者が直感的に理解しやすい名称を生成しやすくすることができる。このため、利用者は、対象データを分類する過程において判定された条件の判定結果を羅列した情報を参照する場合に比べて、対象データが分類された理由を把握しやすくなり、効率よく業務を行うことができる。 Further, the classification device 100 can generate the name of the first leaf node based on the name of the second leaf node, and can easily generate a name that is intuitively easy for the user to understand. .. For this reason, the user can easily understand the reason why the target data is classified, and can perform the work efficiently, as compared with the case of referring to the information listing the judgment results of the conditions judged in the process of classifying the target data. It can be carried out.

また、分類装置１００は、利用者が名称を設定していない葉ノードについて名称を生成することができる。このため、利用者は、決定木モデル１１０のすべての葉ノードに名称を設定しなくてもよく、負担の低減化を図ることができる。また、分類装置１００は、決定木モデル１１０に、教師なし学習によって追加された葉ノードについても、名称を設定することができる。このため、利用者は、対象データが、教師なし学習によって追加された葉ノードに分類された理由を把握することができる。 In addition, the classification device 100 can generate a name for a leaf node for which the user has not set a name. Therefore, the user does not have to set the names for all the leaf nodes of the decision tree model 110, and the burden can be reduced. The classification device 100 can also set the name of the leaf node added to the decision tree model 110 by unsupervised learning. Therefore, the user can understand the reason why the target data is classified into the leaf nodes added by unsupervised learning.

ここでは、分類装置１００が、対象データが分類された第１の葉ノードが、対象データが正常であることを表す葉ノードであっても、第１の葉ノードの名称を生成する場合について説明したが、これに限らない。例えば、分類装置１００が、対象データが分類された第１の葉ノードが、対象データが正常であることを表す葉ノードであれば、第１の葉ノードの名称を生成しない場合があってもよい。これにより、分類装置１００は、処理量の低減化を図ることができる。分類装置１００は、例えば、利用者が異常な対象データを発見することを求める場合、対象データが正常であることを表す葉ノードに名称を付けなくても、利用者の業務効率を低下させずに、処理量の低減化を図ることができる。 Here, the case where the classification device 100 generates the name of the first leaf node even if the first leaf node in which the target data is classified is a leaf node indicating that the target data is normal will be described. However, it is not limited to this. For example, if the classification device 100 is a leaf node indicating that the target data is normal, the first leaf node into which the target data is classified may not generate the name of the first leaf node. Good. As a result, the classification device 100 can reduce the processing amount. For example, when the user requests to find abnormal target data, the classification device 100 does not reduce the business efficiency of the user even if the leaf node indicating that the target data is normal is not named. In addition, the amount of processing can be reduced.

ここでは、分類装置１００が、ノード間の位置関係に基づいて、第１の葉ノードの近傍にある葉ノードの中から、第２の葉ノードを選択する場合について説明したが、これに限らない。例えば、分類装置１００が、ノード間の位置関係に基づいて、決定木モデル１１０のうち、第１の葉ノードを含む第１の部分木と同一または類似する第２の部分木に含まれる葉ノードの中から、第２の葉ノードを選択する場合があってもよい。第２の部分木に含まれる葉ノードの中から第２の葉ノードを選択する具体例については、図１９〜図２１を用いて後述する。 Here, the case where the classification device 100 selects the second leaf node from the leaf nodes in the vicinity of the first leaf node based on the positional relationship between the nodes has been described, but the present invention is not limited to this. .. For example, the classification device 100 includes leaf nodes in a second subtree that is the same as or similar to the first subtree that includes the first leaf node in the decision tree model 110, based on the positional relationship between the nodes. The second leaf node may be selected from the above. A specific example of selecting the second leaf node from the leaf nodes included in the second subtree will be described later with reference to FIGS. 19 to 21.

（分類システム２００の一例）
次に、図２を用いて、図１に示した分類装置１００を適用した、分類システム２００の一例について説明する。 (Example of classification system 200)
Next, an example of the classification system 200 to which the classification device 100 shown in FIG. 1 is applied will be described with reference to FIG.

図２は、分類システム２００の一例を示す説明図である。図２において、分類システム２００は、分類装置１００と、端末装置２０１とを含む。 FIG. 2 is an explanatory diagram showing an example of the classification system 200. In FIG. 2, the classification system 200 includes a classification device 100 and a terminal device 201.

分類システム２００において、分類装置１００と端末装置２０１とは、有線または無線のネットワーク２１０を介して接続される。ネットワーク２１０は、例えば、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどである。 In the classification system 200, the classification device 100 and the terminal device 201 are connected via a wired or wireless network 210. The network 210 is, for example, a LAN (Local Area Network), a WAN (Wide Area Network), the Internet, or the like.

分類装置１００は、端末装置２０１から学習データとなる請求データを受信し、決定木モデルを生成する。分類装置１００は、対象データとなる請求データを受信し、生成した決定木モデルの葉ノードに、請求データを分類する。分類装置１００は、請求データを分類した葉ノードに名称が未設定であれば、名称を生成する。 The classification device 100 receives billing data as learning data from the terminal device 201 and generates a decision tree model. The classification device 100 receives the billing data to be the target data, and classifies the billing data into the leaf nodes of the generated decision tree model. The classification device 100 generates a name if the name is not set for the leaf node that has classified the billing data.

分類装置１００は、請求データを分類した結果に基づく情報を、端末装置２０１に出力させ、端末装置２０１の利用者に通知させる。分類装置１００は、例えば、請求データと、請求データを分類した葉ノードの名称とを対応付けて、端末装置２０１に表示させる。分類装置１００は、例えば、サーバやＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）などである。 The classification device 100 causes the terminal device 201 to output information based on the result of classifying the billing data, and notifies the user of the terminal device 201. The classification device 100, for example, associates the billing data with the name of the leaf node that classifies the billing data, and displays the billing data on the terminal device 201. The classification device 100 is, for example, a server, a PC (Personal Computer), or the like.

端末装置２０１は、分類システム２００の利用者が利用するコンピュータである。利用者は、例えば、省庁や自治体、または、病院や薬局などの職員である。端末装置２０１は、例えば、省庁や自治体、または、病院や薬局などに設けられる。端末装置２０１は、学習データとなる請求データを、分類装置１００に送信する。端末装置２０１は、対象データとなる請求データを、分類装置１００に送信し、請求データを分類した結果に基づく情報を、分類装置１００から受信して出力する。端末装置２０１は、例えば、ＰＣ、タブレット端末、または、スマートフォンなどである。 The terminal device 201 is a computer used by the user of the classification system 200. The user is, for example, an employee of a ministry or local government, or a hospital or pharmacy. The terminal device 201 is provided in, for example, a ministry, a local government, a hospital, a pharmacy, or the like. The terminal device 201 transmits billing data, which is learning data, to the classification device 100. The terminal device 201 transmits billing data as target data to the classification device 100, and receives and outputs information based on the result of classifying the billing data from the classification device 100. The terminal device 201 is, for example, a PC, a tablet terminal, a smartphone, or the like.

ここでは、分類装置１００と端末装置２０１とが別の装置である場合について説明したが、これに限らない。例えば、分類装置１００が、端末装置２０１と一体である場合があってもよい。この場合、分類装置１００は、利用者の操作入力に基づいて、学習データとなる請求データ、または、対象データとなる請求データなどを受け付ける。 Here, the case where the classification device 100 and the terminal device 201 are different devices has been described, but the present invention is not limited to this. For example, the classification device 100 may be integrated with the terminal device 201. In this case, the classification device 100 receives billing data as learning data, billing data as target data, or the like based on the operation input of the user.

ここでは、分類装置１００が決定木モデルを生成し、対象データを分類する場合について説明したが、これに限らない。例えば、分類装置１００とは異なる他の装置が、決定木モデルを生成し、分類装置１００に送信する場合があってもよい。また、例えば、分類装置１００とは異なる他の装置が、対象データを分類した結果を、分類装置１００に送信する場合があってもよい。他の装置は、例えば、端末装置２０１である。 Here, the case where the classification device 100 generates a decision tree model and classifies the target data has been described, but the present invention is not limited to this. For example, another device different from the classification device 100 may generate a decision tree model and send it to the classification device 100. Further, for example, another device different from the classification device 100 may transmit the result of classifying the target data to the classification device 100. Another device is, for example, a terminal device 201.

（分類装置１００のハードウェア構成例）
次に、図３を用いて、分類装置１００のハードウェア構成例について説明する。 (Example of hardware configuration of classification device 100)
Next, a hardware configuration example of the classification device 100 will be described with reference to FIG.

図３は、分類装置１００のハードウェア構成例を示すブロック図である。図３において、分類装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）３０１と、メモリ３０２と、ネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）３０３と、記録媒体Ｉ／Ｆ３０４と、記録媒体３０５とを有する。また、各構成部は、バス３００によってそれぞれ接続される。 FIG. 3 is a block diagram showing a hardware configuration example of the classification device 100. In FIG. 3, the classification device 100 includes a CPU (Central Processing Unit) 301, a memory 302, a network I / F (Interface) 303, a recording medium I / F 304, and a recording medium 305. Further, each component is connected by a bus 300.

ここで、ＣＰＵ３０１は、分類装置１００の全体の制御を司る。メモリ３０２は、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ３０１のワークエリアとして使用される。メモリ３０２に記憶されるプログラムは、ＣＰＵ３０１にロードされることで、コーディングされている処理をＣＰＵ３０１に実行させる。 Here, the CPU 301 controls the entire classification device 100. The memory 302 includes, for example, a ROM (Read Only Memory), a RAM (Random Access Memory), a flash ROM, and the like. Specifically, for example, a flash ROM or ROM stores various programs, and the RAM is used as a work area of the CPU 301. The program stored in the memory 302 is loaded into the CPU 301 to cause the CPU 301 to execute the coded process.

ネットワークＩ／Ｆ３０３は、通信回線を通じてネットワーク２１０に接続され、ネットワーク２１０を介して他のコンピュータに接続される。そして、ネットワークＩ／Ｆ３０３は、ネットワーク２１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。ネットワークＩ／Ｆ３０３には、例えば、モデムやＬＡＮアダプタなどを採用することができる。 The network I / F 303 is connected to the network 210 through a communication line, and is connected to another computer via the network 210. Then, the network I / F 303 controls the internal interface with the network 210 and controls the input / output of data from another computer. For the network I / F 303, for example, a modem, a LAN adapter, or the like can be adopted.

記録媒体Ｉ／Ｆ３０４は、ＣＰＵ３０１の制御に従って記録媒体３０５に対するデータのリード／ライトを制御する。記録媒体Ｉ／Ｆ３０４は、例えば、ディスクドライブ、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ポートなどである。記録媒体３０５は、記録媒体Ｉ／Ｆ３０４の制御で書き込まれたデータを記憶する不揮発メモリである。記録媒体３０５は、例えば、ディスク、半導体メモリ、ＵＳＢメモリなどである。記録媒体３０５は、分類装置１００から着脱可能であってもよい。メモリ３０２、または、記録媒体３０５は、例えば、請求データや決定木モデルを記憶してもよい。 The recording medium I / F 304 controls read / write of data to the recording medium 305 according to the control of the CPU 301. The recording medium I / F 304 is, for example, a disk drive, an SSD (Solid State Drive), a USB (Universal Bus) port, or the like. The recording medium 305 is a non-volatile memory that stores data written under the control of the recording medium I / F 304. The recording medium 305 is, for example, a disk, a semiconductor memory, a USB memory, or the like. The recording medium 305 may be detachable from the sorting device 100. The memory 302 or the recording medium 305 may store, for example, billing data or a decision tree model.

分類装置１００は、上述した構成部のほか、例えば、キーボード、マウス、ディスプレイ、プリンタ、スキャナ、マイク、スピーカーなどを有してもよい。また、分類装置１００は、記録媒体Ｉ／Ｆ３０４や記録媒体３０５を複数有していてもよい。また、分類装置１００は、記録媒体Ｉ／Ｆ３０４や記録媒体３０５を有していなくてもよい。 The sorting device 100 may include, for example, a keyboard, a mouse, a display, a printer, a scanner, a microphone, a speaker, and the like, in addition to the above-described components. Further, the classification device 100 may have a plurality of recording media I / F 304 and recording media 305. Further, the classification device 100 does not have to have the recording medium I / F 304 or the recording medium 305.

（請求データ４００のデータ構造）
次に、図４を用いて、分類装置１００が取得する請求データ４００のデータ構造について説明する。請求データ４００は、例えば、端末装置２０１から分類装置１００に送信される。 (Data structure of billing data 400)
Next, the data structure of the billing data 400 acquired by the classification device 100 will be described with reference to FIG. The billing data 400 is transmitted from the terminal device 201 to the classification device 100, for example.

図４は、請求データ４００のデータ構造の記憶内容の一例を示す説明図である。図４に示すように、請求データ４００は、１以上の項目のフィールドを有する。請求データ４００は、各フィールドに情報を設定することにより、社会保障給付費の請求書の項目の値が記憶される。 FIG. 4 is an explanatory diagram showing an example of the stored contents of the data structure of the billing data 400. As shown in FIG. 4, the billing data 400 has one or more item fields. By setting information in each field of the billing data 400, the value of the item of the invoice for social security benefit expenses is stored.

項目のフィールドには、社会保障給付費の請求書の項目の値が設定される。項目は、例えば、項目Ａと、項目Ｂと、項目Ｃとである。項目は、具体的には、日付と、サービス区分と、金額と、利用量と、加算情報となどである。加算情報は、金額を増額して申請可能になる条件である。 In the item field, the value of the item on the invoice for social security benefits is set. The items are, for example, item A, item B, and item C. Specifically, the items are a date, a service category, an amount of money, a usage amount, additional information, and the like. Additional information is a condition that allows you to apply by increasing the amount.

（請求テーブル５００の記憶内容）
次に、図５を用いて、分類装置１００が取得した請求データ４００を管理する請求テーブル５００の記憶内容について説明する。請求テーブル５００は、例えば、図３に示した分類装置１００のメモリ３０２や記録媒体３０５などの記憶領域により実現される。 (Memory contents of billing table 500)
Next, the stored contents of the billing table 500 that manages the billing data 400 acquired by the classification device 100 will be described with reference to FIG. The billing table 500 is realized, for example, by a storage area such as a memory 302 or a recording medium 305 of the classification device 100 shown in FIG.

図５は、請求テーブル５００の記憶内容の一例を示す説明図である。図５に示すように、請求テーブル５００は、キーと、１以上の項目と、結果と、不正理由と、正常理由とのフィールドを有する。請求テーブル５００は、各フィールドに情報を設定することにより、請求データ４００と、請求データ４００を分類した結果に関する結果データがレコードとして記憶される。 FIG. 5 is an explanatory diagram showing an example of the stored contents of the billing table 500. As shown in FIG. 5, the billing table 500 has fields for a key, one or more items, a result, a reason for fraud, and a reason for normality. By setting information in each field, the billing table 500 stores the billing data 400 and the result data regarding the result of classifying the billing data 400 as a record.

キーのフィールドには、請求データ４００を識別するキーが設定される。項目のフィールドには、社会保障給付費の請求書の項目の値に対応し、キーによって識別される請求データ４００の項目のフィールドに設定された値が設定される。 A key for identifying the billing data 400 is set in the key field. In the item field, the value set in the item field of the billing data 400 identified by the key corresponding to the value of the item of the social security benefit bill is set.

結果のフィールドには、キーによって識別される請求データ４００を正常または不正請求に分類した結果が設定される。不正理由のフィールドには、請求データ４００が不正請求に分類された場合、請求データ４００が不正請求に分類された不正理由が設定される。不正理由は、例えば、水増し請求や架空請求などである。正常理由のフィールドには、請求データ４００が正常に分類された場合、請求データ４００が正常に分類された正常理由が設定される。正常理由は、例えば、居宅介護などである。 In the result field, the result of classifying the billing data 400 identified by the key into normal or fraudulent billing is set. When the billing data 400 is classified as fraudulent billing, the fraudulent reason for which the billing data 400 is classified as fraudulent billing is set in the fraudulent reason field. Reasons for fraud include, for example, inflated billing and fictitious billing. When the billing data 400 is normally classified, the normal reason for which the billing data 400 is normally classified is set in the normal reason field. The normal reason is, for example, home care.

（学習データのデータ構造）
学習データは、過去の請求データ４００を含むため、学習データのデータ構造は、例えば、図４に示した請求データ４００のデータ構造と同様であるが、学習データは、結果のフィールドをさらに有してもよい。結果のフィールドには、過去の請求データ４００を利用者が正常または不正請求と判断した結果が設定される。 (Data structure of training data)
Since the training data includes the past billing data 400, the data structure of the training data is, for example, the same as the data structure of the billing data 400 shown in FIG. 4, but the training data further has a field of results. You may. In the result field, the result that the user determines that the past billing data 400 is normal or fraudulent billing is set.

学習データは、例えば、学習テーブルを用いて記憶される。学習テーブルの一例は、例えば、図１２に示される。学習テーブルの記憶内容は、例えば、図５に示した請求テーブル５００の記憶内容と同様であるが、学習テーブルは、不正理由と正常理由とのフィールドを有さなくてもよい。 The learning data is stored using, for example, a learning table. An example of the learning table is shown in FIG. 12, for example. The stored contents of the learning table are, for example, the same as the stored contents of the billing table 500 shown in FIG. 5, but the learning table does not have to have fields for the reason for fraud and the reason for normal.

（端末装置２０１のハードウェア構成例）
次に、図６を用いて、端末装置２０１のハードウェア構成例について説明する。 (Hardware configuration example of terminal device 201)
Next, a hardware configuration example of the terminal device 201 will be described with reference to FIG.

図６は、端末装置２０１のハードウェア構成例を示すブロック図である。図５において、端末装置２０１は、ＣＰＵ６０１と、メモリ６０２と、ネットワークＩ／Ｆ６０３と、記録媒体Ｉ／Ｆ６０４と、記録媒体６０５と、ディスプレイ６０６と、入力装置６０７とを有する。また、各構成部は、バス６００によってそれぞれ接続される。 FIG. 6 is a block diagram showing a hardware configuration example of the terminal device 201. In FIG. 5, the terminal device 201 includes a CPU 601, a memory 602, a network I / F 603, a recording medium I / F 604, a recording medium 605, a display 606, and an input device 607. Further, each component is connected by a bus 600.

ここで、ＣＰＵ６０１は、端末装置２０１の全体の制御を司る。メモリ６０２は、例えば、ＲＯＭ、ＲＡＭおよびフラッシュＲＯＭなどを有する。具体的には、例えば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ６０１のワークエリアとして使用される。メモリ６０２に記憶されるプログラムは、ＣＰＵ６０１にロードされることで、コーディングされている処理をＣＰＵ６０１に実行させる。 Here, the CPU 601 controls the entire terminal device 201. The memory 602 includes, for example, a ROM, a RAM, a flash ROM, and the like. Specifically, for example, a flash ROM or ROM stores various programs, and RAM is used as a work area of CPU 601. The program stored in the memory 602 is loaded into the CPU 601 to cause the CPU 601 to execute the coded process.

ネットワークＩ／Ｆ６０３は、通信回線を通じてネットワーク２１０に接続され、ネットワーク２１０を介して他のコンピュータに接続される。そして、ネットワークＩ／Ｆ６０３は、ネットワーク２１０と内部のインターフェースを司り、他のコンピュータからのデータの入出力を制御する。ネットワークＩ／Ｆ６０３には、例えば、モデムやＬＡＮアダプタなどを採用することができる。 The network I / F 603 is connected to the network 210 through a communication line, and is connected to another computer via the network 210. Then, the network I / F 603 controls the internal interface with the network 210 and controls the input / output of data from another computer. For the network I / F 603, for example, a modem, a LAN adapter, or the like can be adopted.

記録媒体Ｉ／Ｆ６０４は、ＣＰＵ６０１の制御に従って記録媒体６０５に対するデータのリード／ライトを制御する。記録媒体Ｉ／Ｆ６０４は、例えば、ディスクドライブ、ＳＳＤ、ＵＳＢポートなどである。記録媒体６０５は、記録媒体Ｉ／Ｆ６０４の制御で書き込まれたデータを記憶する不揮発メモリである。記録媒体６０５は、例えば、ディスク、半導体メモリ、ＵＳＢメモリなどである。記録媒体６０５は、端末装置２０１から着脱可能であってもよい。 The recording medium I / F 604 controls read / write of data to the recording medium 605 according to the control of the CPU 601. The recording medium I / F 604 is, for example, a disk drive, an SSD, a USB port, or the like. The recording medium 605 is a non-volatile memory that stores data written under the control of the recording medium I / F 604. The recording medium 605 is, for example, a disk, a semiconductor memory, a USB memory, or the like. The recording medium 605 may be detachable from the terminal device 201.

ディスプレイ６０６は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。ディスプレイ６０６は、例えば、ＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）、液晶ディスプレイ、有機ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどを採用することができる。 The display 606 displays data such as a cursor, an icon, a toolbox, a document, an image, and functional information. As the display 606, for example, a CRT (Cathode Ray Tube), a liquid crystal display, an organic EL (Electroluminescence) display and the like can be adopted.

入力装置６０７は、文字、数字、各種指示などの入力のためのキーを有し、データの入力を行う。入力装置６０７は、キーボードやマウスなどであってもよく、また、タッチパネル式の入力パッドやテンキーなどであってもよい。 The input device 607 has keys for inputting characters, numbers, various instructions, and the like, and inputs data. The input device 607 may be a keyboard, a mouse, or the like, or may be a touch panel type input pad, a numeric keypad, or the like.

端末装置２０１は、上述した構成部のほか、例えば、プリンタ、スキャナ、マイク、スピーカーなどを有してもよい。また、端末装置２０１は、記録媒体Ｉ／Ｆ６０４や記録媒体６０５を複数有していてもよい。また、端末装置２０１は、記録媒体Ｉ／Ｆ６０４や記録媒体６０５を有していなくてもよい。 In addition to the above-described components, the terminal device 201 may include, for example, a printer, a scanner, a microphone, a speaker, and the like. Further, the terminal device 201 may have a plurality of recording media I / F 604 and recording media 605. Further, the terminal device 201 does not have to have the recording medium I / F 604 or the recording medium 605.

（分類装置１００の機能的構成例）
次に、図７を用いて、分類装置１００の機能的構成例について説明する。 (Example of functional configuration of classification device 100)
Next, a functional configuration example of the classification device 100 will be described with reference to FIG. 7.

図７は、分類装置１００の機能的構成例を示すブロック図である。分類装置１００は、記憶部７００と、取得部７０１と、学習部７０２と、分類部７０３と、選択部７０４と、生成部７０５と、出力部７０６とを含む。 FIG. 7 is a block diagram showing a functional configuration example of the classification device 100. The classification device 100 includes a storage unit 700, an acquisition unit 701, a learning unit 702, a classification unit 703, a selection unit 704, a generation unit 705, and an output unit 706.

記憶部７００は、例えば、図３に示したメモリ３０２や記録媒体３０５などの記憶領域によって実現される。以下では、記憶部７００が、分類装置１００に含まれる場合について説明するが、これに限らない。例えば、記憶部７００が、分類装置１００とは異なる装置に含まれ、記憶部７００の記憶内容が分類装置１００から参照可能である場合があってもよい。 The storage unit 700 is realized by, for example, a storage area such as the memory 302 or the recording medium 305 shown in FIG. Hereinafter, the case where the storage unit 700 is included in the classification device 100 will be described, but the present invention is not limited to this. For example, the storage unit 700 may be included in a device different from the classification device 100, and the stored contents of the storage unit 700 may be referred to by the classification device 100.

取得部７０１〜出力部７０６は、制御部の一例として機能する。取得部７０１〜出力部７０６は、具体的には、例えば、図３に示したメモリ３０２や記録媒体３０５などの記憶領域に記憶されたプログラムをＣＰＵ３０１に実行させることにより、または、ネットワークＩ／Ｆ３０３により、その機能を実現する。各機能部の処理結果は、例えば、図３に示したメモリ３０２や記録媒体３０５などの記憶領域に記憶される。 The acquisition units 701 to the output unit 706 function as an example of the control unit. Specifically, the acquisition units 701 to the output unit 706 are made by causing the CPU 301 to execute a program stored in a storage area such as the memory 302 or the recording medium 305 shown in FIG. 3, or the network I / F 303. To realize the function. The processing result of each functional unit is stored in a storage area such as the memory 302 or the recording medium 305 shown in FIG. 3, for example.

記憶部７００は、各機能部の処理において参照され、または更新される各種情報を記憶する。記憶部７００は、例えば、学習データを記憶してもよい。学習データは、例えば、過去のデータと、過去のデータが正常であるか異常であるかを利用者が判断した結果とを対応付けたデータである。過去のデータは、例えば、請求データ４００である。異常は、例えば、不正請求である。 The storage unit 700 stores various information referred to or updated in the processing of each functional unit. The storage unit 700 may store the learning data, for example. The learning data is, for example, data in which the past data is associated with the result of the user's determination as to whether the past data is normal or abnormal. The past data is, for example, billing data 400. The anomaly is, for example, fraudulent billing.

記憶部７００は、例えば、学習データから決定木モデルを生成する生成ルールを記憶してもよい。決定木モデルは、複数のノードを含み、ノード間がエッジで接続されたモデルである。決定木モデルは、例えば、対象データを分類する条件を表すノードと、対象データが正常であること、または、対象データが異常であることを表す葉ノードとを含み、少なくともいずれかの葉ノードに名称が設定される。記憶部７００は、具体的には、生成ルールを有する機械学習ＦＷ（ＦｉｒｍＷａｒｅ）を記憶してもよい。 The storage unit 700 may store, for example, a generation rule that generates a decision tree model from training data. The decision tree model is a model in which a plurality of nodes are included and the nodes are connected by an edge. The decision tree model includes, for example, a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal, and the decision tree model includes at least one leaf node. The name is set. Specifically, the storage unit 700 may store a machine learning FW (FirmWare) having a generation rule.

記憶部７００は、例えば、決定木モデルを記憶してもよい。記憶部７００は、例えば、決定木モデルの葉ノードの名称を生成する際に用いられる言語化ルールや名称テンプレートを記憶してもよい。言語化ルールは、例えば、不等号を用いた条件の判定結果を、「以上」、「以下」、「より大きい」、「より小さい」、「未満」、「範囲」などの文言を用いて書き換えるためのルールである。名称テンプレートは、例えば、「既知のＡに近い傾向が見られる」や「既知のＡに関連する不正請求である」などの文章である。Ａは、例えば、他の葉ノードに設定済みの名称が挿入される。 The storage unit 700 may store, for example, a decision tree model. The storage unit 700 may store, for example, a verbalization rule or a name template used when generating a name of a leaf node of a decision tree model. The verbalization rule is, for example, to rewrite the judgment result of the condition using the inequality sign by using words such as "greater than or equal to", "less than or equal to", "greater than", "less than", "less than", and "range". Is the rule of. The name template is, for example, a sentence such as "a tendency close to a known A is seen" or "a fraudulent claim related to a known A". For A, for example, a name set in another leaf node is inserted.

記憶部７００は、例えば、画面テンプレートを記憶してもよい。画面テンプレートは、端末装置２０１に表示させる画面を生成する際に用いられる。画面テンプレートは、例えば、請求テーブル５００の不正理由や正常理由を利用者に設定させる画面を生成する際に用いられる。画面テンプレートは、例えば、請求データ４００と、請求データ４００を分類した葉ノードの名称とを対応付けて出力する画面を生成する際に用いられる。 The storage unit 700 may store, for example, a screen template. The screen template is used when generating a screen to be displayed on the terminal device 201. The screen template is used, for example, when generating a screen for causing the user to set the reason for fraud or the reason for normality of the billing table 500. The screen template is used, for example, when generating a screen for outputting the billing data 400 in association with the name of the leaf node that classifies the billing data 400.

取得部７０１は、各機能部の処理に用いられる各種情報を取得し、各機能部に出力する。取得部７０１は、例えば、各機能部の処理に用いられる各種情報を記憶部７００から取得し、各機能部に出力してもよい。取得部７０１は、例えば、各機能部の処理に用いられる各種情報を、分類装置１００とは異なる装置から取得し、各機能部に出力してもよい。 The acquisition unit 701 acquires various information used for processing of each functional unit and outputs the information to each functional unit. For example, the acquisition unit 701 may acquire various information used for processing of each functional unit from the storage unit 700 and output the information to each functional unit. For example, the acquisition unit 701 may acquire various information used for processing of each functional unit from a device different from the classification device 100 and output the information to each functional unit.

取得部７０１は、具体的には、学習データを端末装置２０１から受信し、記憶部７００に記憶し、各機能部に出力する。取得部７０１は、具体的には、利用者の操作入力、または、記録媒体Ｉ／Ｆ３０４を介して、学習データの入力を受け付けてもよい。これにより、取得部７０１は、学習データを学習部７０２に出力し、学習部７０２で決定木モデルを生成可能にすることができる。 Specifically, the acquisition unit 701 receives the learning data from the terminal device 201, stores it in the storage unit 700, and outputs it to each function unit. Specifically, the acquisition unit 701 may accept the input of the learning data via the operation input of the user or the recording medium I / F 304. As a result, the acquisition unit 701 can output the learning data to the learning unit 702 and enable the learning unit 702 to generate the decision tree model.

取得部７０１は、具体的には、対象データを端末装置２０１から受信し、記憶部７００に記憶し、各機能部に出力する。取得部７０１は、具体的には、利用者の操作入力、または、記録媒体Ｉ／Ｆ３０４を介して、対象データの入力を受け付けてもよい。これにより、取得部７０１は、対象データを分類部７０３に出力し、分類部７０３で対象データを分類開始させることができる。 Specifically, the acquisition unit 701 receives the target data from the terminal device 201, stores it in the storage unit 700, and outputs it to each function unit. Specifically, the acquisition unit 701 may accept the input of the target data via the operation input of the user or the recording medium I / F 304. As a result, the acquisition unit 701 can output the target data to the classification unit 703 and start the classification of the target data in the classification unit 703.

学習部７０２は、生成ルールを参照し、学習データに基づいて決定木モデルを生成する。学習部７０２は、例えば、機械学習ＦＷを用いて、学習データに基づいて決定木モデルを生成する。これにより、学習部７０２は、分類部７０３が決定木モデルを参照可能にし、分類部７０３が対象データを分類可能にすることができる。 The learning unit 702 refers to the generation rule and generates a decision tree model based on the learning data. The learning unit 702 generates a decision tree model based on the learning data by using, for example, a machine learning FW. As a result, in the learning unit 702, the classification unit 703 can refer to the decision tree model, and the classification unit 703 can classify the target data.

分類部７０３は、決定木モデルにより、対象データを分類する。分類部７０３は、例えば、決定木モデルの根ノードから順に、ノードが表す条件を対象データが満たすか否かを判定した結果に基づいて、子ノードを辿り、いずれかの葉ノードに対象データを分類する。これにより、分類部７０３は、対象データをいずれかの葉ノードに分類し、対象データを正常または異常に分類することができる。 The classification unit 703 classifies the target data by the decision tree model. The classification unit 703 traces child nodes based on the result of determining whether or not the target data satisfies the conditions represented by the nodes in order from the root node of the decision tree model, and sets the target data in one of the leaf nodes. Classify. As a result, the classification unit 703 can classify the target data into any leaf node and classify the target data as normal or abnormal.

選択部７０４は、名称が未設定である第１の葉ノードに対象データが分類された場合、ノード間の位置関係に基づいて、決定木モデルのうち名称が設定済みである第２の葉ノードを選択する。ノード間の位置関係は、例えば、ノード間の親子関係によって表される。 When the target data is classified into the first leaf node whose name has not been set, the selection unit 704 uses the second leaf node whose name has been set in the decision tree model based on the positional relationship between the nodes. Select. The positional relationship between nodes is represented by, for example, the parent-child relationship between nodes.

選択部７０４は、例えば、第１の葉ノードが、対象データが異常であることを表す葉ノードである場合、第２の葉ノードを選択する。これにより、選択部７０４は、利用者が重要視する傾向がある、対象データが異常であることを表す葉ノードの名称を生成することができる。 For example, when the first leaf node is a leaf node indicating that the target data is abnormal, the selection unit 704 selects the second leaf node. As a result, the selection unit 704 can generate the name of the leaf node indicating that the target data is abnormal, which the user tends to attach importance to.

選択部７０４は、例えば、第１の葉ノードが、対象データが正常であることを表す葉ノードである場合、第２の葉ノードを選択しなくてもよい。これにより、選択部７０４は、対象データが異常に分類された理由を利用者が把握することが求められる場合、対象データが正常に分類された理由に対応する名称を生成しなくてもよいようにすることができる。このため、選択部７０４は、生成部７０５の処理量の低減化を図ることができる。 For example, when the first leaf node is a leaf node indicating that the target data is normal, the selection unit 704 does not have to select the second leaf node. As a result, when the user is required to grasp the reason why the target data is abnormally classified, the selection unit 704 does not have to generate a name corresponding to the reason why the target data is normally classified. Can be. Therefore, the selection unit 704 can reduce the processing amount of the generation unit 705.

選択部７０４は、例えば、ノード間の位置関係に基づいて、第１の葉ノードの近傍にある葉ノードの中から、第２の葉ノードを選択する。ノード間の距離は、例えば、ノード間を接続するエッジの数によって表される。近傍は、例えば、ノード間を接続するエッジの数が所定数以下であることに対応する。第１の葉ノードの近傍にある葉ノードは、例えば、第１の葉ノードから所定数以下のエッジを経由して到達可能な葉ノードである。 The selection unit 704 selects the second leaf node from the leaf nodes in the vicinity of the first leaf node, for example, based on the positional relationship between the nodes. The distance between nodes is represented, for example, by the number of edges connecting the nodes. The neighborhood corresponds, for example, that the number of edges connecting the nodes is less than or equal to a predetermined number. The leaf node in the vicinity of the first leaf node is, for example, a leaf node that can be reached from the first leaf node via a predetermined number or less of edges.

選択部７０４は、具体的には、第１の葉ノードから所定数以下のエッジを経由して到達可能な葉ノードの中から、第２の葉ノードを選択する。これにより、選択部７０４は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。このため、選択部７０４は、第１の葉ノードについて、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 Specifically, the selection unit 704 selects a second leaf node from among the leaf nodes that can be reached from the first leaf node via a predetermined number or less of edges. This makes it easier for the selection unit 704 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node. Therefore, the selection unit 704 can easily generate a name for the first leaf node that is intuitively easy for the user to understand.

第１の葉ノードの近傍にある葉ノードは、例えば、第１の葉ノードの上位にあり第１の葉ノードから所定数以下のエッジを経由して到達可能な上位ノードの下位にある葉ノードである。上位は、根ノードに近い方である。下位は、葉ノードに近い方である。 A leaf node in the vicinity of the first leaf node is, for example, a leaf node above the first leaf node and below a higher node that can be reached from the first leaf node via a predetermined number of edges or less. Is. The higher level is the one closer to the root node. The lower part is closer to the leaf node.

選択部７０４は、具体的には、第１の葉ノードの上位にあり第１の葉ノードから所定数以下のエッジを経由して到達可能な上位ノードの下位にある葉ノードの中から、第２の葉ノードを選択する。これにより、選択部７０４は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。このため、選択部７０４は、第１の葉ノードについて、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 Specifically, the selection unit 704 is the first leaf node among the leaf nodes above the first leaf node and below the upper node that can be reached from the first leaf node via a predetermined number of edges or less. Select 2 leaf nodes. This makes it easier for the selection unit 704 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node. Therefore, the selection unit 704 can easily generate a name for the first leaf node that is intuitively easy for the user to understand.

選択部７０４は、例えば、所定数を、決定木モデルの深さに基づいて設定する。選択部７０４は、具体的には、決定木モデルの深さが閾値を超える場合、所定数を、決定木モデルの深さ＊２０％に設定する。閾値は、例えば、２０である。選択部７０４は、具体的には、決定木モデルの深さが閾値以下であれば、所定数を固定値に設定してもよい。固定値は、例えば、２である。これにより、選択部７０４は、決定木モデルの深さに応じて、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。また、選択部７０４は、決定木モデルの深さに応じて、選択対象の範囲を限定し、処理量の低減化を図ることができる。 The selection unit 704 sets, for example, a predetermined number based on the depth of the decision tree model. Specifically, when the depth of the decision tree model exceeds the threshold value, the selection unit 704 sets a predetermined number to the depth * 20% of the decision tree model. The threshold is, for example, 20. Specifically, the selection unit 704 may set a predetermined number to a fixed value as long as the depth of the decision tree model is equal to or less than the threshold value. The fixed value is, for example, 2. This makes it easier for the selection unit 704 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node, depending on the depth of the decision tree model. Further, the selection unit 704 can limit the range of selection targets according to the depth of the decision tree model and reduce the processing amount.

選択部７０４は、ノード間の位置関係に基づいて、決定木モデルのうち、第１の部分木とは異なる位置にある第２の部分木に含まれる葉ノードの中から、第２の葉ノードを選択する。第１の部分木は、第１の葉ノードを含む部分木である。第２の部分木は、第１の部分木と異なる位置にあり、第１の部分木と同一の部分木である。第１の部分木と同一の部分木は、例えば、第１の部分木に含まれるノードと同一の条件を表すノードを含み、第１の部分木とノード間の位置関係が同一になる部分木である。 The selection unit 704 is a second leaf node from among the leaf nodes included in the second subtree at a position different from the first subtree in the decision tree model based on the positional relationship between the nodes. Select. The first subtree is a subtree containing the first leaf node. The second subtree is located at a different position from the first subtree and is the same subtree as the first subtree. The subtree that is the same as the first subtree includes, for example, a node that represents the same conditions as the node included in the first subtree, and the subtree that has the same positional relationship between the first subtree and the node. Is.

第２の部分木は、第１の部分木と異なる位置にあり、第１の部分木と類似する部分木であってもよい。第１の部分木と類似する部分木は、例えば、第１の部分木の一部に含まれるノードと同一の条件を表すノードを含み、第１の部分木の一部とノード間の位置関係が同一になる部分を含む部分木である。これにより、選択部７０４は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。 The second subtree is located at a different position from the first subtree and may be a subtree similar to the first subtree. A subtree similar to the first subtree includes, for example, a node representing the same conditions as a node included in a part of the first subtree, and a positional relationship between the part of the first subtree and the node. It is a subtree including the part where is the same. This makes it easier for the selection unit 704 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node.

生成部７０５は、選択部７０４が選択した第２の葉ノードの名称に基づいて、第１の葉ノードの名称を生成する。生成部７０５は、さらに、第１の経路上の各ノードが表す条件に関する判定結果と、第２の経路上の各ノードが表す条件に関する判定結果との差異に基づいて、第１の葉ノードの名称を生成してもよい。 The generation unit 705 generates the name of the first leaf node based on the name of the second leaf node selected by the selection unit 704. The generation unit 705 further determines that the condition of each node on the first path is different from the judgment result of the condition of each node on the second path of the first leaf node. You may generate a name.

第１の経路は、例えば、決定木モデルの根ノードから第１の葉ノードまでの経路である。第２の経路は、例えば、決定木モデルの根ノードから第２の葉ノードまでの経路である。差異は、例えば、第１の経路上の各ノードが表す条件に関する判定結果のうち、第２の経路上の各ノードが表す条件に関する判定結果と重複しない判定結果である。 The first path is, for example, the path from the root node of the decision tree model to the first leaf node. The second path is, for example, the path from the root node of the decision tree model to the second leaf node. The difference is, for example, a determination result that does not overlap with the determination result regarding the condition represented by each node on the second route among the determination results regarding the condition represented by each node on the first route.

生成部７０５は、例えば、記憶部７００を参照し、差異になる判定結果を、「以上」、「以下」、「より大きい」、「より小さい」、「未満」、「範囲」などの文言を用いて書き換えた文章を生成する。生成部７０５は、例えば、記憶部７００を参照し、「既知のＡに近い傾向が見られる」の文章の「Ａ」に、選択した第２の葉ノードの名称を挿入した文章を生成する。 For example, the generation unit 705 refers to the storage unit 700, and sets the judgment result that makes a difference as words such as "greater than or equal to", "less than or equal to", "greater than", "less than", "less than", and "range". Generate a rewritten sentence using. For example, the generation unit 705 refers to the storage unit 700 and generates a sentence in which the name of the selected second leaf node is inserted in the "A" of the sentence "a tendency close to the known A is seen".

そして、生成部７０５は、生成した文章を組み合わせて、第１の葉ノードの名称を生成する。生成部７０５は、具体的には、名称「水増し請求に近い傾向が見られる。金額は１００以下であり、利用料は１０以上である。」などを生成する。これにより、生成部７０５は、第１の葉ノードの名称を、第２の葉ノードの名称に基づいて生成することができ、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 Then, the generation unit 705 combines the generated sentences to generate the name of the first leaf node. Specifically, the generation unit 705 generates a name such as "a tendency close to an inflated request is seen. The amount is 100 or less and the usage fee is 10 or more." As a result, the generation unit 705 can generate the name of the first leaf node based on the name of the second leaf node, which makes it easier for the user to generate a name that is intuitively easy to understand. it can.

また、生成部７０５は、差異になる判定結果を書き換えた文章を、利用者が参照可能にすることができる。このため、生成部７０５は、どのような条件の判定結果が異なるかを把握可能にすることができ、対象データが正常または異常に分類された理由を把握しやすい名称を生成することができる。 In addition, the generation unit 705 can make the sentence in which the determination result that is different is rewritten can be referred to by the user. Therefore, the generation unit 705 can make it possible to grasp what kind of condition the determination result is different, and can generate a name that makes it easy to grasp the reason why the target data is classified as normal or abnormal.

生成部７０５は、第１の葉ノードに、生成した第１の葉ノードの名称を設定する。これにより、生成部７０５は、次回、対象データが第１の葉ノードに分類された場合に、名称を生成し直さなくてもよくすることができ、処理量の低減化を図ることができる。 The generation unit 705 sets the name of the generated first leaf node in the first leaf node. As a result, the generation unit 705 does not have to regenerate the name when the target data is classified into the first leaf node next time, and the processing amount can be reduced.

出力部７０６は、対象データに、生成した第１の葉ノードの名称を対応付けて出力する。出力形式は、例えば、ディスプレイへの表示、プリンタへの印刷出力、ネットワークＩ／Ｆ３０３による外部装置への送信、または、メモリ３０２や記録媒体３０５などの記憶領域への記憶である。これにより、出力部７０６は、利用者が、対象データが分類された結果の細かい種別を把握可能にし、対象データが分類された理由を把握可能にすることができる。 The output unit 706 outputs the target data in association with the name of the generated first leaf node. The output format is, for example, display on a display, print output to a printer, transmission to an external device by the network I / F 303, or storage in a storage area such as a memory 302 or a recording medium 305. As a result, the output unit 706 can enable the user to grasp the detailed type of the result of classifying the target data, and can grasp the reason why the target data is classified.

出力部７０６は、各機能部の処理結果を出力してもよい。これにより、出力部７０６は、各機能部の処理結果を利用者に通知可能にし、分類装置１００の管理や運用、例えば、分類装置１００の設定値の更新などを支援することができ、分類装置１００の利便性の向上を図ることができる。 The output unit 706 may output the processing result of each functional unit. As a result, the output unit 706 can notify the user of the processing result of each functional unit, and can support the management and operation of the classification device 100, for example, the update of the set value of the classification device 100, and the classification device. It is possible to improve the convenience of 100.

（分類装置１００を利用する状況の具体例）
次に、図８を用いて、分類装置１００を利用する状況の具体例について説明する。 (Specific example of the situation where the classification device 100 is used)
Next, a specific example of the situation in which the classification device 100 is used will be described with reference to FIG.

図８は、分類装置１００を利用する状況の具体例を示す説明図である。図８に示すように、分類装置１００は、例えば、省庁や自治体などの職員が、介護施設や養護施設などの事業者から社会保障給付費の請求データ８０６を収集し、事業者の指導監査業務を行うような状況において利用される。収集する請求データ８０６の数は、例えば、百万単位である。以下の説明では、省庁や自治体などの職員を「利用者」と表記する場合がある。 FIG. 8 is an explanatory diagram showing a specific example of a situation in which the classification device 100 is used. As shown in FIG. 8, in the classification device 100, for example, an employee of a ministry or a local government collects social security benefit billing data 806 from a business operator such as a long-term care facility or a nursing home, and provides guidance and audit work for the business operator. It is used in situations where The number of billing data 806 collected is, for example, one million units. In the following explanation, employees of ministries and local governments may be referred to as "users".

ここで、利用者が、限られた時間で、効率よく事業者の指導監査業務を行うことを可能にすることが望まれる。例えば、利用者が、収集した請求データ８０６から、不正請求の請求データ８０６を効率よく発見可能にすることが望まれる。 Here, it is desired that the user can efficiently perform the guidance and audit work of the business operator in a limited time. For example, it is desired that the user can efficiently find the billing data 806 for fraudulent billing from the collected billing data 806.

また、例えば、利用者が、いずれの事業者を優先して指導および監査することが好ましいかを判断可能にすることが望まれる。このため、利用者が、事業者ごとに不正請求の請求データ８０６の数を把握可能にすることが望まれる。また、利用者が、事業者ごと、かつ、不正請求の種別ごとに、不正請求の請求データ８０６の数を把握可能にすることが望まれる。 Further, for example, it is desired that the user can determine which business operator is preferable for guidance and auditing. Therefore, it is desired that the user can grasp the number of billing data 806 for fraudulent billing for each business operator. Further, it is desired that the user can grasp the number of fraudulent billing billing data 806 for each business operator and for each type of fraudulent billing.

また、例えば、利用者が、いずれかの事業者を指導および監査するために、いずれかの事業者の所在地を訪れる前に、請求データ８０６に関する資料を作成することが望まれる。このため、利用者が、事業者の請求データ８０６の不正請求の種別を、請求データ８０６に関する資料を作成するために利用可能にすることが望まれる。 Also, for example, it is desirable that a user prepares materials related to billing data 806 before visiting the location of any business in order to guide and audit any business. Therefore, it is desired that the user can use the type of fraudulent billing of the billing data 806 of the business operator to create the material related to the billing data 806.

また、例えば、利用者が、いずれかの事業者を指導および監査する際、どのような種別の不正請求を優先して考慮することが好ましいかを判断可能にすることが望まれる。このため、利用者が、事業者の請求データ８０６に、どのような種別の不正請求の請求データ８０６があるかを把握可能にし、不正請求の種別ごとに、不正請求の請求データ８０６の数を把握可能にすることが望まれる。これに対し、分類装置１００は、下記に示すように動作する。 Further, for example, it is desired that the user can determine what kind of fraudulent claim should be prioritized and considered when instructing and auditing any business operator. Therefore, the user can grasp what type of fraudulent billing billing data 806 is in the business operator's billing data 806, and the number of fraudulent billing billing data 806 is calculated for each fraudulent billing type. It is desirable to be able to grasp. On the other hand, the classification device 100 operates as shown below.

（８−１）分類装置１００は、省庁や自治体などの障害システムと連携し、過去の請求データ８０１を収集する。また、分類装置１００は、省庁や自治体などの職員から、過去の不正請求の事例データ８０２を収集する。次に、分類装置１００は、過去の請求データ８０１と、過去の不正請求の事例データ８０２とを加工して結合することにより、学習データ集合８０３を生成して記憶する。そして、分類装置１００は、機械学習ＦＷ８０４を用いて、学習データ集合８０３に基づいて、決定木モデル８０５を生成する。決定木モデル８０５を生成する流れについては、例えば、図９を用いて後述する。 (8-1) The classification device 100 collects past billing data 801 in cooperation with a failure system such as a ministry or a local government. In addition, the classification device 100 collects past fraudulent billing case data 802 from employees of ministries and local governments. Next, the classification device 100 generates and stores the learning data set 803 by processing and combining the past billing data 801 and the past fraudulent billing case data 802. Then, the classification device 100 uses the machine learning FW804 to generate a decision tree model 805 based on the learning data set 803. The flow of generating the decision tree model 805 will be described later with reference to, for example, FIG.

（８−２）分類装置１００は、決定木モデル８０５により、新たな請求データ８０６を葉ノードに分類し、正常または不正請求に分類する。ここで、分類装置１００は、新たな請求データ８０６を分類した葉ノードの名称が未設定であれば、新たな請求データ８０６を分類した葉ノードの名称を生成し、葉ノードの名称を設定する。葉ノードの名称を設定する流れについては、例えば、図１０および図１１を用いて後述する。そして、分類装置１００は、新たな請求データ８０６と、新たな請求データ８０６を分類した葉ノードの名称とを対応付けて記憶する。 (8-2) The classification device 100 classifies the new billing data 806 into leaf nodes according to the decision tree model 805, and classifies the new billing data 806 into normal or fraudulent billing. Here, if the name of the leaf node that classifies the new billing data 806 is not set, the classification device 100 generates the name of the leaf node that classifies the new billing data 806 and sets the name of the leaf node. .. The flow of setting the name of the leaf node will be described later with reference to, for example, FIGS. 10 and 11. Then, the classification device 100 stores the new billing data 806 and the name of the leaf node that classifies the new billing data 806 in association with each other.

（８−３）分類装置１００は、新たな請求データ８０６と、新たな請求データ８０６を分類した葉ノードの名称とを対応付けた対応データ８０７を、利用者に提示する。また、分類装置１００は、事業者ごとに不正請求の請求データ８０６の数を算出し、利用者に提示してもよい。また、分類装置１００は、事業者ごと、かつ、不正請求の種別ごとに、不正請求の請求データ８０６の数を算出し、利用者に提示してもよい。 (8-3) The classification device 100 presents to the user the corresponding data 807 in which the new billing data 806 and the name of the leaf node that classifies the new billing data 806 are associated with each other. Further, the classification device 100 may calculate the number of billing data 806 for fraudulent billing for each business operator and present it to the user. Further, the classification device 100 may calculate the number of fraudulent billing billing data 806 for each business operator and each type of fraudulent billing and present it to the user.

これにより、分類装置１００は、利用者が、限られた時間で、効率よく事業者の指導監査業務を行うことを支援することができる。分類装置１００は、例えば、決定木モデル８０５により、新たな請求データ８０６を分類するため、利用者が、不正請求の請求データ８０６を効率よく発見可能にすることができ、利用者にかかる負担の低減化を図ることができる。 As a result, the classification device 100 can support the user to efficiently perform the guidance and audit work of the business operator in a limited time. Since the classification device 100 classifies new billing data 806 by, for example, the decision tree model 805, the user can efficiently find the billing data 806 for fraudulent billing, which imposes a burden on the user. It can be reduced.

また、分類装置１００は、例えば、事業者ごとに不正請求の請求データ８０６の数を、利用者に提示するため、利用者が、いずれの事業者を優先して指導および監査することが好ましいかを判断可能にすることができる。 Further, in order to present to the user the number of billing data 806 for fraudulent billing for each business operator, for example, it is preferable for the user to give priority to which business operator to give guidance and audit. Can be judged.

また、分類装置１００は、例えば、事業者ごと、かつ、不正請求の種別ごとに、不正請求の請求データ８０６の数を、利用者に提示するため、利用者が、いずれの事業者を優先して指導および監査することが好ましいかを判断可能にすることができる。利用者は、例えば、日付ミスの不正請求の請求データ８０６が１０個の事業者と、水増し請求の不正請求の請求データ８０６が１０個の事業者とを区別可能になり、いずれの事業者を優先して指導および監査することが好ましいかを判断しやすくすることができる。 Further, in the classification device 100, for example, in order to present the number of fraudulent billing billing data 806 to the user for each business operator and for each type of fraudulent billing, the user gives priority to any business operator. It can be made possible to judge whether it is preferable to provide guidance and audit. For example, the user can distinguish between 10 businesses with fraudulent billing data 806 for date errors and 10 businesses with fraudulent billing data 806 for inflated bills. It can make it easier to judge whether it is preferable to give priority to guidance and auditing.

また、分類装置１００は、例えば、新たな請求データ８０６を分類した葉ノードの名称を、利用者に提示するため、利用者が、新たな請求データ８０６が分類された理由を把握しやすくすることができる。そして、利用者は、事業者を指導および監査するための、請求データ８０６に関する資料を作成しやすくすることができる。 Further, since the classification device 100 presents to the user, for example, the name of the leaf node that classifies the new billing data 806, the user can easily understand the reason why the new billing data 806 is classified. Can be done. Then, the user can easily create a document regarding the billing data 806 for instructing and auditing the business operator.

また、分類装置１００は、例えば、新たな請求データ８０６を分類した葉ノードの名称を、利用者に提示するため、利用者が、事業者の請求データ８０６に、どのような種別の不正請求の請求データ８０６があるかを把握可能にすることができる。次に、図９の説明に移行し、決定木モデル８０５を生成する流れについて説明する。 Further, the classification device 100 presents to the user, for example, the name of the leaf node that classifies the new billing data 806, so that the user can use the billing data 806 of the business operator for any type of fraudulent billing. It is possible to grasp whether or not there is billing data 806. Next, the explanation shifts to FIG. 9, and the flow of generating the decision tree model 805 will be described.

（決定木モデル８０５を生成する流れ）
図９は、決定木モデル８０５を生成する流れを示す説明図である。ここで、決定木モデル８０５は、規模が比較的小さいことが好ましい。決定木モデル８０５の規模が比較的小さければ、対象データを分類する際に判定される条件の数が少なくなる傾向があり、処理量の低減化と、所要時間の低減化を図りやすい傾向があるため好ましい。 (Flow to generate decision tree model 805)
FIG. 9 is an explanatory diagram showing a flow for generating the decision tree model 805. Here, the decision tree model 805 is preferably relatively small in scale. If the scale of the decision tree model 805 is relatively small, the number of conditions to be determined when classifying the target data tends to be small, and it tends to be easy to reduce the processing amount and the required time. Therefore, it is preferable.

また、決定木モデル８０５は、いずれかのノードが表す条件で学習データ集合８０３を２分割した場合、それぞれ分割された学習データ部分集合に含まれる、正常な学習データの数と、異常な学習データの数とに偏りがあることが好ましい。偏りがあれば、正常な学習データと、異常な学習データとを分類するために用いられる条件の数が少なくなる傾向があるため好ましい。 Further, in the decision tree model 805, when the training data set 803 is divided into two under the conditions represented by any of the nodes, the number of normal training data and the abnormal training data included in the divided learning data subsets are obtained. It is preferable that there is a bias in the number of. Bias is preferable because the number of conditions used to classify normal training data and abnormal training data tends to be small.

また、１つのノードが１つの項目の値を用いる条件を表す場合、複数の項目の値の組み合わせを用いる条件は、複数のノードの組み合わせにより表されることになる。このため、決定木モデル８０５の上位の方では、経由するノードの数が比較的少ないため、比較的少ない項目の値に基づいて、複数の学習データが２分割されることになる。一方で、決定木モデル８０５の下位の方では、経由するノードの数が比較的多いため、比較的多くの項目の値に基づいて、複数の学習データが２分割されることになる。 Further, when one node represents a condition in which the value of one item is used, the condition in which the combination of the values of a plurality of items is used is represented by the combination of a plurality of nodes. Therefore, in the upper part of the decision tree model 805, since the number of nodes passing through is relatively small, a plurality of training data are divided into two based on the values of relatively few items. On the other hand, in the lower part of the decision tree model 805, since the number of nodes passing through is relatively large, a plurality of training data are divided into two based on the values of a relatively large number of items.

このため、まず、分類装置１００は、決定木モデル８０５の上位の方に、学習データの１つの項目の値に基づいて、複数の学習データを比較的好ましく２分割することができる条件を表すノードを生成することが好ましい。そして、分類装置１００は、決定木モデル８０５の下位の方に、学習データの複数の項目の値に基づいて、複数の学習データを比較的好ましく２分割することができる複数の条件のそれぞれの条件を表すノードを生成することが好ましい。 Therefore, first, the classification device 100 is a node representing a condition in which a plurality of training data can be relatively preferably divided into two based on the value of one item of the training data in the upper part of the decision tree model 805. It is preferable to generate. Then, the classification device 100 lowers the decision tree model 805, and each condition of the plurality of conditions capable of relatively preferably dividing the plurality of training data into two based on the values of the plurality of items of the training data. It is preferable to generate a node representing.

図９に示すように、まず、分類装置１００は、学習データ集合８０３を、正常と異常が混じった学習データ部分集合９２１と、異常１の学習データ部分集合とに分類する条件を表すノード９０１を生成する。また、分類装置１００は、正常と異常が混じらずに異常１の学習データ部分集合を分類可能であるため、異常１を表す葉ノード９１１を生成する。 As shown in FIG. 9, first, the classification device 100 sets a node 901 representing a condition for classifying the learning data set 803 into a learning data subset 921 in which normal and abnormal are mixed and a learning data subset of abnormality 1. Generate. Further, since the classification device 100 can classify the learning data subset of the abnormality 1 without mixing the normal and the abnormality, the leaf node 911 representing the abnormality 1 is generated.

同様に、分類装置１００は、学習データ部分集合９２１を、正常と異常が混じった学習データ部分集合９２２と、正常の学習データ部分集合とに分類する条件を表すノード９０２を生成する。また、分類装置１００は、正常と異常が混じらずに正常の学習データ部分集合を分類可能であるため、正常を表す葉ノード９１２を生成する。 Similarly, the classification device 100 generates a node 902 representing a condition for classifying the learning data subset 921 into a learning data subset 922, which is a mixture of normal and abnormal, and a normal learning data subset 922. Further, since the classification device 100 can classify the normal learning data subset without mixing the normal and the abnormal, the leaf node 912 representing the normal is generated.

これにより、分類装置１００は、ノード９０１〜９０４と、葉ノード９１１〜９１５とを含む決定木モデル８０５を生成することができる。次に、図１０および図１１の説明に移行し、いずれかの請求データ８０６が分類された、名称が未設定である葉ノードに名称を設定する流れについて説明する。 As a result, the classification device 100 can generate a decision tree model 805 including nodes 901 to 904 and leaf nodes 911 to 915. Next, the description shifts to the description of FIGS. 10 and 11, and the flow of setting the name to the leaf node to which any of the billing data 806 is classified and whose name has not been set will be described.

（葉ノードに名称を設定する流れ）
図１０および図１１は、葉ノードに名称を設定する流れを示す説明図である。図１０において、分類装置１００は、決定木モデル１０００を記憶し、Ｎｏ１の請求データ８０６と、Ｎｏ２の請求データ８０６とを分類する際に、葉ノードに名称を設定する。決定木モデル１０００は、ノード１００１〜１００６と、葉ノード１０１１〜１０１７とを含む。 (Flow of setting the name of the leaf node)
10 and 11 are explanatory views showing a flow of setting a name for a leaf node. In FIG. 10, the classification device 100 stores the decision tree model 1000 and sets a name for the leaf node when classifying the billing data 806 of No. 1 and the billing data 806 of No2. The decision tree model 1000 includes nodes 101 to 1006 and leaf nodes 101 to 1017.

図１０に示すように、分類装置１００は、Ｎｏ１の請求データ８０６が、根ノード１００１が表す条件を満たすか否かを判定する。分類装置１００は、根ノード１００１が表す条件を満たすため、Ｎｏ１の請求データ８０６が、根ノード１００１のＴｒｕｅ側の子ノードであるノード１００２が表す条件を満たすか否かを判定する。 As shown in FIG. 10, the classification device 100 determines whether or not the billing data 806 of No. 1 satisfies the condition represented by the root node 1001. Since the classification device 100 satisfies the condition represented by the root node 1001, it is determined whether or not the billing data 806 of No. 1 satisfies the condition represented by the node 1002 which is a child node on the True side of the root node 1001.

分類装置１００は、ノード１００２が表す条件を満たすため、Ｎｏ１の請求データ８０６が、ノード１００２のＴｒｕｅ側の子ノードであるノード１００４が表す条件を満たすか否かを判定する。分類装置１００は、ノード１００４が表す条件を満たすため、Ｎｏ１の請求データ８０６が、ノード１００４のＴｒｕｅ側の子ノードであるノード１００６が表す条件を満たすか否かを判定する。 Since the classification device 100 satisfies the condition represented by the node 1002, it is determined whether or not the billing data 806 of No. 1 satisfies the condition represented by the node 1004 which is a child node on the True side of the node 1002. Since the classification device 100 satisfies the condition represented by the node 1004, it is determined whether or not the billing data 806 of No. 1 satisfies the condition represented by the node 1006 which is a child node on the True side of the node 1004.

分類装置１００は、ノード１００６が表す条件を満たすため、Ｎｏ１の請求データ８０６を、ノード１００６のＴｒｕｅ側の葉ノード１０１６に分類する。ここで、分類装置１００は、葉ノード１０１６が、名称が設定済みであり正常を表す命名済み正常であるため、名称を生成しなくてもよい。 Since the classification device 100 satisfies the condition represented by the node 1006, the billing data 806 of No. 1 is classified into the leaf node 1016 on the True side of the node 1006. Here, the classification device 100 does not have to generate a name because the leaf node 1016 has a name set and is a named normal indicating normality.

分類装置１００は、Ｎｏ２の請求データ８０６が、根ノード１００１が表す条件を満たすか否かを判定する。分類装置１００は、根ノード１００１が表す条件を満たすため、Ｎｏ２の請求データ８０６が、根ノード１００１のＴｒｕｅ側の子ノードであるノード１００２が表す条件を満たすか否かを判定する。 The classification device 100 determines whether or not the billing data 806 of No. 2 satisfies the condition represented by the root node 1001. Since the classification device 100 satisfies the condition represented by the root node 1001, it is determined whether or not the billing data 806 of No. 2 satisfies the condition represented by the node 1002 which is a child node on the True side of the root node 1001.

分類装置１００は、ノード１００２が表す条件を満たすため、Ｎｏ２の請求データ８０６が、ノード１００２のＴｒｕｅ側の子ノードであるノード１００４が表す条件を満たすか否かを判定する。分類装置１００は、ノード１００４が表す条件を満たすため、Ｎｏ２の請求データ８０６が、ノード１００４のＴｒｕｅ側の子ノードであるノード１００６が表す条件を満たすか否かを判定する。 Since the classification device 100 satisfies the condition represented by the node 1002, it is determined whether or not the billing data 806 of No. 2 satisfies the condition represented by the node 1004 which is a child node on the True side of the node 1002. Since the classification device 100 satisfies the condition represented by the node 1004, it is determined whether or not the billing data 806 of No. 2 satisfies the condition represented by the node 1006 which is a child node on the True side of the node 1004.

分類装置１００は、ノード１００６が表す条件を満たさないため、Ｎｏ２の請求データ８０６を、ノード１００６のＦａｌｓｅ側の葉ノード１０１７に分類する。ここで、分類装置１００は、葉ノード１０１７が、名称が未設定であり未知の異常を表すため、名称を生成する。分類装置１００は、例えば、葉ノード１０１７の近傍にある葉ノードを選択する。ここで、図１１の説明に移行し、葉ノードを選択する流れについて説明する。 Since the classification device 100 does not satisfy the condition represented by the node 1006, the billing data 806 of No. 2 is classified into the leaf node 1017 on the False side of the node 1006. Here, the classification device 100 generates a name because the leaf node 1017 represents an unknown abnormality because the name has not been set. The classification device 100 selects, for example, a leaf node in the vicinity of the leaf node 1017. Here, the explanation shifts to FIG. 11, and the flow of selecting the leaf node will be described.

図１１に示すように、分類装置１００は、下位にある葉ノードから順に、名称が設定済みである葉ノードを検索し、名称が設定済みである葉ノードを選択する。分類装置１００は、例えば、葉ノード１０１７の近傍にあり、名称が設定済みである葉ノード１０１３または葉ノード１０１６のうち、葉ノード１０１６を選択する。分類装置１００は、選択した葉ノード１０１６の名称に基づいて、葉ノード１０１７の名称を生成する。 As shown in FIG. 11, the classification device 100 searches for leaf nodes whose names have been set in order from the lower leaf nodes, and selects the leaf nodes whose names have been set. The classification device 100 selects the leaf node 1016 from, for example, the leaf node 1013 or the leaf node 1016 which is in the vicinity of the leaf node 1017 and whose name has been set. The classification device 100 generates the name of the leaf node 1017 based on the name of the selected leaf node 1016.

ここで、例えば、決定木モデルにおいて、名称が未設定である葉ノードと、名称が未設定である葉ノードの近傍にある葉ノードとは、根ノードからの経路に重複する部分が含まれる傾向がある。換言すれば、名称が未設定である葉ノードに対象データが分類される場合と、名称が未設定である葉ノードの近傍にある葉ノードに対象データが分類される場合とで、重複する条件が用いられる傾向がある。このため、名称が未設定である葉ノードと、名称が未設定である葉ノードの近傍にある葉ノードとは、類似する内容を表す可能性が比較的大きい。 Here, for example, in the decision tree model, the leaf node whose name has not been set and the leaf node in the vicinity of the leaf node whose name has not been set tend to include overlapping portions in the route from the root node. There is. In other words, the conditions that overlap between the case where the target data is classified into the leaf node whose name has not been set and the case where the target data is classified into the leaf node near the leaf node whose name has not been set. Tends to be used. Therefore, there is a relatively high possibility that the leaf node whose name has not been set and the leaf node in the vicinity of the leaf node whose name has not been set represent similar contents.

また、例えば、決定木モデルにおいて、相対的に下位にある葉ノードは、比較的多くの条件を用いて対象データが分類される傾向がある葉ノードである。このため、相対的に下位にある葉ノードは、対象データに含まれる複数の項目の値を参照しなければ発見困難である、利用者にとって比較的重要な異常を表す傾向がある。 Further, for example, in the decision tree model, the leaf node at a relatively lower level is a leaf node in which the target data tends to be classified using a relatively large number of conditions. For this reason, the leaf nodes that are relatively inferior tend to represent relatively important anomalies for the user, which are difficult to find without referring to the values of a plurality of items included in the target data.

これに対し、分類装置１００は、図１１に示したように、名称が未設定である葉ノードの近傍にあり、かつ、相対的に下位にある葉ノードから順に、名称が設定済みである葉ノードを検索する。 On the other hand, as shown in FIG. 11, the classification device 100 is a leaf whose name has been set in order from the leaf node which is in the vicinity of the leaf node whose name has not been set and which is relatively lower. Search for a node.

このため、分類装置１００は、名称が未設定である葉ノードの名称の生成に、名称が未設定である葉ノードと類似する内容を表す可能性が比較的大きい葉ノードの名称を用いることができる。結果として、分類装置１００は、名称が未設定である葉ノードが表す内容を正しく表している可能性が比較的大きい名称を生成しやすくすることができ、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 Therefore, the classification device 100 may use the name of the leaf node, which is relatively likely to represent a content similar to that of the leaf node whose name has not been set, for the generation of the name of the leaf node whose name has not been set. it can. As a result, the classification device 100 can easily generate a name that is relatively likely to correctly represent the content represented by the leaf node whose name has not been set, and the name is intuitively easy for the user to understand. Can be easily generated.

また、分類装置１００は、名称が未設定である葉ノードの名称の生成に、利用者にとって比較的重要な異常を表す傾向がある葉ノードの名称を用いることができる。結果として、分類装置１００は、利用者にとって比較的重要な異常との、名称が未設定である葉ノードが表す内容との関連性を、利用者が把握しやすくすることができる。 Further, the classification device 100 can use the name of the leaf node that tends to represent an abnormality that is relatively important to the user in generating the name of the leaf node for which the name has not been set. As a result, the classification device 100 can make it easier for the user to grasp the relationship between the abnormality that is relatively important to the user and the content represented by the leaf node whose name has not been set.

（分類装置１００が葉ノードに名称を設定する動作例１）
次に、図１２〜図１９を用いて、分類装置１００が葉ノードに名称を設定する動作例１について説明する。動作例１は、名称が未設定の葉ノードの近傍にある、名称が設定済みの葉ノードを選択して、名称が未設定の葉ノードの名称を生成する一例である。 (Operation example 1 in which the classification device 100 sets a name for a leaf node)
Next, an operation example 1 in which the classification device 100 sets a name for the leaf node will be described with reference to FIGS. 12 to 19. The operation example 1 is an example in which the leaf node whose name has not been set and which is in the vicinity of the leaf node whose name has not been set is selected and the name of the leaf node whose name has not been set is generated.

図１２〜図１９は、分類装置１００が葉ノードに名称を設定する動作例１を示す説明図である。図１２において、分類装置１００は、端末装置２０１から学習データ集合を受信し、学習テーブル１２００を用いて記憶する。学習テーブル１２００のデータ構造は、図５に示した請求テーブル５００のデータ構造と同様であるため、説明を省略する。次に、図１３の説明に移行する。 12 to 19 are explanatory views showing an operation example 1 in which the classification device 100 sets a name for the leaf node. In FIG. 12, the classification device 100 receives the learning data set from the terminal device 201 and stores it using the learning table 1200. Since the data structure of the learning table 1200 is the same as the data structure of the billing table 500 shown in FIG. 5, the description thereof will be omitted. Next, the description proceeds to FIG.

図１３において、分類装置１００は、機械学習ＦＷ８０４を用いて、学習テーブルに基づいて、決定木モデル１３００を生成する。決定木モデル１３００は、例えば、ノード１３０１〜１３０３と、葉ノード１３１１〜１３１３とを含む。次に、図１４の説明に移行する。 In FIG. 13, the classification device 100 uses machine learning FW804 to generate a decision tree model 1300 based on a learning table. The decision tree model 1300 includes, for example, nodes 1301-1303 and leaf nodes 1311-1313. Next, the description shifts to FIG.

図１４において、分類装置１００は、端末装置２０１から請求データを受信し、請求テーブル１４００を用いて記憶する。ここでは、請求テーブル１４００の結果、不正理由、および、正常理由のフィールドは、空白である。次に、図１５の説明に移行する。 In FIG. 14, the classification device 100 receives billing data from the terminal device 201 and stores it using the billing table 1400. Here, the result of billing table 1400, fraudulent reason, and normal reason fields are blank. Next, the description shifts to FIG.

図１５において、分類装置１００は、決定木モデル１３００により、対象データを正常または異常に分類する。分類装置１００は、対象データを正常または異常に分類した結果を、請求テーブル１４００の結果のフィールドに設定する。次に、図１６の説明に移行する。 In FIG. 15, the classification device 100 classifies the target data as normal or abnormal according to the decision tree model 1300. The classification device 100 sets the result of classifying the target data as normal or abnormal in the result field of the billing table 1400. Next, the description shifts to FIG.

図１６において、分類装置１００は、端末装置２０１のディスプレイ６０６に、不正理由および正常理由の入力を受け付ける受付画面１６００を表示させる。端末装置２０１は、入力装置６０７による利用者の操作入力に基づいて、不正理由および正常理由の入力を受け付け、保存ボタンがクリックされると、分類装置１００に送信する。分類装置１００は、不正理由および正常理由を、端末装置２０１から受信する。 In FIG. 16, the classification device 100 causes the display 606 of the terminal device 201 to display a reception screen 1600 that accepts inputs of an illegal reason and a normal reason. The terminal device 201 receives the input of the reason for fraud and the reason for normal based on the user's operation input by the input device 607, and when the save button is clicked, transmits the input to the classification device 100. The classification device 100 receives the reason for fraud and the reason for normal from the terminal device 201.

端末装置２０１は、具体的には、受付画面１６００の検索条件欄に入力された検索条件を、分類装置１００に送信し、検索条件に基づく請求テーブル１４００のレコードを、分類装置１００の制御に従って明細情報欄に表示する。端末装置２０１は、更新内容欄のリストから選択された不正理由や正常理由を、明細情報欄に表示されて選択ボックスにチェックされたレコードに反映し、保存ボタンのクリックに応じて、反映済みのレコードを分類装置１００に送信する。次に、図１７の説明に移行する。 Specifically, the terminal device 201 transmits the search conditions entered in the search condition field of the reception screen 1600 to the classification device 100, and details the records of the billing table 1400 based on the search conditions according to the control of the classification device 100. Display in the information column. The terminal device 201 reflects the fraudulent reason or normal reason selected from the list in the update content column in the record displayed in the detailed information column and checked in the selection box, and has been reflected in response to the click of the save button. The record is transmitted to the sorting device 100. Next, the description shifts to FIG.

図１７において、分類装置１００は、受信した不正理由および正常理由を、請求テーブル１４００を用いて記憶する。ここで、キー１に対応付けられた正常理由は設定される。また、キー２に対応付けられた不正理由は設定される。一方で、キー３に対応付けられた不正理由は、利用者に入力されず、不明である。次に、図１８の説明に移行する。 In FIG. 17, the classification device 100 stores the received fraudulent reason and normal reason using the billing table 1400. Here, the normal reason associated with the key 1 is set. Further, the reason for fraud associated with the key 2 is set. On the other hand, the reason for fraud associated with the key 3 is not input to the user and is unknown. Next, the description shifts to FIG.

図１８において、分類装置１００は、葉ノード１３１１〜１３１３に名称を設定する。分類装置１００は、葉ノード１３１１〜１３１３に、キーを対応付けて、名称を設定する。分類装置１００は、例えば、葉ノード１３１１に、キー１を対応付けて、キー１に対応する正常理由を名称として設定する。 In FIG. 18, the classification device 100 sets a name for the leaf nodes 1311-1313. The classification device 100 associates a key with the leaf nodes 131 to 1313 and sets a name. For example, the classification device 100 associates the key 1 with the leaf node 1311 and sets the normal reason corresponding to the key 1 as a name.

分類装置１００は、例えば、葉ノード１３１２に、キー２を対応付けて、キー２に対応する不正理由を名称として設定する。分類装置１００は、例えば、葉ノード１３１３に、キー３を対応付けるが、キー３に対応する不正理由は不明であり、名称が設定されない。次に、図１９の説明に移行する。 For example, the classification device 100 associates the key 2 with the leaf node 1312 and sets the reason for fraud corresponding to the key 2 as a name. The classification device 100 associates the key 3 with the leaf node 1313, for example, but the reason for the fraud corresponding to the key 3 is unknown and the name is not set. Next, the description shifts to FIG.

図１９において、分類装置１００は、名称が未設定である葉ノード１３１３の名称を生成する。分類装置１００は、例えば、葉ノード１３１３の近傍にある葉ノードを選択する。そして、分類装置１００は、選択した葉ノードの名称に基づいて、葉ノード１３１３の名称を生成する。 In FIG. 19, the classification device 100 generates the name of the leaf node 1313 whose name has not been set. The classification device 100 selects, for example, a leaf node in the vicinity of the leaf node 1313. Then, the classification device 100 generates the name of the leaf node 1313 based on the name of the selected leaf node.

分類装置１００は、例えば、葉ノード１３１３の階層から、葉ノード１３１３の階層の所定数上位の階層までの範囲にある葉ノードの中から、いずれかの葉ノードを選択する。所定数は、例えば、決定木モデル１３００の深さ＊２０％である。 The classification device 100 selects one of the leaf nodes from, for example, the leaf nodes in the range from the hierarchy of the leaf node 1313 to the hierarchy higher than a predetermined number of the hierarchy of the leaf node 1313. The predetermined number is, for example, the depth * 20% of the decision tree model 1300.

分類装置１００は、具体的には、葉ノード１３１３の階層から、葉ノード１３１３の階層の所定数上位の階層までの範囲にある葉ノードの中で、葉ノード１３１３から最も少ない数のエッジにより到達する葉ノードを選択する。 Specifically, the classification device 100 reaches from the leaf node 1313 by the smallest number of edges among the leaf nodes in the range from the hierarchy of the leaf node 1313 to the hierarchy higher than a predetermined number of the hierarchy of the leaf node 1313. Select the leaf node to do.

図１９の例では、分類装置１００は、葉ノード１３１３の階層から３段分上位の階層までの範囲にある葉ノード１３１１，１３１２を特定する。次に、分類装置１００は、葉ノード１３１１，１３１２のうち、葉ノード１３１３に最も近い葉ノード１３１２を選択する。そして、分類装置１００は、選択した葉ノード１３１２の名称に基づいて、葉ノード１３１３の名称を生成する。 In the example of FIG. 19, the classification device 100 identifies the leaf nodes 1311, 1312 in the range from the layer of the leaf node 1313 to the layer three steps higher. Next, the classification device 100 selects the leaf node 1312 closest to the leaf node 1313 among the leaf nodes 1311, 1312. Then, the classification device 100 generates the name of the leaf node 1313 based on the name of the selected leaf node 1312.

ここでは、分類装置１００が、学習データとは異なる請求データを分類する際に、葉ノードの名称を生成する場合について説明したが、これに限らない。例えば、分類装置１００が、学習データに基づいて決定木モデルを生成した後、決定木モデルにより学習データの分類を試行しつつ、葉ノードの名称を生成しておく場合があってもよい。 Here, the case where the classification device 100 generates the name of the leaf node when classifying the billing data different from the learning data has been described, but the present invention is not limited to this. For example, after the classification device 100 generates a decision tree model based on the training data, the name of the leaf node may be generated while trying to classify the training data by the decision tree model.

（分類装置１００が葉ノードに名称を設定する動作例２）
次に、図２０〜図２２を用いて、分類装置１００が葉ノードに名称を設定する動作例２について説明する。動作例１は、名称が未設定の葉ノードの近傍にある、名称が設定済みの葉ノードを選択して、名称が未設定の葉ノードの名称を生成する一例である。これに対し、動作例２は、名称が未設定の葉ノードの近傍にない、名称が設定済みの葉ノードを選択して、名称が未設定の葉ノードの名称を生成する一例である。 (Operation example 2 in which the classification device 100 sets a name for the leaf node)
Next, operation example 2 in which the classification device 100 sets a name for the leaf node will be described with reference to FIGS. 20 to 22. The operation example 1 is an example in which the leaf node whose name has not been set and which is in the vicinity of the leaf node whose name has not been set is selected and the name of the leaf node whose name has not been set is generated. On the other hand, the operation example 2 is an example in which a leaf node whose name has not been set and which is not in the vicinity of the leaf node whose name has not been set is selected to generate the name of the leaf node whose name has not been set.

図２０〜図２２は、分類装置１００が葉ノードに名称を設定する動作例２を示す説明図である。図２０において、分類装置１００は、決定木モデル２０００により、対象データを、葉ノード２００１に分類したとする。葉ノード２００１は、名称が未設定であったとする。 20 to 22 are explanatory views showing an operation example 2 in which the classification device 100 sets a name for the leaf node. In FIG. 20, it is assumed that the classification device 100 classifies the target data into the leaf nodes 2001 by the decision tree model 2000. It is assumed that the leaf node 2001 has not been named.

ここで、分類装置１００は、葉ノード２００１を含む部分木２０１０を特定する。分類装置１００は、例えば、葉ノード２００１の階層から、葉ノード２００１の階層の所定数上位の階層までの範囲にある部分木２０１０を特定する。所定数は、例えば、決定木モデルの深さ＊２０％である。次に、図２１の説明に移行し、部分木２０１０の一例について説明する。 Here, the classification device 100 identifies the subtree 2010 containing the leaf node 2001. The classification device 100 identifies, for example, the subtree 2010 in the range from the hierarchy of the leaf node 2001 to the hierarchy higher than a predetermined number of the hierarchy of the leaf node 2001. The predetermined number is, for example, the depth * 20% of the decision tree model. Next, the description shifts to FIG. 21, and an example of the subtree 2010 will be described.

図２１に示すように、部分木２０１０は、条件を表すノード２１０１〜２１０３と、葉ノード２１１１〜２１１４とを含む。また、部分木２０１０のノード間の親子関係は、例えば、ノード２１０１の子ノードに葉ノード２１１１とノード２１０２とがあるという関係などである。次に、図２２の説明に移行する。 As shown in FIG. 21, the subtree 2010 includes nodes 2101 to 2103 representing conditions and leaf nodes 211 to 2114. Further, the parent-child relationship between the nodes of the subtree 2010 is, for example, the relationship that the child nodes of the node 2101 have the leaf nodes 2111 and the nodes 2102. Next, the description shifts to FIG. 22.

図２２において、分類装置１００は、決定木モデルのうち、部分木２０１０と同一の部分木を検索する。部分木２０１０と同一の部分木は、例えば、ノード２１０１〜２１０３と同じ条件を表すノードを含み、部分木２０１０とノード間の親子関係が同一になる部分木である。 In FIG. 22, the classification device 100 searches the decision tree model for the same subtree as the subtree 2010. The subtree that is the same as the subtree 2010 is, for example, a subtree that includes nodes that represent the same conditions as the nodes 2101 to 2103, and has the same parent-child relationship between the subtree 2010 and the nodes.

分類装置１００は、例えば、決定木モデルの部分木２２０１〜２２０４などの中から、部分木２０１０と同一の部分木２２０２，２２０４を特定する。次に、分類装置１００は、部分木２２０２，２２０４に含まれ、名称が設定済みの葉ノードを選択する。そして、分類装置１００は、選択した葉ノードの名称に基づいて、葉ノード２００１の名称を生成する。 The classification device 100 identifies the same subtree 2202, 2204 as the subtree 2010 from, for example, the subtrees 2201 to 2204 of the decision tree model. Next, the classification device 100 selects the leaf nodes included in the subtrees 2202 and 2204 and whose names have been set. Then, the classification device 100 generates the name of the leaf node 2001 based on the name of the selected leaf node.

分類装置１００は、例えば、相対的に下位にある部分木から葉ノードを選択してもよい。分類装置１００は、具体的には、部分木２２０２，２２０４のうち、相対的に下位にある部分木２２０２から葉ノードを選択する。これにより、分類装置１００は、葉ノード２００１の近傍に、名称が設定済みの葉ノードがなくても、他の葉ノードを選択することができる。 The classification device 100 may select leaf nodes from relatively lower subtrees, for example. Specifically, the classification device 100 selects a leaf node from the relatively lower subtree 2202 among the subtrees 2202 and 2204. As a result, the classification device 100 can select another leaf node even if there is no leaf node whose name has been set in the vicinity of the leaf node 2001.

また、分類装置１００は、相対的に下位にある部分木から葉ノードを選択するため、名称が未設定である葉ノードの名称の生成に、利用者にとって比較的重要な異常を表す傾向がある葉ノードの名称を用いることができる。結果として、分類装置１００は、利用者にとって比較的重要な異常との、名称が未設定である葉ノードが表す内容との関連性を、利用者が把握しやすくすることができる。 Further, since the classification device 100 selects a leaf node from a subtree that is relatively inferior, it tends to represent an abnormality that is relatively important to the user in generating the name of the leaf node for which the name has not been set. You can use the name of the leaf node. As a result, the classification device 100 can make it easier for the user to grasp the relationship between the abnormality that is relatively important to the user and the content represented by the leaf node whose name has not been set.

ここでは、分類装置１００が、動作例１と動作例２とのいずれかを実行する場合について説明したが、これに限らない。例えば、分類装置１００が、動作例１と動作例２とを組み合わせる場合があってもよい。具体的には、分類装置１００が、動作例１により名称を生成することができなかったことに応じて、動作例２を実行するような場合があってもよい。図２５〜図２７に後述する各種フローチャートは、動作例１と動作例２とを組み合わせる場合に対応する。 Here, the case where the classification device 100 executes either the operation example 1 or the operation example 2 has been described, but the present invention is not limited to this. For example, the classification device 100 may combine the operation example 1 and the operation example 2. Specifically, there may be a case where the classification device 100 executes the operation example 2 depending on the fact that the name could not be generated by the operation example 1. The various flowcharts described in FIGS. 25 to 27 correspond to the case where the operation example 1 and the operation example 2 are combined.

（端末装置２０１における出力例１）
次に、図２３を用いて、端末装置２０１における出力例１について説明する。 (Output example 1 in the terminal device 201)
Next, an output example 1 in the terminal device 201 will be described with reference to FIG. 23.

図２３は、端末装置２０１における出力例１を示す説明図である。図２３において、分類装置１００は、動作例１または動作例２によって、葉ノードの名称を設定し、葉ノードに分類された請求データの元になった社会保障給付費の請求書の明細番号に対応付けて出力する。 FIG. 23 is an explanatory diagram showing an output example 1 in the terminal device 201. In FIG. 23, the classification device 100 sets the name of the leaf node according to the operation example 1 or the operation example 2, and sets the detail number of the invoice for the social security benefit expense as the source of the billing data classified into the leaf nodes. Output in association with each other.

分類装置１００は、例えば、表示内容２３００を、端末装置２０１のディスプレイ６０６に表示させる。表示内容２３００の内容のフィールドには、葉ノードの名称が設定される。これにより、分類装置１００は、利用者が、限られた時間で、効率よく事業者の指導監査業務を行うことを支援することができる。 The classification device 100 displays, for example, the display content 2300 on the display 606 of the terminal device 201. The name of the leaf node is set in the content field of the display content 2300. As a result, the classification device 100 can support the user to efficiently perform the guidance and audit work of the business operator in a limited time.

分類装置１００は、例えば、請求データが分類された理由を把握しやすくすることができる。また、分類装置１００は、例えば、事業者の請求データに、どのような種別の不正請求の請求データがあるかを把握可能にすることができる。このため、利用者は、事業者を指導および監査するための、請求データに関する資料を作成しやすくすることができる。 The classification device 100 can make it easy to understand, for example, the reason why the billing data is classified. Further, the classification device 100 can, for example, make it possible to grasp what kind of fraudulent billing billing data is included in the billing data of the business operator. Therefore, the user can easily create materials related to billing data for instructing and auditing the business operator.

（端末装置２０１における出力例２）
次に、図２４を用いて、端末装置２０１における出力例２について説明する。 (Output example 2 in the terminal device 201)
Next, an output example 2 in the terminal device 201 will be described with reference to FIG. 24.

図２４は、端末装置２０１における出力例２を示す説明図である。図２４において、分類装置１００は、動作例１または動作例２によって、事業者ごとに、不正請求に分類された請求データの数を算出して出力する。 FIG. 24 is an explanatory diagram showing an output example 2 in the terminal device 201. In FIG. 24, the classification device 100 calculates and outputs the number of billing data classified as fraudulent billing for each business operator according to the operation example 1 or the operation example 2.

分類装置１００は、例えば、表示内容２４００を、端末装置２０１のディスプレイ６０６に表示させる。表示内容２４００の件数のフィールドには、不正請求に分類された請求データの数が設定される。これにより、分類装置１００は、利用者が、限られた時間で、効率よく事業者の指導監査業務を行うことを支援することができる。利用者は、例えば、不正請求に分類された請求データの数を参照し、いずれの事業者を優先して指導および監査することが好ましいかを判断することができる。 The classification device 100 displays, for example, the display content 2400 on the display 606 of the terminal device 201. In the field of the number of display contents 2400, the number of billing data classified as fraudulent billing is set. As a result, the classification device 100 can support the user to efficiently perform the guidance and audit work of the business operator in a limited time. The user can refer to, for example, the number of billing data classified as fraudulent billing and determine which business operator is preferable for guidance and auditing.

また、分類装置１００は、例えば、事業者ごと、かつ、不正請求の種別ごとに、不正請求の請求データの数を表示してもよい。これにより、利用者は、いずれの事業者を優先して指導および監査することが好ましいかを判断することができる。 Further, the classification device 100 may display the number of billing data for fraudulent billing, for example, for each business operator and for each type of fraudulent billing. As a result, the user can determine which business operator is preferable for guidance and auditing.

（全体処理手順）
次に、図２５を用いて、分類装置１００が実行する、全体処理手順の一例について説明する。全体処理は、例えば、図３に示したＣＰＵ３０１と、メモリ３０２や記録媒体３０５などの記憶領域と、ネットワークＩ／Ｆ３０３とによって実現される。 (Overall processing procedure)
Next, an example of the overall processing procedure executed by the classification device 100 will be described with reference to FIG. 25. The entire processing is realized by, for example, the CPU 301 shown in FIG. 3, a storage area such as a memory 302 or a recording medium 305, and a network I / F 303.

図２５は、全体処理手順の一例を示すフローチャートである。図２５において、まず、分類装置１００は、学習した決定木モデルに基づいて、複数の対象データのそれぞれの対象データを、正常または異常を表す葉ノードに分類する（ステップＳ２５０１）。 FIG. 25 is a flowchart showing an example of the overall processing procedure. In FIG. 25, first, the classification device 100 classifies each target data of the plurality of target data into leaf nodes representing normal or abnormal based on the learned decision tree model (step S2501).

次に、分類装置１００は、分類した結果を請求テーブル５００に反映する（ステップＳ２５０２）。そして、分類装置１００は、少なくとも１以上の対象データのそれぞれの対象データが分類された理由の入力を受け付け、請求テーブル５００に反映する（ステップＳ２５０３）。 Next, the classification device 100 reflects the classification result in the billing table 500 (step S2502). Then, the classification device 100 receives the input of the reason why each target data of at least one target data is classified and reflects it in the billing table 500 (step S2503).

次に、分類装置１００は、受け付けた少なくとも１以上の対象データのそれぞれの対象データが分類された理由を、それぞれの対象データが分類された葉ノードの名称に設定する（ステップＳ２５０４）。そして、分類装置１００は、名称が未設定の葉ノードに分類された対象データがあれば、図２６に後述する選択処理を実行することにより、名称が設定済みの葉ノードを選択する（ステップＳ２５０５）。 Next, the classification device 100 sets the reason why each target data of the received at least one target data is classified in the name of the leaf node in which each target data is classified (step S2504). Then, if there is target data classified into the leaf nodes whose names have not been set, the classification device 100 selects the leaf nodes whose names have been set by executing the selection process described later in FIG. 26 (step S2505). ).

次に、分類装置１００は、選択した葉ノードの名称を取得する（ステップＳ２５０６）。そして、分類装置１００は、根ノードから名称が未設定の葉ノードまでの経路上のノードが表す条件と、根ノードから選択した葉ノードまでの経路上のノードが表す条件との差異を特定する（ステップＳ２５０７）。 Next, the classification device 100 acquires the name of the selected leaf node (step S2506). Then, the classification device 100 identifies the difference between the condition represented by the node on the route from the root node to the unnamed leaf node and the condition represented by the node on the route from the root node to the selected leaf node. (Step S2507).

次に、分類装置１００は、選択した葉ノードの名称と、特定した差異とに基づいて、名称が未設定の葉ノードの名称を設定する（ステップＳ２５０８）。そして、分類装置１００は、全体処理を終了する。これにより、分類装置１００は、葉ノードの名称を設定し、葉ノードの名称を利用者に提示可能にすることができる。 Next, the classification device 100 sets the name of the leaf node whose name has not been set based on the name of the selected leaf node and the specified difference (step S2508). Then, the classification device 100 ends the entire processing. As a result, the classification device 100 can set the name of the leaf node and make it possible to present the name of the leaf node to the user.

（選択処理手順）
次に、図２６を用いて、ステップＳ２５０５で実行する、選択処理手順の一例について説明する。選択処理は、例えば、図３に示したＣＰＵ３０１と、メモリ３０２や記録媒体３０５などの記憶領域と、ネットワークＩ／Ｆ３０３とによって実現される。 (Selection processing procedure)
Next, an example of the selection processing procedure executed in step S2505 will be described with reference to FIG. 26. The selection process is realized, for example, by the CPU 301 shown in FIG. 3, a storage area such as a memory 302 or a recording medium 305, and a network I / F 303.

図２６は、選択処理手順の一例を示すフローチャートである。図２６において、まず、分類装置１００は、名称が未設定の葉ノードの親ノードを基準ノードに設定する（ステップＳ２６０１）。 FIG. 26 is a flowchart showing an example of the selection processing procedure. In FIG. 26, first, the classification device 100 sets the parent node of the leaf node whose name has not been set as the reference node (step S2601).

次に、分類装置１００は、基準ノードの右側の子ノードに名称が設定されているか否かを判定する（ステップＳ２６０２）。ここで、右側の子ノードに名称が設定されている場合（ステップＳ２６０２：Ｙｅｓ）、分類装置１００は、ステップＳ２６０４の処理に移行する。一方で、右側の子ノードに名称が設定されていない場合（ステップＳ２６０２：Ｎｏ）、分類装置１００は、ステップＳ２６０３の処理に移行する。 Next, the classification device 100 determines whether or not a name is set for the child node on the right side of the reference node (step S2602). Here, when a name is set for the child node on the right side (step S2602: Yes), the classification device 100 shifts to the process of step S2604. On the other hand, when the name is not set for the child node on the right side (step S2602: No), the classification device 100 shifts to the process of step S2603.

ステップＳ２６０３では、分類装置１００は、基準ノードの左側の子ノードに名称が設定されているか否かを判定する（ステップＳ２６０３）。ここで、左側の子ノードに名称が設定されている場合（ステップＳ２６０３：Ｙｅｓ）、分類装置１００は、ステップＳ２６０４の処理に移行する。一方で、左側の子ノードに名称が設定されていない場合（ステップＳ２６０３：Ｎｏ）、分類装置１００は、ステップＳ２６０５の処理に移行する。 In step S2603, the classification device 100 determines whether or not a name is set for the child node on the left side of the reference node (step S2603). Here, when a name is set for the child node on the left side (step S2603: Yes), the classification device 100 shifts to the process of step S2604. On the other hand, when the name is not set for the child node on the left side (step S2603: No), the classification device 100 shifts to the process of step S2605.

ステップＳ２６０４では、分類装置１００は、基準ノードの子ノードのうち、名称が設定済みの葉ノードを選択する（ステップＳ２６０４）。そして、分類装置１００は、選択処理を終了する。 In step S2604, the classification device 100 selects a leaf node whose name has been set from the child nodes of the reference node (step S2604). Then, the classification device 100 ends the selection process.

ステップＳ２６０５では、分類装置１００は、基準ノードの変更回数＞決定木の深さ＊１０％であるか否かを判定する（ステップＳ２６０５）。ここで、基準ノードの変更回数＞決定木の深さ＊１０％ではない場合（ステップＳ２６０５：Ｎｏ）、分類装置１００は、ステップＳ２６０６の処理に移行する。一方で、基準ノードの変更回数＞決定木の深さ＊１０％である場合（ステップＳ２６０５：Ｙｅｓ）、分類装置１００は、ステップＳ２６０７の処理に移行する。 In step S2605, the classification device 100 determines whether or not the number of changes of the reference node> the depth of the decision tree * 10% (step S2605). Here, when the number of changes of the reference node> the depth of the decision tree * 10% is not satisfied (step S2605: No), the classification device 100 shifts to the process of step S2606. On the other hand, when the number of changes of the reference node> the depth of the decision tree * 10% (step S2605: Yes), the classification device 100 shifts to the process of step S2607.

ステップＳ２６０６では、分類装置１００は、基準ノードの親ノードを、基準ノードに変更する（ステップＳ２６０６）。そして、分類装置１００は、ステップＳ２６０２の処理に移行する。 In step S2606, the classification device 100 changes the parent node of the reference node to the reference node (step S2606). Then, the classification device 100 shifts to the process of step S2602.

ステップＳ２６０７では、分類装置１００は、基準ノードを変更せず、図２７に示す検索処理を実行することにより、決定木モデル全体から、名称が未設定の葉ノードの近傍の部分木と類似する他の部分木を検索する（ステップＳ２６０７）。 In step S2607, the classification device 100 does not change the reference node and executes the search process shown in FIG. 27 to resemble the subtree in the vicinity of the unnamed leaf node from the entire decision tree model. (Step S2607).

次に、分類装置１００は、検索した他の部分木の葉ノードのうち、名称が設定済みの葉ノードを選択する（ステップＳ２６０８）。そして、分類装置１００は、選択処理を終了する。これにより、分類装置１００は、葉ノードの名称を生成する際に用いられる葉ノードを選択することができる。 Next, the classification device 100 selects a leaf node whose name has been set from among the other searched leaf nodes of the subtree (step S2608). Then, the classification device 100 ends the selection process. Thereby, the classification device 100 can select the leaf node used when generating the name of the leaf node.

（検索処理手順）
次に、図２７を用いて、ステップＳ２５０５で実行する、検索処理手順の一例について説明する。検索処理は、例えば、図３に示したＣＰＵ３０１と、メモリ３０２や記録媒体３０５などの記憶領域と、ネットワークＩ／Ｆ３０３とによって実現される。 (Search processing procedure)
Next, an example of the search processing procedure executed in step S2505 will be described with reference to FIG. 27. The search process is realized, for example, by the CPU 301 shown in FIG. 3, a storage area such as a memory 302 or a recording medium 305, and a network I / F 303.

図２７は、検索処理手順の一例を示すフローチャートである。図２７において、まず、分類装置１００は、名称が未設定の葉ノードを含み、名称が未設定の葉ノードの階層よりも決定木の深さ＊１０％だけ上位の階層から、名称が未設定の葉ノードの階層までにある、部分木を特定する（ステップＳ２７０１）。 FIG. 27 is a flowchart showing an example of the search processing procedure. In FIG. 27, first, the classification device 100 includes a leaf node whose name has not been set, and the name has not been set from a hierarchy higher than the hierarchy of the leaf node whose name has not been set by the depth of the decision tree * 10%. Identify subtrees up to the leaf node hierarchy (step S2701).

次に、分類装置１００は、特定した部分木に含まれるノードが表す条件と同一または類似する条件を表すノードが含まれる１以上の部分木を、決定木モデル全体から検索する（ステップＳ２７０２）。そして、分類装置１００は、検索した１以上の部分木のうち、最も深い階層にある部分木を検索する（ステップＳ２７０３）。その後、分類装置１００は、検索処理を終了する。これにより、分類装置１００は、葉ノードを選択する範囲になる部分木を検索することができる。 Next, the classification device 100 searches the entire decision tree model for one or more subtrees including nodes representing conditions that are the same as or similar to the conditions represented by the nodes included in the specified subtree (step S2702). Then, the classification device 100 searches for the subtree in the deepest hierarchy among the searched subtrees of one or more (step S2703). After that, the classification device 100 ends the search process. As a result, the classification device 100 can search for subtrees that fall within the range for selecting leaf nodes.

以上説明したように、分類装置１００は、決定木モデルにより、対象データを分類することができる。分類装置１００によれば、名称が未設定である第１の葉ノードに対象データが分類された場合、ノード間の位置関係に基づいて、決定木モデルのうち、名称が設定済みである第２の葉ノードを選択することができる。分類装置１００によれば、選択した第２の葉ノードの名称に基づいて第１の葉ノードの名称を生成することができる。これにより、分類装置１００は、利用者が、対象データが分類された結果の種別を把握可能にし、対象データが分類された理由を把握可能にすることができる。 As described above, the classification device 100 can classify the target data by the decision tree model. According to the classification device 100, when the target data is classified into the first leaf node whose name has not been set, the second of the decision tree models whose name has been set is set based on the positional relationship between the nodes. Leaf nodes can be selected. According to the classification device 100, the name of the first leaf node can be generated based on the name of the selected second leaf node. As a result, the classification device 100 makes it possible for the user to grasp the type of the result of classifying the target data and to grasp the reason why the target data is classified.

分類装置１００によれば、第１の葉ノードが、対象データが異常であることを表す葉ノードである場合に、第２の葉ノードを選択することができる。これにより、分類装置１００は、処理量の低減化を図ることができる。 According to the classification device 100, when the first leaf node is a leaf node indicating that the target data is abnormal, the second leaf node can be selected. As a result, the classification device 100 can reduce the processing amount.

分類装置１００によれば、ノード間の位置関係に基づいて、第１の葉ノードの近傍にある葉ノードの中から、第２の葉ノードを選択することができる。これにより、分類装置１００は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。このため、分類装置１００は、第１の葉ノードについて、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 According to the classification device 100, the second leaf node can be selected from the leaf nodes in the vicinity of the first leaf node based on the positional relationship between the nodes. This makes it easier for the classifier 100 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node. Therefore, the classification device 100 can easily generate a name that is easy for the user to intuitively understand for the first leaf node.

分類装置１００によれば、第１の葉ノードから所定数以下のエッジを経由して到達可能な葉ノードの中から、第２の葉ノードを選択することができる。これにより、分類装置１００は、第１の葉ノードについて、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 According to the classification device 100, the second leaf node can be selected from the leaf nodes that can be reached from the first leaf node via a predetermined number or less of edges. As a result, the classification device 100 can easily generate a name that is intuitively easy for the user to understand for the first leaf node.

分類装置１００によれば、第１の葉ノードの上位にあり第１の葉ノードから所定数以下のエッジを経由して到達可能な上位ノードの下位にある葉ノードの中から、第２の葉ノードを選択することができる。これにより、分類装置１００は、第１の葉ノードについて、利用者が直感的に理解しやすい名称を生成しやすくすることができる。 According to the classification device 100, the second leaf is among the leaf nodes above the first leaf node and below the upper node that can be reached from the first leaf node via a predetermined number of edges or less. You can select a node. As a result, the classification device 100 can easily generate a name that is intuitively easy for the user to understand for the first leaf node.

分類装置１００によれば、第２の葉ノードを選択する際に用いられる所定数を、決定木モデルの深さに基づいて設定することができる。これにより、分類装置１００は、決定木モデルの深さに応じて、選択対象の範囲を限定し、処理量の低減化を図ることができる。 According to the classification device 100, a predetermined number used when selecting the second leaf node can be set based on the depth of the decision tree model. As a result, the classification device 100 can limit the range of selection targets according to the depth of the decision tree model and reduce the processing amount.

分類装置１００によれば、ノード間の位置関係に基づいて、決定木モデルのうち、第１の葉ノードを含む第１の部分木と同一または類似する第２の部分木に含まれる葉ノードの中から、第２の葉ノードを選択することができる。これにより、分類装置１００は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。また、分類装置１００は、第１の葉ノードの近傍に、名称が設定済みの葉ノードがなくても、第２の葉ノードを選択することができる。 According to the classification device 100, the leaf nodes included in the second subtree that is the same as or similar to the first subtree containing the first leaf node in the decision tree model based on the positional relationship between the nodes. A second leaf node can be selected from among them. This makes it easier for the classifier 100 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node. Further, the classification device 100 can select the second leaf node even if there is no leaf node for which the name has been set in the vicinity of the first leaf node.

分類装置１００によれば、第２の部分木として、第１の部分木に含まれるノードと同一の条件を表すノードを含み、第１の部分木とノード間の位置関係が同一である部分木を特定することができる。これにより、分類装置１００は、第１の葉ノードと比較的類似する正常または異常を表す第２の葉ノードを選択しやすくすることができる。 According to the classification device 100, the second subtree includes a node representing the same conditions as the node included in the first subtree, and the subtree has the same positional relationship between the first subtree and the node. Can be identified. This makes it easier for the classifier 100 to select a second leaf node that represents normal or abnormal, which is relatively similar to the first leaf node.

分類装置１００によれば、さらに、根ノードから第１の葉ノードまでの経路上の各ノードが表す条件に関する判定結果と、根ノードから第２の葉ノードまでの経路上の各ノードが表す条件に関する判定結果との差異を特定することができる。分類装置１００によれば、特定した差異に基づいて、第１の葉ノードの名称を生成することができる。これにより、分類装置１００は、どのような条件の判定結果が差異になるかを把握可能にすることができ、対象データが正常または異常に分類された理由を把握しやすい名称を生成することができる。 According to the classification device 100, further, the determination result regarding the condition represented by each node on the route from the root node to the first leaf node and the condition represented by each node on the route from the root node to the second leaf node. It is possible to identify the difference from the judgment result regarding. According to the classification device 100, the name of the first leaf node can be generated based on the identified difference. As a result, the classification device 100 can grasp what kind of condition the determination result is different, and can generate a name that makes it easy to grasp the reason why the target data is classified as normal or abnormal. it can.

分類装置１００によれば、対象データに、生成した第１の葉ノードの名称を対応付けて出力することができる。これにより、分類装置１００は、利用者が、対象データが分類された第１の葉ノードの名称を把握可能にすることができる。 According to the classification device 100, the name of the generated first leaf node can be associated with the target data and output. Thereby, the classification device 100 can enable the user to grasp the name of the first leaf node in which the target data is classified.

分類装置１００によれば、第１の葉ノードに、生成した第１の葉ノードの名称を設定することができる。これにより、分類装置１００は、次回、対象データが第１の葉ノードに分類された場合に、名称を生成し直さなくてもよくすることができ、処理量の低減化を図ることができる。 According to the classification device 100, the name of the generated first leaf node can be set in the first leaf node. As a result, the classification device 100 does not have to regenerate the name when the target data is classified into the first leaf node next time, and the processing amount can be reduced.

なお、本実施の形態で説明した分類方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本実施の形態で説明した分類プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、本実施の形態で説明した分類プログラムは、インターネット等のネットワークを介して配布してもよい。 The classification method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The classification program described in this embodiment is recorded on a computer-readable recording medium such as a hard disk, flexible disk, CD-ROM, MO, or DVD, and is executed by being read from the recording medium by the computer. Further, the classification program described in the present embodiment may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are further disclosed with respect to the above-described embodiment.

（付記１）コンピュータに、
対象データを分類する条件を表すノードと、前記対象データが正常であること、または、前記対象データが異常であることを表す葉ノードとを含み、少なくともいずれかの葉ノードに名称が設定された決定木モデルにより、前記対象データを分類し、
名称が未設定である第１の葉ノードに前記対象データが分類された場合、ノード間の位置関係に基づいて、前記決定木モデルのうち、名称が設定済みである第２の葉ノードを選択し、
選択した前記第２の葉ノードの名称に基づいて前記第１の葉ノードの名称を生成する、
処理を実行させることを特徴とする分類プログラム。 (Appendix 1) To the computer
A name is set for at least one of the leaf nodes, including a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is classified by the decision tree model, and
When the target data is classified into the first leaf node whose name has not been set, the second leaf node whose name has been set is selected from the decision tree models based on the positional relationship between the nodes. And
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification program characterized by executing processing.

（付記２）前記第１の葉ノードは、前記対象データが異常であることを表す葉ノードである、ことを特徴とする付記１に記載の分類プログラム。 (Appendix 2) The classification program according to Appendix 1, wherein the first leaf node is a leaf node indicating that the target data is abnormal.

（付記３）前記選択する処理は、
前記ノード間の位置関係に基づいて、前記第１の葉ノードの近傍にある葉ノードの中から、前記第２の葉ノードを選択する、ことを特徴とする付記１または２に記載の分類プログラム。 (Appendix 3) The selected process is
The classification program according to Appendix 1 or 2, wherein the second leaf node is selected from the leaf nodes in the vicinity of the first leaf node based on the positional relationship between the nodes. ..

（付記４）前記第１の葉ノードの近傍にある葉ノードは、前記第１の葉ノードから所定数以下のエッジを経由して到達可能な葉ノードである、ことを特徴とする付記３に記載の分類プログラム。 (Supplementary note 4) The leaf node in the vicinity of the first leaf node is a leaf node that can be reached from the first leaf node via a predetermined number or less of edges. Described classification program.

（付記５）前記第１の葉ノードの近傍にある葉ノードは、前記第１の葉ノードの上位にあり前記第１の葉ノードから所定数以下のエッジを経由して到達可能な上位ノードの下位にある葉ノードである、ことを特徴とする付記３に記載の分類プログラム。 (Appendix 5) The leaf node in the vicinity of the first leaf node is a higher node that is above the first leaf node and can be reached from the first leaf node via a predetermined number or less of edges. The classification program according to Appendix 3, characterized in that it is a lower leaf node.

（付記６）前記所定数は、前記決定木モデルの深さに基づいて設定される、ことを特徴とする付記５に記載の分類プログラム。 (Supplementary note 6) The classification program according to Supplementary note 5, wherein the predetermined number is set based on the depth of the decision tree model.

（付記７）前記選択する処理は、
前記ノード間の位置関係に基づいて、前記決定木モデルのうち、前記第１の葉ノードを含む第１の部分木と同一または類似する第２の部分木に含まれる葉ノードの中から、前記第２の葉ノードを選択する、ことを特徴とする付記１〜６のいずれか一つに記載の分類プログラム。 (Appendix 7) The selected process is
From among the leaf nodes included in the second subtree that is the same as or similar to the first subtree including the first leaf node in the decision tree model based on the positional relationship between the nodes. The classification program according to any one of Supplementary notes 1 to 6, characterized in that a second leaf node is selected.

（付記８）前記第２の部分木は、前記第１の部分木に含まれるノードと同一の条件を表すノードを含み、前記第１の部分木とノード間の位置関係が同一である、ことを特徴とする付記７に記載の分類プログラム。 (Appendix 8) The second subtree includes a node representing the same conditions as the node included in the first subtree, and the positional relationship between the first subtree and the node is the same. The classification program according to Appendix 7, which comprises the above.

（付記９）前記生成する処理は、さらに、前記決定木モデルの根ノードから前記第１の葉ノードまでの経路上の各ノードが表す条件に関する判定結果と、前記根ノードから前記第２の葉ノードまでの経路上の各ノードが表す条件に関する判定結果との差異に基づいて、前記第１の葉ノードの名称を生成する、ことを特徴とする付記１〜８のいずれか一つに記載の分類プログラム。 (Appendix 9) The generated process further includes a determination result regarding the conditions represented by each node on the path from the root node of the determination tree model to the first leaf node, and the second leaf from the root node. The description in any one of Supplementary notes 1 to 8, wherein the name of the first leaf node is generated based on the difference from the determination result regarding the condition represented by each node on the route to the node. Classification program.

（付記１０）前記コンピュータに、
前記対象データに、生成した前記第１の葉ノードの名称を対応付けて出力する、処理を実行させることを特徴とする付記１〜９のいずれか一つに記載の分類プログラム。 (Appendix 10) To the computer
The classification program according to any one of Supplementary note 1 to 9, wherein the target data is associated with the name of the generated first leaf node and output, and the process is executed.

（付記１１）前記コンピュータに、
前記第１の葉ノードに、生成した前記第１の葉ノードの名称を設定する、処理を実行させることを特徴とする付記１〜１０のいずれか一つに記載の分類プログラム。 (Appendix 11) To the computer
The classification program according to any one of Supplementary note 1 to 10, wherein a process is executed in which the name of the generated first leaf node is set in the first leaf node.

（付記１２）コンピュータが、
対象データを分類する条件を表すノードと、前記対象データが正常であること、または、前記対象データが異常であることを表す葉ノードとを含み、少なくともいずれかの葉ノードに名称が設定された決定木モデルにより、前記対象データを分類し、
名称が未設定である第１の葉ノードに前記対象データが分類された場合、ノード間の位置関係に基づいて、前記決定木モデルのうち、名称が設定済みである第２の葉ノードを選択し、
選択した前記第２の葉ノードの名称に基づいて前記第１の葉ノードの名称を生成する、
処理を実行することを特徴とする分類方法。 (Appendix 12) The computer
A name is set for at least one of the leaf nodes, including a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is classified by the decision tree model, and
When the target data is classified into the first leaf node whose name has not been set, the second leaf node whose name has been set is selected from the decision tree models based on the positional relationship between the nodes. And
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification method characterized by performing processing.

（付記１３）対象データを分類する条件を表すノードと、前記対象データが正常であること、または、前記対象データが異常であることを表す葉ノードとを含み、少なくともいずれかの葉ノードに名称が設定された決定木モデルにより、前記対象データを分類し、
名称が未設定である第１の葉ノードに前記対象データが分類された場合、ノード間の位置関係に基づいて、前記決定木モデルのうち、名称が設定済みである第２の葉ノードを選択し、
選択した前記第２の葉ノードの名称に基づいて前記第１の葉ノードの名称を生成する、
制御部を有することを特徴とする分類装置。 (Appendix 13) A node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal are included, and at least one of the leaf nodes is named. According to the decision tree model in which is set, the target data is classified and
When the target data is classified into the first leaf node whose name has not been set, the second leaf node whose name has been set is selected from the decision tree models based on the positional relationship between the nodes. And
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification device having a control unit.

１００分類装置
１１０，８０５，１０００，１３００決定木モデル
１１１〜１１６，９０１〜９０４，１００１〜１００６，１３０１〜１３０３，２１０１〜２１０３ノード
１２１〜１２７，９１１〜９１５，１０１１〜１０１７，１３１１〜１３１３，２００１，２１１１〜２１１４葉ノード
２００分類システム
２０１端末装置
２１０ネットワーク
３００，６００バス
３０１，６０１ＣＰＵ
３０２，６０２メモリ
３０３，６０３ネットワークＩ／Ｆ
３０４，６０４記録媒体Ｉ／Ｆ
３０５，６０５記録媒体
４００，８０１，８０６請求データ
５００，１４００請求テーブル
６０６ディスプレイ
６０７入力装置
７００記憶部
７０１取得部
７０２学習部
７０３分類部
７０４選択部
７０５生成部
７０６出力部
８０２事例データ
８０３学習データ集合
８０４機械学習ＦＷ
８０７対応データ
９２１，９２２学習データ部分集合
１２００学習テーブル
２０１０，２２０１〜２２０４部分木
２３００，２４００表示内容 100 Sorting device 110,805,1000,1300 Decision tree model 111-116,901-904,1001-1006,1301-1303,2101-2103 Nodes 121-127,911-915,101-11-1017,1311-11332001 , 2111-2114 Leaf Node 200 Classification System 201 Terminal Equipment 210 Network 300,600 Bus 301,601 CPU
302,602 Memory 303,603 Network I / F
304,604 Recording medium I / F
305,605 Recording medium 400,801,806 Billing data 500,1400 Billing table 606 Display 607 Input device 700 Storage unit 701 Acquisition unit 702 Learning unit 703 Classification unit 704 Selection unit 705 Generation unit 706 Output unit 802 Case data 803 Learning data set 804 Machine Learning FW
807 Corresponding data 921,922 Learning data Subset 1200 Learning table 2010, 2201-2204 Subtree 2300, 2400 Display contents

Claims

On the computer
A name is set for at least one of the leaf nodes, including a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is classified by the decision tree model, and
When the target data is classified into the first leaf node whose name has not been set, the first leaf node and the other leaf nodes in the determination tree model are selected based on the positional relationship between the nodes. Based on the number of connected edges, select the second leaf node whose name has been set from the leaf nodes determined to be in the vicinity of the first leaf node.
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification program characterized by executing processing.

The classification program according to claim 1, wherein the first leaf node is a leaf node indicating that the target data is abnormal.

The selected process is
Based on the positional relationship between the nodes, the first portion of the determination tree model includes nodes representing the same conditions as each node included in the first subtree including the first leaf node. A second subtree in which the positional relationship between the tree and the node is the same, or a node representing the same conditions as each node included in a part of the first subtree, and the first subtree includes a node. A second subtree including a part having the same positional relationship between the node and the part is specified, and the second leaf node is selected from the leaf nodes included in the specified second subtree. , The classification program according to claim 1 or 2.

In the selection process, one of the second leaf nodes is selected.
The generated process further includes a determination result regarding the conditions represented by each node on the path from the root node of the determination tree model to the first leaf node, and the second leaf node selected from the root node. The classification according to any one of claims 1 to 3, wherein the name of the first leaf node is generated based on the difference from the determination result regarding the condition represented by each node on the path of. program.

On the computer
The classification program according to any one of claims 1 to 4, wherein the target data is associated with the name of the generated first leaf node and output, and the process is executed.

On the computer
The classification program according to any one of claims 1 to 5, wherein a process is executed in which the name of the generated first leaf node is set in the first leaf node.

The computer
A name is set for at least one of the leaf nodes, including a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is classified by the decision tree model, and
When the target data is classified into the first leaf node whose name has not been set, the first leaf node and the other leaf nodes in the determination tree model are selected based on the positional relationship between the nodes. Based on the number of connected edges, select the second leaf node whose name has been set from the leaf nodes determined to be in the vicinity of the first leaf node.
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification method characterized by performing processing.

A name is set for at least one of the leaf nodes, including a node representing a condition for classifying the target data and a leaf node indicating that the target data is normal or the target data is abnormal. The target data is classified by the decision tree model, and
When the target data is classified into the first leaf node whose name has not been set, the first leaf node and the other leaf nodes in the determination tree model are selected based on the positional relationship between the nodes. Based on the number of connected edges, select the second leaf node whose name has been set from the leaf nodes determined to be in the vicinity of the first leaf node.
Generates the name of the first leaf node based on the name of the selected second leaf node.
A classification device having a control unit.