JP2022029347A

JP2022029347A - Feature selection program, device, and method

Info

Publication number: JP2022029347A
Application number: JP2020132646A
Authority: JP
Inventors: 尚美岩山; Naomi Iwayama; 学中尾; Manabu Nakao; 英司長谷川; Eiji Hasegawa
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2022-02-17

Abstract

PROBLEM TO BE SOLVED: To select a feature amount for effective machine learning from items included in a data set without requiring correct answer data.
A feature amount selection device acquires a shaping processing data set and a shaping processing history DB 21 in which shaping processing history data is stored, and uses a shaping processing function stored as shaping processing history data as a key. Search the "formatting function" of the priority item DB22 in which the priority item to be preferentially selected as the feature amount to be included in the training data is defined, acquire the corresponding "priority item target argument", and use the formatting history data. The item that became the argument at the time of executing the shaping process function corresponding to the acquired priority item target argument is extracted as the priority item, and this priority item is selected as the feature quantity.
[Selection diagram] FIG. 11

Description

開示の技術は、特徴量選択プログラム、特徴量選択装置、及び特徴量選択方法に関する。 The disclosed technique relates to a feature amount selection program, a feature amount selection device, and a feature amount selection method.

工場における生産設備等の稼働状況等を示す複数種類のセンサデータを収集し、収集したセンサデータのデータセットを機械学習することにより生成したモデルを用いて、人手をかけずに生産設備等の異常又は正常を検出するシステムの導入が進んでいる。収集したデータセットは、多数の項目を含む場合が多く、データセットをそのまま機械学習に用いた場合、機械学習に時間がかかるという問題がある。また、モデルが複雑になり、モデルを用いた推論時の解釈性が低下する等の問題も生じる。 Using a model generated by collecting multiple types of sensor data indicating the operating status of production equipment in a factory and machine learning the collected sensor data data set, abnormalities in production equipment, etc. without human intervention. Alternatively, the introduction of a system for detecting normality is progressing. The collected data set often contains a large number of items, and if the data set is used as it is for machine learning, there is a problem that machine learning takes time. In addition, the model becomes complicated, and there arises a problem that the interpretability at the time of inference using the model is lowered.

そこで、データセットに含まれる多数の項目から、所定数の項目を、機械学習に用いる学習データに含める特徴量として選択する技術が提案されている。例えば、簡便な方法により入力候補変数から入力変数を絞り込むための指標を提示し、入力変数選択の支援を行う入力変数選択支援装置が提案されている。この装置は、平均値が０であり標準偏差が１となるように標準化されたモデル情報データを用いて、全入力候補変数と出力変数の関係を表す複数の入出力モデルを、モデル原理を異ならせて多形式にわたり生成する。そして、この装置は、各入出力モデルの各入力候補変数に対する出力変数の絶対値である感度を生成する感度生成を各入出力モデルについてそれぞれ行い、各入出力モデルの感度に基づいて入力変数を選択提示する。 Therefore, a technique has been proposed in which a predetermined number of items are selected as feature quantities to be included in the learning data used for machine learning from a large number of items included in the data set. For example, an input variable selection support device has been proposed that supports input variable selection by presenting an index for narrowing down input variables from input candidate variables by a simple method. This device uses model information data standardized so that the mean value is 0 and the standard deviation is 1, and multiple input / output models that represent the relationship between all input candidate variables and output variables can be created using different model principles. Generate over multiple formats. Then, this device performs sensitivity generation for each input / output model to generate the sensitivity which is the absolute value of the output variable for each input candidate variable of each input / output model, and inputs the input variable based on the sensitivity of each input / output model. Select and present.

また、観測対象からのデータ群が取り得る値の範囲である変数を選択する選択装置が提案されている。この装置は、変数を類似性に基づいてクラスタリングし、各クラスターに属する変数の中から特定の代表変数を選択する。そして、この装置は、熟練の管理者や技術者の知見に基づいて、監視すべき変数を指定するラベルを入力デバイスから設定し、ラベル指定表に格納する。 In addition, a selection device that selects a variable that is in the range of values that the data group from the observation target can take has been proposed. This device clusters variables based on similarity and selects a specific representative variable from the variables belonging to each cluster. Then, based on the knowledge of a skilled manager or technician, this device sets a label that specifies a variable to be monitored from the input device and stores it in the label specification table.

また、機械学習されたモデルを使用することなく、データセットのデータの性質に基づいて特徴量を選択する方法も存在する。具体的には、目的変数と説明変数との関係に基づいて特徴量を選択する方法が存在する。又は、多重共線性によるモデル精度の低下を回避するため、説明変数同士で相関の高い項目を削除する方法が存在する。 There is also a method of selecting features based on the nature of the data in the dataset without using a machine-learned model. Specifically, there is a method of selecting a feature amount based on the relationship between the objective variable and the explanatory variable. Alternatively, in order to avoid a decrease in model accuracy due to multicollinearity, there is a method of deleting items having a high correlation between explanatory variables.

特開２０１０－２８２５４７号公報Japanese Unexamined Patent Publication No. 2010-282547 国際公開２０１８／０９２３１７号International Publication No. 2018/092317

“特徴量選択のまとめ”，［online］，2020年7月8日，［2020年7月10日検索］，インターネット＜URL：https://qiita.com/shimopino/items/5fee7504c7acf044a521＞"Summary of feature selection", [online], July 8, 2020, [Search on July 10, 2020], Internet <URL: https://qiita.com/shimopino/items/5fee7504c7acf044a521>

従来技術では、特徴量を選択するために、目的変数（出力変数）、すなわち正解データを必要とする。しかしながら、例えば、上述したような機械の異常を検出するシステムでは、機械学習における正解データとなる異常を示す情報が存在しない場合もある。そのため、正解データを必要とする従来技術は適用することができないという問題がある。また、業務上の重要な項目は削除することなく特徴量として選択することが望ましいが、説明変数同士で相関の高い項目を削除する場合、重要な説明変数の方を削除してしまう可能性がある。 In the prior art, an objective variable (output variable), that is, correct answer data is required to select a feature amount. However, for example, in a system for detecting an abnormality in a machine as described above, there may be no information indicating an abnormality that is correct data in machine learning. Therefore, there is a problem that the conventional technique that requires correct answer data cannot be applied. In addition, it is desirable to select important business items as features without deleting them, but when deleting items with high correlation between explanatory variables, there is a possibility that the important explanatory variables will be deleted. be.

一つの側面として、開示の技術は、正解データを必要とすることなく、データセットに含まれる項目から、効果的な機械学習を行うための特徴量を選択することを目的とする。 As one aspect, the disclosed technique aims to select features for effective machine learning from the items contained in the dataset without the need for correct data.

一つの態様として、開示の技術は、複数の特徴量を含み、かつ整形処理が実行されたデータセットと、前記整形処理の履歴と、特定の整形処理において引数となる特定の特徴量が定義された特定情報とを取得する。そして、開示の技術は、取得した前記整形処理の履歴に基づいて、前記データセットに含まれる前記複数の特徴量から、前記特定情報で定義された前記特定の特徴量に対応する特徴量を選択する。 As one embodiment, the disclosed technique defines a data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and a specific feature quantity as an argument in the specific shaping process. Get specific information. Then, the disclosed technique selects a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set based on the acquired history of the shaping process. do.

一つの側面として、正解データを必要とすることなく、データセットに含まれる項目から、効果的な機械学習を行うための特徴量を選択することができる、という効果を有する。 One aspect is that it is possible to select a feature amount for effective machine learning from the items included in the data set without requiring correct answer data.

特徴量選択装置の機能ブロック図である。It is a functional block diagram of a feature quantity selection device. 整形処理前のデータセットＡの一例を示す図である。It is a figure which shows an example of the data set A before the shaping process. 整形処理前のデータセットＢの一例を示す図である。It is a figure which shows an example of the data set B before the shaping process. 整形処理画面の一例を示す図である。It is a figure which shows an example of a shaping process screen. 整形処理済みのデータセットＡの一例を示す図である。It is a figure which shows an example of the data set A which has been shaped. 整形処理画面の他の例を示す図である。It is a figure which shows another example of a shaping process screen. 整形処理済みのデータセットＢの一例を示す図である。It is a figure which shows an example of the data set B which has been shaped. データセットＡについての整形処理履歴ＤＢの一例を示す図である。It is a figure which shows an example of the shaping process history DB about the data set A. データセットＢについての整形処理履歴ＤＢの一例を示す図である。It is a figure which shows an example of the shaping process history DB about the data set B. 優先項目ＤＢの一例を示す図である。It is a figure which shows an example of a priority item DB. 優先項目の抽出を説明するための図である。It is a figure for demonstrating the extraction of a priority item. 優先項目と他の項目との相関テーブルの一例を示す図である。It is a figure which shows an example of the correlation table of a priority item and another item. 受付画面の一例を示す図である。It is a figure which shows an example of a reception screen. 優先項目ＤＢの更新を説明するための図である。It is a figure for demonstrating the update of a priority item DB. 特徴量選択装置として機能するコンピュータの概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the computer which functions as a feature quantity selection device. 特徴量選択処理の一例を示すフローチャートである。It is a flowchart which shows an example of a feature amount selection process.

以下、図面を参照して、開示の技術に係る実施形態の一例を説明する。 Hereinafter, an example of the embodiment according to the disclosed technology will be described with reference to the drawings.

図１に示すように、特徴量選択装置１０は、機能的には、整形部１１と、取得部１２と、抽出部１３と、選択部１４と、受付部１５と、更新部１６とを含む。なお、抽出部１３及び選択部１４は、開示の技術の「選択部」の一例である。 As shown in FIG. 1, the feature amount selection device 10 functionally includes a shaping unit 11, an acquisition unit 12, an extraction unit 13, a selection unit 14, a reception unit 15, and an update unit 16. .. The extraction unit 13 and the selection unit 14 are examples of the “selection unit” of the disclosed technology.

整形部１１は、特徴量選択装置１０に入力された、整形処理前のデータセットを取得する。図２及び図３に、整形処理前のデータセットの一例を示す。以下では、図２に示すデータセットを「データセットＡ」、図３に示すデータセットを「データセットＢ」として説明する。データセットには、複数の項目が含まれ、各データは、各項目についての値を有する。図２に示すように、データセットＡには、「Ｘ１」、「Ｘ２」、「Ｘ３」、「Ｘ４」、及び「Ｘ５」という項目が含まれている。また、図３に示すように、データセットＢには、「運転開始日」、「障害発生日」、「Ｘ１」、「Ｘ２」、及び「Ｘ３」という項目が含まれている。 The shaping unit 11 acquires the data set before the shaping process input to the feature amount selection device 10. 2 and 3 show an example of the data set before the shaping process. Hereinafter, the data set shown in FIG. 2 will be referred to as “data set A”, and the data set shown in FIG. 3 will be referred to as “data set B”. The data set contains a plurality of items, and each data has a value for each item. As shown in FIG. 2, the data set A includes items "X1", "X2", "X3", "X4", and "X5". Further, as shown in FIG. 3, the data set B includes the items "operation start date", "failure occurrence date", "X1", "X2", and "X3".

整形部１１は、取得した整形処理前のデータセットに対して整形処理を実行する。具体的には、整形部１１は、データセットに対する整形処理として実行される関数（以下、「整形処理関数」という）、及び整形処理前のデータセットに含まれる複数の項目のうち、整形処理関数の引数となる項目の指定を受け付ける。そして、整形部１１は、指定された項目を引数として、指定された整形処理関数を実行する。 The shaping unit 11 executes the shaping process on the acquired data set before the shaping process. Specifically, the shaping unit 11 is a function executed as a shaping process for the data set (hereinafter referred to as "shaping process function"), and a shaping processing function among a plurality of items included in the data set before the shaping process. Accepts the specification of the item that becomes the argument of. Then, the shaping unit 11 executes the designated shaping processing function with the designated item as an argument.

整形処理関数としては、予め定められた複数種類の整形処理関数を用意しておくことができる。整形処理関数は、例えば、指定された項目の値が数値か否かをチェックする数値型チェック、指定された項目の値が指定された範囲の値であるか否かをチェックする数値型値チェック等である。また、整形処理関数は、例えば、日時型の項目の値を他の書式の日時型の値に変換する日時型変換、指定された２つの日時型の項目の値の差分を所定の日時型の値に変換する日時型差分変換等である。また、整形処理関数は、例えば、指定された項目の値を、異なるカテゴリの値に変換するカテゴリ型変換、指定された項目の値のうち、欠損している値を予測処理等で補完する欠損値補完等である。 As the shaping processing function, a plurality of predetermined types of shaping processing functions can be prepared. The formatting function is, for example, a numerical type check that checks whether the value of the specified item is a numerical value, or a numerical type value check that checks whether the value of the specified item is a value in the specified range. And so on. Further, the formatting function is, for example, a date / time type conversion that converts a value of a date / time type item into a date / time type value of another format, and a difference between the values of two specified date / time type items of a predetermined date / time type. Date and time type difference conversion to convert to a value. Further, the shaping processing function is, for example, a category type conversion that converts the value of the specified item into a value of a different category, a defect that complements the missing value among the values of the specified item by prediction processing, or the like. Value complementation, etc.

例えば、整形部１１は、図４に示すような整形処理画面３０を表示装置に表示する。図４の例では、整形処理画面３０には、整形処理の対象のデータセットのファイル名が表示されるファイル名領域３１、整形処理の内容及び実行を指定するための実行領域３２、及び整形処理の結果が表示される結果領域３３が含まれる。実行領域３２には、予め定められた複数種類の整形処理関数が選択可能な状態で、一覧で表示される。また、実行領域３２には、一覧から整形処理関数が選択された際に、整形処理関数の引数となる項目を指定するためのテキストボックス、及び整形処理の実行を指示するための実行ボタンが表示される。 For example, the shaping unit 11 displays the shaping processing screen 30 as shown in FIG. 4 on the display device. In the example of FIG. 4, the shaping process screen 30 has a file name area 31 in which the file name of the data set to be shaped is displayed, an execution area 32 for designating the content and execution of the shaping process, and the shaping process. The result area 33 in which the result of is displayed is included. In the execution area 32, a plurality of predetermined types of shaping processing functions are displayed in a list in a selectable state. Further, in the execution area 32, when a shaping process function is selected from the list, a text box for specifying an item to be an argument of the shaping process function and an execution button for instructing the execution of the shaping process are displayed. Will be done.

例えば、整形処理の担当者は、対象となるデータセットのファイルを読み込み、整形処理画面３０の実行領域３２において、一覧から実行する整形処理関数を選択し、引数となる項目をテキストボックスに入力して、実行ボタンを押下する。整形部１１は、整形処理画面３０で指定された整形処理関数及び引数となる項目の情報を取得し、データセットに対して取得した情報に基づいて整形処理を実行する。そして、整形部１１は、整形処理の実行結果を整形処理画面３０の結果領域３３に表示する。 For example, the person in charge of the shaping process reads the file of the target data set, selects the shaping processing function to be executed from the list in the execution area 32 of the shaping process screen 30, and inputs the item to be an argument in the text box. And press the execute button. The shaping unit 11 acquires information on the shaping processing function specified on the shaping processing screen 30 and items to be arguments, and executes the shaping process based on the acquired information on the data set. Then, the shaping unit 11 displays the execution result of the shaping process in the result area 33 of the shaping process screen 30.

図４では、データセットＡに対して、数値型チェックの整形処理を実行する例を示している。なお、図４では、チェックマークにより選択された整形処理関数を示している。後述する図６においても同様である。また、引数としては、数値型をチェックする対象の項目として「Ｘ１」が指定されている。この例では、項目「Ｘ１」の３行目の値が「０．．１」となっており、数値ではないことを示す結果が結果領域３３に表示されている。なお、この例では、整形部１１は、チェック結果に基づいて自動で値を修正してもよいし、担当者による該当の値の修正を受け付けてもよい。図５に、値の修正も含めた整形処理後のデータセットＡを示す。 FIG. 4 shows an example of executing the formatting process of the numerical type check for the data set A. Note that FIG. 4 shows the shaping processing function selected by the check mark. The same applies to FIG. 6 described later. Further, as an argument, "X1" is specified as an item for checking the numerical type. In this example, the value in the third row of the item "X1" is "0.1", and the result indicating that it is not a numerical value is displayed in the result area 33. In this example, the shaping unit 11 may automatically correct the value based on the check result, or may accept the correction of the corresponding value by the person in charge. FIG. 5 shows the data set A after the shaping process including the correction of the value.

図６に、整形処理画面３０の他の例を示す。図６では、データセットＢに対して、日時型差分変換の整形処理を実行する例を示している。また、引数としては、項目名１に「運転開始日」、項目名２に「障害発生日」、差分変換結果項目名に「稼働期間」、差分結果日時型に「日」がそれぞれ指定されている。これは、運転開始日と障害発生日との差分日数を、稼働期間という項目として追加する整形処理が実行されることを表している。したがって、整形部１１は、運転開始日から障害発生日までの期間を日換算で計算した結果を、例えばデータセットの最終列に項目名「稼働期間」として追加する。図７に、整形処理後のデータセットＢを示す。 FIG. 6 shows another example of the shaping processing screen 30. FIG. 6 shows an example of executing the shaping process of the datetime type difference conversion for the data set B. As arguments, "operation start date" is specified for item name 1, "failure occurrence date" is specified for item name 2, "operating period" is specified for the difference conversion result item name, and "day" is specified for the difference result date and time type. There is. This indicates that the shaping process of adding the difference number of days between the operation start date and the failure occurrence date as an item called the operation period is executed. Therefore, the shaping unit 11 adds the result of calculating the period from the operation start date to the failure occurrence date on a daily basis, for example, as the item name “operating period” in the last column of the data set. FIG. 7 shows the data set B after the shaping process.

整形部１１は、データセットに対して実行した整形処理の履歴を、整形処理履歴ＤＢ２１に記憶する。整形処理履歴ＤＢ２１には、実行された整形処理関数、及びその引数となった項目が、整形処理履歴データとして記憶される。整形部１１は、１つのデータセットに対して、複数の整形処理関数を実行した場合、複数の整形処理履歴データを整形処理履歴ＤＢ２１に記憶する。 The shaping unit 11 stores the history of the shaping process executed on the data set in the shaping process history DB 21. In the shaping process history DB 21, the executed shaping processing function and the items as its arguments are stored as shaping processing history data. When a plurality of shaping processing functions are executed for one data set, the shaping unit 11 stores a plurality of shaping processing history data in the shaping processing history DB 21.

図８に、データセットＡに対する整形処理の履歴を記憶した整形処理履歴ＤＢ２１の一例を示す。図８に示すように、整形処理履歴ＤＢ２１には、データセットに対して実行された「整形処理関数」に、その整形処理関数の引数名及び引数となった項目の情報が「引数」として対応付けて記憶される。図８では、図４の整形処理画面３０の例で説明した数値型チェックに加え、項目「Ｘ３」に対する数値型値チェックの整形処理が実行された例を示している。また、図９に、データセットＢに対する整形処理の履歴を記憶した整形処理履歴ＤＢ２１の一例を示す。図９では、図６の整形処理画面３０の例で説明した日時型差分変換の整形処理が実行された例を示している。 FIG. 8 shows an example of the shaping process history DB 21 that stores the history of shaping processing for the data set A. As shown in FIG. 8, in the shaping process history DB 21, the argument name of the shaping processing function and the information of the item that became the argument correspond to the "shaping processing function" executed for the data set as "arguments". Attached and memorized. FIG. 8 shows an example in which the shaping process of the numerical type value check for the item “X3” is executed in addition to the numerical type check described in the example of the shaping processing screen 30 of FIG. Further, FIG. 9 shows an example of the shaping process history DB 21 that stores the history of the shaping process for the data set B. FIG. 9 shows an example in which the shaping process of the datetime type difference conversion described in the example of the shaping process screen 30 of FIG. 6 is executed.

取得部１２は、整形部１１から整形処理済みのデータセットを取得する。また、取得部１２は、整形処理履歴ＤＢ２１から、取得した整形処理済みのデータセットについての整形処理履歴ＤＢ２１を取得する。また、取得部１２は、外部の記憶装置に記憶された優先項目ＤＢ２２を取得する。優先項目ＤＢ２２には、複数種類ある整形処理関数のうち、特定の整形処理関数において引数となる特定の項目が定義されている。特定の整形処理関数において引数となる特定の項目とは、学習データに含める特徴量として優先的に選択すべき項目（以下、「優先項目」という）である。具体的には、整形処理後にその項目の値を使用することを前提とした整形処理が行われた項目が優先項目として定義される。なお、優先項目ＤＢ２２に記憶される情報は、開示の技術の「特定情報」の一例である。図１０に、優先項目ＤＢ２２の一例を示す。図１０の例では、特定の整形処理を示す「整形処理関数」に、その整形処理関数で引数となる項目のうち、優先項目となる引数が「優先項目対象引数」として対応付けて記憶されている。取得部１２は、取得した情報を抽出部１３へ受け渡す。 The acquisition unit 12 acquires a data set that has been shaped from the shaping unit 11. In addition, the acquisition unit 12 acquires the shaping process history DB 21 for the acquired data set that has been shaped, from the shaping process history DB 21. Further, the acquisition unit 12 acquires the priority item DB 22 stored in the external storage device. Among the plurality of types of shaping processing functions, the priority item DB 22 defines a specific item as an argument in the specific shaping processing function. The specific item that becomes an argument in the specific shaping processing function is an item that should be preferentially selected as a feature amount to be included in the training data (hereinafter referred to as "priority item"). Specifically, an item that has undergone shaping processing on the assumption that the value of that item will be used after the shaping process is defined as a priority item. The information stored in the priority item DB 22 is an example of "specific information" of the disclosed technology. FIG. 10 shows an example of the priority item DB 22. In the example of FIG. 10, among the items that are arguments in the shaping process function, the argument that is the priority item is stored in association with the "shaping process function" that indicates the specific shaping process as the "priority item target argument". There is. The acquisition unit 12 passes the acquired information to the extraction unit 13.

抽出部１３は、取得部１２から受け渡された整形処理履歴ＤＢ２１に基づいて、整形処理済みのデータセットに含まれる複数の項目から、優先項目ＤＢ２２で定義された優先項目対象引数に対応する項目を、そのデータセットにおける優先項目として抽出する。 Based on the shaping process history DB 21 passed from the acquisition unit 12, the extraction unit 13 is an item corresponding to the priority item target argument defined in the priority item DB 22 from a plurality of items included in the formatted data set. Is extracted as a priority item in the data set.

具体的には、抽出部１３は、図１１に示すように、整形処理履歴データとして整形処理履歴ＤＢ２１に記憶されている整形処理関数をキーとして、優先項目ＤＢ２２の「整形処理関数」を検索する。抽出部１３は、優先項目ＤＢ２２において、キーとした整形処理関数に一致する「整形処理関数」に対応付けられている「優先項目対象引数」を取得する。そして、抽出部１３は、整形処理履歴データから、取得した優先項目対象引数に対応する整形処理関数の実行時に引数となった項目を、優先項目として抽出する。データセットＡの例では、図１１に示すように、優先項目対象引数として「チェック項目名」が取得され、チェック項目名に対応する項目「Ｘ１」が優先項目として抽出される。 Specifically, as shown in FIG. 11, the extraction unit 13 searches for the "shaping function" of the priority item DB 22 using the shaping processing function stored in the shaping processing history DB 21 as the shaping processing history data as a key. .. The extraction unit 13 acquires the "priority item target argument" associated with the "shaping process function" that matches the shaping process function used as the key in the priority item DB 22. Then, the extraction unit 13 extracts the item that became an argument at the time of execution of the shaping processing function corresponding to the acquired priority item target argument from the shaping processing history data as a priority item. In the example of the data set A, as shown in FIG. 11, the “check item name” is acquired as the priority item target argument, and the item “X1” corresponding to the check item name is extracted as the priority item.

抽出部１３は、整形処理履歴ＤＢ２１に記憶されている整形処理履歴データの全てについて上記処理を行う。したがって、整形処理履歴データに含まれる複数の整形処理関数の各々が、優先項目ＤＢ２２の「整形処理関数」のいずれかと一致する場合、複数の優先項目が抽出されることになる。なお、図１１の例では、整形処理履歴データに含まれる整形処理関数「数値型値チェック」は、優先項目ＤＢ２２に定義されていないため、対応する項目「Ｘ３」は優先項目として抽出されない。抽出部１３は、抽出した優先項目の情報、及び整形処理済みのデータセットを選択部１４へ受け渡す。 The extraction unit 13 performs the above processing on all of the shaping processing history data stored in the shaping processing history DB 21. Therefore, if each of the plurality of shaping processing functions included in the shaping processing history data matches any of the "shaping processing functions" of the priority item DB 22, the plurality of priority items will be extracted. In the example of FIG. 11, since the shaping processing function "numerical type value check" included in the shaping processing history data is not defined in the priority item DB 22, the corresponding item "X3" is not extracted as a priority item. The extraction unit 13 passes the extracted priority item information and the formatted data set to the selection unit 14.

選択部１４は、抽出部１３から受け渡された優先項目と、整形処理済みのデータセットに含まれる複数の項目のうち、優先項目以外の他の項目の各々との相関を示す指標を算出する。選択部１４は、例えば図１２に示すような、優先項目と他の項目との相関を示す相関テーブルを作成する。図１２の例では、相関テーブルには、優先項目と他の項目との相関を示す指標として算出された相関係数が格納されている。 The selection unit 14 calculates an index showing the correlation between the priority item passed from the extraction unit 13 and each of the other items other than the priority item among the plurality of items included in the formatted data set. .. The selection unit 14 creates a correlation table showing the correlation between the priority item and other items, as shown in FIG. 12, for example. In the example of FIG. 12, the correlation table stores a correlation coefficient calculated as an index showing the correlation between the priority item and other items.

選択部１４は、まず、優先項目を、学習データに含める特徴量として選択する。そして、選択部１４は、算出した、優先項目との相関を示す指標が予め定められた閾値より低い他の項目を、学習データに含める特徴量として選択する。優先項目との相関が低い他の項目を選択するのは、機械学習の際に、優先項目との間で多重共線性が生じることを回避するためである。閾値は、予め定めた値としてもよいし、相関テーブルにおいてＮ番目に小さい指標を閾値としてもよい。後者の場合、Ｎ個の他の項目が特徴量として選択されることになる。図１２の例において、閾値を０．７とした場合、選択部１４は、項目「Ｘ１」、「Ｘ２」、及び「Ｘ５」を特徴量として選択する。選択部１４は、データセットに含まれる複数の項目、及び特徴量として選択した項目の情報を受付部１５へ受け渡す。 The selection unit 14 first selects a priority item as a feature amount to be included in the learning data. Then, the selection unit 14 selects another item whose index indicating the correlation with the priority item is lower than the predetermined threshold value as the feature amount to be included in the learning data. The reason for selecting other items that have a low correlation with the priority item is to avoid the occurrence of multicollinearity with the priority item during machine learning. The threshold value may be a predetermined value, or the Nth smallest index in the correlation table may be used as the threshold value. In the latter case, N other items will be selected as features. In the example of FIG. 12, when the threshold value is 0.7, the selection unit 14 selects the items “X1”, “X2”, and “X5” as feature quantities. The selection unit 14 passes information on a plurality of items included in the data set and items selected as feature quantities to the reception unit 15.

受付部１５は、データセットに含まれる複数の項目の各々を、特徴量として選択されたか否かを示す情報と共に、例えばモデル設計の担当者に提示する。そして、受付部１５は、いずれかの項目に対する、特徴量としての選択の追加又は解除を担当者から受け付ける。 The reception unit 15 presents each of the plurality of items included in the data set to, for example, a person in charge of model design, together with information indicating whether or not the items are selected as the feature amount. Then, the reception unit 15 receives from the person in charge the addition or cancellation of the selection as the feature amount for any of the items.

例えば、受付部１５は、図１３に示すような受付画面３５を表示装置に表示する。図１３の例では、受付画面３５には、整形処理後のデータセットのファイル名が表示されるファイル名領域３６と、特徴量選択の追加又は解除の受け付け、及び確定を指示するための受付領域３７とが含まれる。受付領域３７には、複数の項目の各々が、特徴量として選択されているか否かを修正可能な状態で表示される。また、受付領域３７には、特徴量の選択を確定する際に押下される確定ボタンが表示される。図１３の例では、特徴量として選択されている項目に対応付けてチェックマークが表示されている。 For example, the reception unit 15 displays the reception screen 35 as shown in FIG. 13 on the display device. In the example of FIG. 13, the reception screen 35 has a file name area 36 in which the file name of the data set after the shaping process is displayed, and a reception area for instructing addition or cancellation of feature quantity selection and confirmation. 37 and are included. In the reception area 37, whether or not each of the plurality of items is selected as a feature amount is displayed in a state in which it can be corrected. Further, in the reception area 37, a confirmation button pressed when confirming the selection of the feature amount is displayed. In the example of FIG. 13, a check mark is displayed in association with the item selected as the feature amount.

担当者は、特徴量の選択を追加する場合には、図１３の下段の破線部に示すように、該当の項目にチェックマークを付与する。また、担当者は、特徴量の選択を解除する場合には、該当の項目のチェックマークを取り消す。担当者は、特徴量の選択の修正を終えると、確定ボタンを押下する。これにより、受付部１５は、特徴量選択の確定情報を受け付け、出力する。また、受付部１５は、特徴量として選択された項目のうち、選択部１４により選択された項目以外の項目、すなわち、担当者により特徴量としての選択が追加された項目の情報を更新部１６へ受け渡す。図１３の例では、特徴量選択の確定情報として、項目「Ｘ１」、「Ｘ２」、「Ｘ３」、及び「Ｘ５」が出力され、選択が追加された項目として項目「Ｘ３」の情報が更新部１６へ受け渡される。 When adding the selection of the feature amount, the person in charge adds a check mark to the corresponding item as shown by the broken line portion in the lower part of FIG. In addition, the person in charge cancels the check mark of the corresponding item when deselecting the feature amount. When the person in charge finishes modifying the selection of the feature amount, he / she presses the confirm button. As a result, the reception unit 15 receives and outputs the confirmation information of the feature amount selection. Further, the reception unit 15 updates the information of the items selected as the feature amount other than the items selected by the selection unit 14, that is, the items to which the person in charge has added the selection as the feature amount. Hand over to. In the example of FIG. 13, the items "X1", "X2", "X3", and "X5" are output as the confirmation information of the feature amount selection, and the information of the item "X3" is updated as the item to which the selection is added. It is handed over to department 16.

更新部１６は、担当者により選択が追加された項目を引数とする整形処理関数を、優先項目ＤＢ２２に追加する。具体的には、更新部１６は、図１４に示すように、追加された項目をキーとして、該当のデータセットについての整形処理履歴データの「引数」を検索し、追加された項目が引数に含まれる整形処理関数を抽出する。更新部１６は、抽出した整形処理関数をキーとして、優先項目ＤＢ２２の「整形処理関数」を検索する。そして、更新部１６は、キーとした整形処理関数が優先項目ＤＢ２２に登録されていない場合、その整形処理関数及び優先項目対象引数を優先項目ＤＢ２２に追加する。優先項目対象引数は、担当者により選択が追加された項目である。図１４の例では、追加された項目「Ｘ３」を「チェック項目名」という引数とする整形処理関数「数値型値チェック」が抽出される。そして、更新前の優先項目ＤＢ２２には、「数値型値チェック」は登録されていないため、「数値型値チェック」及びその優先項目対象引数「チェック項目名」が優先項目ＤＢ２２に追加される。 The update unit 16 adds a shaping processing function having an item whose selection has been added by the person in charge as an argument to the priority item DB 22. Specifically, as shown in FIG. 14, the update unit 16 searches for the "argument" of the shaping processing history data for the corresponding data set using the added item as a key, and the added item becomes an argument. Extract the included formatting functions. The update unit 16 searches for the "formatting function" of the priority item DB 22 using the extracted shaping processing function as a key. Then, when the shaping processing function used as a key is not registered in the priority item DB 22, the update unit 16 adds the shaping processing function and the priority item target argument to the priority item DB 22. The priority item target argument is an item for which a selection has been added by the person in charge. In the example of FIG. 14, the shaping processing function “numeric type value check” in which the added item “X3” is used as an argument “check item name” is extracted. Since the "numeric type value check" is not registered in the priority item DB 22 before the update, the "numeric type value check" and the priority item target argument "check item name" are added to the priority item DB 22.

特徴量選択装置１０は、例えば図１５に示すコンピュータ４０で実現することができる。コンピュータ４０は、ＣＰＵ（Central Processing Unit）４１と、一時記憶領域としてのメモリ４２と、不揮発性の記憶部４３とを備える。また、コンピュータ４０は、入力部、表示部等の入出力装置４４と、記憶媒体４９に対するデータの読み込み及び書き込みを制御するＲ／Ｗ（Read/Write）部４５とを備える。また、コンピュータ４０は、インターネット等のネットワークに接続される通信Ｉ／Ｆ（Interface）４６を備える。ＣＰＵ４１、メモリ４２、記憶部４３、入出力装置４４、Ｒ／Ｗ部４５、及び通信Ｉ／Ｆ４６は、バス４７を介して互いに接続される。 The feature amount selection device 10 can be realized by, for example, the computer 40 shown in FIG. The computer 40 includes a CPU (Central Processing Unit) 41, a memory 42 as a temporary storage area, and a non-volatile storage unit 43. Further, the computer 40 includes an input / output device 44 such as an input unit and a display unit, and an R / W (Read / Write) unit 45 that controls reading and writing of data to the storage medium 49. Further, the computer 40 includes a communication I / F (Interface) 46 connected to a network such as the Internet. The CPU 41, the memory 42, the storage unit 43, the input / output device 44, the R / W unit 45, and the communication I / F 46 are connected to each other via the bus 47.

記憶部４３は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等によって実現できる。記憶媒体としての記憶部４３には、コンピュータ４０を、特徴量選択装置１０として機能させるための特徴量選択プログラム５０が記憶される。特徴量選択プログラム５０は、整形プロセス５１と、取得プロセス５２と、抽出プロセス５３と、選択プロセス５４と、受付プロセス５５と、更新プロセス５６とを有する。 The storage unit 43 can be realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. In the storage unit 43 as a storage medium, a feature amount selection program 50 for making the computer 40 function as the feature amount selection device 10 is stored. The feature amount selection program 50 includes a shaping process 51, an acquisition process 52, an extraction process 53, a selection process 54, a reception process 55, and an update process 56.

ＣＰＵ４１は、特徴量選択プログラム５０を記憶部４３から読み出してメモリ４２に展開し、特徴量選択プログラム５０が有するプロセスを順次実行する。ＣＰＵ４１は、整形プロセス５１を実行することで、図１に示す整形部１１として動作する。また、ＣＰＵ４１は、取得プロセス５２を実行することで、図１に示す取得部１２として動作する。また、ＣＰＵ４１は、抽出プロセス５３を実行することで、図１に示す抽出部１３として動作する。また、ＣＰＵ４１は、選択プロセス５４を実行することで、図１に示す選択部１４として動作する。また、ＣＰＵ４１は、受付プロセス５５を実行することで、図１に示す受付部１５として動作する。また、ＣＰＵ４１は、更新プロセス５６を実行することで、図１に示す更新部１６として動作する。また、ＣＰＵ４１は、各プロセスの実行時に、整形処理履歴ＤＢ２１及び優先項目ＤＢ２２の各々をメモリ４２に展開する。これにより、特徴量選択プログラム５０を実行したコンピュータ４０が、特徴量選択装置１０として機能することになる。なお、プログラムを実行するＣＰＵ４１はハードウェアである。 The CPU 41 reads the feature amount selection program 50 from the storage unit 43, expands the feature amount selection program 50 into the memory 42, and sequentially executes the processes included in the feature amount selection program 50. The CPU 41 operates as the shaping unit 11 shown in FIG. 1 by executing the shaping process 51. Further, the CPU 41 operates as the acquisition unit 12 shown in FIG. 1 by executing the acquisition process 52. Further, the CPU 41 operates as the extraction unit 13 shown in FIG. 1 by executing the extraction process 53. Further, the CPU 41 operates as the selection unit 14 shown in FIG. 1 by executing the selection process 54. Further, the CPU 41 operates as the reception unit 15 shown in FIG. 1 by executing the reception process 55. Further, the CPU 41 operates as the update unit 16 shown in FIG. 1 by executing the update process 56. Further, the CPU 41 expands each of the shaping process history DB 21 and the priority item DB 22 into the memory 42 when each process is executed. As a result, the computer 40 that has executed the feature amount selection program 50 functions as the feature amount selection device 10. The CPU 41 that executes the program is hardware.

なお、特徴量選択プログラム５０により実現される機能は、例えば半導体集積回路、より詳しくはＡＳＩＣ（Application Specific Integrated Circuit）等で実現することも可能である。 The function realized by the feature amount selection program 50 can also be realized by, for example, a semiconductor integrated circuit, more specifically, an ASIC (Application Specific Integrated Circuit) or the like.

次に、本実施形態に係る特徴量選択装置１０の作用について説明する。学習データに含める特徴量の選択処理が指示されると、特徴量選択装置１０において、図１６に示す特徴量選択処理が実行される。なお、特徴量選択処理は、開示の技術の特徴量選択方法の一例である。 Next, the operation of the feature amount selection device 10 according to the present embodiment will be described. When the feature amount selection process to be included in the learning data is instructed, the feature amount selection process shown in FIG. 16 is executed in the feature amount selection device 10. The feature amount selection process is an example of the feature amount selection method of the disclosed technology.

ステップＳ１１で、整形部１１が、整形処理前のデータセットを読み込む。 In step S11, the shaping unit 11 reads the data set before the shaping process.

次に、ステップＳ１２で、整形部１１が、データセットに対して実行する整形処理関数、及び整形処理関数の引数となる項目の指定を受け付け、読み込んだ整形処理前のデータセットに対して、指定された項目を引数として、指定された整形処理関数を実行する。そして、整形部１１が、データセットに対して実行した整形処理関数と、その引数となった項目とを対応付けた整形処理履歴データを、整形処理履歴ＤＢ２１に記憶する。 Next, in step S12, the shaping unit 11 accepts the specification of the shaping processing function to be executed for the data set and the item to be the argument of the shaping processing function, and specifies the read data set before the shaping process. Executes the specified formatting function with the specified item as an argument. Then, the shaping process history data in which the shaping process function executed for the data set and the item as an argument thereof are associated with each other is stored in the shaping process history DB 21 by the shaping unit 11.

次に、ステップＳ１３で、取得部１２が、整形部１１から整形処理済みのデータセットを取得し、整形処理履歴ＤＢ２１から、取得した整形処理済みのデータセットについての整形処理履歴データを取得し、外部の記憶装置に記憶された優先項目ＤＢ２２を取得する。そして、取得部１２が、取得した情報を抽出部１３へ受け渡す。抽出部１３は、取得部１２から受け渡された整形処理履歴データに基づいて、整形処理済みのデータセットに含まれる複数の項目から、優先項目ＤＢ２２で定義された優先項目対象引数に対応する項目を、そのデータセットにおける優先項目として抽出する。抽出部１３は、抽出した優先項目の情報、及び整形処理済みのデータセットを選択部１４へ受け渡す。 Next, in step S13, the acquisition unit 12 acquires the shaping processing data set from the shaping unit 11, and acquires the shaping processing history data for the acquired shaping processing data set from the shaping processing history DB 21. The priority item DB 22 stored in the external storage device is acquired. Then, the acquisition unit 12 passes the acquired information to the extraction unit 13. The extraction unit 13 is an item corresponding to the priority item target argument defined in the priority item DB 22 from a plurality of items included in the formatted data set based on the shaping processing history data passed from the acquisition unit 12. Is extracted as a priority item in the data set. The extraction unit 13 passes the extracted priority item information and the formatted data set to the selection unit 14.

次に、ステップＳ１４で、選択部１４が、抽出部１３から受け渡された優先項目と、整形処理済みのデータセットに含まれる複数の項目のうち、優先項目以外の他の項目の各々との相関を示す指標（例えば相関係数）を算出する。 Next, in step S14, the selection unit 14 sets the priority items passed from the extraction unit 13 and each of the plurality of items included in the formatted data set other than the priority items. An index showing the correlation (for example, a correlation coefficient) is calculated.

次に、ステップＳ１５で、選択部１４が、まず、優先項目を、学習データに含める特徴量として選択する。さらに、選択部１４が、算出した相関を示す指標が予め定められた閾値より低い他の項目を、学習データに含める特徴量として選択する。選択部１４は、データセットに含まれる複数の項目、及び特徴量として選択した項目の情報を受付部１５へ受け渡す。 Next, in step S15, the selection unit 14 first selects a priority item as a feature amount to be included in the learning data. Further, the selection unit 14 selects another item whose index indicating the calculated correlation is lower than a predetermined threshold value as the feature amount to be included in the learning data. The selection unit 14 passes information on a plurality of items included in the data set and items selected as feature quantities to the reception unit 15.

次に、ステップＳ１６で、受付部１５が、データセットに含まれる複数の項目の各々が、特徴量として選択されているか否かを修正可能な状態で表示した受付画面３５を表示装置に表示する。そして、受付部１５が、受付画面３５を介して、いずれかの項目に対する、特徴量としての選択の追加又は解除、すなわち、特徴量の選択の修正を担当者から受け付ける。 Next, in step S16, the reception unit 15 displays on the display device a reception screen 35 in which it is possible to modify whether or not each of the plurality of items included in the data set is selected as a feature amount. .. Then, the reception unit 15 receives from the person in charge the addition or cancellation of the selection as the feature amount for any item, that is, the correction of the selection of the feature amount, via the reception screen 35.

次に、ステップＳ１７で、受付部１５が、受付画面３５において確定ボタンが押下された際に特徴量として選択されている項目の情報を受け付け、特徴量選択の確定情報として出力する。また、受付部１５が、担当者により特徴量としての選択が追加された項目の情報を更新部１６へ受け渡す。 Next, in step S17, the reception unit 15 receives the information of the item selected as the feature amount when the confirmation button is pressed on the reception screen 35, and outputs the information as the confirmation information of the feature amount selection. Further, the reception unit 15 passes the information of the item to which the person in charge has added the selection as the feature amount to the update unit 16.

次に、ステップＳ１８で、更新部１６が、担当者により選択が追加された項目を引数とする整形処理関数が優先項目ＤＢ２２に登録されていない場合には、その整形処理関数及び優先項目対象引数を優先項目ＤＢ２２に追加する。これにより、優先項目ＤＢ２２が更新され、特徴量選択処理は終了する。 Next, in step S18, if the shaping processing function having the item to which the selection has been added by the person in charge as an argument is not registered in the priority item DB 22, the updating unit 16 has the shaping processing function and the priority item target argument. Is added to the priority item DB22. As a result, the priority item DB 22 is updated, and the feature amount selection process ends.

以上説明したように、本実施形態に係る特徴量選択装置は、データセットに対して整形処理を実行すると共に、整形処理の履歴を記憶する。そして、特徴量選択装置は、整形処理の履歴、及び特定の整形処理関数の引数となる特定の項目が優先項目として定義された優先項目ＤＢに基づいて、整形処理済みのデータセットに含まれる複数の項目から、優先項目を抽出し、特徴量として選択する。これにより、正解データを必要とすることなく、データセットに含まれる項目から、効果的な機械学習を行うための特徴量を選択することができる。 As described above, the feature amount selection device according to the present embodiment executes the shaping process on the data set and stores the history of the shaping process. Then, the feature quantity selection device includes a plurality of shaped data sets included in the shaped data set based on the history of the shaping process and the priority item DB in which the specific item as an argument of the specific shaping process function is defined as the priority item. Priority items are extracted from the items of and selected as features. As a result, it is possible to select a feature amount for effective machine learning from the items included in the data set without requiring correct answer data.

また、本実施形態に係る特徴量選択装置は、優先項目との相関が低い他の項目も特徴量として選択する。これにより、多重共線性の発生を回避しつつ、学習データに含める特徴量をさらに選択することができる。なお、優先項目のみで学習データに含める特徴量として数が足りている場合などには、優先項目との相関が低い他の項目も特徴量として選択する処理は必ずしも行う必要はない。 Further, the feature amount selection device according to the present embodiment also selects other items having a low correlation with the priority item as the feature amount. This makes it possible to further select the features to be included in the training data while avoiding the occurrence of multicollinearity. When the number of feature quantities to be included in the learning data is sufficient only for the priority items, it is not always necessary to perform the process of selecting other items having a low correlation with the priority items as the feature quantities.

また、本実施形態に係る特徴量選択装置は、選択した特徴量に対する修正を受け付けて、特徴量の選択を確定させる。これにより、例えば、モデル設計の担当者等の判断を踏まえた特徴量を選択することができる。なお、特徴量選択の修正を受け付けることは必須ではなく、選択部により選択された特徴量を、特徴量選択の確定情報として出力してもよい。 Further, the feature amount selection device according to the present embodiment accepts modifications to the selected feature amount and confirms the selection of the feature amount. Thereby, for example, the feature amount can be selected based on the judgment of the person in charge of model design or the like. It is not essential to accept the modification of the feature amount selection, and the feature amount selected by the selection unit may be output as the confirmation information of the feature amount selection.

また、本実施形態に係る特徴量選択装置は、担当者により特徴量としての選択が追加された項目、及びその項目を引数とする整形処理関数を優先項目ＤＢに追加する。これにより、優先項目ＤＢを業務等の実態に沿って更新することができる。なお、優先項目ＤＢの更新は必須ではない。また、担当者による特徴量としての選択の追加が行われる都度、優先項目ＤＢを更新するのではなく、同一の項目について、選択の追加が所定回数行われた段階で優先項目ＤＢの更新を行うようにしてもよい。 Further, the feature amount selection device according to the present embodiment adds an item to which the person in charge has added selection as a feature amount and a shaping processing function having the item as an argument to the priority item DB. As a result, the priority item DB can be updated according to the actual situation of business and the like. It is not essential to update the priority item DB. Also, instead of updating the priority item DB each time the person in charge adds a selection as a feature amount, the priority item DB is updated when the selection is added a predetermined number of times for the same item. You may do so.

また、上記実施形態では、特徴量選択装置がデータセットに対する整形処理を実行する場合について説明したが、これに限定されない。例えば、外部装置で整形処理が実行されると共に、整形処理履歴データが記憶されていてもよい。この場合、特徴量選択装置から整形部の構成を省き、取得部が、外部装置から整形処理済みのデータセット及び整形処理履歴ＤＢを読み込むようにすればよい。 Further, in the above embodiment, the case where the feature amount selection device executes the shaping process for the data set has been described, but the present invention is not limited to this. For example, the shaping process may be executed by an external device and the shaping process history data may be stored. In this case, the configuration of the shaping unit may be omitted from the feature amount selection device, and the acquisition unit may read the data set having been shaped and the shaping processing history DB from the external device.

また、上記実施形態で例示した整形処理関数は一例である。 Further, the shaping processing function exemplified in the above embodiment is an example.

また、上記実施形態では、特徴量選択プログラムが記憶部に予め記憶（インストール）されている態様を説明したが、これに限定されない。開示の技術に係るプログラムは、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ＵＳＢメモリ等の記憶媒体に記憶された形態で提供することも可能である。 Further, in the above embodiment, the embodiment in which the feature amount selection program is stored (installed) in the storage unit in advance has been described, but the present invention is not limited to this. The program according to the disclosed technology can also be provided in a form stored in a storage medium such as a CD-ROM, a DVD-ROM, or a USB memory.

以上の実施形態に関し、さらに以下の付記を開示する。 The following additional notes are further disclosed with respect to the above embodiments.

（付記１）
複数の特徴量を含み、かつ整形処理が実行されたデータセットと、前記整形処理の履歴と、特定の整形処理において引数となる特定の特徴量が定義された特定情報とを取得し、
取得した前記整形処理の履歴に基づいて、前記データセットに含まれる前記複数の特徴量から、前記特定情報で定義された前記特定の特徴量に対応する特徴量を選択する
ことを含む処理をコンピュータに実行させるための特徴量選択プログラム。 (Appendix 1)
A data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and specific information in which a specific feature quantity as an argument in a specific shaping process is defined are acquired.
Based on the acquired history of the shaping process, a computer performs a process including selecting a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set. Feature selection program to be executed by.

（付記２）
前記特定の特徴量に対応する特徴量として選択された第１の特徴量と、前記データセットに含まれる前記複数の特徴量のうち、前記第１の特徴量以外の第２の特徴量の各々との相関を示す指標を算出し、前記第１の特徴量と共に、前記指標が予め定められた閾値より低い前記第２の特徴量を選択する付記１に記載の特徴量選択プログラム。 (Appendix 2)
Each of the first feature amount selected as the feature amount corresponding to the specific feature amount and the second feature amount other than the first feature amount among the plurality of feature amounts included in the data set. The feature amount selection program according to Appendix 1, which calculates an index showing a correlation with and selects the second feature amount whose index is lower than a predetermined threshold value together with the first feature amount.

（付記３）
前記複数の特徴量の各々を、選択されたか否かを示す情報と共にユーザに提示し、いずれかの特徴量に対する選択の追加又は解除を受け付けることをさらに含む処理を前記コンピュータに実行させるための付記１又は付記２に記載の特徴量選択プログラム。 (Appendix 3)
An appendix for presenting each of the plurality of feature quantities to the user together with information indicating whether or not the feature quantity has been selected, and causing the computer to perform a process further including accepting addition or cancellation of selection for any of the feature quantities. The feature amount selection program described in 1 or Appendix 2.

（付記４）
ユーザにより選択の追加が行われた特徴量を引数とする整形処理を、前記特定情報に追加することをさらに含む処理を前記コンピュータに実行させるための付記３に記載の特徴量選択プログラム。 (Appendix 4)
The feature amount selection program according to Appendix 3 for causing the computer to perform a process including adding to the specific information a shaping process using a feature amount to which a selection has been added by a user as an argument.

（付記５）
整形処理前のデータセットに対して整形処理を実行し、前記整形処理の履歴を所定の記憶部に記憶することをさらに含む処理を前記コンピュータに実行させるための付記１～付記４のいずれか１項に記載の特徴量選択プログラム。 (Appendix 5)
Any one of Supplementary note 1 to Supplementary note 4 for executing a shaping process on the data set before the shaping process and further causing the computer to perform a process including storing the history of the shaping process in a predetermined storage unit. Feature selection program described in section.

（付記６）
複数の特徴量を含み、かつ整形処理が実行されたデータセットと、前記整形処理の履歴と、特定の整形処理において引数となる特定の特徴量が定義された特定情報とを取得する取得部と、
取得した前記整形処理の履歴に基づいて、前記データセットに含まれる前記複数の特徴量から、前記特定情報で定義された前記特定の特徴量に対応する特徴量を選択する選択部と、
を含む特徴量選択装置。 (Appendix 6)
A data set that includes a plurality of feature quantities and has been subjected to shaping processing, a history of the shaping processing, and an acquisition unit that acquires specific information in which a specific feature quantity that is an argument in a specific shaping process is defined. ,
A selection unit that selects a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set based on the acquired history of the shaping process.
Feature quantity selection device including.

（付記７）
前記選択部は、前記特定の特徴量に対応する特徴量として選択された第１の特徴量と、前記データセットに含まれる前記複数の特徴量のうち、前記第１の特徴量以外の第２の特徴量の各々との相関を示す指標を算出し、前記第１の特徴量と共に、前記指標が予め定められた閾値より低い前記第２の特徴量を選択する付記６に記載の特徴量選択装置。 (Appendix 7)
The selection unit includes a first feature amount selected as a feature amount corresponding to the specific feature amount, and a second feature amount other than the first feature amount among the plurality of feature amounts included in the data set. The feature amount selection according to Appendix 6, which calculates an index showing the correlation with each of the feature amounts of, and selects the second feature amount whose index is lower than a predetermined threshold value together with the first feature amount. Device.

（付記８）
前記複数の特徴量の各々を、選択されたか否かを示す情報と共にユーザに提示し、いずれかの特徴量に対する選択の追加又は解除を受け付ける受付部をさらに含む付記６又は付記７に記載の特徴量選択装置。 (Appendix 8)
The feature according to Appendix 6 or Appendix 7, further including a reception unit that presents each of the plurality of feature quantities to the user together with information indicating whether or not the feature quantity has been selected, and accepts addition or cancellation of selection for any of the feature quantities. Quantity selection device.

（付記９）
ユーザにより選択の追加が行われた特徴量を引数とする整形処理を、前記特定情報に追加する更新部をさらに含む付記８に記載の特徴量選択装置。 (Appendix 9)
The feature amount selection device according to Appendix 8, further including an update unit for adding a shaping process using a feature amount to which a selection has been made by a user as an argument to the specific information.

（付記１０）
整形処理前のデータセットに対して整形処理を実行し、前記整形処理の履歴を所定の記憶部に記憶する整形部をさらに含む付記６～付記９のいずれか１項に記載の特徴量選択装置。 (Appendix 10)
The feature amount selection device according to any one of Supplementary note 6 to Supplementary note 9, which further includes a shaping unit that executes shaping processing on the data set before shaping processing and stores the history of the shaping process in a predetermined storage unit. ..

（付記１１）
複数の特徴量を含み、かつ整形処理が実行されたデータセットと、前記整形処理の履歴と、特定の整形処理において引数となる特定の特徴量が定義された特定情報とを取得し、
取得した前記整形処理の履歴に基づいて、前記データセットに含まれる前記複数の特徴量から、前記特定情報で定義された前記特定の特徴量に対応する特徴量を選択する
ことを含む処理をコンピュータが実行する特徴量選択方法。 (Appendix 11)
A data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and specific information in which a specific feature quantity as an argument in a specific shaping process is defined are acquired.
Based on the acquired history of the shaping process, a computer performs a process including selecting a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set. Feature selection method to be executed by.

（付記１２）
前記特定の特徴量に対応する特徴量として選択された第１の特徴量と、前記データセットに含まれる前記複数の特徴量のうち、前記第１の特徴量以外の第２の特徴量の各々との相関を示す指標を算出し、前記第１の特徴量と共に、前記指標が予め定められた閾値より低い前記第２の特徴量を選択する付記１１に記載の特徴量選択方法。 (Appendix 12)
Each of the first feature amount selected as the feature amount corresponding to the specific feature amount and the second feature amount other than the first feature amount among the plurality of feature amounts included in the data set. The feature amount selection method according to Appendix 11, wherein an index showing a correlation with the above is calculated, and the second feature amount whose index is lower than a predetermined threshold is selected together with the first feature amount.

（付記１３）
前記複数の特徴量の各々を、選択されたか否かを示す情報と共にユーザに提示し、いずれかの特徴量に対する選択の追加又は解除を受け付けることをさらに含む処理を前記コンピュータが実行する付記１１又は付記１２に記載の特徴量選択方法。 (Appendix 13)
Appendix 11 or, the computer performs a process including presenting each of the plurality of feature quantities to the user together with information indicating whether or not the feature quantity has been selected, and accepting addition or cancellation of selection for any of the feature quantities. The feature amount selection method according to Appendix 12.

（付記１４）
ユーザにより選択の追加が行われた特徴量を引数とする整形処理を、前記特定情報に追加することをさらに含む処理を前記コンピュータが実行する付記１３に記載の特徴量選択方法。 (Appendix 14)
The feature amount selection method according to Appendix 13, wherein the computer further executes a shaping process including adding a feature amount to which a selection has been added by a user as an argument to the specific information.

（付記１５）
整形処理前のデータセットに対して整形処理を実行し、前記整形処理の履歴を所定の記憶部に記憶することをさらに含む処理を前記コンピュータが実行する付記１１～付記１４のいずれか１項に記載の特徴量選択方法。 (Appendix 15)
Item 1. Described feature quantity selection method.

（付記１６）
複数の特徴量を含み、かつ整形処理が実行されたデータセットと、前記整形処理の履歴と、特定の整形処理において引数となる特定の特徴量が定義された特定情報とを取得し、
取得した前記整形処理の履歴に基づいて、前記データセットに含まれる前記複数の特徴量から、前記特定情報で定義された前記特定の特徴量に対応する特徴量を選択する
ことを含む処理をコンピュータに実行させるための特徴量選択プログラムを記憶した記憶媒体。 (Appendix 16)
A data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and specific information in which a specific feature quantity as an argument in a specific shaping process is defined are acquired.
Based on the acquired history of the shaping process, a computer performs a process including selecting a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set. A storage medium that stores a feature selection program to be executed by a computer.

１０特徴量選択装置
１１整形部
１２取得部
１３抽出部
１４選択部
１５受付部
１６更新部
２１整形処理履歴ＤＢ
２２優先項目ＤＢ
３０整形処理画面
３５受付画面
４０コンピュータ
４１ＣＰＵ
４２メモリ
４３記憶部
４９記憶媒体
５０特徴量選択プログラム 10 Feature amount selection device 11 Shaped unit 12 Acquisition unit 13 Extraction unit 14 Selection unit 15 Reception unit 16 Update unit 21 Formatting processing history DB
22 Priority item DB
30 Formatting screen 35 Reception screen 40 Computer 41 CPU
42 Memory 43 Storage unit 49 Storage medium 50 Feature quantity selection program

Claims

A data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and specific information in which a specific feature quantity as an argument in a specific shaping process is defined are acquired.
Based on the acquired history of the shaping process, a computer performs a process including selecting a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set. Feature selection program to be executed by.

Each of the first feature amount selected as the feature amount corresponding to the specific feature amount and the second feature amount other than the first feature amount among the plurality of feature amounts included in the data set. The feature amount selection program according to claim 1, wherein an index showing a correlation with the above is calculated, and the second feature amount whose index is lower than a predetermined threshold value is selected together with the first feature amount.

A claim for presenting each of the plurality of feature quantities to the user together with information indicating whether or not the feature quantity has been selected, and causing the computer to perform a process further including accepting addition or cancellation of selection for any of the feature quantities. The feature amount selection program according to claim 1 or claim 2.

The feature amount selection program according to claim 3, wherein the computer is to perform a process including further adding to the specific information a shaping process having a feature amount to which a selection has been added by a user as an argument.

Any of claims 1 to 4 for executing a shaping process on a data set before the shaping process and causing the computer to perform a process including further storing the history of the shaping process in a predetermined storage unit. The feature amount selection program described in item 1.

A data set that includes a plurality of feature quantities and has been subjected to shaping processing, a history of the shaping processing, and an acquisition unit that acquires specific information in which a specific feature quantity that is an argument in a specific shaping process is defined. ,
A selection unit that selects a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set based on the acquired history of the shaping process.
Feature quantity selection device including.

A data set containing a plurality of feature quantities and for which shaping processing has been executed, a history of the shaping processing, and specific information in which a specific feature quantity as an argument in a specific shaping process is defined are acquired.
Based on the acquired history of the shaping process, a computer performs a process including selecting a feature amount corresponding to the specific feature amount defined in the specific information from the plurality of feature amounts included in the data set. Feature selection method to be executed by.