CN117591851A

CN117591851A - Training method, training device, training server and training medium for feature selection algorithm recommendation model

Info

Publication number: CN117591851A
Application number: CN202311558497.4A
Authority: CN
Inventors: 王鑫; 芮海东; 谢鲤鸿; 王飞跃
Original assignee: Boc Financial Technology Suzhou Co ltd
Current assignee: Boc Financial Technology Suzhou Co ltd
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-23

Abstract

The application discloses a training method, a training device, a training server and a training medium for a feature selection algorithm recommendation model, which can be applied to the field of artificial intelligence or the field of finance. Using a plurality of different preset feature selection algorithms to respectively obtain different high-quality feature sets corresponding to the same application scene; aiming at the same application scene, training by using different high-quality feature sets to obtain different first machine learning models; the preset feature selection algorithm for screening out the optimal high-quality feature set is the optimal target preset feature selection algorithm for the application scene; and acquiring a meta-feature set corresponding to the sample feature subset corresponding to the application scene aiming at each application scene. And taking the multiple meta-feature sets as the input of the second machine learning model, taking the identification of the target preset feature selection algorithm corresponding to the multiple meta-feature sets as a training target, and training to obtain a feature selection algorithm recommendation model. The optimal preset feature selection algorithm can be recommended by using the feature selection algorithm recommendation model, so that the method is more accurate.

Description

Training method, training device, training server and training medium for feature selection algorithm recommendation model

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a training method, apparatus, server, and medium for a feature selection algorithm recommendation model.

Background

The machine learning model is one of the most popular topics of the current society, and the research sub-topics derived from the machine learning model are endless. Among these research sub-topics, how to train a machine learning model with excellent robustness is a hot topic. For a certain application scene, the application scene corresponds to a plurality of attribute features, some of the attribute features have positive effects on training the machine learning model, such as enabling the machine learning model to be more accurate, and some of the attribute features have negative effects on training the machine learning model, such as enabling the machine learning model to be more inaccurate. Therefore, the attribute features having the forward effect on training the machine learning model are selected from the plurality of features, and the machine learning model is trained using the attribute features having the forward effect on training the machine learning model, so that the machine learning model with higher accuracy can be obtained.

Thus, a feature selection algorithm for screening from a plurality of attribute features for attribute features having a positive effect on training a machine learning model is applied. There are many feature selection algorithms at present, which feature selection algorithm should be used for a specific application scene to screen multiple attribute features of the application scene, and the manual judgment of technicians is inaccurate.

Disclosure of Invention

In view of this, the present application provides a training method, device, server and medium for a feature selection algorithm recommendation model.

In order to achieve the above purpose, the present application provides the following technical solutions:

according to a first aspect of an embodiment of the present disclosure, there is provided a training method of a feature selection algorithm recommendation model, including:

acquiring sample feature sets corresponding to a plurality of application scenes respectively; the sample feature set corresponding to the application scene comprises sample feature subsets respectively corresponding to a plurality of labeling classification labels, wherein the sample feature subsets comprise attribute values of a plurality of attribute features;

for each application scene, acquiring a plurality of high-quality feature sets corresponding to the application scene, wherein each high-quality feature set comprises the step of screening attribute features with forward action for training of a first machine learning model from the sample feature subset based on a preset feature selection algorithm; different preset feature selection algorithms corresponding to different high-quality feature subsets are different;

for each application scene, acquiring accuracy of the plurality of high-quality feature sets corresponding to the application scene, wherein the accuracy of each high-quality feature set refers to the accuracy of the first machine learning model obtained by training by taking the high-quality feature set with the attribute value as input and taking the labeling classification label of the sample feature subset to which the high-quality feature set with the attribute value belongs as a training target;

Determining the accuracy degree of the plurality of high-quality feature sets corresponding to each application scene, wherein the high-quality feature set corresponding to the maximum accuracy degree is a target high-quality feature set;

determining the identification of a target preset feature selection algorithm corresponding to the target high-quality feature set of the application scene aiming at each application scene;

for each application scene, acquiring a meta-feature set corresponding to the sample feature subset included in a sample feature set corresponding to the application scene, wherein the meta-feature set corresponding to the sample feature subset characterizes the relation among attribute features included in the sample feature subset;

and taking the meta feature sets respectively corresponding to the sample feature subsets as the input of a second machine learning model, taking the identification of the target preset feature selection algorithm corresponding to the application scene to which the sample feature subsets belong as a training target, and training to obtain a feature selection algorithm recommendation model.

According to a second aspect of the embodiments of the present disclosure, there is provided a training apparatus for a feature selection algorithm recommendation model, including:

the first acquisition module is used for acquiring sample feature sets corresponding to a plurality of application scenes respectively; the sample feature set corresponding to the application scene comprises sample feature subsets respectively corresponding to a plurality of labeling classification labels, wherein the sample feature subsets comprise attribute values of a plurality of attribute features;

The second acquisition module is used for acquiring a plurality of high-quality feature sets corresponding to each application scene, wherein each high-quality feature set comprises attribute features which have forward action on training of a first machine learning model and are screened out of the sample feature subsets based on a preset feature selection algorithm; different preset feature selection algorithms corresponding to different high-quality feature subsets are different;

the third obtaining module is configured to obtain, for each application scenario, accuracy levels of the plurality of high-quality feature sets corresponding to the application scenario, where accuracy levels of each high-quality feature set refer to accuracy levels of the first machine learning model obtained by training with the high-quality feature set to which an attribute value is given as input and a label classification tag of a sample feature subset to which the high-quality feature set to which the attribute value is given as a training target;

the first determining module is used for determining, for each application scene, the accuracy degree of the plurality of high-quality feature sets corresponding to the application scene, wherein the high-quality feature set corresponding to the maximum accuracy degree is the target high-quality feature set;

the second determining module is used for determining the identification of a target preset feature selection algorithm corresponding to the target high-quality feature set of the application scene for each application scene;

A fourth obtaining module, configured to obtain, for each application scenario, a meta-feature set corresponding to the sample feature subset included in a sample feature set corresponding to the application scenario, where the meta-feature set corresponding to the sample feature subset characterizes a relationship between attribute features included in the sample feature subset;

the training module is used for taking the meta feature sets corresponding to the sample feature subsets as input of a second machine learning model, taking the identification of the target preset feature selection algorithm corresponding to the application scene to which the sample feature subsets belong as a training target, and training to obtain a feature selection algorithm recommendation model.

According to a third aspect of embodiments of the present disclosure, there is provided a server comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement a training method of the feature selection algorithm recommendation model as described in the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of a server, causes the server to perform the training method of the feature selection algorithm recommendation model according to the first aspect.

According to the technical scheme, the application provides a training method of a feature selection algorithm recommendation model, each application scene is provided with a plurality of sample feature subsets corresponding to labeling classification labels, and different high-quality feature sets corresponding to the same application scene are obtained by using a plurality of different preset feature selection algorithms; aiming at the same application scene, training by using different high-quality feature sets to obtain different first machine learning models; it can be understood that, the high-quality feature set with the highest accuracy of the first machine learning model is the optimal high-quality feature set, and then the preset feature selection algorithm for screening out the optimal high-quality feature set is the optimal target preset feature selection algorithm for the application scene; because the attribute characteristics of different application scenes are different, in order to enable the trained second machine learning model to be suitable for any application scene, aiming at each application scene, acquiring a meta-characteristic set corresponding to a sample feature subset corresponding to the application scene; the number of attribute information contained in the meta-feature sets corresponding to the sample feature subsets of different application scenes is the same. And taking meta-feature sets respectively corresponding to the plurality of sample feature subsets as input of a second machine learning model, taking the identification of a target preset feature selection algorithm corresponding to an application scene to which the plurality of sample feature subsets belong as a training target, and training to obtain a feature selection algorithm recommendation model. The optimal preset feature selection algorithm can be recommended by using the feature selection algorithm recommendation model in the follow-up process, and the optimal preset feature selection algorithm is not required to be obtained manually based on experience, so that the method is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart illustrating a method of training a feature selection algorithm recommendation model, according to an exemplary embodiment;

FIG. 2 is a block diagram of a training apparatus for a feature selection algorithm recommendation model, according to an example embodiment;

fig. 3 is a block diagram illustrating an apparatus for a server according to an exemplary embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Fig. 1 is a flowchart illustrating a training method of a feature selection algorithm recommendation model according to an exemplary embodiment, and as shown in fig. 1, the training method of a feature selection algorithm recommendation model is used in a server, and includes the following steps S11 to S17.

Step S11: and acquiring sample feature sets corresponding to the application scenes respectively.

The sample feature set corresponding to the application scene comprises sample feature subsets respectively corresponding to a plurality of labeling classification labels, and the sample feature subsets comprise attribute values of a plurality of attribute features.

Illustratively, the training objectives of the machine learning model are different from application scenario to application scenario, and are illustrated below.

Assuming that the application scenario is to determine credit of the client, the training goal of training the machine learning model is to enable the machine learning model to output credit scores of the client.

Assuming that the application scenario is recommending financial products to the customer, the training goal of training the machine learning model is such that the machine learning model can output financial products of interest to the customer.

Assuming that the application scenario is to acquire the evaluation of the client for the service, the training goal of training the machine learning model is to enable the machine learning model to output the score of the client for the service.

In summary, different application scenarios have different training targets for the machine learning model, and the training targets are different, so that the training machine learning model has different attribute characteristics. If the training goal is such that the machine learning model can output a credit rating for the customer, then the attribute features include, but are not limited to: name, ID card number, age, loan information, monthly income, consumption times and tax payment information; if the training goal is such that the machine learning model can output a financial product of interest to the customer, then the attribute features include, but are not limited to: historical purchase financial product information, friend historical purchase financial product information, and historical browse financial product information; if the training goal is such that the machine learning model can output customer scores for services, then the attribute features include, but are not limited to: a face image of a customer, a sound of a customer, and a motion image of a customer.

It can be appreciated that in the process of training the machine learning model corresponding to each application scenario, a large amount of sample data, i.e., a sample feature subset, is required; the attribute features contained in different sample feature subsets under the same application scene are the same, but the attribute values of the same attribute feature in different sample feature subsets may be different. The following examples are illustrative.

Assuming that the application scenario is recommending financial products to the client, the attribute features contained in the different sample feature subsets are: historical purchase financial product information, friend historical purchase financial product information, and historical browse financial product information; sample clients corresponding to different sample feature subsets are different, so that labeling classification labels corresponding to different sample feature subsets may be different and the same.

Assuming that the application scenario is recommending financial products to the clients, assuming that 3 sample clients are provided, and the 3 sample clients are respectively a sample client 1, a sample client 2 and a sample client 3; the sample feature subset of the sample client 1 is { historical purchased financial product 1, friend historical purchased financial product 1, historical browsed financial product 1}, and the corresponding labeling classification label is financial product 1; the sample feature subset of the sample client 2 is { historical purchased financial product 2, friend historical purchased financial product 2, historical browsed financial product 2}, and the classification label is marked as the financial product 2; the sample feature subset of the sample client 3 is { historical purchased financial product 1, friend historical purchased financial product 1, historical browsed financial product 1}, and the classification label is marked as financial product 1.

Step S12: and acquiring a plurality of high-quality feature sets corresponding to each application scene.

Each high-quality feature set comprises the step of screening attribute values of attribute features with forward action aiming at training of a first machine learning model from the sample feature subset based on a preset feature selection algorithm; and different preset feature selection algorithms corresponding to different high-quality feature subsets are different.

For example, the number of the plurality of high-quality feature sets corresponding to the application scene is the same as the number of the preset feature selection algorithms, for example, the number of the plurality of high-quality feature sets corresponding to the application scene is 5, and one high-quality feature set corresponds to one preset feature selection algorithm, so that 5 high-quality feature sets correspond to 5 preset feature selection algorithms.

For example, the 5 preset feature selection algorithms are respectively: a filtering type feature selection method, a wrapping type feature selection method, an embedded type feature selection method, a group structure feature selection algorithm and a tree structure feature selection algorithm.

For each application scenario, for example, the attribute features with positive effects on the training of the first machine learning model may be screened out by a preset feature selection algorithm based on that the sample feature set corresponding to the application scenario includes a plurality of sample feature subsets. The importance value corresponding to each attribute information can be obtained through a preset feature selection algorithm.

For each application scene, the attribute features with low importance values have no benefit on the training of the first machine learning model, and if the importance values of the attribute features are smaller than or equal to a first threshold value, the attribute features have no or little influence on the training of the first machine learning model. If the first machine learning model is trained by using the sample feature subset including the attribute features with importance values smaller than or equal to the first threshold, not only the accuracy of the prediction result of the first machine learning model is reduced, but also the data volume in the process of training the first machine learning model is increased, so that the time for training the first machine learning model is longer.

For each preset feature selection algorithm, the attribute features with importance values greater than a first threshold may be classified into a high-quality feature set; for example, for each preset feature selection algorithm, the top-ranked preset number of attribute features may be partitioned into a high-quality feature set.

Assuming that the application scenario is credit investigation of a determined client, the attribute information includes: name, identification number, age, loan information, monthly income, consumption times and tax payment information, assuming 3 preset feature selection algorithms, and respectively: a preset feature selection algorithm 1, a preset feature selection algorithm 2 and a preset feature selection algorithm 3. The high-quality feature set obtained by the preset feature selection algorithm 1 is assumed to be { name, identification card number, age, loan information }, the high-quality feature set obtained by the preset feature selection algorithm 2 is assumed to be { name, month income, consumption number, age, loan information }, and the high-quality feature set obtained by the preset feature selection algorithm 3 is assumed to be { loan information, month income, consumption number, tax payment information }.

In summary, under the same application scenario, the attribute features in the high-quality feature set obtained by different preset feature selection algorithms may be partially different, may be completely different, and may be completely the same.

Step S13: and acquiring the accuracy degree of the plurality of high-quality feature sets corresponding to each application scene.

The accuracy of each high-quality feature set refers to the accuracy of the first machine learning model obtained by training by taking the high-quality feature set given with the attribute value as input and the labeling classification label of the sample feature subset to which the high-quality feature set is given with the attribute value as a training target.

Training to obtain a first machine learning model by using each high-quality feature set; the first machine learning models trained by the different high-quality feature sets are different.

It will be appreciated that the attribute features contained in different high-quality feature sets corresponding to the same application scenario may be different, and each application scenario has a plurality of sample feature subsets, so for each high-quality feature set, a plurality of high-quality feature sets with attribute values assigned thereto may be obtained. The following examples are illustrative.

Assuming that the application scenario is credit investigation of a determined client, the attribute information includes: name, ID card number, age, loan information, month income, consumption times and tax payment information, and assuming that the application scene has 3 sample feature subsets, namely a sample feature subset 1, a sample feature subset 2 and a sample feature subset 3; sample feature subset 1 is { Zhang three, 1111, 30, none, 3000, 50, 0 yuan for month tax payment }, sample feature subset 2 is { Lifour, 1112, 26, 30W, 10000, 30, 1000 yuan for month tax payment }, sample feature subset 3 is { Wang five, 1113, 35, 100W, 50000, 60, 5000 yuan for month tax payment }, and it is assumed that the high-quality feature set obtained by the preset feature selection algorithm 1 is { name, identity card number, age, loan information }, the high-quality feature set given attribute values obtained from sample feature subset 1 is { Zhang three, 111, 30, none }, and the label classification labels corresponding to the high-quality feature set given attribute values are label allocation labels of sample feature subset 1; the high-quality feature set with the attribute value obtained from the sample feature subset 2 is { Lifour, 1112, 26, 30W }, and the label classification label corresponding to the high-quality feature set with the attribute value is the label distribution label of the sample feature subset 2; the attribute-value-assigned high-quality feature set obtained from the sample feature subset 3 is { wang five, 1113, 35, 100W }, and the label classification label corresponding to the attribute-value-assigned high-quality feature set is the label assignment label of the sample feature subset 3.

The plurality of high-quality feature sets to which attribute values are assigned correspond to labeling classification labels, respectively. For example, a plurality of high-quality feature sets assigned attribute values may be extracted in a set proportion to obtain a test set and a verification set.

Training a first machine learning model by using the high-quality feature set with the attribute values contained in the test set, and verifying the trained first machine learning model by using the high-quality feature set with the attribute values contained in the verification set.

Illustratively, the trained first machine learning model is validated against a set of superior features included in the validation set to obtain AUC (Area Under Curve) values, accuracy, precision, recall.

Illustratively, the accuracy is determined by AUC (Area Under Curve) value, accuracy, precision, recall.

The first machine learning model may be, for example, any one of a neural network model, a logistic regression model, a linear regression model, a Support Vector Machine (SVM), adaboost, a lifting tree model, and a transducer-Encoder model. The neural network model may be any one of a cyclic neural network-based model, a convolutional neural network-based model, and a transducer-encoder-based classification model, for example.

Step S14: and determining the accuracy degree of the plurality of high-quality feature sets corresponding to each application scene, wherein the high-quality feature set corresponding to the maximum accuracy degree is the target high-quality feature set.

For example, if there is one high-quality feature set corresponding to the maximum accuracy, the high-quality feature set is the target high-quality feature set.

For example, if there are a plurality of high-quality feature sets corresponding to the maximum accuracy, one high-quality feature set may be randomly determined as the target high-quality feature set.

For example, if there are a plurality of high-quality feature sets corresponding to the maximum accuracy, it may be determined that the high-quality feature set with the shortest training time for training the first machine learning model is the target high-quality feature set.

Step S15: and determining the identification of a target preset feature selection algorithm corresponding to the target high-quality feature set of the application scene aiming at each application scene.

The target preset feature selection algorithm corresponding to the target high-quality feature set refers to a preset feature selection algorithm for screening the target high-quality feature set from the sample feature subset.

Step S16: and aiming at each application scene, acquiring a meta-feature set corresponding to the sample feature subset included in the sample feature set corresponding to the application scene.

And the meta-feature set corresponding to the sample feature subset characterizes the relation among the attribute features contained in the sample feature subset.

The corresponding meta-feature sets of different sample feature subsets under different application scenes are needed to be obtained.

Illustratively, the method of obtaining the meta-feature set for each sample feature subset is: the attribute values contained in the sample feature subset are converted into N features using N meta-feature calculation methods, the N feature sets Cheng Yuante collection, N being a positive integer greater than 1. The meta-feature sets of different sample feature subsets contain the same number of features.

Assuming that n=16, the method of acquiring the meta-feature set corresponding to any sample feature subset includes the following steps a11 to a28.

Step A11: a first number of attribute features included in the sample feature subset is determined.

Assuming that the application scenario is credit of a determined client, attribute information contained in the sample feature subset includes: name, identification number, age, loan information, monthly income, number of consumption and tax payment information, the first number is 7.

Assuming that the application scenario is recommending financial products to a customer, the attribute information contained in the sample feature subset includes: historical purchase financial product information, friend historical purchase financial product information, historical browse financial product information, the first number is 3.

Step A12: a first logarithm of the first number is determined.

Illustratively, the first logarithm = log (first number).

Step A13: and determining a second number of sample feature subsets contained in the application scene corresponding to the sample feature subsets.

Assuming that the application scenario is credit of a determined client, each sample client corresponds to a sample feature subset, and if 1000 sample clients correspond to the application scenario, the second number is 1000.

Assuming that the application scenario is recommending financial products to the clients, each sample client corresponds to a sample feature subset, and if 10000 sample clients corresponding to the application scenario exist, the second number is 10000.

Step A14: determining a second number of second logarithms.

Illustratively, the second logarithm = log (second number).

Step A15: a third number of attribute features having different attribute values is determined for the subset of sample features.

The sample feature subset contains attribute values of the attribute information, and features of the same attribute values can be deduplicated, and the number of remaining attribute values is a third number. The following examples are illustrative.

Assuming that the sample feature subset is {1,2,3,2,3}, the feature deduplication for the same attribute value is {1,2,3}, so the third number is 3.

Step A16: determining a ratio of the first number to the second number as a first ratio.

Step A17: a fourth number of attribute features belonging to a binary class is determined that are included in the sample feature subset.

Binary classification refers to the fact that the attribute value of an attribute feature has only two values, for example, the attribute feature "gender" has only two values "male" and "female".

Step A18: a fifth number of attribute features included in the sample feature subset that belong to the ternary and higher categories is determined.

The ternary and above classification means that the number of the attribute values of the attribute features is greater than or equal to 3, and the number of the attribute values of the attribute features is M, so that the attribute features belong to M-element classification, and M is a positive integer greater than or equal to 3. The following is an example.

For example, the attribute value of the attribute feature "loan type" may be: three loans, i.e. no loan or house accumulation loan or commercial loan; because the number of the attribute values is 3, the attribute feature 'loan type' belongs to the ternary classification; if the attribute values are 4, the attribute features belong to quaternary classification, and so on.

Step A19: and determining a sixth number of attribute features belonging to the digital type contained in the sample feature subset, wherein the attribute value of the attribute features belonging to the digital type is a number.

For example, the attribute values of age, height and weight are all digital, so the age, height and weight belong to the digital attribute features.

Step A20: determining a ratio of the fifth number to the sixth number as a second ratio.

If the sixth number is zero, the second ratio may be a set symbol, such as NULL.

Step A21: and determining the maximum value of the attribute values of the attribute features belonging to the digital type contained in the sample feature subset.

Step A22: and determining the minimum value of the attribute values of the attribute features belonging to the digital type contained in the sample feature subset.

Step A23: a variance of attribute values of attribute features belonging to a digital type contained in the sample feature subset is determined.

Step A24: and determining the average value of the attribute values of the attribute features belonging to the digital type contained in the sample feature subset.

Step A25: and determining standard deviation of attribute values of the attribute features belonging to the digital type contained in the sample feature subset.

Step A26: and sorting attribute values of the attribute features belonging to the digital type contained in the sample feature subset to obtain a sorting result.

Step A27: and determining the median of the sequencing result.

The median refers to the attribute value located in the middle in the sorting result, and assuming that the sorting result includes 100 attribute values, the median is the 50 th and 51 th attribute values.

Step A28: determining a set of meta-features of the sample feature subset includes the first number, the first logarithm, the second number, the second logarithm, the third number, the first ratio, the fourth number, the fifth number, the sixth number, the second ratio, the maximum, the minimum, the variance, the mean, the standard deviation, the median.

The meta-feature set corresponding to the sample feature subset characterizes the relation among the attribute features contained in the sample feature subset, and the target preset feature selection algorithm of the sample feature subset of different application scenes with similar meta-feature sets is verified to be the same.

Step S17: and taking the meta feature sets respectively corresponding to the sample feature subsets as the input of a second machine learning model, taking the identification of the target preset feature selection algorithm corresponding to the application scene to which the sample feature subsets belong as a training target, and training to obtain a feature selection algorithm recommendation model.

In this way, the accuracy of predicting which feature selection algorithm to use to select the best quality feature set can be recommended by the feature selection algorithm recommendation model. Based on this, the embodiment of the present application further includes the following steps B11 to B14.

Step B11: the method comprises the steps of obtaining a feature set to be detected corresponding to an application scene to be detected, wherein the feature set to be detected corresponding to the application scene to be detected comprises feature subsets to be detected corresponding to a plurality of labeling classification labels respectively, and the feature subsets to be detected comprise a plurality of attribute features.

For example, the feature set of the known application scenario is referred to as a sample feature set, and the feature set of the application scenario to be tested is referred to as a feature set to be tested. The description of the feature set to be measured may be referred to the description of the sample feature set, which is not limited here. The description of the feature subset to be measured may refer to the description of the feature subset of the sample, and is not limited herein.

Step B12: and acquiring the meta-feature sets respectively corresponding to the feature subsets to be detected.

The process of obtaining the meta-feature set of the feature subset to be measured is the same as the process of obtaining the meta-feature set of the sample feature subset, and will not be described here again.

Step B13: inputting the meta feature sets respectively corresponding to the feature subsets to be detected to the feature selection algorithm recommendation model, and obtaining the target identification of the target feature selection algorithm through the feature selection algorithm recommendation model.

Step B14: and determining to acquire a high-quality feature set corresponding to the feature subset to be detected from the feature subset to be detected through the target feature selection algorithm with the target identifier.

Illustratively, the first machine learning model is the same type as the second machine learning model; or, the first machine learning model is of a different type than the second machine learning model.

The embodiment of the application provides a training method of a feature selection algorithm recommendation model, wherein each application scene is provided with a plurality of sample feature subsets corresponding to labeling classification labels respectively, and a plurality of different preset feature selection algorithms are used for respectively obtaining different high-quality feature sets corresponding to the same application scene; aiming at the same application scene, training by using different high-quality feature sets to obtain different first machine learning models; it can be understood that, the high-quality feature set with the highest accuracy of the first machine learning model is the optimal high-quality feature set, and then the preset feature selection algorithm for screening out the optimal high-quality feature set is the optimal target preset feature selection algorithm for the application scene; because the attribute characteristics of different application scenes are different, in order to enable the trained second machine learning model to be suitable for any application scene, aiming at each application scene, acquiring a meta-characteristic set corresponding to a sample feature subset corresponding to the application scene; the number of attribute information contained in the meta-feature sets corresponding to the sample feature subsets of different application scenes is the same. And taking meta-feature sets respectively corresponding to the plurality of sample feature subsets as input of a second machine learning model, taking the identification of a target preset feature selection algorithm corresponding to an application scene to which the plurality of sample feature subsets belong as a training target, and training to obtain a feature selection algorithm recommendation model. The optimal preset feature selection algorithm can be recommended by using the feature selection algorithm recommendation model in the follow-up process, and the optimal preset feature selection algorithm is not required to be obtained manually based on experience, so that the method is more accurate.

The method is described in detail in the embodiments disclosed in the application, and the method can be implemented by using various devices, so that the application also discloses a device, and a specific embodiment is given in the following detailed description.

FIG. 2 is a block diagram of a training apparatus for a feature selection algorithm recommendation model, according to an example embodiment. Referring to fig. 2, the apparatus includes: a first acquisition module 21, a second acquisition module 22, a third acquisition module 23, a first determination module 24, a second determination module 25, a fourth acquisition module 26, and a training module 27, wherein:

a first obtaining module 21, configured to obtain sample feature sets corresponding to a plurality of application scenarios respectively; the sample feature set corresponding to the application scene comprises sample feature subsets respectively corresponding to a plurality of labeling classification labels, wherein the sample feature subsets comprise attribute values of a plurality of attribute features;

a second obtaining module 22, configured to obtain, for each application scenario, a plurality of high-quality feature sets corresponding to the application scenario, where each high-quality feature set includes a feature that is screened from the sample feature subset based on a preset feature selection algorithm and has a positive effect on training of the first machine learning model; different preset feature selection algorithms corresponding to different high-quality feature subsets are different;

A third obtaining module 23, configured to obtain, for each application scenario, accuracy levels of the plurality of high-quality feature sets corresponding to the application scenario, where accuracy levels of each high-quality feature set refer to accuracy levels of the first machine learning model obtained by training with the high-quality feature set to which an attribute value is given as an input and a label classification tag of a sample feature subset to which the high-quality feature set to which the attribute value is given as a training target;

a first determining module 24, configured to determine, for each application scenario, a high-quality feature set corresponding to a maximum accuracy degree among the plurality of high-quality feature sets corresponding to the application scenario as a target high-quality feature set;

a second determining module 25, configured to determine, for each application scenario, an identifier of a target preset feature selection algorithm corresponding to the target high-quality feature set of the application scenario;

a fourth obtaining module 26, configured to obtain, for each application scenario, a meta-feature set corresponding to the sample feature subset included in a sample feature set corresponding to the application scenario, where the meta-feature set corresponding to the sample feature subset characterizes a relationship between attribute features included in the sample feature subset;

The training module 27 is configured to train to obtain a feature selection algorithm recommendation model by using the meta feature sets corresponding to the plurality of sample feature subsets as input of a second machine learning model, and using the identifiers of the target preset feature selection algorithm corresponding to the application scenario to which the plurality of sample feature subsets belong as training targets.

In an optional implementation manner, the fourth obtaining module includes:

a first determining unit configured to determine a first number of attribute features included in the sample feature subset;

a second determining unit configured to determine a first logarithm of the first number;

a third determining unit, configured to determine a second number of sample feature subsets included in the application scenario corresponding to the sample feature subset;

a fourth determining unit configured to determine a second number of second logarithms;

a fifth determining unit, configured to determine a third number of attribute features with different attribute values included in the sample feature subset;

a sixth determining unit, configured to determine a ratio of the first number to the second number as a first ratio;

a seventh determining unit, configured to determine a fourth number of attribute features belonging to a binary classification included in the sample feature subset;

An eighth determining unit configured to determine a fifth number of attribute features belonging to a ternary or higher classification included in the sample feature subset;

a ninth determining unit, configured to determine a sixth number of attribute features belonging to a digital type included in the sample feature subset, where an attribute value of the attribute feature belonging to the digital type is a number;

a tenth determining unit configured to determine a ratio of the fifth number to the sixth number as a second ratio;

an eleventh determining unit configured to determine a maximum value of attribute values of attribute features belonging to a digital type included in the sample feature subset;

a twelfth determining unit, configured to determine a minimum value of attribute values of attribute features belonging to a digital type included in the sample feature subset;

a thirteenth determining unit configured to determine a variance of attribute values of attribute features belonging to a digital type included in the sample feature subset;

a fourteenth determining unit, configured to determine a mean value of attribute values of attribute features belonging to a digital type included in the sample feature subset;

a fifteenth determination unit configured to determine a standard deviation of attribute values of attribute features belonging to a digital type included in the sample feature subset;

The sorting unit is used for sorting the attribute values of the attribute features belonging to the digital type contained in the sample feature subset to obtain a sorting result;

a sixteenth determining unit configured to determine a median of the sorting results;

a seventeenth determining unit is configured to determine that the meta feature set of the sample feature subset includes the first number, the first logarithm, the second number, the second logarithm, the third number, the first ratio, the fourth number, the fifth number, the sixth number, the second ratio, the maximum value, the minimum value, the variance, the mean, the standard deviation, and the median.

In an alternative implementation, the method further includes:

a fifth obtaining module, configured to obtain a feature set to be tested corresponding to an application scene to be tested, where the feature set to be tested corresponding to the application scene to be tested includes feature subsets to be tested corresponding to a plurality of labeling classification labels, and the feature subsets to be tested include a plurality of attribute features;

a sixth obtaining module, configured to obtain the meta-feature sets corresponding to the feature subsets to be tested respectively;

a seventh obtaining module, configured to input the meta feature sets corresponding to the feature subsets to be tested to the feature selection algorithm recommendation model, and obtain a target identifier of a target feature selection algorithm through the feature selection algorithm recommendation model;

And the third determining module is used for determining that the high-quality feature set corresponding to the feature subset to be detected is obtained from the feature subset to be detected through the target feature selection algorithm with the target identifier.

In an alternative implementation, the first machine learning model is the same type as the second machine learning model; or, the first machine learning model is of a different type than the second machine learning model.

In an optional implementation manner, the number of the plurality of high-quality feature sets corresponding to the application scene is 5, and the 5 high-quality feature sets correspond to 5 preset feature selection algorithms; the 5 preset feature selection algorithms are respectively: a filtering type feature selection method, a wrapping type feature selection method, an embedded type feature selection method, a group structure feature selection algorithm and a tree structure feature selection algorithm.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Servers include, but are not limited to: a processor 31, a memory 32, a network interface 33, an I/O controller 34, and a communication bus 35.

It should be noted that the structure of the server shown in fig. 3 is not limited to the server, and the server may include more or less components than those shown in fig. 3, or may combine some components, or may be arranged with different components, as will be understood by those skilled in the art.

The following describes the respective constituent elements of the server in detail with reference to fig. 3:

the processor 31 is a control center of the server, connects respective portions of the entire server using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 32, and calling data stored in the memory 32, thereby performing overall monitoring of the server. The processor 31 may include one or more processing units; by way of example, the processor 31 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 31.

The processor 31 may be a central processing unit (Central Processing Unit, CPU), or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;

The Memory 32 may include a Memory such as a Random-Access Memory (RAM) 321 and a Read-Only Memory (ROM) 322, and may further include a mass storage device 323 such as at least 1 disk Memory, etc. Of course, the server may also include hardware required for other services.

The memory 32 is used for storing instructions executable by the processor 31. The processor 31 has a function of executing a training method of the feature selection algorithm recommendation model.

A wired or wireless network interface 33 is configured to connect the server to the network.

The processor 31, memory 32, network interface 33, and I/O controller 34 may be interconnected by a communication bus 35, which may be an ISA (Industry Standard Architecture ) bus, PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc.

In an exemplary embodiment, the server may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for performing the training method of the feature selection algorithm recommendation model described above.

In an exemplary embodiment, the disclosed embodiments provide a storage medium including instructions, such as a memory 32 including instructions, executable by a processor 31 of a server to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer readable storage medium is also provided, which can be directly loaded into an internal memory of a computer, such as the memory 32, and contains software code, and the computer program can implement the training method of the feature selection algorithm recommendation model after being loaded and executed by the computer.

In an exemplary embodiment, a computer program product is also provided, which can be directly loaded into an internal memory of a computer, for example, a memory contained in the server, and contains software codes, and the computer program can implement the training method of the feature selection algorithm recommendation model after being loaded and executed by the computer.

It should be noted that the training method, device, server and medium of the feature selection algorithm recommendation model provided by the invention can be used in the artificial intelligence field or the financial field. The foregoing is merely an example, and the application fields of the training method, the training device, the server and the medium of the feature selection algorithm recommendation model provided by the present invention are not limited.

The features described in the respective embodiments in the present specification may be replaced with each other or combined with each other. For device or system class embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The training method of the feature selection algorithm recommendation model is characterized by comprising the following steps of:

2. The method for training a recommendation model of a feature selection algorithm according to claim 1, wherein the step of obtaining a meta feature set corresponding to the sample feature subset included in the sample feature set corresponding to the application scene includes:

determining a first number of attribute features contained by the sample feature subset;

determining a first logarithm of the first number;

determining a second number of sample feature subsets contained in the application scene corresponding to the sample feature subsets;

determining a second logarithm of the second number;

determining a third number of attribute features having different attribute values contained in the sample feature subset;

Determining a ratio of the first number to the second number as a first ratio;

determining a fourth number of attribute features belonging to a binary classification contained in the sample feature subset;

determining a fifth number of attribute features included in the sample feature subset that belong to the ternary and higher classifications;

determining a sixth number of attribute features belonging to the digital type contained in the sample feature subset, wherein attribute values of the attribute features belonging to the digital type are numbers;

determining a ratio of the fifth number to the sixth number as a second ratio;

determining the maximum value of attribute values of attribute features belonging to a digital type contained in the sample feature subset;

determining the minimum value of the attribute values of the attribute features belonging to the digital type contained in the sample feature subset;

determining the variance of attribute values of attribute features belonging to a digital type contained in the sample feature subset;

determining the average value of attribute values of the attribute features belonging to the digital type contained in the sample feature subset;

determining standard deviation of attribute values of attribute features belonging to a digital type contained in the sample feature subset;

sorting attribute values of the attribute features belonging to the digital type contained in the sample feature subset to obtain a sorting result;

Determining a median of the ranking results;

determining a set of meta-features of the sample feature subset includes the first number, the first logarithm, the second number, the second logarithm, the third number, the first ratio, the fourth number, the fifth number, the sixth number, the second ratio, the maximum, the minimum, the variance, the mean, the standard deviation, the median.

3. The training method of a feature selection algorithm recommendation model according to claim 1 or 2, further comprising:

acquiring a feature set to be detected corresponding to an application scene to be detected, wherein the feature set to be detected corresponding to the application scene to be detected comprises feature subsets to be detected respectively corresponding to a plurality of labeling classification labels, and the feature subsets to be detected comprise a plurality of attribute features;

acquiring the element feature sets respectively corresponding to a plurality of feature subsets to be detected;

inputting the meta feature sets respectively corresponding to the feature subsets to be detected into the feature selection algorithm recommendation model, and obtaining a target identification of a target feature selection algorithm through the feature selection algorithm recommendation model;

and determining to acquire a high-quality feature set corresponding to the feature subset to be detected from the feature subset to be detected through the target feature selection algorithm with the target identifier.

4. A method for training a recommendation model for a feature selection algorithm according to claim 3,

the first machine learning model is the same type as the second machine learning model; or alternatively, the first and second heat exchangers may be,

the first machine learning model is of a different type than the second machine learning model.

5. The training method of the feature selection algorithm recommendation model according to any one of claims 1, 2 or 4, wherein the number of the plurality of high-quality feature sets corresponding to the application scene is 5, and 5 high-quality feature sets correspond to 5 preset feature selection algorithms; the 5 preset feature selection algorithms are respectively: a filtering type feature selection method, a wrapping type feature selection method, an embedded type feature selection method, a group structure feature selection algorithm and a tree structure feature selection algorithm.

6. A training device for a recommendation model of a feature selection algorithm, comprising:

7. The training device of the feature selection algorithm recommendation model according to claim 6, wherein the fourth acquisition module comprises:

8. The training device of the feature selection algorithm recommendation model of claim 6 or 7, further comprising:

9. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement a training method of the feature selection algorithm recommendation model as claimed in any one of claims 1 to 5.

10. A computer readable storage medium, which when executed by a processor of a server, enables the server to perform a training method of the feature selection algorithm recommendation model of any one of claims 1 to 5.