CN111461344A

CN111461344A - Method, system, device and medium for automatically generating high-order features

Info

Publication number: CN111461344A
Application number: CN202010245363.7A
Authority: CN
Inventors: 王育添; 江文斌; 李健
Original assignee: Shanghai Ctrip International Travel Agency Co Ltd
Current assignee: Shanghai Ctrip International Travel Agency Co Ltd
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2020-07-28
Anticipated expiration: 2040-03-31
Also published as: CN111461344B

Abstract

The invention discloses a method, a system, equipment and a medium for automatically generating high-order characteristics, wherein the method for automatically generating the high-order characteristics comprises the following steps: acquiring an input feature set, wherein the input feature set comprises a plurality of input features; generating high-order characteristics by carrying out operator operation on the input characteristics in the current input characteristic set; adding each generated high-order feature into an input feature set respectively to form a plurality of groups of candidate feature sets, and evaluating the plurality of groups of candidate feature sets by using a selected machine model; adding high-order features in a plurality of candidate feature sets with optimal evaluation results into an input feature set to obtain an updated input feature set; evaluating an input feature set using the machine model; and outputting the high-order features in the updated input feature set and the specific meanings corresponding to the features. The invention not only can automatically generate effective high-order characteristics, but also can name and explain meanings of the generated high-order characteristics.

Description

Method, system, device and medium for automatically generating high-order features

Technical Field

The invention relates to the field of artificial intelligence and machine learning, in particular to a method, a system, equipment and a medium for automatically generating high-order features.

Background

In recent years, more and more cases show that effective characteristics can bring great improvement to each index of a machine learning task. Features in machine learning refer to the manifestation of some prominent properties of things, which are the key to distinguishing them. Many experts can design useful features by using their own domain knowledge and combining specific business scenes to promote business development. On the other hand, the interpretability of the model is very important in some scenarios. Such as search ranking of travel products, good interpretability is important to travel product suppliers to understand the results of their product ranking.

In general, the design of a valid feature often goes through multiple processes of guessing, statistical analysis, model verification, and the like. Firstly, deeply combining with a business background to analyze, then collecting data to extract characteristics, carrying out statistical test, adding the characteristics into a model, and observing the effect. The generation of a valid feature often needs to go through multiple rounds of verification, which consumes many manpower and material resources and is difficult to achieve the accuracy and high coverage of the feature. On the other hand, deep learning is excellent in recent search recommendation tasks, and its performance has surpassed other forms of machine learning in terms of multiple key indicators. Deep learning directly manipulates the initial data and automatically learns a higher-level representation of relevant features of the initial data set. However, the generation of new features by the deep learning technology is not transparent to human beings, and the high-order features can bring better effect but have poor interpretability.

Disclosure of Invention

The invention aims to overcome the defects that new features generated by a deep learning technology in the prior art are not transparent to human beings and the generated new features are not well explained, and provides a method, a system, equipment and a medium for automatically generating high-order features.

The invention solves the technical problems through the following technical scheme:

the invention provides an automatic generation method of high-order features, which comprises the following steps:

s1, acquiring an input feature set, wherein the input feature set comprises a plurality of input features;

s2, performing operator operation on the input features in the current input feature set to generate high-order features;

s3, adding each generated high-order feature into an input feature set respectively to form a plurality of groups of candidate feature sets, and evaluating the plurality of groups of candidate feature sets by using the selected machine model;

s4, adding high-order features in the candidate feature sets with the optimal evaluation results into the input feature set to obtain an updated input feature set;

s5, the machine model is used for evaluating the input feature set in the step S2 to obtain a first evaluation result, whether the evaluation results of the machine model on the candidate feature sets are all inferior to the first evaluation result is judged, if yes, the updated high-order features in the input feature set are output, and if not, the step S2 is returned.

Preferably, the input features in the input feature set in step S1 each have a corresponding name and meaning;

in step S5, when the higher-order feature in the updated input feature set is output, the name and meaning of the higher-order feature are also output.

Preferably, the step of obtaining the input feature set comprises:

acquiring original characteristics;

analyzing the original features, and deleting the features with the deletion rate larger than a first threshold and the features with the correlation higher than a second threshold to obtain first original features;

and screening out different types of features based on the first original features to obtain an input feature set.

Preferably, the step of generating the high-order feature by performing operator operation on the input features in the current input feature set to obtain the first feature includes:

carrying out unary operator operation on the input features in the current input feature set to obtain basic features;

and carrying out binary operator and/or multivariate operator operation on the basic features to generate the high-order features.

Preferably, when performing a multivariate operator operation on the base feature, the multivariate operator operation step comprises:

selecting one multivariate operator from a multivariate operator set according to the weight probability, wherein the multivariate operator set comprises groupThenMin, groupThenMax and groupthenvg operators;

selecting m input features randomly from the interval [2, L ], wherein L in [2, L ] selects the maximum number of the input features when a multivariate operator is adopted;

performing multivariate operator operation on the m input features;

repeatedly executing the steps for multiple times to generate a plurality of high-order characteristics, updating the weight of the multivariate operator according to a formula and normalizing the weight according to the expression of the high-order characteristics; the formula is as follows:

in the formula P (Δ)_k) Representing said multivariate operator Δ_kWeight of (C)_kRepresents delta_kNumber of uses in the current round, Val_ikIs Δ_kEvaluation index on the relevant machine learning model at the time of the i-th use.

The invention also provides an automatic generation system of high-order features, which comprises:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring an input feature set, and the input feature set comprises a plurality of input features;

the operation module is used for generating high-order characteristics by carrying out operator operation on the input characteristics in the current input characteristic set;

the evaluation module is used for respectively adding each generated high-order feature into the input feature set to form a plurality of groups of candidate feature sets, and evaluating the plurality of groups of candidate feature sets by using the selected machine model;

the updating module is used for adding high-order features in the candidate feature sets with the optimal evaluation results into the input feature set to obtain an updated input feature set;

and the judging module is used for evaluating the input feature set currently operated by the operating module by using the machine model to obtain a first evaluation result, judging whether the evaluation results of the machine model on the plurality of groups of candidate feature sets are inferior to the first evaluation result, outputting the high-order features in the updated input feature set if the evaluation results are inferior to the first evaluation result, and calling the operating module again if the evaluation results are not inferior to the first evaluation result.

Preferably, the first and second liquid crystal films are made of a polymer,

the input features in the input feature set acquired by the acquisition module all have corresponding names and meanings;

the judging module is further configured to output a name and a meaning of the high-order feature when the high-order feature in the updated input feature set is output.

Preferably, the obtaining module includes:

an acquisition unit configured to acquire an original feature;

the deleting unit is used for analyzing the original features, deleting the features of which the missing rate of the value is greater than a first threshold value and the features of which the correlation is greater than a second threshold value, so as to obtain first original features;

and the screening unit is used for screening out different types of features based on the first original features so as to obtain an input feature set.

Preferably, the operation module comprises:

the first operation unit is used for carrying out unary operator operation on the input features in the current input feature set to obtain basic features;

and the second operation unit is used for carrying out binary operator and/or multivariate operator operation on the basic features to generate the high-order features.

Preferably, when performing a multivariate operator operation on the base feature, the second operation unit includes:

the first selecting subunit is used for selecting one multivariate operator from a multivariate operator set according to weight probability, and the multivariate operator set comprises groupThenMin, groupThenMax and groupthenvg operators;

the second selection subunit is used for randomly selecting m input features from the interval [2, L ], wherein the L in the interval [2, L ] selects the maximum number of the input features when a multivariate operator is adopted;

a third operation subunit, configured to perform multivariate operator operation on the m input features;

the updating subunit is used for repeatedly calling the subunits to generate a plurality of high-order characteristics, updating the weight of the multivariate operator according to a formula and normalizing the weight according to the expression of the high-order characteristics; the formula is as follows:

The invention further provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the automatic generation method of the high-order features.

The present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the aforementioned method for automatic generation of high-order features.

The positive progress effects of the invention are as follows: the invention provides a method, a system, equipment and a medium for automatically generating high-order features, which select a proper original feature set by preprocessing original features; generating a candidate feature set according to a plurality of interpretable operators; carrying out rapid and effective evaluation on the candidate characteristic set; and outputting the generated high-order features and the specific meanings corresponding to the features. Compared with the black box characteristic generated based on deep learning in the prior art, the method and the device can not only automatically generate effective high-order characteristics, but also name and explain meanings of the generated high-order characteristics.

Drawings

Fig. 1 is a flowchart of an automatic generation method of high-order features according to embodiment 1 of the present invention.

Fig. 2 is a flowchart of step S101 in embodiment 1 of the present invention.

Fig. 3 is a flowchart of step S102 in embodiment 1 of the present invention.

Fig. 4 is a flowchart of step S1022 in embodiment 1 of the present invention.

Fig. 5 is a schematic block diagram of an automatic generation system of high-order features according to embodiment 2 of the present invention.

Fig. 6 is a block diagram of an acquisition module according to embodiment 2 of the present invention.

Fig. 7 is a block diagram of an operation module according to embodiment 2 of the present invention.

Fig. 8 is a schematic block diagram of an operation unit when performing a multivariate operator operation on the basic features in embodiment 2 of the present invention.

FIG. 9 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

As shown in fig. 1, the present embodiment discloses an automatic generation method of high-order features, which includes the following steps:

s101, acquiring an input feature set, wherein the input feature set comprises a plurality of input features; the input features all have corresponding names and meanings;

step S102, performing operator operation on input features in the current input feature set to generate high-order features;

the operator in the embodiment comprises an operator name, an operator meaning and an operator execution mode, and cross feature candidates are generated according to the operator. The operators mainly comprise unary operators, binary operators and multivariate operators.

Step S103, adding each generated high-order feature into an input feature set respectively to form a plurality of groups of candidate feature sets, and evaluating the plurality of groups of candidate feature sets by using a selected machine model;

step S104, adding high-order features in a plurality of candidate feature sets with optimal evaluation results into an input feature set to obtain an updated input feature set;

step S105, evaluating the input feature set in the step S2 by using the machine model to obtain a first evaluation result, judging whether the evaluation results of the machine model on the plurality of sets of candidate feature sets are all inferior to the first evaluation result, if so, executing the step S106, and if not, returning to the step S102;

and S106, outputting the high-order features in the updated input feature set and the names and meanings of the high-order features.

In the embodiment, the candidate features are evaluated by using the model, and because each candidate feature needs to be evaluated independently, a simple and quick model can greatly improve the time efficiency, and the logistic stewart regression model is an ideal choice and can be selected more specifically according to the service scene. At the same time, too large a data set will also affect the time of evaluation. When the data set is too large, if the sampled data can better measure the distribution of the whole data, and the generation and evaluation of high-order characteristics on the data samples can greatly reduce the whole running time.

For example, in a search, in order to arrange a product with high word relevance to a user search in front of a list page as much as possible, the selected evaluation index should be related to the ranking, such as the index AUC focuses on the global ranking, and measures the probability that a positive sample is arranged in front of a negative sample.

In this embodiment, the newly generated high-order feature name has the following form [ feature ] [ operator ], where [ feature ] is an original feature, that is, a parent feature, and [ operator ] is an operator operating on the parent feature. The output features are explained according to the parent feature name and the crossover operator by combining the following table.

Two specific examples are further illustrated:

example 1: for example, the new feature name is [ age ] [ disc5 ]. Age is a parent feature that represents the Age of the user. The table lookup, disc5, is a unary operator, and for the meaning of the discretization of the feature, the new feature is represented as discretized 5 levels for age.

Example 2: the new characteristic name is [ [ age ] [ disc5], gene, level ] [ groupThenAvg ]. Three of [ age ] [ disc5], gender and level are parent features, and [ age ] [ disc5] is a new feature after the action of the unary operator in (1), and also serves as the parent feature in the generation process of the feature, wherein the gender represents the gender of the user, and the level represents the consumption level of the user. And looking up a table, wherein groupthEnavg is a multivariate operator, the meaning of the features is that the features are grouped and averaged, then the new features are represented as two features grouped according to [ age ] [ disc5], generator, and then the average of the levels is taken in the group, and the features describe the average consumption levels of the users at different ages and different sexes.

The model is used for evaluating the candidate features, a simple and quick model can greatly improve the time efficiency because each candidate feature needs to be evaluated independently, and the logistic stet regression model is an ideal choice and can be selected in a more specific way according to the service scene. At the same time, too large a data set will also affect the time of evaluation. When the data set is too large, if the sampled data can better measure the distribution of the whole data, and the generation and evaluation of high-order characteristics on the data samples can greatly reduce the whole running time.

As shown in fig. 2, in this embodiment, step 101 includes the following steps:

step S1011, obtaining original characteristics;

step S1012, analyzing the original features, and deleting the features with the missing value rate greater than the first threshold and the features with the correlation greater than the second threshold to obtain first original features;

and S1013, screening out different types of features based on the first original features to obtain an input feature set.

In the embodiment, the analysis of the features is mainly to set some rules according to the existing experience. Features with a general loss rate greater than 0.5 are considered unusable and need to be deleted. And removing the features with too small variance, wherein the features have no difference among different samples and no distinction degree, and filtering the features with too small variance by setting a threshold value. For example, a collected feature a is obtained by calculating that the variance is 0.5, and according to past experience, a feature with a variance greater than 2 is available, and a threshold value is set to 2, so that the feature a is filtered out, and the purpose of feature screening is achieved.

Feature screening in this embodiment mainly considers two aspects of richness and effectiveness. The richness is to generate useful high-order features more easily in the following operation, and the effectiveness is to remove some features with high similarity and reduce the number of generated candidate features. Screening mainly depends on two aspects of rules and algorithms, richness depends on the rules, and effectiveness depends on the algorithms.

Rule: the input feature is the first step of automatic generation of the whole high-order feature, and feature screening of the step depends on artificial rules. In a search recommendation scene, features often include four major categories, namely user features, product features, context features (such as time features) and cross features (some of the three features), and when the features are input, the four categories of features should be covered as much as possible so as to easily generate rich and effective high-order feature cross.

The algorithm is as follows: pearson's correlation coefficient between the respective features is calculated. A high correlation is considered when the pearson correlation coefficient for both variables is greater than 0.7. At this point, one feature is randomly selected. After a round of iteration, a new feature set is generated, and the process is repeated until the pearson correlation coefficient between any two features is less than 0.7, and a part of the highly correlated features is screened out through the step.

As shown in fig. 3, in this embodiment, step 102 includes the following steps:

s1021, carrying out unary operator operation on the input features in the current input feature set to obtain basic features;

the method comprises the steps of carrying out numerical value cleaning and numerical value conversion on some characteristics, mainly aiming at converting unstructured data into structured data and facilitating subsequent characteristic intersection, carrying out discretization, normalization and other numerical value conversion operations on the unary operators, wherein the discretization can convert continuous characteristics and date characteristics into discrete characteristics by discretizing the numerical characteristics.

And step S1022, performing binary operator and/or multivariate operator operation on the basic features to generate the high-order features.

In this embodiment, the binary operator: acting on both features creates a new feature. If four arithmetic operations such as addition, subtraction, multiplication and division are carried out on the two characteristics, Cartesian product operation is carried out on the two characteristics. By operating on both features, rich features can be generated. For example, the consumption level of the user and the price of the travel product are subjected to binary operator division operation, so that the preference characteristic for measuring the price of the product by the user can be generated. A number of useful features may be produced by binary operators.

A multivariate operator: two or more features are used to generate a feature. For example, after a certain two feature groups are grouped, the maximum value, the minimum value or the average value in the group is taken for the third feature group according to the grouping. For example, there are three characteristics, gender, age and consumption level, and for example, the consumption level in each group can be averaged according to the gender and age. Through the action of the multivariate operator, a characteristic which is used for describing consumption levels of different sexes and different ages can be generated.

In this embodiment, operator action refers to using an operator on a selected feature to generate a new feature. For example, the existing multivariate operator grouptheinavg and the selected feature sex, age and consumption grade have three categories of male, female and unknown sex features, 7 categories of age grade, 5 categories of [1,2,3,4,5] respectively, and five categories of consumption grade [1,2,3,4,5], and the multivariate operator grouptheinavg is grouped according to the first two features, such as 15 groups of [ male, 1], [ male, 2], [ male, 3], [ female, 1], [ female, 2], [ female, 3], [ unknown, 1], [ unknown, 2], [ unknown, 3], [ etc., and the average value of the consumption grade is obtained for each group. This results in a new feature, [ gene, age, level ] [ groupThenAvg ]. Assuming that one of the specific values is [ female, 2, 5], this value indicates that the average consumption rating for female of gender class 2 is 5, and the newly created features characterize the average consumption levels for different ages of different gender.

As shown in fig. 4, in this embodiment, when performing a multivariate operator operation on the basic feature, step 1022 includes the following steps:

step S10221, selecting one multivariate operator from a multivariate operator set according to weight probability, wherein the multivariate operator set comprises groupThenmin, groupThenmax and groupThenavg operators;

step S10222, selecting m input features randomly from the interval [2, L ], wherein when L in [2, L ] adopts a multivariate operator, the maximum number of the input features is selected;

step S10223, carrying out multivariate operator operation on the m input features;

step S10224, repeatedly executing the above steps for multiple times to generate multiple high-order features, and updating and normalizing the weight of the multivariate operator according to a formula according to the expression of the high-order features; the formula is as follows:

The specific application of the above formula is as follows:

the existing multivariate operators are assumed to be combined into delta { ggroup ThenMin, groupThenMax and groupThenAvg }, and the three multivariate operators are abbreviated as g1, g2 and g 3. The initial weight of each multivariate operator is equal, i.e. P (Δ) for g1, g2, g3_k) Are all 1/3, wherein g1, g2 and g3 are P (. DELTA._k) Δ of (1)_k. One of the multivariate operators is selected from the set of multivariate operators according to the weighted probability, which means that the probability of selecting to g1, g2, g3 is the same since the three operators are equally weighted. Suppose that in one round of generating higher-order features, we want to generate 6 higher-order features through multivariate operators. The method uses 6 multivariate operators, randomly selects g1, g2, g3, g3, g2 and g1, and due to selection according to weight probability, it is noted that the extreme case that 6 multivariate operators of g1, g1, g1, g1, g1 and g1 are randomly selected is possible, but analysis is not affected, the probability that the operator with high weight probability is selected is large in nature, and 6 candidate high-order features can be generated through the 6 high-order operators and the initial input features, so that 6 groups of results can be obtained by evaluation according to a machine learning model. Assuming that the 6 multivariate operators g1, g2, g3, g3, g2, g1, the corresponding 6 sets of AUC are 0.80, 0.40, 0.62, 0.60, 0.50, 0.90, respectively. Then according to the formula:

and (3) calculating:

p (. DELTA.for g 1)_k){(0.8+0.9)/2{0.85

P (. DELTA.for g 2)_k){(0.4+0.5)/2{0.45

P (. DELTA.for g 3)_k){(0.62+0.60)/2{0.61

P (. DELTA.s) corresponding to g1, g2 and g3_k) And (3) carrying out normalization:

g1_{0.85/(0.85+0.45+0.61){0.445

g2_{0.235

g3_{0.32

this gives new weights for g1, g2, g 3. Then at the next time of selecting the weight probability, g1 is more likely to be selected when using the multivariate operator in the next round of feature generation because the previous evaluation is good and the weight is larger.

In this embodiment, when a multivariate operator is applied, k (k > {2) features are selected from the feature set F, and the multivariate operator is operated. Since k can take a plurality of values and the multivariate operator can select a plurality of values, a large number of high-order features can be generated by using the multivariate operator, and the final performance can be greatly influenced if the multivariate operator is not controlled. In a specific service scene, the importance of each multivariate operator is different, and the invention provides a method for constructing the related high-order characteristics of the multivariate operators by dynamically selecting and adjusting a set.

In the embodiment, the disclosed automatic generation method of the high-order features selects a proper original feature set by preprocessing the original features; generating a candidate feature set according to a plurality of interpretable operators; carrying out rapid and effective evaluation on the candidate characteristic set; and outputting the generated high-order features and the specific meanings corresponding to the features. The embodiment can not only automatically generate effective high-order features, but also name and explain meanings of the generated high-order features.

Example 2

As shown in fig. 5, the present embodiment discloses an automatic generation system of high-order features, which includes:

the system comprises an acquisition module 1, a processing module and a display module, wherein the acquisition module is used for acquiring an input feature set, and the input feature set comprises a plurality of input features;

the operation module 2 is used for generating high-order characteristics by carrying out operator operation on the input characteristics in the current input characteristic set; the input features in the input feature set acquired by the acquisition module all have corresponding names and meanings;

The evaluation module 3 is used for respectively adding each generated high-order feature into the input feature set to form a plurality of groups of candidate feature sets, and evaluating the plurality of groups of candidate feature sets by using the selected machine model;

the updating module 4 is used for adding the high-order features in the candidate feature sets with the optimal evaluation results into the input feature set to obtain an updated input feature set;

the judging module 5 is configured to evaluate the input feature set currently operated by the operation module by using the machine model to obtain a first evaluation result, and judge whether the evaluation results of the machine model on the plurality of sets of candidate feature sets are all inferior to the first evaluation result, if yes, output the updated high-order features in the input feature set, and if not, recall the operation module.

The judging module 5 is further configured to, when outputting a higher-order feature in the updated input feature set, further output a name and a meaning of the higher-order feature.

For example, in a search, in order to arrange products with high word relevance to the user search in front of a list page as much as possible, the selected evaluation index should be related to the ranking, such as the index AUC focuses on the global ranking and measures the probability that a positive sample is arranged in front of a negative sample.

Two specific examples are further illustrated:

As shown in fig. 6, in this embodiment, the obtaining module 1 includes:

an obtaining unit 11, configured to obtain an original feature;

a deleting unit 12, configured to analyze the original features, delete a feature whose value missing rate is greater than a first threshold and a feature whose correlation is greater than a second threshold, so as to obtain a first original feature;

and the screening unit 13 is configured to screen out different types of features based on the first original feature to obtain an input feature set.

As shown in fig. 7, in the present embodiment, the operation module 2 includes:

the first operation unit 21 is configured to perform unary operator operation on the input features in the current input feature set to obtain basic features;

the unitary operator is used for cleaning and converting numerical values of some characteristics, mainly aims at converting unstructured data into structured data and facilitating subsequent characteristic intersection, and has operations of discretization, normalization and other numerical value conversion.

And the second operation unit 22 is used for performing binary operator and/or multivariate operator operation on the basic features to generate the high-order features.

As shown in fig. 8, in this embodiment, when performing a multivariate operator operation on the basic feature, the second operation unit 22 includes:

a first selecting subunit 221, configured to select one of the multivariate operators from a multivariate operator set according to a weighted probability, where the multivariate operator set includes groupThenMin, groupThenMax, and groupthenvg operators;

a second selecting subunit 222, configured to randomly select m input features from the interval [2, L ], where L in the interval [2, L ] is when a multivariate operator is used, select the maximum number of the input features;

a third operation subunit 223, configured to perform a multivariate operator operation on the m input features;

an updating subunit 224, configured to repeatedly invoke the above subunits, generate a plurality of the high-order features, update the weights of the multivariate operator according to a formula according to the expression of the high-order features, and normalize the weights; the formula is as follows:

The specific application of the above formula is as follows:

the existing multivariate operators are assumed to be combined into delta { ggroup ThenMin, groupThenMax and groupThenAvg }, and the three multivariate operators are abbreviated as g1, g2 and g 3. The initial weight of each multivariate operator is equal, i.e. P (Δ) for g1, g2, g3_k) Are all 1/3, wherein g1, g2 and g3 are P (. DELTA._k) Δ of (1)_k. One of the multivariate operators is selected from the set of multivariate operators according to the weighted probability, which means that the probability of selecting to g1, g2, g3 is the same since the three operators are equally weighted. Suppose that in one round of generating higher-order features, we want to generate 6 higher-order features through multivariate operators. Wherein, 6-time multivariate operators are used, g1, g2, g3, g3, g2 and g1 are randomly selected, and because the selection is carried out according to the weight probability, 6 multivariate operators of g1, g1, g1, g1, g1 and g1 are possibly selected randomlyIn this extreme case, but without affecting the analysis, the probability of selecting an operator with a large weighting probability is large, and 6 candidate high-order features can be generated by the 6 high-order operators and the initial input features, so that 6 groups of results can be obtained by evaluating according to the machine learning model. Assuming that the 6 multivariate operators g1, g2, g3, g3, g2, g1, the corresponding 6 sets of AUC are 0.80, 0.40, 0.62, 0.60, 0.50, 0.90, respectively. Then according to the formula:

and (3) calculating:

p (. DELTA.for g 1)_k){(0.8+0.9)/2{0.85

P (. DELTA.for g 2)_k){(0.4+0.5)/2{0.45

P (. DELTA.for g 3)_k){(0.62+0.60)/2{0.61

g1_{0.85/(0.85+0.45+0.61){0.445

g2_{0.235

g3_{0.32

In the embodiment, the disclosed automatic generation system of the high-order features selects a proper original feature set by preprocessing the original features; generating a candidate feature set according to a plurality of interpretable operators; carrying out rapid and effective evaluation on the candidate characteristic set; and outputting the generated high-order features and the specific meanings corresponding to the features. The embodiment can not only automatically generate effective high-order features, but also name and explain meanings of the generated high-order features.

Example 3

Fig. 9 is a schematic structural diagram of an electronic device according to embodiment 3 of the present invention. The electronic device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the automatic generation method of the high-order features provided by the embodiment 1. The electronic device 30 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 9, the electronic device 30 may be embodied in the form of a general purpose computing device, which may be, for example, a server device. The components of the electronic device 30 may include, but are not limited to: the at least one processor 31, the at least one memory 32, and a bus 33 connecting the various system components (including the memory 32 and the processor 31).

The bus 33 includes a data bus, an address bus, and a control bus.

The memory 32 may include volatile memory, such as Random Access Memory (RAM)321 and/or cache memory 322, and may further include Read Only Memory (ROM) 323.

Memory 32 may also include a program/utility 325 having a set (at least one) of program modules 324, such program modules 324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 31 executes various functional applications and data processing, such as the automatic generation method of high-order features provided in embodiment 1 of the present invention, by running the computer program stored in the memory 32.

The electronic device 30 may also communicate with one or more external devices 34 (e.g., keyboard, pointing device, etc.), such communication may be through input/output (I/O) interfaces 35, and the model-generated device 30 may also communicate with one or more networks (e.g., local area network (L AN), Wide Area Network (WAN) and/or a public network, such as the Internet) through a network adapter 36. As shown, the network adapter 36 communicates with other modules of the model-generated device 30 through a bus 33. it should be understood that, although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generated device 30, including, but not limited to, microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 4

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of the automatic generation method of high-order features provided in embodiment 1.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps in the automatic generation method for implementing the high-order features provided in embodiment 1, when the program product runs on the terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A method for automatically generating high-order features is characterized by comprising the following steps:

2. The method for automatic generation of higher order features of claim 1,

the input features in the input feature set in step S1 each have a corresponding name and meaning;

3. The method for automatic generation of higher order features of claim 1, wherein the step of obtaining the set of input features comprises:

acquiring original characteristics;

4. The method of claim 1, wherein the step of generating the high-order feature by performing an operator operation on the input features in the current input feature set to obtain the first feature comprises:

5. The method of automatic generation of higher order features of claim 4, wherein when performing a multivariate operator operation on the base feature, the multivariate operator operation step comprises:

performing multivariate operator operation on the m input features;

6. An automatic generation system of high-order features, characterized in that the automatic generation system of high-order features comprises:

7. The automatic generation system of higher order features of claim 6,

8. The system for automatic generation of higher order features of claim 6, wherein the obtaining module comprises:

an acquisition unit configured to acquire an original feature;

9. The system for automatic generation of higher order features of claim 6, wherein the operation module comprises:

10. The system for automatic generation of higher order features of claim 9, wherein when performing a multivariate operator operation on the base feature, the second operation unit comprises:

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for automatic generation of high order features according to any of claims 1 to 5 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for automatic generation of high-order features according to any one of claims 1 to 5.