CN111435463A

CN111435463A - Data processing method and related equipment and system

Info

Publication number: CN111435463A
Application number: CN201910028386.XA
Authority: CN
Inventors: 权涛; 缪丹丹; 孙伟健
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-01-11
Filing date: 2019-01-11
Publication date: 2020-07-21

Abstract

The embodiment of the application discloses a data processing method and related equipment and system. The method relates to the field of artificial intelligence, in particular to the field of automatic feature engineering, and comprises the following steps: the executing equipment performs multi-order feature transformation on a plurality of data features in the acquired first group of data sets, and selects an optimal data set from the data sets obtained by the multi-order feature transformation; when the nth-order feature transformation is carried out, respectively carrying out feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets; calculating a first evaluation value for each of the plurality of candidate data sets; further, the n +1 th group of data sets that enter the next-order feature transformation is determined based on the first evaluation value of each candidate data set, the number of data sets in the n +1 th group of data sets being smaller than the number of the plurality of candidate data sets.

Description

Data processing method and related equipment and system

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method, and related devices and systems.

Background

With the advent of industry 4.0, the traditional industry is gradually moving towards digital services. However, some conventional industries lack technology accumulation in aspects of big data processing, cloud computing, Artificial Intelligence (AI), and the like, and do not have the ability to apply AI technology transformation. Cloud computing is an important service platform of digital economy, and automatic machine learning service provided based on cloud becomes core competitiveness of the cloud platform.

Feature engineering (feature engineering) is an important link of automatic machine learning, wherein the feature engineering is to obtain a plurality of candidate data sets by performing feature transformation on an original data set, and obtain an optimal data set by evaluating the candidate data sets, wherein the optimal data set comprises data features for machine learning, the data features can describe the characteristics of the original data set in an all-around and multi-angle manner, and a model established by the data features can show good performance.

At present, obtaining high-order features by means of iterative feature transformation is a main means for obtaining a plurality of candidate data sets by automatic feature engineering, however, when feature transformation operations are more, exponential growth occurs in the candidate data sets obtained by transformation, and each transformed data set needs performance evaluation, so that the time consumption for determining the optimal data set is long, and the automation efficiency of the feature engineering is low.

Disclosure of Invention

The embodiment of the application provides a data processing method, related equipment and a system, which solve the defect that in the prior art, when feature transformation operations are more, an exponential increase occurs in a candidate data set obtained through transformation, and improve the automation efficiency of feature engineering.

In a first aspect, an embodiment of the present application provides a data processing method, which is applicable to an execution device, and includes: an executing device acquires a first set of data sets, the first set of data sets comprising a plurality of data features; and performing multi-order feature transformation on a plurality of data features in the first group of data sets, and further determining a target data set from a first set, wherein the first set comprises data sets obtained by each order of feature transformation in the multi-order feature transformation process. The nth order feature transformation in the multi-order feature transformation is specifically realized as follows: respectively performing feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets, wherein the nth data set is a data set obtained by performing n-1 order feature transformation on the first data set, and n is an integer greater than 1; respectively calculating a first evaluation value of each candidate data set in the plurality of candidate data sets, wherein the first evaluation value is used for evaluating the accuracy of a model obtained by candidate data set training; further, an n +1 th group of data sets is determined based on the first evaluation value of each of the plurality of candidate data sets, the n +1 th group of data sets being subjected to feature transformation of a next order, the number of data sets in the n +1 th group of data sets being smaller than the number of the plurality of candidate data sets.

The first data set may be a raw data set submitted or sent to the execution device for a user, or preprocessed data of the raw data set. The first group of data sets comprises a plurality of samples, the target data set is the optimal data set determined in the feature engineering, and the model obtained through training of the optimal data set is better.

The "multi-order feature transformation" refers to performing multiple feature transformations on a data set obtained by the current feature transformation as the basis of the next feature transformation.

It should be understood that, after obtaining the target data set, the execution device may further obtain a target feature transformation algorithm for transforming the target data set, and the execution device may further obtain a target machine learning model by training the newly-built machine learning model through the target data set, and further send the target machine learning model and the target feature transformation algorithm to the device on the user side through the communication interface of the execution device.

It should also be understood that the execution device may be a terminal device, a server, or a device capable of implementing data computation, such as a virtual machine, and is not limited thereto.

According to the method, only part of the candidate data sets are selected from the multiple candidate data sets obtained through nth-order feature transformation to be used as the (n + 1) th group of data sets for next-order feature transformation, exponential increase of the number of the data sets is avoided, data processing speed is increased, and automation efficiency of feature engineering is improved.

As a possible embodiment, the first candidate data set is any one of a plurality of candidate data sets, and the calculation method of the first evaluation value of the first candidate data set may be: the execution device calculates a meta-feature of the first candidate data set according to the first candidate data set, wherein the meta-feature is used for representing the attribute of the first candidate data set; inputting the meta-features into a first machine learning model to predict a second evaluation value of the first candidate data set, the second evaluation value of the first candidate data set being used to evaluate the accuracy of a model trained from the first candidate data set; further, a first evaluation value of the first candidate data set is determined based on the second evaluation value of the first candidate data set.

It should be understood that, since the first machine learning model is obtained by using the meta-features of the data set as training data, since the meta-features are attributes describing the data set and are independent of the physical meaning of the data features in the data set and the values of the data features, the first machine learning model can be obtained by off-line training and is suitable for the evaluation of all the data sets.

In the prior art, the evaluation method of the candidate data sets needs to train and test each candidate data set, and the online training is time-consuming. In the method, the first machine learning model is an offline trained model, the evaluation value of the data set corresponding to the meta-feature can be directly predicted according to the meta-feature, the candidate data set obtained by screening is further screened based on the first evaluation value, only a small amount of candidate data sets are reserved to enter next-order feature transformation, the process of the feature transformation is accelerated, and the target data set can be quickly obtained.

As a possible implementation, the first candidate data set includes a plurality of data features and a tag, and the meta-feature of the first candidate data set may be calculated by: the execution device calculates first information according to the first candidate data set, wherein the first information can comprise at least one of data similarity and distribution similarity of every two data features in the plurality of data features of the first candidate data set, data similarity and distribution similarity of each data feature in the plurality of data features of the first candidate data set and the label, data distribution information of each data feature in the plurality of data features of the first candidate data set, data distribution information of the label and the like; further, meta features of the first candidate data set are calculated from the first information.

Optionally, the meta-features of the first candidate data set may include: at least one of the basic feature of the first candidate data set, the feature of the continuous data feature in the plurality of data features of the first candidate data set, the feature of the discrete data feature in the plurality of data features of the first candidate data set, the feature of the label, the feature of the data similarity, the feature of the distribution information of the data feature, and the like.

Optionally, the first data feature and the second data feature are any two data features of the plurality of data features of the first candidate data set, and the calculation method of the data similarity between the first feature and the second feature may be: the execution equipment calculates mutual information of the first data characteristic and the second data characteristic according to the data of the first data characteristic and the data of the second data characteristic in the first candidate data set, and then determines the data similarity of the first data characteristic and the second data characteristic according to the mutual information. For example, the data similarity between the first data feature and the second data feature is mutual information between the first data feature and the second data feature.

Mutual Information (MI) is an information measure in information theory, and can be regarded as the amount of information contained in a random variable about another random variable, or the fact that a random variable is reduced due to the fact that another random variable is known. Therefore, the mutual information can describe the data similarity between the data characteristics, when the correlation between the data characteristics is strong, the corresponding mutual information value is larger, otherwise, the correlation is smaller

Further, mutual information of the first data characteristic and the label can be calculated, and then the data similarity of the first data characteristic and the label is obtained.

Optionally, the first data feature and the second data feature are any two data features of the plurality of data features of the first candidate data set, and the calculation method of the distribution similarity between the first data feature and the second data feature may be: and the execution equipment calculates chi-square values of the first data characteristic and the second data characteristic through chi-square test or calculates T statistic of the first data characteristic and the second data characteristic through T test according to the data of the first data characteristic and the data of the second data characteristic, wherein the chi-square values or the T statistic is the distribution similarity of the first data characteristic and the second data characteristic.

Further, chi-square value or t statistic of the first data feature and the label can be calculated, and distribution similarity of the first data feature and the label is obtained.

Optionally, the first data feature is any one of a plurality of data features of the first candidate data set, and the calculation method of the distribution information of the first data feature may be: the execution device may calculate skewness and kurtosis of the first data feature from data of the first data feature, and the distribution information of the first data feature includes the skewness and the kurtosis.

Further, skewness and kurtosis of the label can also be calculated. Wherein, skewness (skewness) refers to the asymmetry degree or skew degree of data distribution, and is a measure for counting the skew direction and degree of data distribution; kurtosis (kurtosis) refers to the degree of concentration of data and the degree of steepness (or flatness) of the distribution curve.

As a possible implementation, in a first implementation in which the first evaluation value of the first candidate data set is determined from the second evaluation value of the first candidate data set: the first evaluation value of the first candidate data set is the second evaluation value of the first candidate data set.

As a possible implementation, the first candidate data set is obtained by first feature transformation for a first data set, and the first data set is one of the nth data sets. In a second implementation of determining a first evaluation value of a first candidate data set from a second evaluation value of the first candidate data set: the first evaluation value of the first candidate data set may be a sum of the first data item and the second data item; wherein the first data item is positively correlated with the second evaluation value of the first candidate data set, and the second data item is determined by the number of historical gains of the first feature transform.

It is to be understood that the first evaluation value of the data set after the first feature transformation in the first n groups of data sets is larger than the first evaluation value of the data set before the first feature transformation, the first feature transformation occurs a gain once.

With the above method, the first evaluation value of the first candidate data set is adjusted in common by the second evaluation value of the first candidate data set and the number of historical gains of the first feature transformation that produced the first candidate data set, as opposed to evaluating the candidate data sets by only the second evaluation value, taking into account the number of historical gains of the feature transformation, and the transformation can be prevented from falling into local optima.

As one possible implementation, a first implementation of determining the n +1 th group of data sets according to the first evaluation value in the plurality of candidate data sets may be: the execution device selects a candidate data set of the plurality of candidate data sets whose first evaluation value is larger than a first threshold value as an n +1 th group data set.

As one possible implementation, the second implementation of determining the n +1 th group of data sets according to the first evaluation value in the plurality of candidate data sets may be: the execution device selects candidate data sets respectively corresponding to the first m first evaluation values of evaluation value ranking in the plurality of candidate data sets as an n +1 th group of data sets, the evaluation value ranking is the first evaluation values respectively corresponding to the plurality of candidate data sets arranged from large to small, and m is a positive integer.

As one possible implementation, the third implementation of determining the n +1 th group of data sets according to the first evaluation value in the plurality of candidate data sets may be: the execution device selects a candidate data set of which the first evaluation value satisfies a first condition among the plurality of candidate data sets; further, training and testing a model of each candidate data set in the candidate data sets meeting the first condition respectively to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition respectively; further, the candidate data set in which the third evaluation value satisfies the second condition among the candidate data sets satisfying the first condition is selected as the n +1 th group of data sets.

In the third implementation, the number of the candidate data sets for training and testing is reduced on the basis of the candidate data set screened by the first evaluation value, the screened candidate data sets are evaluated more accurately, and the candidate data sets are further screened on the basis of the accurate evaluation value, so that the branches are further reduced, the complexity of feature transformation is reduced, and the data processing efficiency is improved.

Alternatively, a candidate data set in which the first evaluation value satisfies the first condition among the plurality of candidate data sets may be a candidate data set in which the first evaluation value is larger than the second threshold among the plurality of candidate data sets; or, the first g first evaluation values of the evaluation value ranking in the plurality of candidate data sets respectively correspond to the candidate data sets, the evaluation value ranking is the first evaluation values of the plurality of candidate data sets respectively corresponding to the ranked multiple candidate data sets from large to small, and g is a positive integer.

Optionally, the second candidate data set is any one of candidate data sets satisfying the first condition, and includes a training data set and a testing data set, where any one sample in the training data set and the testing data set includes a plurality of data features and a label; the calculation method of the third evaluation value of the second candidate data set may be: the executing device trains a second machine learning model according to the training data set; inputting a plurality of data characteristics of each sample in the test data set into the second machine learning model to obtain a prediction label of each sample in the test data set; further, a third evaluation value for the second candidate data set is calculated based on the label and the predicted label for each sample in the test data set.

It should be understood that the third evaluation value may be F1score (F1score), average accuracy (MAP), auc (area under roc curve), mean-square error (MSE), root-mean-square error (root mean square error), recall, precision, etc., which are not limited thereto.

As a possible implementation, before inputting the meta-feature into the first machine learning model to predict the second evaluation value of the first candidate data set, the method may further include: the execution device acquires a plurality of first samples, wherein any one of the plurality of first samples comprises the meta-feature of the third data set and the evaluation value of the third data set; training the first machine learning model from a plurality of first samples.

For the method for calculating the meta-feature, reference may be made to the related description in the first aspect, and details are not repeated in the embodiments of the present application.

It should be understood that the first machine learning model may be used to predict the evaluation value of the input data set, that is, the second evaluation value in the above-described first aspect, based on the meta-features of the data set.

The first machine learning model training method may be executed by a training device, and the executing device may be the same as the training device, which is not limited to this.

According to the method, the first machine learning model trained in an off-line mode can be suitable for all data sets, the second evaluation value of the candidate data set can be predicted based on the meta-features of the candidate data set, then the candidate data set is screened based on the second evaluation value, bad candidate data sets are removed, the increase of the number of the data sets is further limited, and the data processing efficiency is improved.

As a possible implementation manner, before performing feature transformation on each data set in the nth data set to obtain multiple candidate data sets, the execution device may further select a feature transformation algorithm applicable to the data set according to the data set in the nth data set, and the specific implementation may be: the execution device can input the meta features of the third data set into a third machine learning model, and predict to obtain fourth evaluation values corresponding to B feature transformations respectively, wherein the fourth evaluation value corresponding to the second feature transformation is used for evaluating the accuracy of a model obtained by training a candidate data set obtained by the third data set through the second feature transformation, the third data set is any one of the nth data set, the second feature transformation is any one of the B feature transformations, and B is a positive integer; selecting a feature transform corresponding to a fourth evaluation value meeting a fourth condition from B feature transforms, wherein A is a positive integer not greater than B; at this time, one embodiment of the executing device performing feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets may be: and the executing equipment performs A kinds of feature transformation on the third data set to obtain A candidate data sets.

It should be understood that, since the third machine learning model is obtained by using the meta-features of the data set as training data, since the meta-features are attributes describing the data set and are independent of the physical meaning of the data features in the data set and the values of the data features, the third machine learning model can be obtained by off-line training and is suitable for the evaluation of all data sets.

In the method, a third machine learning model trained offline is used for estimating fourth evaluation values corresponding to each feature transformation respectively between feature transformations of data sets in the nth data set, feature transformations of the data sets which can enable the data sets to generate excellent are screened out based on the fourth evaluation values, feature transformations are only performed on the data sets through the screened feature transformations, calculation of special transformations and first evaluation values is reduced, and data processing is accelerated through pre-pruning before transformation.

As a possible implementation manner, before inputting the meta-features of the third data set into the third machine learning model and predicting the fourth evaluation values corresponding to the B feature transforms, the performing device further includes: a third machine learning model is trained. The training method can be realized by the following two methods:

the first realization is as follows:

the execution device acquires a plurality of second samples, wherein any one of the second samples comprises the meta-feature of the fourth data set and the difference between the evaluation value of the data set after the fourth data set is subjected to second feature transformation and the evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations; training the third machine learning model according to the plurality of second samples.

At this time, the a-type feature transformation may specifically be a feature transformation corresponding to the fourth evaluation value having a value greater than 0 selected among the B-type feature transformations.

The second realization:

the execution device acquires a plurality of third samples, wherein any one of the third samples comprises the meta-feature of the fourth data set and a fourth evaluation value of the data set of the second data set after the second feature transformation; training the third machine learning model according to a plurality of third samples.

At this time, the a feature transformation may be specifically a transformation in which, of the B feature transformations, a feature transformation corresponding to a third evaluation value having a value greater than the first evaluation value of the data set is selected_iAnd (5) carrying out feature transformation.

According to the method, the third machine learning model trained offline can be suitable for all data sets, and the advantages and disadvantages of the candidate data sets obtained through feature transformation can be predicted based on the meta features of the data sets, so that feature transformation on the inferior data sets is avoided, the increase of the number of the data sets is limited, and the data processing efficiency is improved.

In a second aspect, an embodiment of the present application provides a data processing system, which may include:

a first acquisition unit for acquiring a first set of data sets, the first set of data sets comprising a plurality of data features;

a transformation unit, configured to perform multi-order feature transformation on a plurality of data features in the first set of data sets;

a first selection unit, configured to determine a target data set from a first set, where the first set includes a data set obtained by each stage of feature transformation in the multi-stage feature transformation process;

wherein the transformation unit is specifically configured to: respectively performing feature transformation on each data set in an nth data set to obtain a plurality of candidate data sets, wherein the nth data set is a data set obtained by performing n-1 order feature transformation on the first data set, and n is an integer greater than 1;

the system further comprises:

a first evaluation unit configured to calculate a first evaluation value for each of the plurality of candidate data sets, the first evaluation value being used to evaluate an accuracy of a model trained by the candidate data set;

a first screening unit, configured to determine an n +1 th group of data sets according to the first evaluation value of each of the plurality of candidate data sets, where the number of data sets in the n +1 th group of data sets is smaller than the number of the plurality of candidate data sets.

It should be noted that the system may further include other functional units for implementing the data processing method according to the first aspect, which may be referred to the related description in the data calculation method according to the first aspect, and is not described herein again.

It should be understood that, in the above system, each functional unit may be disposed in one or more computing devices, such as an execution device, which may implement data computation, for example, the execution device may be one or more servers, one or more computers, and the like, and is not limited thereto.

In a third aspect, an execution device further provided in an embodiment of the present application may include a processor and a memory, where the memory is used to store data and program codes, and the processor is used to call the data and the program codes in the memory to execute:

obtaining a first set of data sets, the first set of data sets comprising a plurality of data features;

performing a multi-order feature transform on the plurality of data features in the first set of data sets;

determining a target data set from a first set, wherein the first set comprises data sets obtained by each stage of feature transformation in the multi-stage feature transformation process;

wherein the performing a multi-order feature transformation on the plurality of data features in the first set of data sets comprises:

respectively performing feature transformation on data features in each data set in an nth data set to obtain a plurality of candidate data sets, wherein the nth data set is a data set obtained by performing n-1 order feature transformation on the first data set, and n is an integer greater than 1;

calculating a first evaluation value for each of the plurality of candidate data sets; the first evaluation value is used for evaluating the accuracy of a model obtained by training the candidate data set;

determining an n +1 th group of data sets according to the first evaluation value of each of the plurality of candidate data sets, the number of data sets in the n +1 th group of data sets being smaller than the number of the plurality of candidate data sets.

It should be noted that, the processor may further execute the data processing method according to the first aspect, which is described in the data calculation method according to the first aspect, and is not described herein again.

In an implementation of the embodiment of the present Application, the processor may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), an artificial intelligence processor, or one or more integrated circuits.

In another implementation of the embodiment of the present application, the execution device may further include an artificial intelligence processor, where the artificial intelligence processor may be any processor suitable for large-scale exclusive or operation processing, such as a neural Network Processor (NPU), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor may be mounted as a coprocessor to a main CPU (hostcu) for which tasks are assigned.

It should be understood that the computing device or execution device described above may be one or more servers, one or more computers, and so on, without limitation.

In a fourth aspect, the present application further provides a computer storage medium for computer software instructions, which when executed by a computer, cause the computer to perform any one of the data processing methods according to the first aspect.

In a fifth aspect, the present application further provides a computer program, which includes computer software instructions, when executed by a computer, cause the computer to perform any one of the data processing methods according to the first aspect.

In a sixth aspect, an embodiment of the present application further provides a training method for a machine learning model, which is applicable to a training device, and the method includes: the training device acquires a plurality of first samples, wherein any one of the plurality of first samples comprises a meta-feature of a second data set and an evaluation value of the second data set; a first machine learning model is trained from the plurality of first samples.

Optionally, a method for calculating the meta-feature is the same as the method for calculating the meta-feature of the first candidate data set in the first aspect, and reference may be made to related description in the first aspect, which is not repeated herein.

The trained first machine learning model is used for processing meta-features of a data set input to the model to obtain a second evaluation value, and the second evaluation value is used for evaluating the accuracy of the model obtained by training the data set.

According to the method, the trained first machine learning model can be suitable for all data sets, the evaluation value of the data set can be predicted based on the meta-features of the data set, the data set is evaluated through the evaluation value, training and testing on each data set needing to be predicted for the evaluation value are avoided, and the data set evaluation efficiency is improved.

In a seventh aspect, an embodiment of the present application further provides a training method for a machine learning model, which is applicable to a training device, and the method includes: the training device acquires a plurality of second samples, any one of the second samples comprises meta-features of a fourth data set and a difference value between an evaluation value of the data set subjected to second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations; training the third machine learning model according to the plurality of second samples.

Optionally, a method for calculating the meta-feature of the fourth data set is the same as the method for calculating the meta-feature of the first candidate data set in the first aspect, and reference may be made to related description in the first aspect, and details are not repeated in this embodiment of the application.

It should be noted that the trained third machine learning model is used to process the meta-features of the data set input to the model, and obtain fourth evaluation values corresponding to the B feature transformations, where the fourth evaluation values are used to evaluate the accuracy of the model obtained by training the candidate data set obtained by feature transformation corresponding to the fourth evaluation values.

According to the method, the trained third machine learning model can be applied to all data sets, whether the evaluation value of the candidate data set generated after the data set is subjected to feature transformation has gain or not can be predicted based on the meta-features of the data set, and then before feature transformation, feature change (namely, feature transformation corresponding to the candidate data set with the evaluation value having gain) suitable for the data set is predicted, so that unnecessary feature transformation is avoided, and the data processing efficiency is improved.

In an eighth aspect, an embodiment of the present application further provides a training method for a machine learning model, which is applicable to a training device, and the method includes: the training equipment acquires a plurality of third samples, wherein any one of the third samples comprises meta-features of a fourth data set and a fourth evaluation value of the data set of the second data set after second feature transformation; training the third machine learning model according to a plurality of third samples.

According to the method, the trained third machine learning model can be suitable for all data sets, the evaluation value of the candidate data set generated after the data set is subjected to feature transformation can be predicted based on the meta-features of the data set, and then the feature change suitable for the data set is predicted before the feature transformation, so that unnecessary feature transformation is avoided, and the data processing efficiency is improved.

It should be noted that the training device according to the sixth aspect, the seventh aspect, or the eighth aspect may be one or more servers, one or more computers, and the like, which is not limited in this respect.

In a ninth aspect, embodiments of the present application further provide a training apparatus, which may include a processor and a memory, where the memory is used to store data and program codes, and the processor is used to call the data and program codes in the memory to execute the training method of the machine learning model according to the sixth aspect.

In a tenth aspect, embodiments of the present application further provide a training device, which may be a training device, and the computing device may include a processor and a memory, the memory being configured to store data and program codes, and the processor being configured to call the data and program codes in the memory to perform the training method of the machine learning model according to the seventh aspect or the eighth aspect.

The processor in the ninth aspect or the tenth aspect may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), an artificial intelligence processor, one or more integrated circuits, or the like.

In another implementation of the embodiment of the present application, the training device in the seventh aspect or the eighth aspect may further include an artificial intelligence processor, where the artificial intelligence processor may be any processor suitable for large-scale exclusive or operation processing, such as a neural Network Processor (NPU), a Tensor Processing Unit (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor can be used as a coprocessor and mounted on a main CPU (host CPU), and tasks are distributed to the artificial intelligence processor by the main CPU.

In an eleventh aspect, embodiments of the present application further provide a computer storage medium for computer software instructions, which when executed by a computer, cause the computer to perform a training method of any one of the machine learning models according to the sixth aspect.

In a twelfth aspect, the present application also provides a computer program, which includes computer software instructions, when executed by a computer, cause the computer to execute the training method of any one of the machine learning models according to the sixth aspect.

In a thirteenth aspect, embodiments of the present application further provide a computer storage medium for computer software instructions, which when executed by a computer, cause the computer to perform a training method of a machine learning model according to any one of the seventh aspect or the eighth aspect.

In a fourteenth aspect, the present application also provides a computer program, which includes computer software instructions, when executed by a computer, cause the computer to execute the training method of any one of the machine learning models according to the seventh aspect or the eighth aspect.

In a fifteenth aspect, an embodiment of the present application further provides a chip, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface to execute the data processing method in the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the data processing method in the first aspect, or the method in any one of the training methods of the machine learning model in the sixth aspect, the seventh aspect, or the eighth aspect.

In a sixteenth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes the data processing system or the execution device in any one of the second aspect and the third aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

FIG. 1 is a schematic block diagram of a system in an embodiment of the present application;

FIG. 2 is a schematic block diagram of another system in an embodiment of the present application;

FIG. 3 is a schematic interface diagram of a graphical user interface according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for calculating meta-features of a data set according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 6A is a schematic flow chart illustrating feature transformation and selection of an nth order in an embodiment of the present application;

FIG. 6B is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;

FIG. 6C is a schematic illustration of feature transformation and screening in an embodiment of the present application;

FIG. 7 is a schematic block diagram of a data processing system in an embodiment of the present application;

FIG. 8 is a schematic block diagram of an execution device in an embodiment of the present application;

FIG. 9 is a schematic block diagram of a training apparatus in an embodiment of the present application;

FIG. 10 is a schematic block diagram of a training apparatus in an embodiment of the present application;

fig. 11 is a schematic block diagram of a chip in an embodiment of the present application.

Detailed Description

The concepts involved in this application are first introduced.

In the embodiments of the present application, a "machine learning model," also referred to as a "model," such as a "first machine learning model," a "second machine learning model," or a "third machine learning model," may receive input data and generate a prediction output based on the received input data and current model parameters. The machine learning model may be a regression model, A Neural Network (ANN), a Deep Neural Network (DNN), a Support Vector Machine (SVM), or other machine learning models.

In the embodiment of the present application, the "raw data set" is a raw data set submitted or sent by a user to a cloud platform or a very careful device. The original data set is used for training the established machine learning model to obtain the machine learning model capable of realizing certain functions. The data in the raw data set may be structured data, for example, represented by a "table. The raw data set includes M samples, each of which may include a plurality of data features and a label.

In the embodiment of the present application, a first group of data sets is obtained by performing data preprocessing on an original data set, where the first group of data sets may include M samples, and any one sample of the M samples includes N1 data features and tags. The preprocessing may include one or more of data cleaning (data cleaning), formatting, feature digitizing, etc., among others. For example, it is necessary to encode "male" and "female" in the data set, such as one-hot encoder (one-hot encoder) and mean encoder (mean encoder), and the data features are described by vectors. It should be appreciated that when feature transforming a data set, only the data features in the sample are transformed, the labels in the data set are not transformed, and the feature transformation may generate new data features. That is, the number of data features and the meaning of data feature designation of each sample in the feature-transformed data set or candidate data set are changed.

The processed "data sets" and candidate data sets of each group all include M samples, and different data sets or samples in the candidate data sets may include different data features, different numbers of data features, and the like. Note that the label corresponding to each sample is not changed. That is, the data set is subjected to feature transformation to obtain a new data set, the features of the samples in the new data set are transformed, the data features of higher order appear in proportion, and the like, but the labels corresponding to the samples are not changed.

In the embodiment of the present application, hierarchical relationships may exist between data sets, relationships between data sets may be described by "groups," each group data set may include one or more data sets, and relationships between data sets of multiple groups in the embodiment of the present application may also be described by a tree structure (also referred to as a search tree). Performing multiple feature transformations on the 1 st group of data sets (layer 1 nodes, also called root nodes) to obtain multiple candidate data sets, and selecting multiple candidate data sets with better evaluation values from the multiple candidate data sets as the 2 nd group of data sets (data sets corresponding to the layer 2 nodes); furthermore, for each data set in the group 2 data set, a plurality of feature transformations are performed to obtain a plurality of candidate data sets, and a part of the candidate data sets with a better evaluation value is selected from the plurality of candidate data sets as a group 3 data set (a data set corresponding to a node in the layer 3), and so on. It can be seen that the level 2 node is a child node of the level 1 node, and similarly, the level 3 node is a child node of the level 2 node. In addition, the group 1 data set includes a data set, which may be a data set obtained by subjecting an original data set to a preliminary processing, and the preliminary processing may include one or more of encoding (e.g., one-hot, meanencorder, etc.) and normalization, and is not limited herein.

In the embodiment of the application, "pruning" means that the number of data sets in each order of feature transformation data structure is reduced by screening, so that some unnecessary feature transformations are avoided, and vividly, some "branches" in a search tree are pruned.

The data set or the candidate data set can be divided into a training data set and a testing data set, wherein the training data set is used for training the model to obtain the trained model; and predicting the test data set by using the trained model, comparing the predicted result with the real result of the test data, and evaluating the comparison to obtain an evaluation value, which is also called as the performance of the model on the data set. It should be understood that the above evaluation process of the data set is evaluated based on the model trained by the data set, and the reliability of the obtained evaluation value is high.

The data may be classified into continuous data (continuous data), discrete data (discrete data), etc.; the data can also be divided into fixed-range data (scale data), ordered-type data (ordinal data), fixed-type data (nominal data) and the like according to the measurement scale of the data. The data features can be divided into continuous data features and discrete data features according to the data types of the data, and the executing device can screen out a feature transformation algorithm suitable for the data features according to the types of the data features.

For example, "cost" is a continuous data feature, and the corresponding transformation of the feature "cost" may include normalization, log, square, etc.; for another example, the "gender" is a discrete data feature, and the transformation corresponding to the "gender" may include one-hot, meanencoder, and other encoding operations, frequency operations (Freg).

In the embodiment of the present application, "feature transformation" refers to processing data of a feature by a feature transformation algorithm to obtain a new feature or a higher-order feature. The transformation operation may be performed on a single feature, or may be performed on a plurality of features, which is not limited in this respect.

The feature transformation may include transformation for one data feature (may also be referred to as single feature transformation), feature transformation for two data features (may also be referred to as binary transformation), and transformation for two or more data features (may also be referred to as multivariate transformation). For single feature transformation, the feature transformation algorithm for the continuous data feature may include one or more of normalization, non-linear, discretization, and the like. Wherein, the normalization method can include maximum-minimum normalization (min max normalization), 0-1normalization (0-1normalization), linear function normalization or dispersion normalization, etc.; the non-linear operation may include one or more of logarithm (log), square (square), square (sqrt), sigmoid (sigmoid) and hyperbolic tangent (tanh) among others; the discretization operation may include one or more of a discretization operation based on an equal width (equal width) or an equal frequency (equal frequency), a supervised discretization operation based on a minimum description length principle (minimum description length principle), and a rounding operation (e.g., round function, etc.). The feature transformation operation for the discrete data feature may include Frequency (Frequency), which is the number of samples with a specific value of the statistical data feature. The binary or multivariate transformation may include one or more of basic mathematical operations (e.g., addition, subtraction, multiplication, division, etc.), aggregation operations (group) and temporal aggregation (group by time) operations, etc., for a plurality of data features.

It should be noted that, the foregoing describes some feature transformations by way of example only, and the embodiments of the present application may also include other feature transformation methods, which are not limited in this respect.

In the embodiment of the present application, "multi-level feature transformation" refers to performing multiple feature transformations on a data set obtained by this feature transformation as a basis for the next feature transformation. That is, the first group of data sets is subjected to first-order feature transformation to obtain a second group of data sets, the second group of data sets is subjected to second-order feature transformation to obtain a third group of data sets, and by analogy, when the condition of stopping transformation is met, feature transformation is not performed. It should be noted that the feature transformation algorithms used in the feature transformation of each order may be the same or different.

In the embodiment of the present application, "evaluation value" (first evaluation value, second evaluation value, third evaluation value, fourth evaluation value, etc.) is used to evaluate the merits of a data set or a candidate data set, and is generally used to describe the performance (accuracy, generalization ability, etc.) of a model obtained by training the data set.

In the embodiment of the present application, "data features" are used to describe the data set or the samples in the candidate data set, and "meta-features" (meta-features) are used to describe the data set or the candidate data set. Where a "meta-feature" describes a general property of a data set or candidate data set by a single feature, the complexity of the data set or candidate data set may be characterized.

For example, the data set includes a plurality of samples, each sample includes data features such as "age", "academic calendar", "university", "gender", "date of birth", "occupation", "working life", and the like, and the corresponding label of the sample is "salary". It can be seen that the user's goal is to have a machine learning model that can predict payroll through data set training. The meta-features of the data set may include the number of samples, the number of data features, the data similarity of each data feature to the tag, the valued distribution information of each data feature, the information entropy of the tag, and so on.

Referring to fig. 1, a system architecture designed by the embodiment of the present application is described below, and the system 10 may include a training device 110, an execution device 120, a client device 130, a terminal device 140, a data storage system 150, and the like. Wherein:

the data storage system 150 may store a plurality of sample data for training the first machine learning model, the third machine learning model, and the training device 110 is configured to execute program codes of a model training method to train the machine learning model; the execution device 120 is used for executing program codes of the data processing method, the data set, a candidate data set generated by the data set through feature transformation, a second machine learning model obtained by training the candidate data set, and the like.

The training device 110 may obtain the sample data in the data storage system 150 to train the first machine learning model and the third machine learning model, and the specific training method may refer to the following description of the embodiment of the training method of the first machine learning model or the embodiment of the training method of the third machine learning model, which is not repeated in this embodiment of the present application. The training device 110 transmits the trained first machine learning model and third machine learning model to the execution device 120.

The trained first machine learning model is used for processing the meta-features of the data set input into the model to obtain a second evaluation value, and the second evaluation value is used for evaluating the accuracy of the model obtained by training the data set. And the trained third machine learning model is used for processing the meta-features of the data set input into the model to obtain fourth evaluation values in one-to-one correspondence with the B feature transformations, wherein the fourth evaluation values are used for evaluating the accuracy of the model obtained by training the candidate data set obtained by feature transformation corresponding to the fourth evaluation values.

The first machine learning model and the third machine learning model are obtained by taking the meta-features of the data set as training data, and the meta-features are attributes describing the data set and are irrelevant to the physical significance of the data features in the data set and the values of the data features, so that the first machine learning model and the third machine learning model can be suitable for evaluation of all the data sets.

In one case, the customer may specify data (e.g., raw data sets in embodiments of the present application) to be input into the execution device 120, for example, to operate in an interface provided by an I/O interface of the execution device 120. Alternatively, the client device 130 may automatically enter data into the I/O interface and obtain the results, and if the client device 130 automatically enters data to obtain authorization from the user, the client may set the corresponding permissions in the client device 130. The client device 130 requests the executing device 120 to use an automatic machine learning service for the raw data set to obtain the machine learning model (also referred to as the target machine learning model in the embodiments of the present application) desired by the user. The client can view the result output by the execution device 120 at the client device 130, and the specific presentation form can be a display, a sound, an action, and the like. A client may input data, such as a raw data set, to the execution device 120 through the client device 130. The client device 130 may also act as a data collection site to store the collected data set in the data storage system 150.

The execution device 120 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers, and the like; the enforcement devices 120 may be disposed on one physical site or distributed across multiple physical sites. The execution device 120 may use data of the data storage system 150 or call program codes in the data storage system 150 to implement the data processing method according to the embodiment of the present application, and specifically, the execution device 120 performs data preprocessing on a received raw data set to obtain a first group of data sets (for example, the raw data set sent by the client device 130), and further obtains an optimal data set (also referred to as a target data set in the embodiment of the present application) and a feature transformation algorithm (also referred to as a target feature transformation algorithm) corresponding to the optimal data set through multi-level feature transformation and selection. Further, the execution device may train the established machine learning model through the optimal data set to obtain the target machine learning model. Wherein the acceleration of feature transformation and selection may be achieved using offline training of the first machine learning model and the third machine learning model in performing the multi-order feature transformation.

Before performing the feature transformation on the data set, the executing device 120 may input the meta features of the data set into the third machine learning model, obtain fourth evaluation values corresponding to the B kinds of feature transformations one to one, and based on the feature transformation corresponding to the fourth evaluation value that is selected from the B kinds of feature transformations, only perform the selected feature transformation on the data set, thereby avoiding all the feature transformations on the data set.

Wherein the executing device 120 may input the plurality of candidate data sets to the first machine learning model after performing the feature transformation on the data set to obtain a plurality of candidate data sets, and the second evaluation value of each of the plurality of candidate data sets may be used to filter the candidate data sets based on the second evaluation value to further reduce the number of candidate data sets. The candidate data set may be classified into a training data set and a testing data set, and the execution device 120 may train the second machine learning model through the training data set, and then test and evaluate the second machine learning model through the testing data set, to obtain a third evaluation value for evaluating the accuracy of the second machine learning model obtained by training the candidate data set. Since the third evaluation value is obtained by evaluating a model trained from the training data set in the candidate data set, the candidate data set can be evaluated more accurately. The execution device 120 may further screen the screened candidate data sets based on the third evaluation value, and only a small number of candidate data sets are retained to enter the next-order feature transformation, so that the number of data sets is greatly reduced, and the feature transformation efficiency is improved. For specific implementation, reference may be made to relevant descriptions in the embodiments of the data processing method in the embodiments of the present application, and details are not described here.

Further, the executing device 120 obtains a plurality of data sets after the multi-order feature transformation, and further determines a target data set (also referred to as an optimal data set in this embodiment) in the plurality of data sets and a target feature transformation algorithm adopted by the original data set to transform the target data set according to a third evaluation value of the plurality of data sets, and further performs model training through the target data set to obtain a target machine learning model required by the first user.

Still further, the execution device 120 may also send the target feature transformation algorithm and the target machine learning model to the user device 130.

The user may operate the respective end device 140 to interact with the executive device 120 or the client device 130 via a communication network of any communication mechanism/communication standard to use the target feature transformation algorithm and the target machine learning model for the predicted services. The communication network may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

For example, the raw data set is as shown in table 1:

TABLE 1

The trained target machine learning model has the ability of predicting salary. The terminal device 140 sends a first request to the execution device 120, where the request carries information of a first object, and the information of the first object includes gender, academic calendar, birth date, specialty, and working year. The executing device 120 performs feature transformation on the information of the first object through a target feature transformation algorithm, and inputs the data after feature transformation into a target machine learning model to obtain the predicted wage of the first object. The enforcement device 120 may send the predicted payroll to the terminal device 140.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the position relationship between the devices, modules, and the like shown in fig. 1 does not constitute any limitation, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 120, and in other cases, the data storage system 150 may also be disposed in the execution device 120.

It should be noted that, in the embodiment of the present application, the training device 110 and the executing device 120 may be the same device or different devices. The training device 110 and/or the executing device 120 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an AR/VR, a vehicle-mounted terminal, or the like, or may also be a server or a virtual machine, or may also be a distributed computer system formed by one or more servers and/or computers, or the like, which is not limited in the embodiment of the present application. The client device 130 may be a server, a computer, or a terminal device, etc. The terminal device 140 may include a smart phone, a tablet computer, a personal computer, a desktop computer, an On Board Unit (OBU), a virtual reality device, an artificial intelligence device (e.g., a robot, etc.), an intelligent wearable device, or the like, which is not limited in the embodiments of the present application.

An application scenario designed by the embodiment of the present application is described below with reference to the system shown in fig. 2 and the graphical user interface shown in fig. 3. The cloud system may include a cloud platform 210 and a cloud host, where the cloud platform may create a virtual machine on the cloud host, and when the virtual machine runs on the cloud host, the virtual machine needs to occupy computing resources of the cloud host, where the computing resources may be resources such as a Central Processing Unit (CPU), a neural network unit (NPU), and/or a memory of the cloud host.

In the embodiment of the present application, the cloud platform 210 may implement the functions of the execution device 120 and/or the training device 110 in fig. 1.

The cloud platform 210 provides a service for automatic machine learning to a user and provides a graphical user interface to the user, based on which the user device 220 can interact with the cloud platform 210 for information. Fig. 3 is a graphical user interface provided by the cloud platform 210 to a user, the graphical user interface 300 may be displayed on the client device 220 for providing an automatic machine learning service to the user. The graphical user interface may include at least one control, and in response to a detected user operation with respect to the control, a user interface associated with the control is displayed.

The client device 220 may display an import data window to read the original data set in response to a user operation on a first control (e.g., icon 301 labeled "read data" in fig. 3), which may include files and/or folders stored within the client device 220, and upload the first file to the cloud platform 210 in response to a user operation entered by the user on the first file (e.g., a file containing the original data set or a file containing a first set of data sets).

The client device 220 displays a data modification window for data modification of the imported original data set in response to the user pointing to a second control (e.g., icon 302 labeled "modify original data" in fig. 2). The data modification window may include a plurality of modification operations for each data feature and/or tag in the original data set. It should be understood that the cloud platform 210 may modify the original data set automatically, or the user may modify the original data set in a preprocessing manner selected by himself to obtain a data set meeting the requirement. The modification operation may include conversion of a value type, specifying tag data, and the like, and the embodiment of the present application is not limited.

In response to a user operation input by the user with respect to a third control element (for example, an icon 303 labeled "automatic modeling" in fig. 2), the client device 220 sends a modeling instruction to the cloud platform 210, where the instruction instructs the cloud platform 210 to process the modified data of the original data set by using the data processing method provided in the embodiment of the present application, that is, to perform feature preprocessing, multi-order feature transformation, and selection, so as to obtain an optimal data set (also referred to as a target data set in the embodiment of the present application). In particular, the cloud platform 210 may include a receiving module 211, a preprocessing module 212, a feature transformation module 213, a dataset determination module 214, a training module 215, a sending module 216, and the like. The cloud platform 210 may receive an original data set sent by the client device 220 through the receiving module 211, perform data preprocessing on the original data set through the preprocessing module 212 to obtain a first group of data sets (i.e., root node data sets), further perform multi-order feature transformation on the first group of data sets through the feature transformation module 213 to obtain a plurality of data sets and an evaluation value (e.g., a first evaluation value, a second evaluation value, and/or a third evaluation value of the data set obtained in this embodiment of the present application) of each data set in the plurality of data sets, and find an optimal data set and a target feature transformation algorithm for obtaining the optimal data set through the data set determining module 214 according to the evaluation values of the plurality of data sets. The training module 215 determines hyper-parameters of the machine learning model based on the optimal data set and establishes the machine learning model, and then trains the established machine learning model through the optimal data set to obtain a target machine learning model that the user needs to have a certain function.

The client device 220, in response to a user operation input with respect to a fourth control (e.g., icon 304 labeled "save model" in fig. 2), sends an instruction to the cloud platform 210 instructing the target feature transformation algorithm and the target machine learning model to be saved. In response to the instructions, the cloud platform 210 may store the target machine learning model and the target feature transformation algorithm, and may also transmit the target machine learning model and the target feature transformation algorithm to the client device 220 via the transmitting module 216.

The graphical user interface may further include a fifth control, and in response to a detected user operation input for the fifth control (for example, an icon 305 labeled "split data" in fig. 2), divide data uploaded by a user into training data and test data, where the training data is the original data set or the first group of data sets, and the test data is used to implement evaluation on the target machine learning model.

In response to a user operation input by the client device 220 with respect to a sixth control (e.g., icon 306 labeled "model application" in fig. 2), the client device 220 sends an instruction to the cloud platform 210 indicating that a model is to be predicted. After receiving the instruction, the cloud platform 210 performs feature transformation on the test data through a target feature transformation algorithm, and inputs the data after the feature transformation into the target machine learning model to obtain a prediction result of each test sample, and optionally, the cloud platform 210 may send the prediction result to the client device 220.

In response to user operation input by the user with respect to a seventh control (e.g., icon 307 labeled "model evaluation" in fig. 2), the client device 220 sends an instruction to the cloud platform 210 indicating that the model is to be evaluated. After receiving the instruction, the cloud platform 210 compares the prediction result with the real result (i.e., the label in the test sample) to obtain an evaluation value for evaluating the prediction accuracy of the target machine learning model. The cloud platform 210 may also transmit the evaluation value to the client device 220, and the client device 220 may display the evaluation value.

In response to a user operation input by the client device 220 with respect to an eighth control (e.g., icon 308 labeled "save data to dataset" in fig. 2), the client device 220 sends an instruction to the cloud platform 210 instructing the prediction result to be saved. In another implementation, the cloud platform 210 may also store other data, for example, the meta-feature of the data set or the candidate data set obtained in the multi-order feature transformation process, and an evaluation value (for example, a third evaluation value) corresponding to the meta-feature, for the meta-feature and the third evaluation value, reference may be made to the following description related to the embodiment of the meta-feature calculation method and the data processing method, which is not repeated in this embodiment.

It should be noted that fig. 2 is only an exemplary illustration of how to implement the human-computer interaction process, and in practical applications, the process may also include other forms of graphical user interfaces, and the human-computer interaction process may also include other implementation manners, which is not limited herein. It should also be noted that the client device 220 may be the client device 130 in fig. 1 described above. The cloud platform 210 may be the execution device 120 of fig. 1 described above.

It should be appreciated that the preprocessing of the data may format convert the received raw data set to a format required by the automated machine learning service, e.g., the data is converted to broad-tabulated data, i.e., each row represents a sample and each column represents a data feature, including a label column.

It should also be appreciated that the performance (e.g., accuracy of prediction, generalization ability, etc.) of the trained machine learning model depends on the optimal data set and algorithms, etc. that train the machine model. The method aims to obtain a target data set according to an original data set sent by a user, and the target data set can be used for determining hyper-parameters or training a machine learning model required by the user.

Specific application scenarios are exemplarily described below.

A first application scenario:

mobile communication operators wish to mine more prepaid subscribers to be converted to postpaid subscribers, requiring the identification of potential postpaid subscribers from among prepaid subscribers. At this time, the mobile communication operator may upload sample data (an original data set) to the cloud platform 210 based on the automatic machine learning service, where the original data set includes information of a plurality of prepaid subscribers, the information of one subscriber indicates one sample, the information of the subscriber may include data characteristics of the age, package, monthly average telephone charge, monthly average data traffic, SIM card usage duration, and the like of the subscriber, and designate a subscriber type (the subscriber type includes a prepaid subscriber and a postpaid subscriber) of the subscriber after a preset duration as a tag.

The cloud platform 210 may process data features in an original data set including information of a large number of prepaid users by using the data processing method provided in the embodiment of the present application, obtain a target data set in the application scenario and a target feature transformation algorithm corresponding to the target data set, determine a hyper-parameter through the target data set, establish a machine learning model, train the established machine learning model by supervising user types with the data features of the target data set as input, and finally obtain the target machine learning model. Through the target feature transformation algorithm and the target machine learning model, whether the user is a potential post-paid user or not can be predicted when the data features of the pre-paid user are known.

A second application scenario:

the communication operator hopes to predict a package for a user after L months, at this time, package use information of the user at a first time can be used as a feature, a package used by the user at a second time can be used as a label, and wide table data is constructed to be used as training data of a machine learning model capable of realizing package recommendation of an operation and maintenance center (SOC).

The training data (i.e. the original data set in the embodiment of the present application) includes a plurality of data features, which may be: the method comprises the following steps of user Identification (ID), whether a user uses a fixed-mobile integrated package at the first time, the online time of the user up to the first time, the total charge amount of the month of the first time, the accumulated flow of the month of the first time, the over-package identification for four continuous months, the contract time, the local voice calling time of the month of the first time and the like.

The target data set can be used for training a machine learning model to obtain the target machine learning model, the target feature transformation algorithm adopted by the target data set and the target machine learning model obtained through training can predict a future package of the user based on the current package use condition of the user, and then the predicted package is recommended to the user to realize network optimization.

A third application scenario:

communication operators hope to identify the OTT (over-the-top) service type of network behavior, can use the characteristics of data flow as training data, and use the OTT service type as a label to construct wide table data, so as to be used as training data of a machine learning model capable of realizing OTT service identification. The OTT service category may include a video service, a web browsing service, a voice call service, a video call service, a music download service, and the like.

The training data (i.e. the original data set in the embodiment of the present application) includes a plurality of data features, which may be: at least one of the number distribution of the stream packets, the size distribution of the stream packets, the interval distribution of the stream packets, the number distribution of the upstream stream packets, the size distribution of the upstream stream packets, the interval distribution of the upstream stream packets, the number distribution of the downstream stream packets, the size distribution of the downstream stream packets, the interval distribution of the downstream stream packets, and the like. It should be understood that the data of the first duration (for example, 20 seconds, 30 seconds, etc.) is a data stream, and in this embodiment, in units of one data stream, one data stream includes a plurality of data packets, and the plurality of data packets may be divided into an uplink data packet and a downlink data packet. It should also be understood that the distribution of the number of stream packets may be an average, a standard deviation, a variance, etc. of the number of packets flowing within a data stream over a time period (the time period is less than the first duration, and may be 1 second, 0.5 second, 0.01 second, etc.). Similarly, the size distribution of the stream packets may be an average, a standard deviation, a variance, etc. of the sizes of the stream packets in a time period of one data stream; the interval distribution of the stream packets may be an average, a standard deviation, a variance, etc. of intervals of consecutive adjacent packets.

The original data set can obtain a target data set and a target feature transformation algorithm through the data processing method provided by the embodiment of the application, the target data set can be used for training a machine learning model to obtain the target machine learning model, and the OTT service category of the data flow can be predicted based on the features of the current data flow of the user through the target feature transformation algorithm adopted by the target data set and the target machine learning model obtained through training.

A fourth application scenario:

communication operators desire to predict cell traffic for network planning and optimization. At this time, for each cell, the traffic of the base station may be counted, the traffic in a plurality of consecutive time periods of the cell is used as a data feature, and one time period after the plurality of consecutive time periods is used as a label to construct wide table data, so that the wide table data is used as training data of a machine learning model capable of realizing cell traffic prediction in a future time period.

The training data (i.e. the original data set in the embodiment of the present application) includes a plurality of data features, which may be: traffic of the first cell in the first time period, traffic of the first cell in the second time period, …, and traffic of the first cell in the nth time period are labeled as traffic of the first cell in the (N + K) th time period, where N, K is a positive integer. The plurality of time periods have equal durations, such as days or months. For example, the training data (i.e., the raw data set in the embodiments of the present application) may include a plurality of data features: the flow of the first cell in the first month, the flow of the first cell in the second month, the flow of the first cell in the third month, the flow of the first cell in the fourth month, the flow of the first cell in the fifth month, the flow of the first cell in the sixth month, and the label is the flow of the first cell in the seventh month. That is, the machine learning model derived from the training data can predict the traffic of the next month from the traffic of the first six months of the first cell.

The original data set can obtain a target data set and a target feature transformation algorithm through the data processing method provided by the embodiment of the application, the target data set can be used for training a machine learning model to obtain the target machine learning model, the target feature transformation algorithm adopted by the target data set and the target machine learning model obtained through training can predict the flow of the first cell in a future time period based on the flow of the first cell in a plurality of continuous time periods, and further, a communication operator can obtain the flow planning and network optimization of the first cell in the future time period according to prediction in advance.

It should be understood that, in the embodiment of the present application, only the first cell is taken as an example for description, the first cell may be any one of cells that need to perform traffic prediction, and it should be understood that different cells correspond to different target feature transformation algorithms and target machine learning models.

A fifth application scenario:

the communication operator hopes to predict whether the user will have the off-network behavior in the future (i.e. no longer using the communication network service of the user), at this time, the network use information of the user at the first time can be taken as a characteristic, the user is off-network at the second time as a label, and wide table data is constructed to be used as training data of a machine learning model for realizing an operation and maintenance center (SOC) to identify potential off-network users, wherein the second time can be the time after L months from the first time, and the first time and the second time can be in the unit of months, and L is a positive integer.

The target data set can be used for training a machine learning model to obtain a target machine learning model, and the target feature transformation algorithm adopted by the target data set and the target machine learning model obtained through training can predict whether the user will leave the network in the future based on the current network use condition of the user.

Because data sets received by a cloud platform and provided by a user are various, and a plurality of candidate data sets obtained by transforming the same data set are different from each other, in the prior art, when the candidate data sets are evaluated, on-line training and testing of a model are required according to the candidate data, and on-line training and testing of the model are required for each candidate data set, so that the time consumption is high, and the efficiency of automatic feature engineering is low.

In order to avoid or reduce online training and testing of a candidate data set, embodiments of the present application provide a method for evaluating the candidate data set through meta-features of the data set. The meta-features are unrelated to specific data of the candidate data set, are used for describing attributes of the data set or the candidate data set, can represent complexity of the data set or the candidate data set, and are one of main factors for realizing feature transformation and selection acceleration and improving feature transformation efficiency. The following describes a method for calculating meta-features of various data sets related to the embodiments of the present application, with reference to fig. 4, by taking a data set as an example, where the method for calculating meta-features may be performed by an execution device, and may include some or all of the following steps:

s42: calculating first information according to a data set, wherein the data set comprises M samples, each sample in the M samples comprises N data characteristics and a label, the first information comprises at least one of data similarity and distribution similarity of every two data characteristics in the N data characteristics, data similarity and distribution similarity of each data characteristic of the N data characteristics and the label, data distribution information of each data characteristic in the N data characteristics and data distribution information of the label, and the like, and M, N is a positive integer.

It should be understood that N has different values in different data sets or candidate data sets, for example, when calculating the meta-feature of the first candidate data set, N is N2; for another example, when the data set of the root node is calculated, N is N1.

The respective data amounts included in the first information are described below, respectively.

(one) data similarity:

the first data feature and the second data feature are any two of the N data features in the data set. Taking the first data feature and the second data feature as an example to illustrate, the method for calculating the data similarity of any two data features in the N data features is based on the acquisition of the set of data of the first data feature and the set of data of the second data feature in the data set. In one implementation of the present application, the data similarity of the first data characteristic and the second data characteristic may be represented by Mutual Information (MI) of the first data characteristic and the second data characteristic.

Mutual information is a measure of information in information theory, which can be seen as the amount of information contained in a random variable about another random variable, or the lack of certainty that one random variable has been reduced by the knowledge of another random variable. Therefore, the mutual information can describe the data similarity between the data characteristics, and when the correlation between the data characteristics is strong, the corresponding mutual information value is larger, otherwise, the correlation is smaller. It can be seen that the data similarity of the two data features can better reflect the redundancy between the data features, and the data similarity of the data features and the label can reflect the size of the information provided by the features to the label.

Wherein the mutual information I (X; Y) of the first data characteristic and the second data characteristic can be expressed as:

formula (1), wherein X is a set of values of a first data characteristic in a dataset; y is a set of values of a second data characteristic in the data set; p (x) represents the probability that the value of the first data characteristic in the data set is x, namely the ratio of the number of samples with the value of the first data characteristic being x to the total number of samples M; p (y) represents the probability that the value of the second data characteristic in the data set is y, namely the ratio of the number of samples with the value of the second data characteristic being y to the total number of samples M; p (x, y) represents the probability that the value of the first feature is x and the value of the second feature is y in the dataset, that is, the ratio of the number of samples with the value of the first feature being x and the value of the second feature being y to the total number of samples M. Mathematically, p (X, Y) is the joint probability distribution function of X and Y, and p (X) and p (Y) are the edge probability distribution functions of X and Y, respectively.

Similarly, the second data feature is replaced by a label, and then Y is a set of values of the label in the data set, and the mutual information between the data feature and the label can be calculated.

Therefore, mutual information of any two data characteristics in the N data characteristics and mutual information of the N data characteristics and the label can be calculated.

It should be understood that the data similarity may also include other implementations, such as Pearson product moment correlation coefficient (Pearson correlation coefficient), Maximum Information Coefficient (MIC), Spearman rank correlation coefficient (Spearman correlation), Canonical Correlation Analysis (CCA), rank correlation coefficient (correlation of rank correlation), etc., without limitation.

(II) distribution similarity:

the first data feature and the second data feature are taken as an example to illustrate, and the method for calculating the respective similarity of any two data features in the N data features is that the distribution similarity of the first data feature and the second data feature is obtained based on a set of data of the first data feature and a set of data of the second data feature in the data set. In one implementation of the present application, the similarity of the first data feature and the second data feature may be represented by a chi-squared value and/or a t-statistic of the first data feature and the second data feature. For convenience of description, in the embodiments of the present application, the distribution similarity obtained by the Chi-squared test (Chi-squared test) is referred to as a first distribution similarity, and the distribution similarity obtained by the T-test (T-test) is referred to as a second distribution similarity. The first information may include a first distribution similarity (also called chi-squared value) and/or a second distribution similarity (also called t-statistic) of the first data feature and the second data feature.

It should be noted that the T-test is performed only between the continuous data features, or between the continuous data features and the label representing the regression problem. The chi-square test can be performed between two discrete data features, or between a discrete data feature and a label representing a classification problem, or can be performed after discretization of a continuous data feature and/or a label representing a regression problem.

The method and the device for analyzing the data characteristics of the sample statistics have the advantages that the deviation degree between the first data characteristics and the second data characteristics of the sample statistics is analyzed through a chi-square test or a T-test. The deviation degree between the first data characteristic and the second data characteristic determines the size of a chi-square value, and the larger the chi-square value is, the more inconsistent the data distribution of the first data characteristic and the second data characteristic is; conversely, the smaller the chi-squared value, the smaller the deviation, and the more consistent the data distribution of the two. Therefore, the distribution of the two characteristics is compared by chi-square test, and the redundancy of the data can be judged. Similarly, when a feature is more similar to the target distribution, the feature can better distinguish the target.

First distribution similarity χ of first data feature and second data feature²Is as follows;

in the formula (2), X_kThe frequency (also called probability) of the first data feature in the data set, which takes the value of level k, Y_kThe value of the second data feature in the data set is frequency of horizontal K, K is a value lattice number (namely the horizontal number of the first data feature or the second data feature which is divided), K and K are positive integers, and K is more than or equal to 1 and less than or equal to K.

For example, the maximum of the set of values of the first data feature in the dataset andthe difference between the minimum values is U, the level k is the interval [ X ]_min+(k-1)*U/K，X_min+k*U/K]，X_kThat is, the value of the first data characteristic is in the interval [ X_min+(k-1)*U/K，X_min+k*U/K]Is compared with the total number of samples M, wherein X_minThe minimum value in the set of values of the first data characteristic in the data set is obtained.

Similarly, the second data feature is replaced by a label, and a chi-square value (first distribution similarity) of the data feature and the label can be calculated.

The second distribution similarity t of the first data characteristic and the second data characteristic is as follows;

in the formula (3), d_i＝|x_i-y_i|，

μ₀Is a T test parameter; i is an index of a sample in the data set, i is more than or equal to 1 and less than or equal to M, x_iIs the value of the first data characteristic corresponding to the sample i in the data set, y_iAnd taking the value of a second data characteristic corresponding to the sample i in the data set, wherein M is the number of the samples in the data set, and M is a positive integer.

Similarly, the second data feature is replaced by a label, and a t statistic (second distribution similarity) of the data feature and the label can be calculated.

Therefore, the chi-square value of any two discrete data features in the N data features and the chi-square value of the discrete data features and the chi-square value of the label can be calculated, and the t statistic of any two continuous data features in the N data features and the t statistic of the continuous data features and the t statistic of the label can be calculated.

It should be understood that the distribution similarity in the present application may also include other implementations, such as K L divergence (Kullback-L eibler divergence, K L D), Bregman divergence (Bregman divergence), Maximum Mean Difference (MMD), tail correlation based on Copula function (Copula & tail-dependency), and the like, which are not limited herein.

(III) distribution information:

in the classification or regression problem, the more concentrated the distribution of the data features, the smaller the corresponding discrimination; conversely, the flatter the distribution of data features, the greater the likelihood of distinguishing between different classes. The distribution information of the data features can adopt two indexes of skewness and kurtosis to represent the distribution of the data features.

Skewness (skewness) refers to the degree of asymmetry or skew of a data distribution, and is a measure of the direction and degree of skew of a statistical data distribution, and a skewness distribution is divided into two types, i.e., a left skew (negative bias) and a right skew (positive bias). Typically, skewness is defined as the third normalized moment of the sample. The skewness definition includes a normal distribution (skewness of 0), a right-skew distribution (also called a positive-skew distribution, whose skewness is >0), and a left-skew distribution (also called a negative-skew distribution, whose skewness is < 0).

Kurtosis (kurtosis) refers to the degree of concentration of data and the degree of steepness (or flatness) of the distribution curve. The measurement of kurtosis is usually based on a curve of normal distribution, and is generally divided into positive kurtosis, peaked distribution and flat-topped distribution. The degree of kurtosis of the distribution curve has a direct relation with the numerical value of even-order central moments, and the relative number obtained by dividing the central moments by the fourth power of the standard deviation can be used for measuring the kurtosis on the basis of the fourth-order central moments to eliminate the influence of dimension. And, the ratio of the fourth central moment m4 of the normal distribution curve to the fourth power of the standard deviation is equal to 3.

Taking the first data as an example, the method for calculating the respective similarity of any one of the N data features is described. Wherein the skewness gamma of the first data feature in the data set₁Comprises the following steps:

kurtosis gamma of a first data feature in a data set₂Comprises the following steps:

wherein the content of the first and second substances,

i is an index of a sample in the data set, i is more than or equal to 1 and less than or equal to M, and M is the number of the samples; mu is the mean value of the values of the first data characteristic in the data set, x_iAnd taking the value of the first data characteristic corresponding to the sample i in the data set.

It should be understood that the distribution information in the present application may also be represented by one or more of a mean, a variance, a Coefficient of Variation (CV), a mutation point position, an information entropy, a kini coefficient, and the like, which is not limited in this respect.

Similarly, the first data feature is replaced by a label, and the chi-squared value and the t statistic of the label can be calculated.

In summary, the mutual information of every two data features of the N data features and the mutual information of the N data features with the tag, the chi-square value of every two discrete data features of the N data features, the chi-square value of every discrete data feature of the N data features and the chi-square value of the tag respectively, and the t statistic of every two continuous data features of the N data features, the t statistic of every continuous data feature of the N data features with the tag respectively, the distribution information of every data feature of the N data features, the distribution information of the tag, and the like form the first information (in this embodiment, the data entropy matrix is also referred to as the data entropy matrix).

S44: meta-features of the data set are calculated from the first information.

The meta-feature of the data set may include at least one of a basic feature of the data set, a feature of a continuous data feature, a feature of a discrete data feature, a feature of a label, a feature of data similarity, a feature of distribution information of data features, and the like. And further calculating the data entropy matrix by adopting the modes of statistics, correlation analysis, data complexity calculation and the like based on the obtained first information, and finally forming the characterization features, namely the meta-features, of the data set.

Wherein the basic features of the data set are used for describing basic conditions of the data set, and may include at least one of total number of samples, total number of data features, total number of categories of labels, proportion of total number of data features to total number of samples, and the like; the continuous data feature is a feature extracted from data based on the continuous data feature, and is used for describing an attribute of the set of the continuous data feature, and may include at least one of a total number of the continuous data features, a ratio of the total number of the continuous data features to the total number of the data features, and the like; the feature of the discrete data feature is a feature extracted from data based on the discrete data feature, and is used for describing an attribute of the set of the discrete data feature, and may include at least one of a total number of the discrete data features, a ratio of the total number of the discrete data features to the total number of the data features, and the like; the characteristics of the tag are extracted based on the data of the tag, and are used for describing the attributes of the tag, and may include at least one of information entropy of the tag, a Gini coeefficient coefficient (Gini coeefficient) of the tag, an average sample proportion of a tag class, a kurtosis of the tag, a skewness of the tag, and the like; the features of the data similarity are extracted based on the data similarity between the data features and/or the data similarity between the data features and the tags, and are used for describing the attributes of the set of the data similarity, and may include at least one of a maximum value, a mean value, a standard deviation of the data similarity between the tags and the data features, a maximum value, a mean value, a standard deviation of the data similarity between two data features, and the like; the feature of the distribution similarity is extracted based on the distribution similarity between the data features and/or the distribution similarity between the data features and the tags, and is used for describing the attributes of the set of the distribution similarity, and may include at least one of a maximum value, a mean value, a standard deviation of the distribution similarity between the tags and the data features, a maximum value, a mean value, a standard deviation of the distribution similarity between two data features, and the like; the feature of the distribution information is a feature extracted based on the distribution information (such as kurtosis, skewness, etc.) of the data feature, and is used for expressing the attribute of the set of the distribution information, and may include at least one of a maximum peak value, a minimum peak value, an average peak value, a maximum skewness, a minimum skewness, an average skewness, etc.

Wherein the information entropy of the tag represents the average amount of information in the tag.

For example, the information entropy of the tag is calculated as follows:

where i is the index of the tag class, P (z)_i) For the class z of the label in M samples_iB is the base used for the logarithm, typically 10 or a natural constant e.

As another example, the average sample fraction of label categories is: the labels can be divided into a plurality of categories, for example, when the label is sex, the problem to be solved by the model is a problem of predicting male and female, the label comprises two categories, namely male and female, and the average sample ratio of the label category is 0.5.

For another example, the method for calculating the mean value of the data similarity of the two data features is as follows: and summing the data similarity of every two data features in the N features, and dividing the sum by the number of the data similarities.

It should be understood that the meta-feature may also include other data items that describe properties of the data set, and the embodiments of the present application are not limited thereto.

In this embodiment, the meta-feature corresponding to any one data set in any group of data sets, the meta-feature corresponding to the candidate data set obtained by feature transformation of the data set, the meta-feature included in the first sample used for training the first machine learning model, and the meta-feature included in the second sample used for training the third machine learning model may be calculated by the above-mentioned meta-feature calculation method.

A method for training a first machine learning model used for predicting a second evaluation value of a candidate data set in an embodiment of the present application will be described below, and it should be understood that the second machine learning model may be trained offline, and the training method may specifically include: the training device acquires a plurality of first samples, wherein any one of the first samples in the plurality of first samples comprises a meta feature of a second data set and an evaluation value (referred to as a third evaluation value in the embodiment of the application, and may also be referred to as a true evaluation value or a credible evaluation value, and the reliability of the evaluation value is high) of the second data set; further, the first machine learning model is trained by supervising the evaluation values with the meta-features of the first sample as input.

Wherein the second data set is a public data set comprising a plurality of data features and tags. It should be understood that the third data sets corresponding to the meta-features in different first samples are different, and the specific expression is that the number, meaning and labels of the data features are different.

The meta-feature of the second data set can be obtained through calculation by the meta-feature calculation method; an evaluation value (e.g., auc (area under rock) is calculated for the machine learning model trained for the second data set. The meta-features of the plurality of second data sets and the estimates corresponding to each of the meta-feature distributions form a plurality of first samples for training the first machine learning model.

As can be seen, the trained first machine learning model can predict an evaluation value (referred to as a second evaluation value in the embodiment of the present application, which may also be referred to as an estimated evaluation value, which is an estimation result and has low accuracy) of the data set based on the meta features of the data set.

It should be understood that the meta-features are independent of the specific data of the data sets, and represent the attributes of the data sets, so that the first machine learning model trained based on the meta-features can be applied to all data sets, i.e. the data sets transmitted by all users, the data sets generated by each order of transformation, or candidate data sets, and after the meta-features of the data sets are calculated, the second evaluation values of the data sets can be estimated by the first machine learning model. This second evaluation value may reflect the accuracy, generalization ability, etc. of the trained model of the data set.

A method for training a third machine learning model related to the embodiment of the present application is described below, and it should be understood that the third machine learning model is used for predicting a fourth evaluation value of a feature-transformed data set of the data set, the third machine learning model may be trained offline, and the training method may specifically include the following two implementations:

implementation (1):

the training device can acquire a plurality of second samples, wherein any one of the second samples comprises a meta-feature of a fourth data set and an evaluation value difference value between an evaluation value of the data set obtained by performing second feature transformation on the fourth data set and the fourth data set, and the second feature transformation is any one of B feature transformations; and then, the meta-features of the second sample are used as input, and the second machine learning model is trained through supervision difference values.

It can be seen that the third machine learning model obtained in implementation manner (1) may predict, based on meta features of the data set, a gain of an evaluation value of the data set after B kinds of feature transformation, and then predict whether the evaluation value of the data set after feature transformation is improved before feature transformation.

Implementation (2):

the training device can obtain a plurality of third samples, wherein any one of the third samples comprises meta-features of a fourth data set and an evaluation value of the data set obtained after the fourth data set is subjected to second feature transformation, and the second feature transformation is any one of B feature transformations; further, the meta-features of the third sample are used as input to train a second machine learning model by supervising the evaluation values.

It can be seen that the second machine learning model obtained in implementation (2) may predict the evaluation value (also referred to as a third evaluation value in the embodiment of the present invention) of the data set after the B kinds of feature transformation based on the meta-features, and then predict the evaluation value of the data set after the feature transformation before the feature transformation.

Wherein the fourth data set is a public data set comprising a plurality of data features and tags. It should be understood that the fourth data sets corresponding to the meta-features in different second samples are different, and the specific expression is that the number, meaning and labels of the data features are different.

By the above meta-feature calculation method, the meta-feature of the fourth data set can be calculated, and the estimated value AUC1 is calculated for the machine learning model trained by the fourth data set; and calculating an evaluation value AUC2 according to a machine learning model obtained by training the candidate data set after the feature transformation of the fourth data set. The meta-features of the plurality of fourth data sets and the differences in AUC2 and AUC1 for each meta-feature distribution constitute a plurality of second samples of the third machine learning model of implementation (1). The meta-features of the plurality of fourth data sets and the evaluation value AUC2 corresponding to each meta-feature distribution constitute a plurality of third samples of the third machine learning model of the implementation (2).

It should be understood that the third machine learning model may also include other training methods, and the embodiments of the present application are not described in detail.

It should be further understood that the meta-features are independent of specific data of the data sets, and represent attributes of the data sets, so that the first machine learning model trained based on the meta-features can be applied to all data sets, that is, the data sets sent by all users and generated by each order of transformation, after the meta-features of the data sets are obtained through calculation, the third machine learning model can estimate the data sets to respectively perform B kinds of feature transformation, and respectively obtain fourth evaluation values of B candidate data sets. The fourth evaluation value may indicate the accuracy, generalization ability, or whether the accuracy, generalization ability, etc. of the trained model of the B candidate data sets are gains.

A data processing method provided in the embodiment of the present application is described below with reference to a flowchart of the data processing method shown in fig. 5, where the method may be executed by the execution device 120 in fig. 1 and the cloud platform 210 in fig. 2, and may also be executed by a processor in the execution device. The method may include some or all of the following steps.

S52: a first set of data sets is acquired.

Wherein the first group of data sets comprises a data set which is a data set corresponding to a root node of the tree structure. The first set of data sets may include M samples, any one of the M samples including N1 data features and tags, M, N1 being a positive integer.

The first set of data sets may be data of a raw data set sent by a user device to an execution device (cloud platform) after data preprocessing. The preprocessing of the raw data set may include one or more of data cleansing, sampling, formatting, and feature digitization, among others.

S54: the N1 data features in the first set of data sets are subjected to a multi-level feature transform.

It should be understood that when transforming features of a data set, only the features of the data in the data set are transformed, and the labels of the data are not transformed. The nth order feature transformation may refer to the nth order feature transformation described in fig. 6A, 6B, and 6C below, and is not described herein again.

The execution apparatus may set a stop condition for the feature conversion, and when the stop condition is satisfied, the execution apparatus stops performing the feature conversion, and step S56 is executed. In one specific implementation, the performing device may set an order of the feature transformation, for example, 8 orders, and the performing device stops the feature transformation after the 8 th order feature transformation is performed. In another implementation, the performing device may determine whether the feature transform of the current order yields a gain. For example, the executing apparatus determines whether or not an average value calculated from the first evaluation value of the data set obtained by the current feature transformation is larger than an average value calculated from the first evaluation value of the data set obtained by the previous feature transformation, and if so, the feature transformation of the current order produces a gain, and the feature transformation of the next order may be performed; otherwise, the executing device may stop the feature transformation.

It should be understood that the embodiments of the present application may also include other stop conditions, which are not limited in this respect.

The average value of the first evaluation values in the nth data set is smaller than the average value of the first evaluation values in the data set obtained by the last feature transformation

S56: a target data set is determined from a first set, wherein the first set comprises data sets obtained by each stage of feature transformation in the multi-stage feature transformation process.

In an implementation of the embodiment of the application, a data set corresponding to a largest first evaluation value in a first set may be determined as a target data set, where the first set includes data sets obtained by each stage of feature transformation in a multi-stage feature transformation process.

In another implementation of the embodiment of the present application, the data set corresponding to the largest third evaluation value in the first set may be determined as the target data set.

The target data set is an optimal data set determined by the execution device according to the original data sent by the user, namely a result of selecting transformed data characteristics for the characteristic engineering. The target data can be used for model building and training to obtain the model required by the user.

The model building and training can adopt a model building and training method in the prior art, and the embodiment of the application is not limited.

In the embodiment of the present application, the process of the nth order characteristic transformation is described by taking an nth order characteristic transformation as an example, where n is a positive integer. The specific implementation of the nth order feature transformation is described with reference to the flow diagram of the nth order feature transformation shown in fig. 6A, the schematic illustration of the nth order feature transformation process shown in fig. 6B, and the tree structure shown in fig. 6C, and includes the following steps:

s541: for each data set D in the nth data set_iData set D_iThe meta-feature of (2) is input to a third machine learning model, fourth evaluation values corresponding to B kinds of feature transformations are predicted, and among the B kinds of feature transformations, a feature transformation A corresponding to a fourth evaluation value satisfying a fourth condition is selected_iAnd (5) carrying out feature transformation.

Wherein the fourth evaluation value corresponding to the first feature transformation is used for evaluating the data set D_iThe accuracy of a candidate data set obtained through first feature transformation is that the first feature transformation is any one of B feature transformations, wherein B is a positive integer; the nth data set is a data set obtained by n-1 order feature transformation, i is an index of the data set in the nth data set, and i is a positive integer. It will be appreciated that the data set D may be calculated by the above-described method of calculating meta-features_iFor specific implementation of the meta-feature of (1), reference may be made to the relevant description in the embodiment of the meta-feature calculation method; the third machine learning model is the third machineFor specific implementation of the machine learning model obtained by training the machine learning model training method, reference may be made to the related description in the embodiment of the third machine learning model training method, and details are not repeated here.

It should be noted that fig. 6C illustrates that the nth data set includes two data sets (i.e., D1 and D2).

Corresponding to the third machine learning model obtained in embodiment (1), one specific implementation of S542 may be: selecting the feature transformation corresponding to the fourth evaluation value with the value larger than 0 from B feature transformations by the execution equipment_iThe feature transformation, i.e., the feature transformation in which selecting the evaluation value can produce a gain, and the feature transformation in which discarding the evaluation value does not produce a gain.

Corresponding to the third machine learning model obtained in embodiment (2), one specific implementation of S542 may be: the execution device selects a feature transform corresponding to a fourth evaluation value having a value greater than the first evaluation value of the data set among the B feature transforms_iThe species characteristics change. Another specific implementation of S542 may be: selecting the feature transform corresponding to the fourth evaluation value with the value larger than the preset threshold value from the B feature transforms_iVariety feature transformation, or, selection of top-ranked A_iThe fourth evaluation value is converted into A corresponding to each feature_iAnd a feature transformation, wherein the ranking is arranged from large to small according to the fourth evaluation value.

It should be understood that S542 may also include other implementations, which are not described in detail herein. It should also be understood that the type and number of feature transformations selected for different datasets in the nth dataset may be different.

Through the above steps S541 and S542, before feature transformation is performed on the data set, the fourth evaluation value of the data set generated by each feature transformation is estimated through the third machine learning model trained offline, the feature transformation is screened out based on the fourth evaluation value, and only the feature transformation is performed on the data set through the screened-out feature transformation, that is, through pre-pruning, the calculation of the first evaluation value and the type of the special transformation is reduced, and the evaluation efficiency is improved, which corresponds to the branch removed by ① in fig. 6C.

S542: for the nth data setEach data set D in_iRespectively carrying out A_iAnd (5) performing feature transformation to obtain a plurality of candidate data sets.

It should be understood that, for the algorithm of feature transformation, reference may be made to the relevant description in the above embodiments, and details of the embodiments of the present application are not repeated. It should also be appreciated that the performing device may identify the type of data feature in the data set and determine the feature transformation that may be performed for high data features based on the type of data feature.

It should also be understood that step S541 is not a necessary step in embodiments of the present application, and in other embodiments of the present application, a_iMay be a fixed value, e.g. A_iIt can be equal to B, i.e. no pre-pruning operation before feature transformation is performed, but on data set D_iThe data set D is subjected to the provided B feature transformations_iAll feature transformations that apply.

In the embodiment of the present application, candidate data set D_i,jFor data set D in the nth data set_iBy feature transformation T_jThe resulting candidate data set. Wherein j is A_iThe index of feature transformation in the middle feature transformation, j is more than or equal to 1 and less than or equal to A_iAnd j is a positive integer.

Data set D as in FIG. 6C₁Carrying out A₁(A in FIG. 6C)₁The candidate data set obtained by 5) feature transformation is D_1,1、D_1,2、D_1,3、D_1,4、D_1,5Data set D₂Carrying out A₂(A in FIG. 6C)₂The candidate data set obtained by 5) feature transformation is D_2,1、D_2,2、D_2,3、D_2,4、D_2,5。

S543: a first evaluation value is calculated for each of a plurality of candidate data sets.

Wherein the first evaluation value of the data set is used to evaluate the accuracy of the trained model of the data set.

The embodiment of the invention uses a candidate data set D_i,j(corresponding to the first candidate data set in the embodiment of the present application) as an example, the calculation method of the first evaluation value for each of the plurality of candidate data sets is described as follows as a candidate numberData set D_i,jThe first evaluation value calculation method of (1):

the first calculation method of the first evaluation value:

step S5431: from the candidate data set D_i,jComputing a candidate data set D_i,jFor representing a candidate data set D_i,jThe attribute of (2).

Wherein the candidate data set D_i,jFor the method for calculating the meta-feature, reference may be made to the related description in the above embodiment of the method for calculating the meta-feature, and details are not repeated here. The candidate data set D obtained by the n-th order feature transformation_i,jThe number and the meaning of the data features and the data features of the first group of data sets may be transformed, but the tags are not transformed all the time in the n-order feature transformation process, and each data set in the n +1 group of data sets and the candidate data set obtained by transforming each data set comprise the same tag data.

Step S5432: inputting meta-features into a first machine learning model to predict a candidate dataset D_i,jThe second evaluation value of (1).

The first machine learning model is an offline-trained machine learning model configured to input data as a meta-feature and output a second evaluation value as the meta-feature. The candidate data set D_i,jIs used to indicate the candidate data set D_i,jAnd training the performance, such as accuracy, of the obtained model. The accuracy is a candidate data set D_i,jAnd training the accuracy degree of the obtained model for predicting the input data.

It is understood that the second evaluation value is an estimated evaluation value, and its accuracy is low compared to an evaluation value (third evaluation value) tested by training the model through the candidate data set.

Step S5433: from the candidate data set D_i,jDetermines a candidate data set D by the second evaluation value of_i,jThe first evaluation value of (1).

In one specific implementation of the embodiment of the present application, the first evaluation value used for candidate data set screening may be the second evaluation value, that is, the performing device may perform screening directly on the second evaluation values of the respective candidate data sets.

In another implementation of the embodiment of the present application, the first evaluation value used for candidate data set screening may be calculated based on the second evaluation value. Optionally, candidate data set D_i,jThe first evaluation value of (a) may be obtained by an operation of the first data item and the second data item, for example, a sum of the first data item and the second data item; wherein the first data item is associated with a candidate data set D_i,jThe second data item is transformed by the characteristic T_jThe number of historical gains.

Wherein the historical gain times are the number of first data sets in the previous n groups of data sets, wherein the first data sets are second data sets and are subjected to characteristic transformation T_jThe second data set is one of the first n sets of data sets, and the second evaluation value of the second data set is smaller than the second evaluation value of the first data set.

It should be understood that if the above-mentioned previous n sets of data and current candidate data set are passed, the feature transformation T is adopted_jIf the second evaluation value is larger than the second evaluation value of the parent node data set, the feature transformation T is considered_jResulting in a gain.

For example, candidate data set D_i,jThe first evaluation value of (1) is:

in formula (6), P' (D)_i,j) For the candidate data set D_i,jFirst evaluation value of (D), P (D)_i,j) As a candidate data set D_i,jA second evaluation value of (1); n (T)_j) Using feature transformation T to obtain the first n groups of data sets_jAnd generating the number of gain data sets of the second evaluation value; n' (T)_j) Applying a feature transformation T to the candidate datasets of the first n-th dataset and the nth dataset_jAnd the number of gain data sets that produce the second evaluation value.

With respect to screening a candidate data set only by the second evaluation value, the above calculation method adjusts the second evaluation value of the candidate data set by the first evaluation value of the candidate data set and the number of times of the history gain of the feature transformation that generates the candidate data set, the calculation of the second evaluation value taking the number of times of the history gain of the feature transformation into account, and it is possible to avoid the feature transformation from falling into local optimality.

S544: the n +1 th group of data sets is determined based on the first evaluation value of each of the plurality of candidate data sets. And the number of the data sets in the (n + 1) th data set is less than the number of the plurality of candidate data sets.

The specific implementation of screening out the data set as the (n + 1) th group of data sets (which may also be referred to as a (n + 1) th layer node) from the plurality of candidate data sets may include the following three implementations:

the first implementation mode comprises the following steps:

the execution device may select, as the n +1 th group of data sets, a candidate data set of which the first evaluation value is larger than the first threshold value among the plurality of candidate data sets. Wherein the first threshold may be a fixed value; the first evaluation values corresponding to the multiple candidate data sets may also be statistically analyzed to obtain a first threshold applicable to the nth-order feature transformation. For example, the first threshold value may be an average value of the first evaluation values respectively corresponding to the plurality of candidate data sets.

The second implementation mode comprises the following steps:

the execution device may select, as the n +1 th group of data sets, candidate data sets respectively corresponding to first m first evaluation values of an evaluation value ranking of a plurality of candidate data sets arranged from large to small, where m is a positive integer.

The third implementation mode comprises the following steps:

s5441: the execution device selects a candidate data set of which the first evaluation value satisfies a first condition among the plurality of candidate data sets.

The screening out of the candidate data sets satisfying the first condition (i.e., the first screening process based on the first evaluation value) from the plurality of candidate data sets may be realized by: the execution device selects a candidate data set, of which the first evaluation value is larger than a second threshold value, from the plurality of candidate data sets; alternatively, the execution device selects, from the plurality of candidate data sets, candidate data sets to which the first g first evaluation values of the evaluation value ranking are respectively corresponding, the evaluation value ranking being the first evaluation values to which the plurality of candidate data sets arranged from large to small are respectively corresponding. Similar to the first threshold in the first implementation manner, the second threshold may be a fixed value or a second threshold obtained by performing statistical analysis on the first evaluation values respectively corresponding to the plurality of candidate data sets, and g is a positive integer.

In fig. 6B, it is assumed that the candidate data sets satisfying the first condition are F candidate data sets (candidate data set 1, candidate data set 2, …, candidate data set F, …, candidate data set F), where F is an index of the candidate data set in the candidate data sets satisfying the first condition, F is not greater than F, and F are positive integers.

The pruning process described above, for the branch removed at ② in FIG. 6C.

S5442: and respectively training and testing the model of each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition.

And the third evaluation value is obtained by training and testing the model by utilizing the candidate data set, and the reliability of the third evaluation value is higher. Thus, the third evaluation value is determined.

Taking the second candidate data set as an example, the second candidate data set is any one of the candidate data sets satisfying the second condition. Wherein the second candidate data set comprises a training data set and a testing data set, any one sample in the training data set and the testing data set comprises N3 data features and labels (also called true labels), and N3 is a positive integer.

The specific implementation of the third evaluation value of the second candidate data set may be: the executing device trains a second machine learning model according to the training data set; inputting the N3 data features of each sample in the test data set into a second machine learning model to obtain a prediction label of each sample in the test data set; a third evaluation value of the second candidate data set is calculated based on the true label and the predicted label of each sample.

The third evaluation value of the second candidate data set is obtained by performing statistical analysis according to the difference between the real label and the predicted label of the M samples.

The third evaluation value may be represented by one or more indexes, such as F1score (F1score), average accuracy (MAP), auc (area under roc curve), mean-square error (MSE), root mean square error (root mean square error), recall, precision, and the like, which are not limited.

In another implementation of the embodiments of the present application, the second candidate data set may be divided into a plurality of shares (e.g., 4 shares), wherein three shares are used as the training data set and one share is used as the test data set. And respectively training the three training data sets to obtain 3 machine learning models, respectively testing the 3 machine learning models through test data to obtain an evaluation value corresponding to each machine learning model, and further determining a third evaluation value of a second candidate data set as an average value of the evaluation values of the 3 machine learning models.

It should be understood that different candidate data sets, whose training results in different second machine learning models, also have different second evaluation values as a result of testing. In fig. 6B, the second machine learning model trained from the training data set in the candidate data set f is represented by the second machine learning model f.

S5443: a candidate data set satisfying the first condition for the third evaluation value satisfying the second condition among the candidate data sets satisfying the first condition is selected as the n +1 th group of data sets.

The selection of a candidate dataset satisfying the second condition from the candidate datasets selected to satisfy the first condition may be performed by: the execution device selects a candidate data set whose third evaluation value is larger than a third threshold value from the candidate data sets satisfying the first condition; alternatively, the execution apparatus selects, from among candidate data sets satisfying the first condition, candidate data sets to which the top h third evaluation values of the evaluation value ranking, which is the third evaluation values to which the plurality of candidate data sets arranged from large to small respectively correspond, correspond. Similar to the first threshold in the first implementation manner, the third threshold may be a fixed value, or a third threshold obtained by performing statistical analysis on third evaluation values respectively corresponding to a plurality of candidate data sets, h is a positive integer, and h < g.

It is to be understood that the second screening process based on the third evaluation value described above corresponds to the branch removed by ③ in fig. 6C.

In another implementation of the embodiment of the present application, step S5441 may not be included, and in step S5442, a third evaluation value may be calculated for all of the plurality of candidate data sets, and further, in step S5443, the (n + 1) th group of data sets (which may also be referred to as the (n + 1) th level node of the tree structure) is screened out.

It should be understood that when a candidate data set satisfies the second condition and the third condition, the candidate data set satisfies the first condition.

Referring to FIG. 7, a data processing system may be implemented on an execution device, which may comprise one or more servers, computers, etc., and the system 700 may comprise the following elements:

a first obtaining unit 701, configured to obtain a first group of data sets, where the first group of data sets includes a plurality of data features;

a transformation unit 702, configured to perform multi-order feature transformation on a plurality of data features in the first set of data sets;

a first selecting unit 704, configured to determine a target data set from a first set, where the first set includes data sets obtained by each stage of feature transformation in the multi-stage feature transformation process;

wherein the transformation unit 702 is specifically configured to: respectively performing feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets, wherein the nth data set is a data set obtained by n-1 order feature transformation, and n is an integer greater than 1;

the system 700 further comprises:

a first evaluation unit 703 for: respectively calculating a first evaluation value of each candidate data set in the plurality of candidate data sets, wherein the first evaluation value is used for evaluating the accuracy of a model obtained by training the candidate data sets;

a first screening unit 705, configured to determine an n +1 th group of data sets according to the first evaluation value of each of the plurality of candidate data sets, where the number of data sets in the n +1 th group of data sets is smaller than the number of the plurality of candidate data sets.

As a possible implementation, the first candidate data set is any one of the plurality of candidate data sets;

the system further comprises a meta-feature calculation unit 706 for: computing a meta-feature of the first candidate data set from the first candidate data set, the meta-feature representing an attribute of the first candidate data set;

the first evaluation unit 703 is specifically configured to: inputting the meta-features into a first machine learning model to predict a second evaluation value of the first candidate data set, the second evaluation value of the first candidate data set being used to evaluate an accuracy of a model trained from the first candidate data set; and determining a first evaluation value of the first candidate data set from the second evaluation value of the first candidate data set.

As a possible implementation manner, the first candidate data set includes a plurality of data features and a tag, and the meta-feature calculating unit 706 is specifically configured to:

calculating first information according to the first candidate data set, wherein the first information comprises data similarity and distribution similarity of every two data features in the plurality of data features of the first candidate data set, the data similarity and distribution similarity of each data feature in the plurality of data features of the first candidate data set and a label, and at least one of data distribution information of each data feature in the plurality of data features of the first candidate data set and data distribution information of the label;

meta features of the first candidate data set are calculated from the first information.

As a possible implementation, the meta-features of the first candidate data set include: at least one of a base feature of the first candidate data set, a feature of a continuous data feature of the plurality of data features of the first candidate data set, a feature of a discrete data feature of the plurality of data features of the first candidate data set, a feature of the tag, a feature of data similarity, a feature of distribution similarity, and a feature of distribution information of the data feature.

As a possible implementation, the first candidate data set is obtained by first feature transformation, the first data set is one of the nth data sets, and the first evaluation value of the first candidate data set is the sum of the first data item and the second data item; wherein the first data item positively correlates with a second evaluation value of the first candidate data set, the second data item being determined by a number of historical gains of the first feature transform.

As a possible implementation manner, the first filtering unit 705 is further configured to: selecting a candidate data set of which first evaluation value satisfies a first condition among the plurality of candidate data sets;

the system further comprises a second evaluation unit 707 for: respectively training and testing a model of each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition;

the first filtering unit 705 is further configured to: selecting a candidate data set of which third evaluation value satisfies a second condition among the candidate data sets satisfying the first condition as the n +1 th group of data sets.

As a possible implementation, the second candidate data set is any one of candidate data sets that satisfy the first condition, the second candidate data set includes a training data set and a testing data set, and any one sample in the training data set and the testing data set includes a plurality of data features and a label; the second evaluation unit 707 is specifically configured to:

training a second machine learning model according to the training data set;

inputting a plurality of data features of each sample in the test data set into the second machine learning model to obtain a prediction label of each sample in the test data set;

a third evaluation value for the second candidate data set is calculated based on the label and the predicted label for each sample in the test data set.

As a possible implementation, the system 700 further includes:

a second obtaining unit 708, configured to obtain a plurality of first samples, where any one of the plurality of first samples includes a meta-feature of a second data set and an evaluation value of the second data set;

a first training unit 709 for training the first machine learning model according to the plurality of first samples.

As a possible implementation, the system further comprises:

a third evaluation unit 710 for: before a transformation evaluation module carries out feature transformation on each data set in an nth data set respectively to obtain a plurality of candidate data sets, inputting meta features of a third data set into a third machine learning model, and predicting to obtain a fourth evaluation value, wherein the fourth evaluation value is used for evaluating the accuracy of the model obtained by training the candidate data sets obtained by carrying out second feature transformation on the third data set, the third data set is any one of the nth data set, the second feature transformation is any one of B feature transformations, and B is a positive integer;

a second filtering unit 711, configured to select, among the B kinds of feature transforms, a feature transform corresponding to a fourth evaluation value that satisfies a fourth condition, where a is a positive integer not greater than B, as the a kinds of feature transforms;

the transformation unit 702 is specifically configured to: and performing A kinds of feature transformation on the third data set to obtain A candidate data sets.

As a possible implementation, the system 700 further includes:

a second obtaining unit 712, configured to obtain a plurality of second samples, where any one of the plurality of second samples includes a meta-feature of a fourth data set and a difference between an evaluation value of a data set of the fourth data set after a second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations;

a second training unit 713, configured to train the third machine learning model according to the plurality of second samples.

It should be noted that the first acquiring unit 701, the transforming unit 702, the first evaluating unit 703, the first selecting unit 704, the first filtering unit 705, the meta-feature calculating unit 706, the second evaluating unit 707, the third evaluating unit 710, and the second filtering unit 711 may be disposed on the executing apparatus side. The second acquisition unit 708, the first training unit 709, the second acquisition unit 712, and the second training unit 713 may be provided on the training apparatus side.

It should also be noted that each apparatus in the system may further include other units, and specific implementations of each device and unit may refer to relevant descriptions in the foregoing method embodiments, and are not described herein again.

As shown in fig. 8, the execution apparatus 800 may include: a processor 801, a memory 802, a communication bus 803 and a communication interface 804, by which the processor 801 connects the memory 802 and the communication interface 803.

The Processor 801 may be a Central Processing Unit (CPU), and the Processor 801 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 801 may be any conventional processor or the like.

The processor 801 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the data processing method of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 801. The processor 801 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 801 reads information in the memory 802, and completes functions required to be executed by the preprocessing module 212, the feature transformation module 213, the data set determination module 214, and the training module 215 included in the cloud platform 210 of the embodiment of the present application in combination with hardware thereof, or executes a data processing method of the embodiment of the method of the present application.

The Memory 802 may be a Read-Only Memory (ROM), a Random Access Memory (RAM), or other Memory. In the embodiment of the present application, the memory 802 is used to store data and various software programs, such as original data sets in the embodiment of the present application, programs for implementing data processing methods in the embodiment of the present application by using data sets of various groups, and the like.

The communication interface 804 enables communication between the performing device 800 and other devices or communication networks using transceiver means such as, but not limited to, transceivers. For example, the raw data set, the first set of data sets, etc. may be retrieved through the communication interface 903 to enable information interaction with a training device, a client device, a user device, or a terminal device.

Optionally, the execution device may further include an artificial intelligence processor 805, and the artificial intelligence processor 805 may be any processor suitable for large-scale exclusive or operation processing, such as a neural Network Processor (NPU), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor 805 may be mounted as a coprocessor to a main CPU (host CPU) for which tasks are assigned. The artificial intelligence processor 805 may implement one or more of the operations involved in the data processing methods described above. For example, taking an NPU as an example, the core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract matrix data in the memory 802 and perform a multiply-add operation.

The processor 801 is configured to call the data and the program codes in the memory, and perform:

The acquired first group of data sets may receive an original data set sent by the client device through the communication interface 804, and then preprocess the original data set to obtain the first group of data sets.

After the execution device 800 obtains the target data set, a target feature transformation algorithm used by transforming the first set of data set to obtain the target data set may be obtained, and the newly created machine learning model may be trained through the target data set to obtain the target machine learning model. Further, the execution device 800 sends the target feature transformation algorithm and the target machine learning model to the client device via the communication section post 804.

As a possible implementation manner, the first candidate data set is any one of the plurality of candidate data sets, and the processor 801 performs the calculating the first evaluation value of each of the plurality of candidate data sets, including performing:

computing a meta-feature of the first candidate data set from the first candidate data set, the meta-feature representing an attribute of the first candidate data set;

inputting the meta-features into a first machine learning model to predict a second evaluation value of the first candidate data set, the second evaluation value of the first candidate data set being used to evaluate the accuracy of a model trained from the first candidate data set;

determining a first evaluation value of the first candidate data set from the second evaluation value of the first candidate data set.

As a possible implementation, the calculating the meta feature of the first candidate data set according to the first candidate data set includes:

As a possible implementation manner, the determining the first evaluation value of the first candidate data set according to the second evaluation value of the first candidate data set by the processor 801 is performed by performing a first feature transformation on the first candidate data set, where the determining the first evaluation value of the first candidate data set according to the second evaluation value of the first candidate data set includes:

the first evaluation value of the first candidate data set is a sum of the first data item and the second data item; wherein the first data item positively correlates with a second evaluation value of the first candidate data set, the second data item being determined by a number of historical gains of the first feature transform.

As a possible implementation, the processor 801 performs the determining of the n +1 th group of data sets according to the first evaluation value in the plurality of candidate data sets, specifically including performing:

selecting a candidate data set of which first evaluation value satisfies a first condition among the plurality of candidate data sets;

respectively training and testing a model of each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition;

selecting a candidate data set of which third evaluation value satisfies a second condition among the candidate data sets satisfying the first condition as the n +1 th group of data sets.

As a possible implementation, the second candidate data set is any one of candidate data sets that satisfy the first condition, the second candidate data set includes a training data set and a testing data set, and any one sample in the training data set and the testing data set includes a plurality of data features and a label; the processor 801 performs the training and testing of the model for each candidate data set in the candidate data sets that satisfy the first condition, to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets that satisfy the first condition, including performing:

training a second machine learning model according to the training data set;

As a possible implementation, the processor 801 is further configured to execute the following steps:

obtaining a plurality of first samples, wherein any one of the plurality of first samples comprises a meta-feature of a second data set and an evaluation value of the second data set;

training the first machine learning model from the plurality of first samples.

As a possible implementation manner, before the processor 801 performs feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets, the processor 801 is further configured to:

inputting meta features of a third data set into a third machine learning model, and predicting to obtain a fourth evaluation value, wherein the fourth evaluation value is used for evaluating the accuracy of a model obtained by training a candidate data set obtained by the third data set through the second feature transformation, the third data set is any one of the nth data set, the second feature transformation is any one of B feature transformations, and B is a positive integer;

selecting a feature corresponding to a fourth evaluation value meeting a fourth condition from the B feature transformations, wherein A is a positive integer not greater than B, and transforming into the A feature transformations;

the performing feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets respectively includes: and performing A kinds of feature transformation on the third data set to obtain A candidate data sets.

As a possible implementation, the processor 801 is further configured to:

obtaining a plurality of second samples, wherein any one of the second samples comprises meta-features of a fourth data set and a difference value between an evaluation value of the data set of the fourth data set after a second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations;

training the third machine learning model according to the plurality of second samples.

It should be understood that, implementation of each device may also correspondingly refer to corresponding description in the foregoing method embodiments, and details are not described in this embodiment of the present application.

It is to be understood that the various elements of data processing system 700 may correspond to processor 802.

As shown in fig. 9, an exercise device provided in an embodiment of the present application may include a processor 901, a memory 902, a communication bus 903, and a communication interface 904, where the processor 901 connects the memory 902 and the communication interface 903 through the communication bus.

The processor 901 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), a neural Network Processing Unit (NPU), or one or more Integrated circuits, and is configured to execute a relevant program to execute the method for training the first machine learning model according to the embodiment of the present Application.

The processor 901 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the training method of the first machine learning model of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 901. The processor 801 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 901, and the processor 901 reads the information in the memory 902, and executes the training method of the first machine learning model of the embodiment of the present application in combination with hardware thereof.

The Memory 902 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 902 may store programs and data, such as a plurality of first samples in the embodiment of the present application, a program for implementing a training method of a first machine learning model in the embodiment of the present application, and the like. When the program stored in the memory 901 is executed by the processor 902, the processor 901 and the communication interface 904 are used for executing the steps of the training method of the first machine learning model of the embodiment of the present application.

For example, the program used in the embodiment of the present application to implement the training method of the first machine learning model in the embodiment of the present application, and the like.

Communication interface 904 enables communication between training device 900 and other devices or communication networks using transceiver means, such as, but not limited to, a transceiver. For example, the plurality of first samples may be obtained through the communication interface 904 to enable information interaction with an execution device, a client device, a user device, or a terminal device, among others.

Optionally, the execution device may further include an artificial intelligence processor 905, and the artificial intelligence processor 905 may be any processor suitable for large-scale exclusive or operation processing, such as a neural Network Processor (NPU), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor 905 may be mounted as a coprocessor to a main CPU (host CPU) for which tasks are assigned. The artificial intelligence processor 905 may implement one or more operations involved in the training method of the first machine learning model described above. For example, taking an NPU as an example, the core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract matrix data in the memory 902 and perform a multiply-add operation.

The processor 901 is configured to call the data and the program code in the memory, and perform:

training the first machine learning model from the plurality of first samples.

It should be understood that the implementation of each device may also correspond to the corresponding description in the above embodiment of the training method for the first machine learning model, and the description of the embodiment of the present application is omitted.

As shown in fig. 10, the training apparatus provided in the embodiment of the present application may include a processor 1001, a memory 1002, a communication bus 1003, and a communication interface 1004, where the processor 1001 connects the memory 1002 and the communication interface 1003 through the communication bus.

The processor 1001 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more Integrated circuits, and is configured to execute related programs to implement functions required to be executed by units in the training apparatus of the third machine learning model according to the embodiment of the present Application, or to execute the training method of the third machine learning model according to the embodiment of the present Application.

The processor 1001 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the training method of the third machine learning model of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 1001. The processor 1001 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable Gate Array (FPGA) or other programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1002, and the processor 1001 reads the information in the memory 1002 and executes the training method of the third machine learning model of the embodiment of the present application in combination with the hardware thereof.

The Memory 1002 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 1002 may store programs and data, such as a plurality of second samples or third samples in the embodiment of the present application, a program for implementing a training method of a third machine learning model in the embodiment of the present application, and the like. When the programs stored in the memory 1001 are executed by the processor 1002, the processor 1001 and the communication interface 1004 are used to perform the steps of the training method of the third machine learning model of the embodiments of the present application.

The plurality of second or third samples are received via the communication interface 1004 to enable information interaction with an execution device, a client device, a user device, or a terminal device, etc.

Optionally, the execution device may further include an artificial intelligence processor 1005, where the artificial intelligence processor 1005 may be any processor suitable for large-scale exclusive or operation processing, such as a neural Network Processor (NPU), a Tensor Processor (TPU), or a Graphics Processing Unit (GPU). The artificial intelligence processor 1005 may be mounted as a coprocessor to a main CPU (host CPU) for which tasks are assigned. The artificial intelligence processor 1005 may implement one or more operations involved in the training method of the third machine learning model described above. For example, taking an NPU as an example, the core portion of the NPU is an arithmetic circuit, and the controller controls the arithmetic circuit to extract matrix data in the memory 1002 and perform a multiply-add operation.

The processor 1001 is configured to call the data and the program codes in the memory, and perform:

obtaining a plurality of second samples, wherein any one of the second samples comprises meta-features of a fourth data set and a difference value between an evaluation value of the data set of the fourth data set after a second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations; training the third machine learning model according to the plurality of second samples.

Or performing:

acquiring a plurality of third samples, wherein any one of the third samples comprises meta-features of a fourth data set and a fourth evaluation value of the data set of the second data set after the second feature transformation; training the third machine learning model according to a plurality of third samples.

It should be understood that the implementation of each device may also correspond to the corresponding description in the training method embodiment of the third machine learning model, and the description of the embodiment of the present application is not repeated.

A hardware structure of a chip provided in an embodiment of the present application is described below.

Fig. 11 is a hardware structure of a chip according to an embodiment of the present invention, where the chip includes an artificial intelligence processor 110. The chip may be provided in the execution device 120 shown in fig. 1 or the execution device 700 shown in fig. 8 to perform part or all of the data processing work of the execution device. The chip may also be disposed in the training apparatus 110 shown in fig. 1, the performing apparatus 800 shown in fig. 8, or the

training apparatuses

900 and 1000 shown in fig. 9-10 to complete the training work of the training apparatus and output the first machine learning model or the third machine learning model.

The artificial intelligence processor 110 may be any processor suitable for large-scale exclusive-or processing, such as an NPU, TPU, or GPU. Taking NPU as an example: the NPU may be mounted as a coprocessor to a main CPU (host CPU), which is assigned tasks. The core portion of the NPU is an arithmetic circuit 1103, and the controller 1104 controls the arithmetic circuit 1103 to extract matrix data in the memory and perform a multiply-add operation.

In some implementations, the arithmetic circuit 1103 includes a plurality of processing units (PEs) inside. In some implementations, the arithmetic circuitry 1103 is a two-dimensional systolic array. The arithmetic circuit 1103 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1103 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 1103 fetches the weight data of the matrix B from the weight memory 1102 and buffers it on each PE in the arithmetic circuit 1103. The arithmetic circuit 1103 takes input data of the matrix a from the input memory 1101, performs matrix arithmetic on the input data of the matrix a and weight data of the matrix B, and stores a partial result or a final result of the obtained matrix in an accumulator (accumulator) 1108.

The unified memory 1106 is used to store input data as well as output data. The weight data is directly transferred to the weight Memory 1102 through a Memory cell Access Controller (DMAC) 1105. The input data is also carried into the unified memory 1106 through the DMAC.

A Bus Interface Unit (BIU) 1110 for interaction between the DMAC and an Instruction Fetch memory (Instruction Fetch Buffer) 1109; bus interface unit 1101 is also used to fetch instructions from external memory by instruction fetch memory 1109; the bus interface unit 1101 is also used for the memory unit access controller 1105 to obtain the original data of the input matrix a or the weight matrix B from the external memory.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1106, to transfer weight data to the weight memory 1102, or to transfer input data to the input memory 1101.

The vector calculation unit 1107 includes a plurality of arithmetic processing units, and further processes the output of the arithmetic circuit 1103, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, as necessary. The vector calculation unit 1107 is mainly used for calculation of a non-convolutional layer or a fully connected layer (FC) in a neural network, and specifically may process: pooling (Pooling), Normalization, etc. For example, the vector calculation unit 1107 may apply a non-linear function to the output of the arithmetic circuit 1103, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 1107 generates normalized values, combined values, or both.

In some implementations, the vector calculation unit 1107 stores the processed vector to the unified memory 1106. In some implementations, the vectors processed by the vector calculation unit 1107 can be used as activation inputs for the arithmetic circuitry 1103, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer 1109 connected to the controller 1104, configured to store instructions used by the controller 1104;

the unified memory 1106, the input memory 1101, the weight memory 1102 and the instruction fetch memory 1109 are all On-Chip memories. The external memory is independent of the NPU hardware architecture.

When the first machine learning model, the second machine learning model, or the third machine learning model is a neural network, the operation of each layer in the neural network may be performed by the operation circuit 1103 or the vector calculation unit 1107.

It should be noted that although the performing device 800,

training device

900 and 1000 shown in fig. 8, 9 and 10 only show memories, processors and communication interfaces, in a particular implementation, those skilled in the art will appreciate that the performing device 800,

training device

900 and 1000 also include other components necessary to achieve proper operation. Also, the performing apparatus 800,

training apparatus

900 and 1000 may include hardware components to implement other additional functions, as may be appreciated by those skilled in the art, according to particular needs. Furthermore, those skilled in the art will appreciate that the performing device 800, the

training device

900, and 1000 may also include only those elements necessary to implement an embodiment of the present application, and need not include all of the elements shown in fig. 8, 9, or 10, e.g., a communication interface and a communication bus are not necessary components of the devices shown in fig. 8, 9, or 10, and the devices shown in fig. 8, 9, or 10 may not include a communication interface and/or a communication bus.

It is to be understood that one of ordinary skill in the art would recognize that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed in the various embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Those of skill would appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps disclosed in the various embodiments disclosed herein may be implemented as hardware, software, firmware, or any combination thereof. If implemented in software, the functions described in the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or any communication medium including a medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing method, comprising:

an executing device acquires a first set of data sets, the first set of data sets comprising a plurality of data features;

the executing device performs multi-order feature transformation on the plurality of data features in the first set of data sets;

the executing equipment determines a target data set from a first set, wherein the first set comprises data sets obtained by each stage of feature transformation in the multi-stage feature transformation process;

the executing equipment respectively performs feature transformation on data features in each data set in an nth data set to obtain a plurality of candidate data sets, wherein the nth data set is a data set obtained by performing n-1 order feature transformation on the first data set, and n is an integer greater than 1;

the execution device calculates a first evaluation value for each of the plurality of candidate data sets; the first evaluation value is used for evaluating the accuracy of a model obtained by training the candidate data set;

the execution device determines an n +1 th group of data sets according to the first evaluation value of each of the plurality of candidate data sets, the number of data sets in the n +1 th group of data sets being smaller than the number of the plurality of candidate data sets.

2. The method of claim 1, wherein the first candidate data set is any one of the plurality of candidate data sets, and wherein the calculating the first evaluation value for each of the plurality of candidate data sets comprises:

the execution device calculating a meta-feature of the first candidate data set, the meta-feature representing an attribute of the first candidate data set;

the executing device inputs the meta-feature into a first machine learning model to predict a second evaluation value of the first candidate data set, wherein the second evaluation value of the first candidate data set is used for evaluating the accuracy of a model obtained by training the first candidate data set;

the execution device determines a first evaluation value of the first candidate data set from a second evaluation value of the first candidate data set.

3. The method of claim 2, wherein the first candidate data set comprises a plurality of data features and a tag, and wherein computing meta-features of the first candidate data set from the first candidate data set comprises:

the execution device calculates first information according to the first candidate data set, wherein the first information comprises data similarity and distribution similarity of every two data features in the plurality of data features of the first candidate data set, data similarity and distribution similarity of each data feature in the plurality of data features of the first candidate data set and a label, and at least one of data distribution information of each data feature in the plurality of data features of the first candidate data set and data distribution information of the label;

the execution device calculates meta features of the first candidate data set from the first information.

4. The method of claim 3, wherein the meta-features of the first candidate data set comprise: at least one of a base feature of the first candidate data set, a feature of a continuous data feature of the plurality of data features of the first candidate data set, a feature of a discrete data feature of the plurality of data features of the first candidate data set, a feature of the tag, a feature of data similarity, a feature of distribution similarity, and a feature of distribution information of the data feature.

5. The method according to any one of claims 2 to 4, wherein the first candidate data set is a first data set obtained by a first feature transformation, the first data set is one of the nth data set, and the determining the first evaluation value of the first candidate data set according to the second evaluation value of the first candidate data set specifically comprises:

6. The method according to any one of claims 1 to 4, wherein the determining the n +1 th group of data sets according to the first evaluation value in the plurality of candidate data sets specifically comprises:

the execution device selects a candidate data set of which a first evaluation value satisfies a first condition among the plurality of candidate data sets;

the execution equipment respectively trains and tests models of each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition;

the execution device selects, as the n +1 th group of data sets, a candidate data set in which a third evaluation value satisfies a second condition among the candidate data sets satisfying the first condition.

7. The method of claim 6, wherein the second candidate data set is any one of the candidate data sets satisfying the first condition, the second candidate data set comprises a training data set and a testing data set, and any one sample in the training data set and the testing data set comprises a plurality of data features and a label; the training and testing of the model for each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition respectively includes:

the executing device training a second machine learning model according to the training data set;

the execution equipment inputs a plurality of data characteristics of each sample in the test data set into the second machine learning model to obtain a prediction label of each sample in the test data set;

the execution device calculates a third evaluation value of the second candidate data set based on the label and the predictive label for each sample in the test data set.

8. The method according to any one of claims 2-7, further comprising:

the execution device acquires a plurality of first samples, wherein any one of the plurality of first samples comprises a meta-feature of a second data set and an evaluation value of the second data set;

the executing device trains the first machine learning model according to the plurality of first samples.

9. The method according to any one of claims 1-8, wherein before the performing the feature transformation on each data set in the nth data set to obtain a plurality of candidate data sets, the method further comprises:

inputting meta features of a third data set into a third machine learning model, and predicting to obtain a fourth evaluation value, wherein the fourth evaluation value is used for evaluating the accuracy of a model obtained by training a candidate data set obtained by the third data set through the second feature transformation, the third data set is any one of the nth data set, the second feature transformation is any one of the B feature transformations, and B is a positive integer;

the executing equipment selects a feature transform corresponding to a fourth evaluation value meeting a fourth condition from the B feature transforms, wherein A is a positive integer not greater than B;

10. The method of claim 9, further comprising:

the executing device acquires a plurality of second samples, wherein any one of the second samples comprises a meta feature of a fourth data set and a difference value between an evaluation value of the data set subjected to second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations;

the executing device trains the third machine learning model according to the plurality of second samples.

11. A data processing system, comprising:

the system further comprises:

a first evaluation unit configured to calculate a first evaluation value for each of the plurality of candidate data sets; the first evaluation value is used for evaluating the accuracy of a model obtained by training the candidate data set;

12. The system of claim 11, wherein the first candidate dataset is any one of the plurality of candidate datasets;

the system further comprises a meta-feature calculation unit for: computing a meta-feature of the first candidate data set from the first candidate data set, the meta-feature representing an attribute of the first candidate data set;

the first evaluation unit is specifically configured to: inputting the meta-features into a first machine learning model to predict a second evaluation value of the first candidate data set, the second evaluation value of the first candidate data set being used to evaluate an accuracy of a model trained from the first candidate data set; and determining a first evaluation value of the first candidate data set from the second evaluation value of the first candidate data set.

13. The system according to claim 12, wherein the first candidate data set comprises a plurality of data features and a tag, and the meta-feature calculating unit is specifically configured to:

14. The system of claim 13, wherein the meta-features of the first candidate data set comprise: at least one of a base feature of the first candidate data set, a feature of a continuous data feature of the plurality of data features of the first candidate data set, a feature of a discrete data feature of the plurality of data features of the first candidate data set, a feature of the tag, a feature of data similarity, a feature of distribution similarity, and a feature of distribution information of the data feature.

15. The system according to any one of claims 12 to 14, wherein the first candidate data set is a first data set obtained by a first feature transformation, the first data set is one of the nth data set, and the first evaluation value of the first candidate data set is a sum of a first data item and a second data item; wherein the first data item positively correlates with a second evaluation value of the first candidate data set, the second data item being determined by a number of historical gains of the first feature transform.

16. The system according to any one of claims 11-14,

the first screening unit is further configured to: selecting a candidate data set of which first evaluation value satisfies a first condition among the plurality of candidate data sets;

the system further comprises a second evaluation unit for: respectively training and testing a model of each candidate data set in the candidate data sets meeting the first condition to obtain a third evaluation value corresponding to each candidate data set in the candidate data sets meeting the first condition;

the first screening unit is further configured to: selecting a candidate data set of which third evaluation value satisfies a second condition among the candidate data sets satisfying the first condition as the n +1 th group of data sets.

17. The system of claim 16, wherein a second candidate data set is any one of the candidate data sets satisfying the first condition, the second candidate data set comprising a training data set and a testing data set, any one of the samples in the training data set and the testing data set comprising a plurality of data features and a label; the second evaluation unit is specifically configured to:

training a second machine learning model according to the training data set;

18. The system according to any one of claims 12-17, further comprising:

a second acquisition unit configured to acquire a plurality of first samples, any one of the plurality of first samples including a meta-feature of a second data set and an evaluation value of the second data set;

a first training unit to train the first machine learning model based on the plurality of first samples.

19. The system according to any one of claims 11-18, further comprising:

a third evaluation unit for: before a transformation evaluation module carries out feature transformation on each data set in an nth data set respectively to obtain a plurality of candidate data sets, inputting meta features of a third data set into a third machine learning model, and predicting to obtain a fourth evaluation value, wherein the fourth evaluation value is used for evaluating the accuracy of the model obtained by training the candidate data sets obtained by carrying out second feature transformation on the third data set, the third data set is any one of the nth data set, the second feature transformation is any one of B feature transformations, and B is a positive integer;

a second screening unit, configured to select, from the B types of feature transformations, a feature transformation corresponding to a fourth evaluation value that satisfies a fourth condition, where a is a positive integer not greater than B, and transform the feature transformation into the a types of feature transformations;

the transformation unit is specifically configured to: and performing A kinds of feature transformation on the third data set to obtain A candidate data sets.

20. The system of claim 19, further comprising:

a third obtaining unit, configured to obtain a plurality of second samples, where any one of the plurality of second samples includes a meta feature of a fourth data set and a difference between an evaluation value of a data set of the fourth data set after a second feature transformation and an evaluation value of the fourth data set, and the second feature transformation is any one of the B feature transformations;

a second training unit to train the third machine learning model according to the plurality of second samples.

21. An execution device, comprising: a processor and a memory for computer program code, the processor being configured to invoke the computer program code to perform the data processing method of any of claims 1-10.

22. A computer-readable storage medium, in which a computer program code is stored, which, when run on a processor, causes the processor to carry out a data processing method according to any one of claims 1 to 10.