CN110956272A

CN110956272A - Method and system for realizing data processing

Info

Publication number: CN110956272A
Application number: CN201911061020.9A
Authority: CN
Inventors: 王昱森; 罗伟锋; 方荣; 罗远飞; 周柯吉
Original assignee: 4Paradigm Beijing Technology Co Ltd
Current assignee: 4Paradigm Beijing Technology Co Ltd
Priority date: 2019-11-01
Filing date: 2019-11-01
Publication date: 2020-04-03
Anticipated expiration: 2039-11-01
Also published as: CN110956272B

Abstract

The invention discloses a method and a system for realizing data processing. The method comprises the following steps: responding to the operation of generating the directed acyclic graph of the user, and generating a corresponding directed acyclic graph; and responding to the operation of running the directed acyclic graph, and executing a data processing flow corresponding to the directed acyclic graph. The scheme of the invention provides a method and a system for realizing data processing, which are widely applied to different scenes, and meets the increasingly complex machine learning scene requirements of users.

Description

Method and system for realizing data processing

Technical Field

The present invention relates to the field of computer data processing, and more particularly, to a method and system for implementing data processing.

Background

With the advent of massive data, artificial intelligence technology is rapidly developing, and machine learning is a necessary product of artificial intelligence development to a certain stage, which is dedicated to mining valuable potential information from massive data through a calculation means.

In the field of machine learning, machine learning models are often trained by providing empirical data to machine learning algorithms, and the trained machine learning models may be applied to provide corresponding prediction results in the face of new prediction data. However, many of the tasks involved in the machine learning process (e.g., feature preprocessing and selection, model algorithm selection, hyper-parameter adjustment, etc.) often require both computer (especially machine learning) expertise and specific business experience associated with the predicted scenario, and thus, require a significant amount of human cost. In order to reduce the threshold of utilizing machine learning technology, many machine learning systems (e.g., machine learning platforms) are presented, however, the existing machine learning platforms are limited to how to train out the corresponding models (or implement the corresponding model management) based on the accumulated data, which makes the functions that the platforms can support very limited and fixed. In addition, the existing platform is often only directed to a relatively single machine learning scene, and is difficult to meet the increasingly complex machine learning scene requirements of users.

Disclosure of Invention

The invention aims to provide a method and a system for realizing data processing, which can be widely applied to different scenes.

One aspect of the present invention provides a method for implementing data processing, including:

responding to the operation of generating the directed acyclic graph of the user, and generating a corresponding directed acyclic graph;

and responding to the operation of running the directed acyclic graph, and executing a data processing flow corresponding to the directed acyclic graph.

Another aspect of the invention provides a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method as described above.

Yet another aspect of the present invention provides a system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method as described above.

In yet another aspect of the present invention, a data processing system is provided, wherein the system comprises:

the operation unit is suitable for responding to the operation of generating the directed acyclic graph of the user and generating the corresponding directed acyclic graph;

and the operation unit is suitable for responding to the operation of operating the directed acyclic graph and executing the data processing flow corresponding to the directed acyclic graph.

The scheme of the invention provides a method and a system which are widely applied to different scenes, and meets the increasingly complex machine learning scene requirements of users.

Drawings

The above and other objects and features of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:

FIG. 1 shows a flow diagram of a method of implementing data processing in accordance with an embodiment of the invention;

FIG. 2 shows a schematic view of a first graphical user interface according to an embodiment of the invention;

FIG. 3 illustrates an example of a search tree for generating combined features according to an exemplary embodiment of the present invention;

FIG. 4 shows a schematic diagram of a system implementing data processing according to an embodiment of the invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 shows a flow diagram of a method of implementing data processing according to an embodiment of the invention. As shown in fig. 1, the method includes:

and step S110, responding to the operation of generating the directed acyclic graph of the user, and generating a corresponding directed acyclic graph.

In this step, the operation of generating a Directed Acyclic Graph (DAG) by the user may be various, for example, the user may input a script language for generating the Directed Acyclic Graph, and may also build the Directed Acyclic Graph by dragging and pulling based on a provided graphical user interface. The generated directed acyclic graph represents a data processing flow, nodes in the directed acyclic graph are data or computing logic for processing the data, and connecting lines among the nodes represent the logical relationship and the sequence of the data processing process.

And step S120, responding to the operation of running the directed acyclic graph, and executing a data processing flow corresponding to the directed acyclic graph.

In this step, the operation of running the directed acyclic graph may be triggered by the user immediately, or may be triggered at regular time according to the time set by the user. The data processing flow corresponding to the directed acyclic graph is a data processing flow related to machine learning.

In the method shown in fig. 1, by generating a DAG and then executing a data processing flow corresponding to the DAG, the construction and editing of a large data processing process are easier, and the threshold of using a machine learning technology is reduced.

In an embodiment of the present invention, the step S110 of the method in fig. 1, in response to the operation of generating a directed acyclic graph by a user, generating a corresponding directed acyclic graph includes: displaying a first graphical user interface comprising a node display area and a canvas area, wherein node types in the node display area comprise data, samples, models, and operators; in response to an operation of selecting a node in the node exhibition area, displaying the corresponding node in the canvas area, and in response to an operation of connecting the nodes, generating a connecting line between the corresponding nodes in the canvas area to generate the directed acyclic graph.

In one embodiment of the invention, the node display area comprises an element list and an operator list, wherein the element list comprises data, samples and models, and the operator list comprises various data processing operators related to machine learning; the node display area further comprises a file list, and the file list comprises a directed acyclic graph. The nodes of the node display area also comprise a directed acyclic graph.

FIG. 2 shows a schematic view of a first graphical user interface according to an embodiment of the invention. Referring to fig. 2, the left side of the first graphical user interface shows regions for nodes, including file lists, element lists, and operator lists. The nodes in the list may be further illustrated by selecting the list, for example, the operator list shown in fig. 2 is selected, and thus, the operators in the operator list are further illustrated. The middle portion of the first graphical user interface is the canvas area, a constructed DAG diagram is illustrated in FIG. 2.

In one embodiment of the invention, the method of fig. 1 further comprises at least one of the following:

1) in response to an operation of selecting a directed acyclic graph in the node display area, displaying the selected directed acyclic graph in the canvas area for direct running or modification editing; for example, selecting a list of files in the interface shown in FIG. 2 further exposes various files in the list of files, including DAGs saved as files, one of which, when further selected, is displayed in the canvas area;

2) responding to the operation of saving the directed acyclic graph in the drawing area, saving the directed acyclic graph, and adding the saved directed acyclic graph into the node display area; for example, in the interface shown in FIG. 2, the saved DAG is added to the list of files;

3) in response to an operation of exporting the directed acyclic graph, outputting the corresponding directed acyclic graph to a specified export position; for example, in the interface shown in fig. 2, a user may select a DAG in the file list and then click the right button to display a download control, and further select the download control, the DAG may be downloaded to a specified location.

In one embodiment of the invention, the method of FIG. 1 further comprises at least one of the following (where the element is data, a sample, or a model):

1) responding to the operation of importing elements from the outside, saving the corresponding elements and adding the elements into the node display area; for example, referring to the interface shown in FIG. 2, the imported elements are added to the list of elements.

2) Saving elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, and adding the saved elements to the node display area; for example, referring to the interface shown in FIG. 2, the intermediate data, samples, and models generated by running the DAG are added to the list of elements.

3) Providing a management page for managing elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, so that a user can check and delete the intermediate elements through the management page; intermediate data, samples, and models generated by running the DAG are managed separately for easy viewing and deletion by the user.

4) In response to the operation of exporting the element, the corresponding element is output to the specified export position, for example, in the interface shown in fig. 2, after the user selects one element in the element list, the user may click a right button to appear a download control, and further select the download control, the element may be downloaded to the specified position.

1) responding to the operation of importing operators from the outside, storing codes corresponding to the corresponding operators, and adding the corresponding operators to the node display area; for example, referring to FIG. 2, the imported operators are added to the operator list.

2) Providing an operator code editing interface, acquiring and storing an input code from the interface, and adding a corresponding operator to a node display area; for example, referring to fig. 2, the corresponding operator is added to the operator list.

In one embodiment of the present invention, the method illustrated in fig. 1 further comprises:

responding to the operation of selecting one node in the layout area, displaying a configuration interface of the node, and finishing the relevant configuration of the corresponding node according to the configuration operation on the configuration interface; for example, referring to fig. 2, a "data split" operator in the layout area is selected, a configuration interface of the "data split operator" is displayed on the rightmost side of the first graphical user interface, and the "data split" operator can be configured on the configuration interface;

when the node does not carry out necessary configuration or the configured parameters do not meet the preset requirements, displaying a prompt identifier at the corresponding node in the drawing area; for example, referring to FIG. 2, an exclamation point may be displayed on a node in the canvas area that is not configured or misconfigured as necessary to prompt the user.

1) displaying a graphic control running the directed acyclic graph in the first graphic user interface, responding to the operation of triggering the graphic control, and executing a data processing flow corresponding to the directed acyclic graph according to each node in the directed acyclic graph and the connection relation among the nodes; for example, a "run" control may be displayed in place, or, as shown in FIG. 2, a "play" button may be displayed in the upper left corner of the canvas area for initiating the running of the DAG.

2) And displaying a timer on the first graphical user interface, wherein the timer is used for timing the time spent on executing the data processing flow corresponding to the directed acyclic graph in real time. For example, a timer can be displayed in the canvas area for the user to view the execution of the DAG.

1) displaying information used for representing the executed progress of the corresponding node on each node of the directed acyclic graph on a first graphical user interface in the process of executing the data processing flow corresponding to the directed acyclic graph; for example, the progress of the node being executed may be displayed on the node in the form of a progress bar, or the percentage of progress at that time may be displayed in real time according to the progress of the node being executed.

2) Displaying an identifier in operation on each node of the directed acyclic graph on a first graphical user interface in the process of executing the data processing flow corresponding to the directed acyclic graph, and displaying the identifier after execution on the node when the data processing flow corresponding to the node is executed; for example, when the DAG runs, a funnel is displayed on each node in the DAG as an ongoing identifier, and when the data processing flow corresponding to a node is executed completely, the funnel is replaced by a green tick as an executed identifier.

3) Responding to the operation of checking the operation result of the node in the directed acyclic graph, and acquiring and displaying operation result data corresponding to the node; for example, the operation of viewing the running result of the node in the directed acyclic graph may specifically be that the user clicks a right button on one node and then selects a status viewing control to appear, or may be that, in response to the operation of clicking one node by the user, an icon corresponding to the type of the running result is displayed near the node, and in response to the operation of clicking the icon by the user, the running result data corresponding to the node is displayed.

In one embodiment of the invention, the method described in fig. 1 comprises one or more of the following:

1) the data, samples, and models in the canvas area each support one or more of the following operations: copy, delete, and preview;

2) operators in the canvas area support one or more of the following operations: copying, deleting, previewing, running the current task, starting running from the current task, running to the current task, viewing a log, viewing details of the task. For example, after a mouse selects an operator, a right button is clicked, and operations supported by the operator can be checked. Copying operators with/without configured parameters, and pasting the operators with/without configured parameters to any canvas for repeated use; deleting the operator dragged into the canvas; renaming is the renaming of operators to facilitate labeling and identification; running the current task, namely selecting to run the current task if the data accessed upstream meets the condition; starting to run from the current task, namely starting to run from the currently selected operator to the end; the operation to the current task means that the operation is finished from the starting unit to the current selected operator; previewing is to preview and check the operation result of the operator; the checking log is used for positioning more detailed error information when the operator operation fails, and the checking log can be checked after the operator operation is finished; and the task details can jump to a Yarn cluster task link/PWS Console page.

3) For the oriented acyclic graph which is finished in operation in the layout area, responding to the operation of clicking one operator, displaying product type marks respectively corresponding to the types of products output by the operator, responding to the operation of clicking the product type marks, and displaying a product related information interface, wherein the product related information interface comprises: a control for previewing the product, a control for exporting the product, a control for importing the product into an element list, basic information of the product and path information for storing the product; wherein the product types output by the operators include: data, samples, models, and reports.

In an embodiment of the present invention, in the method illustrated in fig. 1, the node display area includes the following operators:

1) a data splitting operator: data splitting refers to a computational unit that splits a set of data into two pieces of data. Typically, one of the data sets is used as data for training the model, and the other set is used for model evaluation, i.e., verifying the model.

And configuring the data splitting operator through a configuration interface of the data splitting operator. The data splitting has one input connection point and two output connection points, the output result of the left output point is marked as 'data set 1', and the right output point is marked as 'data set 2'. The split two data tables can be connected with a plurality of downstream operations, and the data tables can be continuously split for a plurality of times to obtain more data tables. The data splitting method provided in the configuration interface of the data splitting operator comprises one or more of splitting according to proportion, splitting according to rules and splitting after sequencing.

When proportional splitting is selected, proportional sequential splitting, proportional random splitting and proportional layered splitting can be further selected. When random splitting is selected, an input area for setting random seed parameters is further provided on the configuration interface, the value range of the random seed parameters is any integer between 0 and 999999999, the system generates random numbers according to the given random seeds, and the purpose of setting the random seeds is to reproduce the result of random splitting under necessary conditions. When the hierarchical splitting is selected, an input area of fields for setting the hierarchical basis is further provided on the configuration interface, and the hierarchical splitting refers to splitting after the data set is split into tens of layers according to the specified fields. Whether the hierarchical splitting is started can be defined by user, and a hierarchical basis field needs to be appointed when the hierarchical splitting is selected. The reason for using the hierarchical splitting is that some data sets are controlled by a certain field to be unevenly distributed, so that the modeling effect is prevented from being adversely affected by the uneven distribution. Given that we want to make a voting prediction of the presidential election, the data set is highly correlated with the population distribution of the country, and it may be desirable to split the population hierarchy to match the actual problem. Example (c): if the data is layered according to the gender field in a data table, assuming that the field has only two values of 0 and 1, if the split ratio is set to 0.5, 50% of the samples with the value of 0 are sampled, 50% of the samples with the value of 1 are sampled, the two parts of the samples are taken as data set 1, and the rest are taken as data set 2.

Providing an input area for inputting a split rule when splitting by the rule is selected; the split rule expression may be written using SQL syntax, with data that is compliant with the rule being output to dataset 1, data that is non-compliant with the rule and data that returns a NULL value being output to dataset 2.

When the sorting is selected and then the splitting is carried out, a splitting proportion selection item, an input area for setting a sorting field and a sorting direction selection item are further provided on the configuration interface. Sorting first and splitting later means that a certain field in the data table is sorted first and then split according to proportion, and the sorting mode can be selected to be ascending or descending. Generally, when a modeling scene needs to predict future data through historical data, samples should be split after being sorted by time so as to avoid 'time crossing'.

2) And (3) data cleaning operator: data of each column can be processed by using data cleaning, and the data of the column can be cleaned by defining a configuration function for the column in the data table. The input of the data cleaning is a data table, the output of the data table after being processed by the configuration function is still a data table, and the original data column cleaned after cleaning is replaced by the column output after processing.

3) Data table statistics operator: data statistics can be carried out on field information in the data table by using a data table statistical operator, and the statistical items comprise NULL number, average value, variance, median, mode and the like. The data distribution and the statistical information of the data table can be checked, and the subsequent feature processing and modeling work can be facilitated.

4) A feature extraction operator: the configuration interface of the feature extraction operator provides an interface for adding an input source and a script editing inlet, and also provides at least one of a sample random ordering option, a feature accurate statistics option, an option for outputting whether a sample is compressed, an output plaintext option, a tag type option and an output result storage type option. The script extracted by the characteristics can be input through the script editing entry, or the data source data can be automatically displayed, the quick configuration script frame is displayed in response to the operation that the button for generating the configuration is triggered, the script configuration can be quickly generated only by inputting the target value by the user, the script content in the editing frame is verified in further response to the operation for triggering the abnormal verification button, and error prompt is carried out. Not all algorithms need to shuffle the feature order after feature extraction and the shuffle itself has resource consumption, so we can use a "sample random ordering" switch to control whether random shuffle is to be done after each feature extraction, which is turned on by default, if not selectively turned off. In the feature extraction process, the system automatically counts the feature dimensions, and then displays the general picture of the features in the output sample data. However, since feature dimension statistics requires de-duplication of all features, there is a large performance penalty in large-scale data scenarios. So here the platform will default to approximate the feature dimensions and guarantee that the relative error of the statistics is less than 0.01. If accurate strict statistics needs to be carried out, the characteristic accurate statistics switch is turned on. The sample represents a file which occupies large space resources in the platform file, so that the output text file is suggested to be compressed, and the space utilization rate is improved. In the process of feature extraction, coding transformation is carried out on the processed features, at the moment, original text information corresponding to the features is lost, so that in order to facilitate deep processing (such as model debug) of a subsequent model, whether a plaintext is output or not switch can be used for controlling whether related plaintext information is output or not while a sample is output, and the plaintext information is stored in a system path under the condition of being switched on, so that certain storage cost is brought. Feature extraction supports target value labeling for three scenarios: the second classification, the multiple classification and the regression respectively correspond to three labeling methods, namely binary _ label, multiclass _ label and regression _ label, and the script edition needs to be carried out after the selection in advance.

5) Feature importance analysis operator: the input of the feature importance analysis operator is a sample table with a target value, the output is a feature importance evaluation report, the report comprises the importance coefficient of each feature, and the report further comprises one or more of the number of features, the number of samples and basic statistical information of each feature. The feature importance analysis is to analyze the importance relationship (independent of the model) between each feature in the sample table and a target value, and the relationship can be represented by an importance coefficient, so as to analyze the abnormal phenomenon of the feature, and adjust and optimize the model feature.

6) Automatic feature combination operator: at least one of a feature selection item, a score index selection item, a learning rate setting item and a termination condition selection item is provided in a configuration interface of the automatic feature combination operator, wherein the feature selection item is used for determining each feature for feature combination, and the termination condition selection item comprises the maximum number of operation feature pools and the maximum number of output features. The feature combination is a method for enhancing the capability of feature description and improving the personalized prediction effect. The automatic feature combination operator can perform various feature combination analyses on a sample table, and is mainly used for generating features generated based on a combination method and evaluating importance.

7) Automatic parameter adjusting operator: the automatic parameter adjusting operator is used for searching out proper parameters from a given parameter range according to a parameter adjusting algorithm, training a model by using the searched parameters and evaluating the model; and providing at least one of a feature selection setting item, a parameter adjusting method option and a parameter adjusting time setting item in a configuration interface of the automatic parameter adjusting operator, wherein all features or self-defined feature selections can be selected in the feature selection setting item, and random search or grid search can be selected in the parameter adjusting method option.

8) Tensorflow operator: the TensorFlow operator is used for running a user-written TensorFlow code, and an input source setting item and a TensorFlow code file path setting item are provided in a configuration interface of the TensorFlow operator. The TensorFlow operator provides the user with sufficient flexibility, thereby also requiring the user to be familiar with the TensorFlow in order to write the TensorFlow code by himself. If one wants to use a distributed TensorFlow, one needs to write the distributed code of the TensorFlow by itself.

9) Self-defining a script operator: the interface is used for providing a user with a custom writing operator by using a specific script language, and an input source setting item and a script editing inlet are provided in the configuration of the custom script operator. A plurality of self-defined script operators can be provided, and the self-defined script operators respectively correspond to different script languages. Such as SQL, custom scripting languages, etc., commonly used in the industry.

In an embodiment of the present invention, the aforementioned feature importance analysis operator determines the importance of a feature by at least one of the following three ways:

the first way to determine feature importance: training at least one characteristic pool model based on the sample set, wherein the characteristic pool model refers to a machine learning model which provides a prediction result about a machine learning problem based on at least a part of characteristics in each characteristic contained in the sample, acquiring the effect of the at least one characteristic pool model, and determining the importance of each characteristic according to the acquired effect of the at least one characteristic pool model; wherein the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a portion of features.

The method specifically comprises the following steps: (A) obtaining a historical data record, wherein the historical data record comprises marks about machine learning problems and at least one piece of attribute information of each feature used for generating machine learning samples; (B) training at least one characteristic pool model by using the acquired historical data records, wherein the characteristic pool model is a machine learning model which provides a prediction result about a machine learning problem based on at least one part of characteristics in the characteristics; (C) acquiring the effect of the at least one feature pool model, and determining the importance of each feature according to the acquired effect of the at least one feature pool model; wherein, in step (B), the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a portion of features.

Here, it is assumed that the history data record has attribute information { p₁,p₂,…,p_mBased on these attribute information and labels, machine learning samples corresponding to the machine learning problem can be generated, which will be applied to model training and/or testing for the machine learning problem. The characteristic part of the machine learning sample can be expressed as f₁,f₂,…,f_nWhere n is a positive integer, while exemplary embodiments of the present invention are directed to determining a characteristic portion f₁,f₂,…,f_nThe degree of importance of each feature in (1). From { f₁,f₂,…,f_nSelecting at least a part of the features as the features of the training samples of the feature pool model, and using the marks of the corresponding historical data records as the marks of the training samples. Some or all of the continuous features in the selected at least one part of the features are subjected to discretization processing. One or more feature pool models can be trained, wherein the importance of corresponding features can be comprehensively obtained based on the difference of the prediction effects of the same feature pool model (which can be based on all or part of the features of the machine learning sample) on the original test data set and the transformed test data set, wherein the transformed test data set is obtained by transforming the values of some target features in the original test data set, so that the prediction effect difference can reflect the prediction effect, i.e. the importance, of the target features; alternatively, the importance of the corresponding features may be derived synthetically based on the difference in the predicted effect of different feature pool models on the same test data set (i.e., the original test data set), where different feature pool models may be designed based on different combinations of features such that the difference in predicted effect is the difference in predicted effectThe respective prediction effects, i.e. the importance, of the different characteristics can be reflected; in particular, a single-feature model can be trained for each feature of the machine learning sample, and accordingly, the prediction effect of the single-feature model can represent the importance of the feature on which the single-feature model is based. It should be noted that the above two ways of measuring feature importance can be used alone or in combination.

Wherein in step (C) the importance of the respective feature on which the feature pool model is based is determined from the difference between the effect of the feature pool model on the original test dataset and the transformed test dataset; the transformation test data set is a data set obtained by replacing the value of the target feature to be determined in the importance of the original test data set with one of the following items: zero values, random values, values obtained by scrambling the order of the original values of the target features. Wherein the at least one feature pool model comprises an all-features model, wherein the all-features model is a machine learning model that provides a prediction result about a machine learning problem based on all features among the individual features.

Wherein the at least one feature pool model comprises a plurality of machine learning models that provide predictions about machine learning problems based on different feature sets; wherein in step (C) the importance of the individual features is determined from the difference between the effects of the at least one feature pool model on the original test data set.

Wherein the at least one feature pool model comprises one or more main feature pool models and at least one sub-feature pool model respectively corresponding to each main feature pool model, wherein the sub-feature pool model refers to a machine learning model for providing a prediction result about a machine learning problem based on the remaining features except the target feature of which the importance is to be determined among the features based on the corresponding main feature pool model; wherein in step (C) the importance of the respective target feature is determined from the difference between the effect of the main feature pool model and its respective sub-feature pool model on the original test data set.

Wherein the at least one feature pool model comprises a plurality of single-feature models, wherein a single-feature model refers to a machine learning model that provides a prediction result regarding a machine learning problem based on a target feature of which importance is to be determined among the respective features; wherein in step (C) the importance of the corresponding target feature is determined from the difference between the effects of the single-feature model on the original test data set.

For example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine-learned samples₁,f₂,…,f_nThree features of { f }₁,f₃,f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on the test data set can reflect feature combination { f }₁,f₃,f₅The predictive power of. In addition, assume that there are two features on which another feature pool model is based as { f }₁,f₃H, likewise, consecutive features f₁After discretization, the AUC of the feature pool model on the test data set can reflect the feature combination { f₁,f₃The predictive power of. On the basis, the difference value between the two AUCs can be used for reflecting the characteristic f₅The importance of (c).

As another example, assume that the features upon which a certain feature pool model is based are the feature portions { f ] of the machine learning samples₁,f₂,…,f_nThree features of { f }₁,f₃,f₅And, successive features f therein₁The training samples of the feature pool model are subjected to discretization, and accordingly AUC of the feature pool model on an original test data set can reflect feature combination { f }₁,f₃,f₅The predictive power of. Here, to determine the target feature f₅By the features f in each test sample included in the original test data set₅The values of (a) are processed to obtain a transformed test data set, and further obtain the AUC of the feature pool model over the transformed test data set. On the basis of the above two AUsThe difference between C can be used to reflect the target feature f₅The importance of (c). As an example, in the transformation process, the feature f in each original test sample may be transformed₅By replacing the value of (f) by a zero value, a random value, or by replacing the characteristic f₅The original values of (a) are scrambled in order to obtain values.

Wherein the discretization operation comprises a basic binning operation and at least one additional operation. The at least one additional operation comprises at least one operation among the following classes of operations: logarithm operation, exponential operation, absolute value operation, gaussian transformation operation. The at least one additional operation comprises an additional binning operation in the same binning mode as the basic binning operation but with different binning parameters; alternatively, the at least one additional operation comprises an additional binning operation in a different binning manner than the basic binning operation. The basic binning operation and the additional binning operation correspond to equal-width binning operations of different widths or equal-depth binning of different depths, respectively. The different widths or depths numerically constitute an geometric series or an arithmetic series. Wherein the step of performing the basic binning operation and/or the additional binning operation comprises: an outlier bin is additionally provided such that consecutive features having outliers are sorted to the outlier bin.

Wherein, in step (B), the feature pool model is trained based on a log-probability regression algorithm.

Wherein the effect of the feature pool model comprises the AUC of the feature pool model. The raw test data set is composed of acquired historical data records, wherein in step (B), the acquired historical data records are divided into a plurality of sets of historical data records to train respective feature pool models step by step, and step (B) further comprises: using the feature pool model trained by the current group of historical data records to perform prediction on the next group of historical data records to obtain grouped AUCs corresponding to the next group of historical data records, and synthesizing the grouped AUCs to obtain the AUC of the feature pool model; and after the grouped AUC corresponding to the next group of historical data records is obtained, continuously training the feature pool model trained by the current group of historical data records by using the next group of historical data records.

Wherein, in step (B), when prediction is performed for a next set of history data records using the feature pool model trained on the current set of history data records, when the next set of history data records includes a missing history data record lacking attribute information for at least a part of features on which the feature pool model is generated, a group AUC corresponding to the next set of history data records is obtained based on one of: calculating a group AUC using only the predicted results of history data records other than the missing history data record in the next set of history data records; calculating a group AUC using the prediction results of all history data records of the next set of history data records, wherein the prediction result of the missing history data record is set as a default value, and the default value is determined based on the value range of the prediction result or based on the acquired marker distribution of the history data record; and multiplying the AUC calculated by the prediction results of other historical data records except the missing historical data record in the next group of historical data records by the proportion of the other historical data records in the next group of historical data records to obtain the grouped AUC.

Wherein, in step (B), the regularization term set for the continuous features is different from the regularization term set for the non-continuous features when the feature pool model is trained based on a log-probability regression algorithm.

Wherein step (B) further comprises: providing an interface to a user for configuring at least one of the following items of the feature pool model: at least a part of the features based on which the feature pool model is based, the algorithm type of the feature pool model, the algorithm parameters of the feature pool model, the operation type of the discretization operation, and the operation parameters of the discretization operation, and in the step (B), the feature pool model is trained according to the items configured by the user through the interface. Wherein in step (B), the interface is provided to the user in response to the user's indication of the importance of the determined feature.

The method further comprises the following steps: (D) the determined importance of the individual features is graphically presented to the user. Wherein, in the step (D), the respective features are presented in order of importance of the features, and/or a part of the features among the respective features is highlighted, wherein the part of the features includes an important feature corresponding to a high importance, an unimportant feature corresponding to a low importance, and/or an abnormal feature corresponding to an abnormal importance.

Second way of determining feature importance: determining a basic feature subset of the sample, and determining a plurality of target feature subsets of which the importance is to be determined; aiming at each target feature subset in the plurality of target feature subsets, acquiring a corresponding compound machine learning model, wherein the compound machine learning model comprises a basic sub-model and an additional sub-model which are trained according to a lifting frame, the basic sub-model is trained on the basis of the basic feature subsets, and the additional sub-model is trained on the basis of each target feature subset; and determining the importance of the plurality of subsets of target features according to the effect of the composite machine learning model.

The method specifically comprises the following steps: (A1) determining a base feature subset of the machine learning sample, wherein the base feature subset comprises at least one base feature; (B1) determining a plurality of target feature subsets of which the importance of the machine learning samples is to be determined, wherein each target feature subset comprises at least one target feature; (C1) aiming at each target feature subset in the plurality of target feature subsets, acquiring a corresponding compound machine learning model, wherein the compound machine learning model comprises a basic sub-model and an additional sub-model which are trained according to a lifting frame, the basic sub-model is trained on the basis of the basic feature subsets, and the additional sub-model is trained on the basis of each target feature subset; and (D1) determining the importance of the plurality of subsets of target features based on the effect of the composite machine learning model.

Wherein in step (D1), the importance of the plurality of subsets of target features is determined from the difference between the effects of the composite machine learning model on the same dataset.

Wherein the effect of the compound machine learning model comprises AUC of the compound machine learning model.

Wherein the target feature is generated based on the base feature.

Wherein the target feature is a combined feature obtained by combining at least one basic feature.

Wherein in step (C1), a composite machine learning model corresponding to each subset of target features is obtained by training a plurality of composite machine learning models in parallel.

Wherein the target feature subset includes one combined feature obtained by combining at least one basic feature, and the method further includes: (E1) the determined importance of each combined feature is graphically presented to the user.

Wherein, in step (C1), the corresponding composite machine learning model is obtained by training the additional sub-models with the already trained basic sub-models fixed.

Wherein the basic submodel and the additional submodel are of the same type.

According to an exemplary embodiment of the present invention, for each target feature subset, a corresponding composite machine learning model is obtained. Here, the training of the compound machine learning model may be performed by itself, or the trained compound machine learning model may be acquired from the outside. Here, the composite machine learning model includes a basic submodel and an additional submodel trained according to a lifting framework (e.g., a gradient lifting framework), wherein the basic submodel and the additional submodel may be models of the same type, for example, the basic submodel and the additional submodel may both be linear models (e.g., log probability regression models), and further, the basic submodel and the additional submodel may have different types. Here, the lifting framework of each composite machine learning model may be the same, i.e. each composite machine learning model has the same type of basic submodel and the same type of additional submodel, differing only in that the subset of target features on which the additional submodel is based is different.

Assuming that a single composite machine learning model is denoted as F, where F may be represented by a basic submodel F_baseAnd an additional submodel f_addComposition, assuming that the input training data record is denoted x, the basic submodel f after corresponding feature processing according to the determined basic feature subset and the target feature subset_baseThe corresponding sample portion is characterized by x^bAdding a submodel f_addThe corresponding sample portion is characterized by x^a. Accordingly, the composite machine learning model F may be constructed according to the following equation:

F(x)＝f_base(x^b)+f_add(x^a)

it should be noted, however, that the basic submodel and the additional submodel may be trained based on different sets of training data records, in addition to being trained based on the same set of training data records. For example, both of the above-described submodels may be trained based on the entire training data records, or may be trained based on a part of the training data records sampled from the entire training data records. As an example, the basic submodel and the additional submodel may be assigned respective training data records according to a preset sampling strategy, e.g. more training data records may be assigned to the basic submodel and less training data records may be assigned to the additional submodel, where training data records assigned by different submodels may have a certain proportion of intersection or no intersection at all. By determining the training data records used by each sub-model according to the sampling strategy, the effect of the whole machine learning model can be further improved.

A third way to determine feature importance: pre-sorting at least one candidate feature in a sample according to importance, and screening a part of candidate features from the at least one candidate feature according to a pre-sorting result to form a candidate feature pool; and reordering the importance of each candidate feature in the candidate feature pool, and selecting at least one candidate feature with higher importance from the candidate feature pool as an important feature according to the reordering result.

The method specifically comprises the following steps: (A2) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information; (B2) generating at least one candidate feature based on the plurality of attribute information; (C2) pre-sorting the importance of the at least one candidate feature, and screening a part of candidate features from the at least one candidate feature according to a pre-sorting result to form a candidate feature pool; and (D2) re-ranking the importance of each candidate feature in the candidate feature pool, and selecting at least one candidate feature with higher importance from the candidate feature pool as an important feature according to the re-ranking result.

Wherein, in step (C2), pre-ordering is performed based on the first number of history data records; in step (D2), the reordering is performed based on a second number of the history data records, and the second number is not less than the first number. The second number of history data records includes the first number of history data records.

Wherein, in step (C2), candidate features with higher importance are screened from the at least one candidate feature according to the pre-sorting result to form a candidate feature pool.

Wherein, in step (C2), the pre-ordering is performed by: and aiming at each candidate feature, obtaining a pre-ranking single-feature machine learning model, and determining the importance of each candidate feature based on the effect of each pre-ranking single-feature machine learning model, wherein each candidate feature corresponds to the pre-ranking single-feature machine learning model. As an example, assume that there are N (N is an integer greater than 1) candidate features f_nWherein N is ∈ [1, N ∈ >]. Accordingly, the pre-ranking apparatus 300 may utilize at least a portion of the historical data records to construct N pre-ranked single-feature machine learning models (wherein each pre-ranked single-feature machine learning model is based on a respective single candidate feature f_nTo predict for machine learning problems), and then measure the effect of the N pre-ranked single feature machine learning models on the same test dataset (e.g., AUC (Receiver Operating Characteristic), Area Under ROC (Receiver Operating Characteristic), MAE (Mean Absolute Error), etc.), and determine the order of importance of each candidate feature based on the ranking of the effect.

Wherein, in the step (C2), the steps are as followsPre-ordering is performed: and aiming at each candidate feature, obtaining a pre-ranking overall machine learning model, and determining the importance of each candidate feature based on the effect of each pre-ranking overall machine learning model, wherein the pre-ranking overall machine learning model corresponds to the pre-ranking basic feature subset and each candidate feature. By way of example, the pre-ordered ensemble machine learning model herein may be a log-probability regression (LR) model; accordingly, the sample of the pre-ranked overall machine learning model consists of the pre-ranked subset of base features and each of the candidate features. Suppose there are N candidate features f_nAccordingly, at least a portion of the historical data records may be utilized to construct N pre-ranked overall machine learning models (where the sample features of each pre-ranked overall machine learning model include a fixed subset of pre-ranked base features and a corresponding candidate feature f)_n) The effects (e.g., AUC, MAE, etc.) of the N pre-ranked overall machine learning models on the same test data set are then measured, and an order of importance for each candidate feature is determined based on the ranking of the effects.

Wherein, in step (C2), the pre-ordering is performed by: and aiming at each candidate feature, obtaining a pre-ordering composite machine learning model, and determining the importance of each candidate feature based on the effect of each pre-ordering composite machine learning model, wherein the pre-ordering composite machine learning model comprises a pre-ordering basic sub-model and a pre-ordering additional sub-model based on a lifting frame, the pre-ordering basic sub-model corresponds to a pre-ordering basic feature subset, and the pre-ordering additional sub-model corresponds to each candidate feature. As an example, assume that there are N candidate features f_nAccordingly, at least a portion of the historical data record may be utilized to construct N pre-ordered composite machine learning models (wherein each pre-ordered composite machine learning model is based on a fixed subset of pre-ordered base features and a corresponding candidate feature f)_nPredict the machine learning problem according to a lifting framework), then measure the effects (e.g., AUC, MAE, etc.) of the N pre-ranked composite machine learning models on the same test data set, and determine the importance of each candidate feature based on the ranking of the effectsAnd (4) sequencing. Preferably, in order to further improve the operation efficiency and reduce the resource consumption, the pre-ordering apparatus 300 may be configured to individually target each candidate feature f under the condition of fixing the pre-ordering basic submodel_nAnd training the pre-ordering additional sub-models to construct each pre-ordering composite machine learning model.

Wherein the pre-ordered basic feature subset includes unit features individually represented by at least one of the plurality of attribute information itself, and the candidate features include combined features combined from the unit features.

Wherein, in step (D2), the reordering is performed by: and aiming at each candidate feature in the candidate feature pool, obtaining a re-ordering single-feature machine learning model, and determining the importance of each candidate feature based on the effect of each re-ordering single-feature machine learning model, wherein each candidate feature corresponds to the re-ordering single-feature machine learning model.

Wherein, in step (D2), the reordering is performed by: and aiming at each candidate feature in the candidate feature pool, obtaining a re-ordering overall machine learning model, and determining the importance of each candidate feature based on the effect of each re-ordering overall machine learning model, wherein the re-ordering composite machine learning model corresponds to the re-ordering basic feature subset and each candidate feature.

Wherein, in step (D2), the reordering is performed by: and aiming at each candidate feature in the candidate feature pool, obtaining a re-ordering composite machine learning model, and determining the importance of each candidate feature based on the effect of each re-ordering composite machine learning model, wherein the re-ordering composite machine learning model comprises a re-ordering basic sub-model and a re-ordering additional sub-model based on a lifting frame, the re-ordering basic sub-model corresponds to a re-ordering basic feature subset, and the re-ordering additional sub-model corresponds to each candidate feature.

Wherein the re-ordered basic feature subset includes unit features individually represented by at least one of the plurality of attribute information itself, and the candidate features include combined features combined from the unit features.

The method further comprises the following steps: (E2) checking whether the important features are suitable as features of a machine learning sample.

Wherein, in step (E2), it is verified whether or not the important feature is suitable as a feature of the machine learning sample using a change in effect of the machine learning model based on the unit feature individually represented by at least one of the plurality of attribute information itself after the important feature is introduced.

And in the case that the verification result is that the important feature is not suitable for being used as the feature of the machine learning sample, screening an additional part of candidate features from the at least one candidate feature according to the pre-sorting result to form a new candidate feature pool, and re-executing the steps (D2) and (E2).

In an exemplary embodiment of the present invention, the step (B2) of generating at least one candidate feature based on the plurality of attribute information is specifically: for at least a part of the attribute information of the historical data record, a corresponding continuous feature can be generated, wherein the continuous feature is a feature opposite to a discrete feature (for example, a category feature), and the value of the continuous feature can be a numerical value with certain continuity, such as distance, age, amount and the like. In contrast, as an example, the values of the discrete features do not have continuity, and may be the features of unordered classification such as "from beijing", "from shanghai", or "from tianjin", "sex is male", and "sex is female", for example. Some discrete value attribute information in the historical data record can be directly used as the corresponding discrete feature, or some attribute information (for example, continuous value attribute and/or discrete value attribute information) in the historical data record can be processed to obtain the corresponding discrete feature.

For example, some continuous value attribute information in the history data record can be directly used as the corresponding continuous feature, and for example, attribute information of distance, age, amount and the like can be directly used as the corresponding continuous feature. Alternatively, certain attribute information (e.g., continuous value attribute information and/or discrete value attribute information) in the history data record may be processed to obtain a corresponding continuous characteristic, for example, a ratio of height to weight as a corresponding continuous characteristic. The continuous value attribute information can be discretized and/or the discrete value attribute information can be serialized according to the needs, and the like, and the original or processed different attribute value information can be further operated or combined, and the like. Even further, any combination or operation between features may be performed, e.g. cartesian product combinations between discrete features may be performed.

According to one embodiment of the invention, the aforementioned automatic feature combination operator performs feature combination by at least one of the following ways:

the first way to do automatic feature combining: executing at least one binning operation aiming at each continuous feature in the sample to obtain binning group features consisting of at least one binning feature, wherein each binning operation corresponds to one binning feature; and generating combined features of the machine-learned samples by combining features between the binned features and/or other discrete features in the samples;

specifically, the method comprises the following steps: (A3) obtaining a data record, wherein the data record comprises a plurality of attribute information; (B3) executing at least one binning operation for each continuous feature generated based on the plurality of attribute information to obtain a binning group feature consisting of at least one binning feature, wherein each binning operation corresponds to one binning feature; and (C3) generating a combined feature of the machine-learned sample by feature combining between the binned features and/or other discrete features produced based on the plurality of attribute information.

Here, at least one binning operation may be performed, thereby enabling multiple discrete features characterizing certain attributes of the original data record from different angles, scales/layers to be obtained simultaneously. Here, the binning operation is a specific method of discretizing a continuous feature, that is, dividing a value range of the continuous feature into a plurality of sections (i.e., a plurality of bins), and determining a corresponding bin feature value based on the divided bins. Binning operations can be broadly divided into supervised binning and unsupervised binning, with each of these two types including some specific binning modes, e.g., supervised binning including minimum entropy binning, minimum description length binning, etc., and unsupervised binning including equal width binning, equal depth binning, k-means cluster-based binning, etc. In each binning mode, corresponding binning parameters, such as width, depth, etc., may be set. It should be noted that, according to the exemplary embodiment of the present invention, the binning operation performed by the binning group feature generating apparatus 200 is not limited to the kind of binning manner nor to the parameters of the binning operation, and the specific representation manner of the binning features produced accordingly is also not limited. Taking the unsupervised equal-width binning as an example, assuming that the value interval of the continuous feature is [0,100], and the corresponding binning parameter (i.e., width) is 50, 2 bins can be sorted, in which case the continuous feature with a value of 61.5 corresponds to the 2 nd bin, and if the two bins are numbered 0 and 1, the bin corresponding to the continuous feature is numbered 1. Alternatively, assuming a bin width of 10, 10 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 7 th bin, and if the ten bins are numbered 0 to 9, the consecutive feature corresponds to the bin numbered 6. Alternatively, assuming a bin width of 2, 50 bins may be separated, in which case a consecutive feature with a value of 61.5 corresponds to the 31 st bin, and if the fifty bins are numbered 0 to 49, the consecutive feature corresponds to the bin number of 30.

Before the step (B3), the method further includes: (D3) the at least one binning operation is selected from a predetermined number of binning operations such that the importance of the binning characteristics corresponding to the selected binning operation is not lower than the importance of the binning characteristics corresponding to unselected binning operations.

In step (D3), for each of the binning features corresponding to the predetermined number of binning operations, a single-feature machine learning model is constructed, an importance of each binning feature is determined based on an effect of each single-feature machine learning model, and the at least one binning operation is selected based on the importance of each binning feature, wherein a single-feature machine learning model corresponds to each binning feature.

Alternatively, in step (D3), for each bin feature among the bin features corresponding to the predetermined number of bin operations, a composite machine learning model is constructed, an importance of each bin feature is determined based on an effect of each composite machine learning model, and the at least one bin operation is selected based on the importance of each bin feature, wherein the composite machine learning model includes a basic sub-model and an additional sub-model based on a lifting framework, wherein the basic sub-model corresponds to a basic feature subset, and the additional sub-model corresponds to each bin feature. Wherein the combined features of the machine learning samples are generated in an iterative manner according to a search strategy for the combined features. Wherein step (D3) is performed for each iteration round to update the at least one binning operation, and the combined features generated in each iteration round are added as new discrete features to the basic feature subset. Wherein each composite machine learning model is constructed by separately training additional sub-models with the base sub-model fixed.

In step (C3), feature combinations according to a cartesian product are performed between the binned features and/or the other discrete features.

Wherein the at least one binning operation corresponds to an equal-width binning operation of different widths or an equal-depth binning operation of different depths, respectively. The different widths or depths numerically constitute an geometric series or an arithmetic series.

The binning feature indicates to which bin the consecutive features are binned according to the corresponding binning operation.

Each of the continuous features is formed by continuous-value attribute information itself among the plurality of attribute information, or each of the continuous features is formed by continuously transforming discrete-value attribute information among the plurality of attribute information. And the continuous transformation indication counts the values of the discrete value attribute information.

Wherein each composite machine learning model is constructed by separately training additional sub-models with the base sub-model fixed.

In embodiments of the present invention, a single discrete feature may be considered a first order feature, and according to exemplary embodiments of the present invention, higher order feature combinations, such as two, three, etc., may be performed until a predetermined cutoff condition is satisfied. As an example, the combined features of the machine-learned samples may be generated in an iterative manner according to a search strategy for the combined features.

Fig. 3 illustrates an example of a search tree for generating combined features according to an exemplary embodiment of the present invention. According to an exemplary embodiment of the invention, the search tree may be based on a heuristic search strategy such as a beam search, for example, where one layer of the search tree may correspond to a particular order of feature combinations.

Referring to fig. 3, it is assumed that the discrete features that can be combined include a feature a, a feature B, a feature C, a feature D, and a feature E, and as an example, the feature a, the feature B, and the feature C may be discrete features formed from discrete-value attribute information of data records itself, and the feature D and the feature E may be bin group features converted from continuous features.

According to the search strategy, in the first iteration, two nodes, namely a feature B and a feature E, which are first-order features, are selected, wherein the nodes can be sorted by taking feature importance and the like as indexes, and then a part of nodes are selected to continue to expand at the next layer.

In the next iteration, generating a feature BA, a feature BC, a feature BD, a feature BE, a feature EA, a feature EB, a feature EC and a feature ED which are second-order combined features based on the feature B and the feature E, and continuously selecting the feature BC and the feature EA based on the ranking index. As an example, feature BE and feature EB can BE considered the same combined feature.

The iteration continues in the manner described above until a certain cutoff condition, e.g., an order limit, is met. Here, the nodes (shown in solid lines) selected in each layer may be used as combined features for subsequent processing, e.g., as final adopted features or for further importance evaluation, while the remaining features (shown in dashed lines) are pruned.

Second way of performing automatic feature combination: and according to a heuristic search strategy, performing feature combination between at least one feature of the sample stage by stage to generate candidate combined features, wherein for each stage, a target combined feature is selected from the candidate combined feature set to be used as the combined feature of the machine learning sample.

Specifically, a method for generating combined features of machine learning samples in the present invention includes: (A4) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information; and (B4) performing feature combination between at least one feature generated based on the plurality of attribute information in accordance with a heuristic search strategy on a stage-by-stage basis to generate candidate combined features; wherein, for each stage, a target combined feature is selected from the candidate combined feature set as the combined feature of the machine learning sample. The heuristic search strategy herein is described with reference to FIG. 3 and will not be repeated herein.

Wherein the at least one feature is at least one discrete feature generated by processing at least one continuous value attribute information and/or discrete value attribute information among the plurality of attribute information; or, the at least one feature is at least one continuous feature generated by processing at least one continuous value attribute information and/or discrete value attribute information among the plurality of attribute information.

And under the heuristic search strategy, generating a candidate combined feature of the next stage by combining the target combined feature selected in the current stage with the at least one feature.

Under the heuristic search strategy, generating candidate combined features of the next stage by pairwise combination between the target combined features selected in the current stage and the previous stage.

Wherein the candidate combined feature set comprises candidate combined features generated in the current stage.

Wherein the candidate combined feature set comprises the candidate combined features generated in the current stage and all candidate combined features generated in the previous stage which are not selected as the target combined features.

Wherein the candidate combined feature set comprises the candidate combined features generated in the current stage and a part of the candidate combined features generated in the previous stage which are not selected as the target combined features. The part of candidate combined features are candidate combined features with higher importance among candidate combined features which are generated in a previous stage and are not selected as target combined features.

The target combination features are candidate combination features with higher importance in the candidate combination feature set.

The third way to perform automatic feature combination: acquiring unit features which can be combined in a sample; providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and combining the features to be combined in the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples.

Specifically, a method for generating combined features of machine learning samples in the present invention includes: (A5) acquiring unit features which can be combined; (B5) providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; (C5) receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; and (D5) combining the features to be combined among the unit features based on the acquired feature combination configuration items to generate combined features of the machine learning samples. Here, the unit feature is the smallest unit that can be combined.

Wherein the feature combination configuration item comprises at least one of: a feature configuration item for specifying features to be combined among the unit features so that the specified features to be combined are combined in step (D5); an evaluation index configuration item for specifying an evaluation index of the combined features so that the effects of the machine learning models corresponding to the various combined features are measured according to the specified evaluation index in step (D5) to determine the combination manner of the features to be combined; and a training parameter configuration item for specifying the training parameters of the machine learning model, so that the combination mode of the features to be combined is determined by measuring the effect of the machine learning model corresponding to the various combined features obtained under the specified training parameters in the step (D5). The feature combination configuration item further includes: a bucket operation configuration item for specifying one or more bucket operations to be performed on at least one continuous feature among the features to be combined, respectively, so that the specified one or more bucket operations are performed on the at least one continuous feature, respectively, to obtain corresponding one or more bucket features, and the obtained bucket features are combined with other features to be combined as a whole in step (D5). The bucket operation configuration item is used for respectively appointing one or more kinds of bucket operation aiming at each continuous characteristic; or the bucket operation configuration item is used for uniformly appointing one or more bucket operations aiming at all continuous characteristics.

As an example, a machine learning model corresponding to a particular combined feature may indicate that the sample of the machine learning model includes the particular combined feature. According to an exemplary embodiment of the present invention, when combining unit features, whether to adopt the combined features may be determined by measuring the effect of a machine learning model corresponding to the combined features. Here, the set evaluation index may be used to measure the effect of the machine learning model corresponding to various combined features, and if the evaluation index of a certain machine learning model is higher, the combined feature corresponding to the machine learning model is more easily determined as the combined feature of the machine learning sample. As an example, a training parameter configuration item may include a configuration item for one or more different training parameters. For example, the training parameter matching items can comprise learning rate configuration items and/or parameter adjusting times configuration items and the like. For each successive feature, each sub-bucket operation performed on it may result in one sub-bucket feature, and accordingly, a feature composed of all sub-bucket features may participate in automatic combination between features to be combined instead of the original successive features. As an example, the bucket operation configuration items may further include a bucket mode configuration item and/or a bucket parameter configuration item. The bucket dividing mode configuration item is used for appointing a bucket dividing mode used by the bucket dividing operation. The bucket parameter configuration item is used for specifying the bucket parameters of the bucket dividing mode. For example, the equal-width bucket dividing method or the equal-depth bucket dividing method can be specified by the bucket dividing method configuration item, and the number of buckets, the width of the buckets, the depth of the buckets and the like can be specified by the bucket dividing parameter configuration item. Here, the user may manually input or select the values of the bucket parameter configuration items, and in particular, may be prompted to set the respective widths/depths of the equal-width/equal-depth buckets in an equal-ratio or equal-difference relationship.

After the step (D5), the method further includes: (E5) the generated combined features are displayed to a user. In step (E5), the evaluation value of each combined feature with respect to the evaluation index is also displayed to the user.

After the step (D5), the method further includes: (F5) the generated combined features are directly applied to subsequent machine learning steps.

After step (E5), the method further comprises: (G5) the combined features selected by the user from the displayed combined features are applied to a subsequent machine learning step.

After the step (D5), the method further includes: (H5) the combination pattern of the combination features generated in the step (D5) is saved in the form of a configuration file.

After the step (G5), the method further includes: (I5) and saving the combination mode of the combination characteristics selected by the user in the step (G5) in the form of a configuration file.

In step (a5), the unit feature is obtained by performing feature processing on the attribute information of the data record.

According to an exemplary embodiment of the invention, a machine learning process may be performed in the form of a directed acyclic graph (DAG graph), which may encompass all or part of the steps for performing machine learning model training, testing, or prediction. For example, a DAG graph including historical data import steps, data splitting steps, feature extraction steps, automatic feature combination steps may be built for automatic feature combination. That is, the various steps described above may be performed as nodes in a DAG graph.

A fourth way of performing automatic feature combination: according to a search strategy, iteratively performing feature combination between at least one discrete feature of a sample to generate candidate combined features, and selecting a target combined feature from the generated candidate combined features as a combined feature; the method comprises the steps of conducting importance pre-sorting on each candidate combination feature in a candidate combination feature set aiming at each iteration, screening a part of candidate combination features from the candidate combination feature set according to a pre-sorting result to form a candidate combination feature pool, conducting importance re-sorting on each candidate combination feature in the candidate combination feature pool, and selecting at least one candidate combination feature with high importance from the candidate combination feature pool as a target combination feature according to a re-sorting result.

Specifically, a method of generating combined features of machine learning samples, comprising: (A6) acquiring a historical data record, wherein the historical data record comprises a plurality of attribute information; and (B6) iteratively performing feature combination between at least one discrete feature generated based on the plurality of attribute information to generate candidate combined features according to a search strategy, and selecting a target combined feature from the generated candidate combined features as a combined feature of the machine learning sample. The method comprises the steps of performing importance pre-sorting on each candidate combined feature in a candidate combined feature set aiming at each iteration; screening a part of candidate combination features from the candidate combination feature set according to a pre-sorting result to form a candidate combination feature pool; reordering the importance of each candidate combined feature in the candidate combined feature pool; and selecting at least one candidate combined feature with higher importance from the candidate combined feature pool as a target combined feature according to the re-ordering result. The search strategy herein is described with reference to fig. 3 and will not be repeated here.

The method comprises the steps of pre-sorting based on a first number of historical data records, re-sorting based on a second number of historical data records, and enabling the second number not to be less than the first number.

And screening out the candidate combination features with higher importance from the candidate combination feature set according to the pre-sorting result to form a candidate combination feature pool.

Wherein the candidate combined feature set comprises candidate combined features generated in the current iteration; alternatively, the candidate combined feature set includes candidate combined features generated in the current iteration and candidate combined features generated in the previous iteration that are not selected as target combined features.

Generating a candidate combined feature of a next iteration by combining the target combined feature selected in the current iteration with the at least one discrete feature; alternatively, candidate combined features for the next iteration are generated by pairwise combining between target combined features selected in the current iteration and the previous iteration.

Wherein the at least one discrete feature comprises a discrete feature converted from a continuous feature generated based on the plurality of attribute information by: for each continuous feature, at least one binning operation is performed to generate a discrete feature comprised of at least one binning feature, where each binning operation corresponds to a binning feature. The at least one binning operation is selected from a predetermined number of binning operations for each iteration of the round or for all iterations of the round, wherein the importance of the binning feature corresponding to the selected binning operation is not lower than the importance of the binning features corresponding to non-selected binning operations.

In particular, the at least one binning operation is selected by: the method comprises the steps of obtaining a binning single-feature machine learning model for each binning feature among the binning features corresponding to the predetermined number of binning operations, determining the importance of each binning feature based on the effect of each binning single-feature machine learning model, and selecting the at least one binning operation based on the importance of each binning feature, wherein the binning single-feature machine learning model corresponds to each binning feature. Alternatively, the at least one binning operation is selected by: the method comprises the steps of obtaining a total binning machine learning model for each binning feature in the binning features corresponding to the predetermined number of binning operations, determining the importance of each binning feature based on the effect of each total binning machine learning model, and selecting the at least one binning operation based on the importance of each binning feature, wherein the total binning machine learning model corresponds to a subset of the basic binning features and each binning feature. Still alternatively, the at least one binning operation is selected by: and obtaining a binning composite machine learning model for each binning feature in the binning features corresponding to the predetermined number of binning operations, determining the importance of each binning feature based on the effect of each binning composite machine learning model, and selecting the at least one binning operation based on the importance of each binning feature, wherein the binning composite machine learning model comprises a binning basic sub-model and a binning additional sub-model based on a lifting frame, wherein the binning basic sub-model corresponds to a binning basic feature subset, and the binning additional sub-model corresponds to each binning feature. Wherein the binned basic feature subset comprises the target combined features selected prior to the current round of iteration.

Wherein the pre-ordering is performed by: and aiming at each candidate combination feature in the candidate combination feature set, obtaining a pre-ranking single feature machine learning model, and determining the importance of each candidate combination feature based on the effect of each pre-ranking single feature machine learning model, wherein each candidate combination feature corresponds to the pre-ranking single feature machine learning model.

Wherein the pre-ordering is performed by: and aiming at each candidate combination feature in the candidate combination feature set, obtaining a pre-sorting overall machine learning model, and determining the importance of each candidate combination feature based on the effect of each pre-sorting overall machine learning model, wherein the pre-sorting overall machine learning model corresponds to the pre-sorting basic feature subset and each candidate combination feature.

Wherein the pre-ordering is performed by: the method comprises the steps of obtaining a pre-ordering composite machine learning model aiming at each candidate combination feature in a candidate combination feature set, and determining the importance of each candidate combination feature based on the effect of each pre-ordering composite machine learning model, wherein the pre-ordering composite machine learning model comprises a pre-ordering basic sub-model and a pre-ordering additional sub-model based on a lifting frame, the pre-ordering basic sub-model corresponds to a pre-ordering basic feature subset, and the pre-ordering additional sub-model corresponds to each candidate combination feature.

Wherein the pre-ordered base feature subset includes the target combined features selected prior to the current iteration.

Wherein the reordering is performed by: and aiming at each candidate combination feature in the candidate combination feature pool, obtaining a re-ordering single feature machine learning model, and determining the importance of each candidate combination feature based on the effect of each re-ordering single feature machine learning model, wherein each candidate combination feature corresponds to the re-ordering single feature machine learning model.

Wherein the reordering is performed by: and aiming at each candidate combination feature in the candidate combination feature pool, obtaining a re-ordering overall machine learning model, and determining the importance of each candidate combination feature based on the effect of each re-ordering overall machine learning model, wherein the re-ordering composite machine learning model corresponds to the re-ordering basic feature subset and each candidate combination feature.

Wherein the reordering is performed by: and aiming at each candidate combination feature in the candidate combination feature pool, obtaining a re-ordering composite machine learning model, and determining the importance of each candidate combination feature based on the effect of each re-ordering composite machine learning model, wherein the re-ordering composite machine learning model comprises a re-ordering basic sub-model and a re-ordering additional sub-model based on a lifting frame, the re-ordering basic sub-model corresponds to a re-ordering basic feature subset, and the re-ordering additional sub-model corresponds to each candidate combination feature.

Wherein the re-ordered base feature subset includes the target combined features selected prior to the current iteration.

Wherein the step (B6) further includes: for each iteration, it is checked whether the selected target combined features are suitable as combined features of the machine learning sample. In step (B6), it is checked whether the selected target combined feature is suitable as a combined feature of a machine learning sample using a change in effect of a machine learning model based on the target combined features that have passed the checking after the selected target combined feature is introduced. In the case that the selected target combined feature is suitable to be used as the combined feature of the machine learning sample, taking the selected target combined feature as the combined feature of the machine learning sample, and executing the next iteration; and screening out another part of candidate combined features from the candidate combined feature set according to a pre-sorting result to form a new candidate combined feature pool under the condition that the selected target combined feature is not suitable for being used as the combined feature of the machine learning sample.

Here, the importance of the binning feature may be automatically determined in any suitable manner.

For example, a binning single-feature machine learning model may be derived for each of the binning features corresponding to a predetermined number of binning operations, the importance of each binning feature being determined based on the effect of the respective binning single-feature machine learning model, and the at least one binning operation being selected based on the importance of each binning feature to which the binning single-feature machine learning model corresponds.

As an example, assume that for a continuous feature F, there are a predetermined number M (M is an integer greater than 1) of binning operations corresponding to M binned features F_mWherein M is [1, M ]]. Accordingly, at least a portion of the historical data records may be utilized to construct M binned single-feature machine learning models (wherein each binned single-feature machine learning model is based on a respective single binning feature f_mTo predict the machine learning problem) and then measure the effect of the M binned single-feature machine learning models on the same test dataset (e.g., AUC (receiver operating characteristic, Re)Receiver operating characteristics) Area Under the Curve, Area Under ROC Curve, MAE (mean absolute Error), etc.), and determines at least one binning operation to be finally performed based on the ordering of effects.

For another example, a binning overall machine learning model may be obtained for each of the binning features corresponding to the predetermined number of binning operations, the importance of each of the binning features being determined based on the effect of each of the binning overall machine learning models, and the at least one binning operation may be selected based on the importance of each of the binning features, wherein the binning overall machine learning model corresponds to a subset of the binning basic features and each of the binning features. As an example, the binned whole-body machine learning model here may be a log-probability regression (LR) model; accordingly, the samples of the binned whole-body machine learning model are composed of the binned basic feature subset and each of the binned features.

As an example, assume that for a continuous feature F, there are a predetermined number M of binning operations, corresponding to M binned features F_mAccordingly, at least a portion of the historical data records may be utilized to construct M binned whole machine learning models (where the sample features of each binned whole machine learning model include a fixed subset of binning basis features and corresponding binning features f_m) The effects (e.g., AUC, MAE, etc.) of the M binned whole machine learning models on the same test data set are then measured, and at least one binning operation that is ultimately performed is determined based on the ordering of the effects.

For another example, a binning composite machine learning model may be obtained for each of the binning features corresponding to the predetermined number of binning operations, wherein the binning composite machine learning model includes a binning basic sub-model and a binning additional sub-model based on a lifting framework (e.g., a gradient lifting framework), wherein the binning basic sub-model corresponds to a binning basic feature subset, and wherein the binning additional sub-model corresponds to each of the binning features, determine an importance of each of the binning features based on an effect of each of the binning composite machine learning models, and select the at least one binning operation based on the importance of each of the binning features.

As an example, assume that for a continuous feature F, there are a predetermined number M of binning operations, corresponding to M binned features F_mAccordingly, at least a portion of the historical data records may be utilized to construct M binned composite machine learning models (where each binned composite machine learning model is based on a fixed subset of binned base features and a corresponding binning feature f_mPredict for the machine learning problem according to the lifting framework), then measure the effects (e.g., AUC, MAE, etc.) of the M binned composite machine learning models on the same test data set, and determine at least one binning operation to be finally performed based on the ordering of the effects. Preferably, in order to further improve the operation efficiency and reduce the resource consumption, the basic sub-model is fixed by respectively aiming at each binning characteristic f_mAnd training the sub-box additional sub-models to construct each sub-box composite machine learning model.

According to an exemplary embodiment of the present invention, the binned basic feature subset may be fixedly applied to the binning basic sub-models in all relevant binning overall machine learning models or binning composite machine learning models, where the binning basic feature subset may be empty for a first iteration; alternatively, any feature generated based on the attribute information of the historical data record may be used as the basic binning feature, for example, a part of the attribute information or all the attribute information of the historical data record may be directly used as the basic binning feature. Further, as an example, actual machine learning issues may be considered, and relatively important or basic features may be determined as binning basic features based on evaluation or as specified by business personnel.

Here, any means of determining the importance of the features may be utilized to measure the importance of each candidate combined feature in the pool of candidate combined features.

For example, a re-ranked single-feature machine learning model may be derived for each candidate combined feature in the pool of candidate combined features, and the importance of each candidate combined feature may be determined based on the effect of each re-ranked single-feature machine learning model, where each candidate combined feature corresponds to the re-ranked single-feature machine learning model.

As an example, assume that the pool of candidate combined features includes 10 candidate combined features. Accordingly, at least a portion of the historical data records may be utilized to construct 10 re-ranked single-feature machine learning models (where each re-ranked single-feature machine learning model predicts a machine learning problem based on a respective single candidate combined feature), then measure the effect (e.g., AUC, MAE, etc.) of the 10 re-ranked single-feature machine learning models on the same test data set, and determine an order of importance for each candidate combined feature in the pool of candidate combined features based on the ordering of the effect.

For another example, a re-ranked overall machine learning model may be obtained for each candidate combined feature in the candidate combined feature pool, and the importance of each candidate combined feature may be determined based on the effect of each re-ranked overall machine learning model, where the re-ranked composite machine learning model corresponds to the re-ranked basic feature subset and each candidate combined feature. As an example, the re-ordered ensemble machine learning model herein may be an LR model; accordingly, the sample of the reordered overall machine learning model consists of the reordered base feature subset and said each candidate combined feature.

By way of example, assuming that the pool of candidate combined features includes 10 candidate combined features, accordingly, at least a portion of the historical data records may be utilized to construct 10 re-ranked overall machine learning models (where the sample features of each re-ranked overall machine learning model include a fixed re-ranked base feature subset and a corresponding candidate combined feature), then measure the effect (e.g., AUC, MAE, etc.) of the 10 re-ranked overall machine learning models on the same test data set, and determine an order of importance for each candidate combined feature among the pool of candidate combined features based on the ranking of the effect.

For another example, a re-ordered composite machine learning model may be obtained for each candidate combined feature in the pool of candidate combined features, and the importance of each candidate combined feature may be determined based on the effect of each re-ordered composite machine learning model, where the re-ordered composite machine learning model includes a re-ordered base sub-model and a re-ordered additional sub-model based on a lifting framework (e.g., a gradient lifting framework), where the re-ordered base sub-model corresponds to a subset of re-ordered base features, and the re-ordered additional sub-model corresponds to each candidate combined feature.

As an example, assuming that the pool of candidate combined features includes 10 candidate combined features, accordingly, at least a portion of the historical data records may be utilized to construct 10 re-ordered composite machine learning models (where each re-ordered composite machine learning model predicts a machine learning problem according to a lifting framework based on a fixed re-ordered base feature subset and the corresponding candidate combined features), then measure the effects (e.g., AUC, MAE, etc.) of the 10 re-ordered composite machine learning models on the same test data set, and determine an order of importance for each candidate combined feature among the pool of candidate combined features based on the ordering of the effects. Preferably, in order to further improve the operation efficiency and reduce the resource consumption, the respective re-ordering composite machine learning model is constructed by training a re-ordering additional sub-model for each candidate combination feature respectively under the condition of fixing the re-ordering basic sub-model.

Fifth way of performing automatic feature combination: screening a plurality of key unit characteristics from the characteristics of the sample; obtaining at least one combined feature from the plurality of key unit features by using an automatic feature combination algorithm, wherein each combined feature is formed by combining corresponding partial key unit features in the plurality of key unit features; and taking the obtained at least one combined feature as an automatically generated combined feature.

Specifically, a method for automatically generating combined features comprises the following steps: a configuration feature extraction step, wherein the feature extraction step is used for performing feature extraction processing according to a plurality of unit features for the attribute fields of each data record in the input data set; configuring an automatic feature combination step, wherein the automatic feature combination step is used for obtaining at least one combination feature by utilizing an automatic feature combination algorithm based on a feature extraction processing result; and running the configured feature extraction step and the automatic feature combination step, and taking the obtained at least one combined feature as an automatically generated combined feature.

In the embodiment of the present invention, in addition to the above-described manner of obtaining the unit feature based on the feature processing on the attribute field of the data record, the attribute field may be directly used as the unit feature. In the method for automatically generating the combined features, the combined features are obtained by operating the pre-configured feature extraction step and the automatic feature combination step, so that when the combined features are obtained, the features can be automatically combined even though a technician does not deeply understand a service scene or does not have rich industrial practical experience, the use threshold of the feature engineering is reduced, and the usability of the feature engineering is improved.

Wherein the automatic feature combining step is configured to include: screening a plurality of key unit characteristics from the characteristic extraction processing result; and obtaining at least one combined feature from the plurality of key unit features by using an automatic feature combination algorithm, wherein each combined feature is formed by combining corresponding partial key unit features in the plurality of key unit features. And screening a plurality of key unit features from the feature extraction processing result according to the feature importance, the feature relevance and/or the feature filling rate.

As an example, feature importance may be determined based on the effect of a machine learning model. For example, a machine learning model corresponding to each of a plurality of unit features obtained after the feature extraction process may be established (for example, a sample of the machine learning model includes a fixed feature portion and an additional feature portion, where the additional feature portion is for each unit feature), a plurality of unit features having high feature importance may be determined based on an effect of the machine learning model (for example, all unit features are sorted in descending order according to feature importance, and a predetermined number of unit features are located before the unit features), and the plurality of unit features having high feature importance may be set as a plurality of key unit features. In addition, feature relevance and feature fill rate may also be determined based on various statistical methods or data characteristics of the features themselves.

The automatic feature combination algorithm is used for generating various candidate combination features in a traversal mode, the importance of each candidate combination feature is measured based on the effect of the machine learning model, and at least one candidate combination feature with high importance is determined to be a combination feature. For example, each candidate combined feature may be used as a distinguishing input of the machine learning model, the importance of each candidate combined feature may be determined based on the effect of the machine learning model, each candidate combined feature may be sorted in descending order of importance, and candidate combined features that are a predetermined number of the past may be determined as combined features.

Wherein the automatic feature combining step is configured to include: and executing a plurality of processing flows corresponding to the automatic feature combination algorithm in parallel to obtain the at least one combined feature based on the feature extraction processing result.

Wherein the automatic feature combining step is configured to include: a plurality of processing flows corresponding to the automatic feature combination algorithm are executed in parallel based on the feature extraction processing result corresponding to each subset of the data set to obtain a combined feature corresponding to each subset. When a plurality of processing flows corresponding to the automatic feature combination algorithm are executed in parallel, there may be repetition of the combined features obtained thereby, for which case the automatic feature combination step is configured to further include: and performing deduplication processing on the combined features corresponding to all the subsets, and taking the combined features obtained after the deduplication processing as the at least one combined feature. The feature extraction step corresponds to a feature extraction node in a directed acyclic graph representing a machine learning process, and the automatic feature combination step corresponds to an automatic feature combination node in the directed acyclic graph. Configuring the automatic feature combining step with a configuration item of the automatic feature combining node.

Wherein the configuration item of the automatic feature combination node includes an option switch as to whether to turn on a key feature filtering function, wherein in a case where the option switch is turned on by a user, the automatic feature combination step is configured to include: screening a plurality of key unit characteristics from the characteristic extraction processing result; and obtaining at least one combined feature from the plurality of key unit features by using an automatic feature combination algorithm, wherein each combined feature is formed by combining corresponding partial key unit features in the plurality of key unit features.

Wherein the configuration items of the automatic feature combination nodes comprise parallel operation configuration items related to executing a plurality of processing flows corresponding to the automatic feature combination algorithm in parallel, wherein the parallel operation configuration items relate to at least one of: the number of the parallel executed processing flows and the hyper-parameters when the machine learning model is trained in the automatic feature combination algorithm corresponding to each processing flow. The parallel operation configuration item further relates to at least one of: the number of subsets of the data set and the data record extraction rule corresponding to each subset. The parallel operation configuration item has a default configuration value and/or a manual configuration value. The default configuration values of the parallel operation configuration items related to the hyper-parameters enable machine learning models trained in automatic feature combination algorithms corresponding to different processing flows to have substantial differences. The default configuration values of the parallel operation configuration items related to the hyper-parameters enable the hyper-parameters of the training machine learning model in the automatic feature combination algorithm corresponding to different processing flows to have differences. The hyper-parameters comprise learning rates, and the default configuration values of the parallel operation configuration items related to the learning rates enable the hyper-parameters of the training machine learning model in the automatic feature combination algorithm corresponding to different processing flows to present a ladder-type increasing trend.

Wherein the configuration item of the automatic feature combination node includes an option switch as to whether to turn on a deduplication function, wherein in a case where the option switch is turned on by a user, the automatic feature combination step is configured to further include: and performing deduplication processing on the combined features corresponding to all the subsets, and taking the combined features obtained after the deduplication processing as the at least one combined feature.

According to an embodiment of the present invention, the aforementioned automatic parameter adjusting operator performs automatic parameter adjustment by any one of the following methods:

the first automatic parameter adjusting mode comprises the following steps: the following steps are executed in each iteration process: determining currently available resources; respectively scoring a plurality of super-parameter tuning strategies, and distributing currently available resources to the super-parameter tuning strategies according to scoring results, wherein each super-parameter tuning strategy is used for selecting a super-parameter combination for the machine learning model based on a corresponding super-parameter selection strategy; and acquiring one or more hyper-parameter combinations generated by each hyper-parameter tuning strategy allocated to the resources based on the allocated resources.

Specifically, the method integrates a plurality of hyper-parameter tuning strategies, each hyper-parameter tuning strategy is used for selecting a hyper-parameter combination for the machine learning model based on a corresponding hyper-parameter selection strategy, and each iteration process of the method comprises the following steps: determining currently available resources; allocating currently available resources for the plurality of hyper-parameter tuning strategies; and acquiring one or more hyper-parameter combinations generated by each hyper-parameter tuning strategy allocated to the resources based on the allocated resources.

The currently available resources, that is, the resources to be allocated in the current round. The resources mentioned in the invention can be various types of resources, such as computing resources of the number of CPUs (central processing units), the number of CPU cores and the like, time resources representing working duration, and task resources representing the number of tasks. In the invention, the number of tasks is also the number of the hyper-parameter combinations which need to be generated, and each hyper-parameter combination which needs to be generated can be regarded as one task. As one example, the currently available resources may include computing resources available for the current round. The computing resources may include, among other things, computing power resources and time resources. For example, "10 CPU cores" may be used as the currently available resource, and "10 CPU cores operate for 2 hours" may also be used as the currently available resource. As another example, the currently available resources may include the number of hyper-parameter combinations that need to be generated for the current round. The number of the current hyper-parameter combinations that need to be generated may be determined according to the currently available computing resources, for example, when the currently available computing resources are more, a larger number of hyper-parameter combinations that need to be generated may be determined, and when the currently available computing resources are less, a smaller number of hyper-parameter combinations that need to be generated may be determined. Therefore, when the number of the hyper-parameter combinations which need to be generated in the current round is allocated, the allocation of the computing resources is also essential.

Each hyper-parameter tuning strategy is used for selecting a hyper-parameter combination for the machine learning model based on the corresponding hyper-parameter selection strategy. The hyper-parameter tuning strategy can be the existing hyper-parameter tuning scheme, such as various hyper-parameter tuning strategies of random search, grid search, evolutionary algorithm, Bayesian optimization and the like. Wherein the plurality of hyper-parameter tuning strategies may comprise one or more non-model-oriented search strategies and/or one or more model-oriented strategies. The non-model-oriented search strategy is used for selecting a hyper-parameter combination from a hyper-parameter search space based on a preset search mode (such as random search, grid search, evolutionary algorithm and the like), wherein the hyper-parameter search space refers to possible value spaces of all hyper-parameters. The model-oriented strategy is used for selecting the hyper-parameter combinations based on a prediction model, wherein the prediction model can be obtained by training based on at least part of the hyper-parameter combinations generated in an iterative process. Alternatively, the model-oriented strategy may be a hyper-parametric optimization algorithm such as bayesian optimization, Tree-structured park Estimator (TPE), and the like.

The plurality of hyper-parameter tuning strategies comprises: one or more non-model-directed search strategies for selecting a hyper-parametric combination from within a hyper-parametric search space based on a predetermined search pattern; and/or one or more model-oriented strategies for selecting a hyper-parametric combination based on a predictive model, wherein the predictive model is trained based on at least part of the hyper-parametric combinations generated in an iterative process.

Wherein the allocating currently available resources for the plurality of hyper-parameter tuning policies comprises: averagely distributing the currently available resources to the plurality of hyper-parameter tuning strategies; or, the currently available resources are allocated to the plurality of hyper-parameter tuning strategies according to a preset proportion.

Or, the allocating currently available resources for the multiple hyper-parameter tuning policies includes: respectively scoring the multiple hyper-parameter tuning strategies; and distributing currently available resources for the multiple super-parameter tuning strategies according to the grading result. Wherein when the plurality of hyper-parameter tuning strategies comprises one or more model-oriented strategies, during each iteration, the method further comprises: and obtaining evaluation indexes corresponding to one or more hyper-parameter combinations generated in the iteration process, and adding the one or more hyper-parameter combinations and the evaluation indexes thereof into a hyper-parameter combination sample set of the machine learning model. The method further comprises the following steps: and performing model training by using at least part of hyper-parameters in the current hyper-parameter combination sample set of the machine learning model as training samples in the model guide strategy distributed to the resources in the round so as to obtain the prediction model.

Wherein, respectively scoring the multiple hyper-parameter tuning strategies comprises at least one of the following: scoring the multiple super-parameter tuning strategies according to the availability of each super-parameter tuning strategy; scoring the multiple super-parameter tuning strategies according to the confidence coefficient of each super-parameter tuning strategy; and respectively scoring the multiple super-parameter tuning strategies according to evaluation indexes of super-parameter combinations generated by the super-parameter tuning strategies in one or more previous iterations.

Wherein the scoring the plurality of super-parameter tuning strategies respectively according to the availability of each super-parameter tuning strategy comprises: the availability of the non-model-oriented search strategy is a fixed constant, the availability of the model-oriented search strategy is zero when the number of the hyper-parameter combinations generated in the iteration process is less than or equal to a preset threshold, and the availability of the model-oriented search strategy is in direct proportion to the number of the hyper-parameter combinations generated in the iteration process when the number of the hyper-parameter combinations generated in the iteration process is greater than the preset threshold.

Wherein the scoring the plurality of hyper-parameter tuning strategies respectively according to the confidence of each hyper-parameter tuning strategy comprises: the confidence coefficient of the non-model-oriented search strategy is a fixed constant; dividing the hyper-parameter combination generated in the iterative process into at least one pair of training sets and test sets, calculating the score of each model-oriented strategy under each pair of training sets and test sets, averaging the scores and then normalizing to obtain the confidence coefficient of each model-oriented strategy.

Wherein the scoring the plurality of hyper-parameter tuning strategies respectively according to the evaluation index of the hyper-parameter combination generated by each hyper-parameter tuning strategy in one or more previous rounds comprises: and respectively scoring the multiple hyper-parameter tuning strategies according to the average ranking of the evaluation indexes of the hyper-parameter combinations generated by each hyper-parameter tuning strategy in one or more previous iterations in all the generated hyper-parameter combinations, wherein the scoring result is in direct proportion to the average ranking.

Wherein allocating currently available resources for the plurality of hyper-parameter tuning strategies according to the scoring results comprises: determining a probability value of each hyper-parameter tuning strategy according to a grading result, wherein the probability value is in direct proportion to the grading result; dividing the currently available resources into a plurality of shares; and sampling the plurality of super-parameter tuning strategies for a plurality of times based on the probability values to determine the super-parameter tuning strategy to which each resource belongs.

Wherein the currently available resources include: the number of the hyper-parameter combinations required to be generated in the current round; or, the computing resources available for the current round.

The method further comprises the following steps: and when the iteration termination condition is met, selecting a hyper-parameter combination with the optimal evaluation index from at least part of hyper-parameter combinations generated in the iteration process as the final hyper-parameter combination of the machine learning model. For example, in a case where the degree of improvement of the evaluation index of the hyper-parameter combination generated in a predetermined number of consecutive rounds is smaller than a predetermined threshold, the iterative process is terminated; or, under the condition that the generated evaluation index of the hyper-parameter combination reaches a preset target, terminating the iteration process; alternatively, the iterative process is terminated in the event that the consumed resources exceed a predetermined resource threshold.

Specifically, in this embodiment, when a plurality of super parameter tuning strategies are respectively evaluated, the evaluation is mainly performed according to the state and the historical good and bad conditions of each super parameter tuning strategy in the current round. By way of example, in scoring a hyper-parameter tuning strategy, reference may be made to any one or more of the following three dimensions.

Dimension 1, availability of hyper-parameter tuning strategy

The availability of the hyper-parametric tuning strategy is used to characterize the availability of the hyper-parametric tuning strategy to be able to select a combination of hyper-parameters for the machine learning model. Taking the example that the super-parameter tuning strategy is divided into a non-model-oriented search strategy and a model-oriented strategy:

the non-model-oriented search strategy is always available in the process of selecting the hyper-parameter combination for the machine learning model, and does not depend on the hyper-parameter combination generated in the iterative process. Thus, the availability of the non-model-directed search strategy may be a fixed constant, such as may be 1;

the model-oriented strategy selects a hyper-parametric combination for the machine learning model based on the prediction model, and the generation of the prediction model relies on the hyper-parametric combination generated in an iterative process. The number of the hyper-parameter combinations generated in the initial iteration process is small, and if the number of the hyper-parameter combinations is smaller than the minimum value of the hyper-parameter combinations required by the training of the prediction model, the prediction model cannot be trained, and at the moment, the model guide strategy is unavailable. When the number of the hyper-parameter combinations generated in the iteration process is larger than the minimum value of the hyper-parameter combinations required by the training of the prediction model, the model guiding strategy is available, and the greater the number of the hyper-parameter combinations is, the better the effect of the trained prediction model is, and the stronger the usability of the model guiding strategy is.

Thus, the availability of model-oriented strategies is related to the number of hyper-parameter combinations generated in the iterative process. Specifically, when the number of the hyper-parameter combinations generated in the iterative process is less than or equal to a preset threshold, the availability of the model-oriented strategy is 0. And when the number of the hyper-parameter combinations generated in the iterative process is greater than a preset threshold value, the availability of the model guide strategy is greater than zero, and the availability of the model guide strategy is in direct proportion to the number of the hyper-parameter combinations generated in the iterative process. The preset threshold may be a minimum value of a hyper-parameter combination required for training a prediction model of a model-oriented strategy, for example, a model-oriented strategy TPE (Tree-oriented parameterized Estimator) needs at least 10 groups of hyper-parameter combinations after evaluation to start constructing a model, so the preset threshold corresponding to the TPE may be set to 10.

When the multiple super-parameter tuning strategies are respectively evaluated based on the availability of the super-parameter tuning strategies, the higher the availability of the super-parameter tuning strategies is, the higher the evaluation thereof is. For example, the availability of a hyper-parametric tuning strategy may be taken as a score in that dimension.

As an example, when a super-parameter tuning strategy i is rated based on availability:

if the hyper-parameter tuning strategy i is a non-model-oriented search strategy, the score may be recorded as a fixed constant 1, such as:

if the hyper-parameter tuning strategy i is a model-oriented strategy, the score may be written as:

wherein the content of the first and second substances,

the score of the hyper-parameter tuning strategy i under the dimension 1 is represented, D is a hyper-parameter sample set, | D | is the number of hyper-parameter combinations in the hyper-parameter sample set, and a function

Is a monotonically increasing function of | D |. The expression means that if the model-oriented strategy requires at least M_iThe model can be constructed only by setting hyper-parameters, so when | D | < M_iWhen the score is negative infinity, the probability value which is equal to the final super-parameter tuning strategy is 0, and when | D | ≧ M_iThe availability of model-oriented strategies may follow the hyperparametric sample setIs increased by a monotonically increasing function f_i ¹(| D |) decision. Monotonically increasing function f_i ¹The specific form of (| D |) can be set according to actual conditions, for example, f_i ¹(|D|)＝|D|^dD is a number greater than 0, such as 0.5.

Confidence of dimension 2 and hyper-parameter tuning strategy

The confidence of the super-parameter tuning strategy is used for representing the credibility of the super-parameter tuning strategy for selecting the super-parameter combination for the machine learning model, namely the effect of the super-parameter tuning strategy. Taking the example that the super-parameter tuning strategy is divided into a non-model-oriented search strategy and a model-oriented strategy: the confidence of the non-model-directed search strategy may be considered a fixed constant, such as may be 1; the model-oriented strategy is to select a hyper-parametric combination for the machine learning model based on the prediction model, with a confidence level dependent on the model effect of the prediction model. Thus, the confidence of the model-guided strategy can be determined by evaluating the model effect of the predictive model.

As an example, the hyper-parameter combinations generated in the iterative process may be divided into at least one pair of training sets and test sets, for example, the hyper-parameter combinations generated in the iterative process may be divided based on a cross validation manner to obtain multiple pairs of training sets and test sets. For convenience of description, in the case that the hyper-parameter combinations generated in the iterative process are 10 groups, for example, the [ 1-9 ] th group hyper-parameter combinations may be used as a training set, the [ 10 ] th group hyper-parameter combinations may be used as a test set, the [ 1-8, 10 ] th group hyper-parameter combinations may be used as a training set, the [ 9 ] th group hyper-parameter combinations may be used as a test set, the [ 1-7, 9-10 ] th group hyper-parameter combinations may be used as a training set, the [ 8] th group hyper-parameter combinations may be used as a test set, and so on, 10 pairs of training sets and test sets may be obtained. The score for each model-oriented strategy (i.e., the predictive model of the model-oriented strategy) can then be calculated for each pair of training and test sets. Here, the predictive model may be trained based on a training set and then validated against a test set to derive a score for the model-directed strategy under the pair of training and test sets. And finally, after averaging the scores, normalizing the scores to a range of 0,1 so as to obtain the confidence coefficient of each model-oriented strategy.

When the multiple super-parameter tuning strategies are respectively evaluated based on the confidence degrees of the super-parameter tuning strategies, the higher the confidence degree of the super-parameter tuning strategies is, the higher the evaluation degree thereof is. For example, the confidence of the hyper-parametric tuning strategy may be taken as the score in that dimension.

As an example, when scoring the hyper-parameter tuning strategy i based on confidence, if the hyper-parameter tuning strategy i is a non-model-oriented search strategy, the score may be scored as a fixed constant of 1, such as may be scored as a fixed constant of 1

If the hyper-parameter tuning strategy i is a model-oriented strategy, the confidence coefficient calculated in the above manner can be used as the score of the hyper-parameter tuning strategy i to obtain the score

Represents the score of the hyper-parameter tuning strategy i in dimension 2.

Dimension 3, evaluation index of hyper-parameter combination generated by each hyper-parameter tuning strategy in previous iteration or iterations

When different machine learning models are faced, the effect of each super-parameter tuning strategy has certain difference, and the accuracy and the robustness of the scoring result obtained by aiming at the super-parameter tuning strategy are improved. The evaluation indexes of the hyper-parameter combinations generated by each hyper-parameter tuning strategy in one or more previous iterations can be monitored in real time, and the hyper-parameter tuning strategies are scored according to the evaluation indexes of the hyper-parameter combinations generated in one or more previous iterations.

As an example, the plurality of super-parameter tuning strategies may be respectively scored according to an average ranking of evaluation indexes of the super-parameter combinations generated by each super-parameter tuning strategy in one or more previous iterations in all the generated super-parameter combinations, wherein a scoring result is proportional to the average ranking.

For example, the ranking of the hyper-parameter combinations generated by the hyper-parameter tuning strategy i in all the generated hyper-parameter combinations can be calculated according to the evaluation indexes of the hyper-parameter combinations, and then the quantile (quantile) is calculated according to the ranking, the higher the quantile before the ranking is, and the average value of the quantiles obtained through calculation is used as the score of the hyper-parameter tuning strategy i. For another example, the average ranking of the hyper-parameter combinations generated by the hyper-parameter tuning strategy i in all the generated hyper-parameter combinations may be calculated according to the evaluation indexes of the hyper-parameter combinations, and then a quantile (quantile) is calculated according to the average ranking, and the higher the quantile is before the average ranking, the higher the quantile is, and the calculated quantile is used as the score of the hyper-parameter tuning strategy i. The quantile is proportional to the rank, and the higher the rank is, the larger the quantile is. The score based on dimension 3 can be scored as

In summary, for the super-parameter tuning strategy i, the scores of the super-parameter tuning strategy i in the one or more dimensions may be calculated, and then the final score of the super-parameter tuning strategy i is obtained according to the scores of the super-parameter tuning strategy i in the one or more dimensions. The final score may be calculated in various ways, such as summing, multiplying, weighting, and summing. As an example, when calculating the score of the hyper-parametric tuning strategy i based on the above three dimensions, the final score of the hyper-parametric tuning strategy i may be recorded as

The scoring result can represent the quality condition of each super-parameter tuning strategy in the current turn, so that currently available resources can be allocated to the super-parameter tuning strategies according to the scoring result. As an example, the scoring result may be used to characterize the probability of allocating resources for the super-parameter tuning strategy, and the higher the score of the super-parameter tuning strategy, the higher the probability of allocating resources for the super-parameter tuning strategy is indicated. For example, the probability value of each super-parameter tuning strategy can be determined according to the scoring result, the currently available resources are divided into multiple parts, and the multiple super-parameter tuning strategies are sampled for multiple times based on the probability values to determine the super-parameter tuning strategy to which each resource belongs. Taking four hyper-parameter tuning strategies in total, taking the example that the probability value corresponding to the hyper-parameter tuning strategy 1 is 0.2, the probability value corresponding to the hyper-parameter tuning strategy 2 is 0.8, the probability value corresponding to the hyper-parameter tuning strategy 3 is 0.6 and the probability value corresponding to the hyper-parameter tuning strategy 4 is 0.5, for each resource, a hyper-parameter tuning strategy is sampled from the four hyper-parameter tuning strategies according to the probability value corresponding to each hyper-parameter tuning strategy, and the current resource is allocated to the sampled hyper-parameter tuning strategy.

The probability value of the hyper-parameter tuning strategy is in direct proportion to the scoring result. The final score of the super-parameter tuning strategy i is recorded as

For example, the corresponding probability value may be expressed as:

wherein q is_i(D) And N represents the number of the super-parameter tuning strategies.

When the currently available resources are divided into multiple parts, the division may be performed according to multiple division criteria, and the specific division manner of the resources may be set according to actual situations, which is only exemplified here.

For example, the resource may be divided into a plurality of parts according to the operating time length, and if the currently available resource is "10 CPU cores operate for 1 day", the resource may be divided into 24 parts, and each part of the resource is "10 CPU cores operate for 1 hour". For another example, the resource may be divided according to physical resources such as the number of CPU cores, and if the currently available resource is "10 CPU cores work for 1 day", the resource may be divided into 10 resources, and each resource is "1 CPU core works for 1 day".

For example, when the currently available resource is "3 hyper-parameter combinations need to be generated", 3 resources may be divided, and each resource is "1 hyper-parameter combination need to be generated". Wherein each resource can be regarded as a task. The number of the current hyper-parameter combinations that need to be generated can be determined according to the currently available computing resources. For example, when the currently available computing resources are more, a greater number of hyper-parameter combinations that need to be generated may be determined, and when the currently available computing resources are less, a fewer number of hyper-parameter combinations that need to be generated may be determined. Therefore, when the number of the hyper-parameter combinations which need to be generated in the current round is allocated, the allocation of the computing resources is also essential.

After the currently available resources are allocated to the plurality of hyper-parameter tuning strategies according to the scoring results, each hyper-parameter tuning strategy allocated to the resources may generate one or more hyper-parameter combinations based on the allocated resources, respectively. The hyper-parameter tuning strategy allocated to the resources can select a hyper-parameter combination for the machine learning model based on the corresponding hyper-parameter selection strategy to generate one or more hyper-parameter combinations. The generation process of the hyper-parameter combination is not described in detail.

It should be noted that the machine learning model hyper-parameter optimization process supports parallel computing. For example, during the optimization process, the hyper-parameter tuning strategies assigned to the resources may be run in parallel to provide multiple sets of hyper-parameter combinations simultaneously. Thereby, the optimization rate can be greatly improved.

Considering that when a single strategy is used for the hyper-parameter optimization of the machine learning model, the risk that some scenes have poor effects or converge to local optimum inevitably exists, the embodiment provides that a plurality of hyper-parameter optimization strategies can be used simultaneously in the hyper-parameter optimization of the machine learning model. In addition, in consideration of the limited resources that can be used in the hyper-parameter optimization process, the present embodiment further provides a resource scheduling scheme in the hyper-parameter optimization process of the machine learning model, and in each iteration process, currently available resources are allocated to a plurality of hyper-parameter tuning strategies according to the states and historical good and bad conditions of the plurality of hyper-parameter tuning strategies in the current round. The resource scheduling scheme for simulating the advantages and the disadvantages can ensure that the best-effect super-parameter tuning strategy combination is used in the whole machine learning model super-parameter optimization process, so that the convergence efficiency of parameter tuning can be effectively accelerated under the condition of limited resources, and the super-parameter optimization effect is improved.

The second automatic parameter adjusting mode comprises the following steps: in the competition stage, under the condition of a plurality of competition hyper-parameter combinations, respectively training corresponding competition models according to a machine learning algorithm to obtain competition models with the best effect, and taking the obtained competition models and the corresponding competition hyper-parameter combinations thereof as a win model and a win hyper-parameter combination to enter a growth stage; in the growth stage, under the condition of the win-win super-parameter combination obtained in the competition stage of the current round, continuously training the win model obtained in the competition stage of the current round, obtaining the effect of the win model, if the effect of the win model indicates that the model effect appears and stops growing, restarting the competition stage to continuously train the updated competition model under the condition of a plurality of updated competition super-parameter combinations according to the machine learning algorithm, otherwise, continuously training the win model, and repeating the process until the preset termination condition is met; wherein, a plurality of updated competition super parameter combinations are obtained based on the winning super parameter combination of the previous growth stage, and the updated competition models are all the winning models obtained in the previous growth stage.

In the competition stage, the training of the corresponding competition models according to the machine learning algorithm under the combination of the plurality of competition hyper-parameters to obtain the competition model with the best effect includes: when each competition training step in a plurality of competition training steps in the competition phase is finished, respectively obtaining the effect of a competition model trained under each competition hyper-parameter combination; adjusting the competition hyper-parameter combination entering the next competition training step and the corresponding competition model based on the obtained effect of each competition model, and obtaining the competition model with the best effect when the last competition training step is finished; wherein at least one gradient update of the competition model is performed based on a first predetermined number of training samples at said each competition training step.

Wherein, the adjusting the competition hyper-parameter combination entering the next competition training step and the corresponding competition model based on the obtained effect of each competition model, and obtaining the competition model with the best effect when the last competition training step is finished comprises: and when the obtained effect of each competition model indicates that the competition model is not in a stop state, if the current competition training step is not the last competition training step, obtaining a competition hyperparameter combination entering the next competition training step and a corresponding competition model thereof, and if the current competition training step is the last competition training step, obtaining the competition model with the best effect.

Wherein, the obtaining of the competition hyper-parameter combination entering the next competition training step and the corresponding competition model thereof comprises: removing a second preset number of competition models with the worst effect to obtain a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof; or

Replacing the second preset number of competition models with the worst effect by the third preset number of competition models with the best effect, and carrying out fine adjustment on the respective competition hyper-parameter combinations of the replaced third preset number of competition models with the best effect so as to obtain the competition hyper-parameter combination entering the next competition training step and the corresponding competition models thereof; or

Randomly removing a second preset number of competition models to obtain a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof; or

Replacing the randomly selected second preset number of competition models with the third preset number of competition models with the best effect, and carrying out fine adjustment on the respective hyper-parameter combinations of the third preset number of competition models with the best effect so as to obtain competition entering the next competition training step; or

Removing the second preset number of competition models with the longest existence time, thereby obtaining a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof; or

Replacing the second preset number of competition models with the longest existing time by the third preset number of competition models with the best effect, and carrying out fine adjustment on the hyper-parameter combinations of the third preset number of competition models with the best effect so as to obtain competition entering the next competition training step; or

And directly taking the competition hyperparameter combination of the current competition training step and the corresponding competition model thereof as the competition hyperparameter combination and the corresponding competition model of the next competition training step.

Wherein the second predetermined number is greater than or equal to a third predetermined number; and/or the second predetermined number is set to a fixed value or a regularly varying value for each competitive training step.

Wherein the method further comprises obtaining the number of the competitive training steps, including: and acquiring the number of the competitive training steps according to the number of the competitive models, the total number of the training samples, the maximum iteration number of the training samples and the number of the training samples which can be trained in each competitive training step.

Wherein, in the growth stage, under the surpassing parameter combination of excelling that this round of competition phase obtained, continue training the surpassing model that this round of competition phase obtained to obtain the effect of the surpassing model, include: in the growth stage, under the condition of the win-win super-parameter combination obtained in the competition stage of the round, continuing to train the win model obtained in the competition stage of the round according to the growth training steps, and obtaining the effect of the win model obtained in each growth training step; wherein at least one gradient update of the winning model is performed based on a fourth predetermined number of training samples at said each growing training step.

The method further comprises the following steps: determining whether the effect of the win model indicates that the effect of the model stops growing based on the effect of the win model obtained in the fifth predetermined number of growth training steps; the determining whether the effect of the winning model indicates that the effect of the model stops growing based on the effect of the winning model obtained in a fifth predetermined number of growing training steps in succession comprises: when the effect obtained by the win model in the fifth preset number of growth training steps is in a downward sliding trend, determining that the effect of the win model indicates that the effect of the model stops growing, wherein whether the downward sliding trend occurs or not is determined based on the descending degree and/or the shaking degree of the effect obtained in the fifth preset number of growth training steps; or when the effect of the win model obtained in the training steps of the fifth preset number of growth meets a preset attenuation condition, determining that the effect of the win model indicates that the effect of the model stops growing, wherein the attenuation condition meets the condition that the effect of the sixth preset number of continuous effects is lower than the average value of the effect of the seventh preset number of effects before the sixth preset number of continuous effects.

The method further comprises the following steps: obtaining, for each growth training step, an effect of the winning model obtained in a fifth predetermined number of consecutive growth training steps, and the continuing training of the winning model includes: if the effect of the win model indicates that the effect of the model does not appear and stops growing, continuing the next growth training step; alternatively, the method further comprises: obtaining the effect of the winning model obtained in the fifth predetermined number of growing training steps for every fifth predetermined number of growing training steps, and the continuing training the winning model comprises: if the effect of the winning model indicates that no model effect has occurred to stop growing, then the next fifth predetermined number of growth training steps are continued.

Wherein the preset termination condition indicates any one or more of: the effect of each competition model obtained at the end of each competition training step indicates that the competition model is in a stop state; the training time reaches the time limit; and the effect of the win model reaches the expected value.

The method further comprises the following steps: when the effect of the win model indicates that the model effect stops growing, determining whether the training time reaches the time limit; and/or, when the effect of the win model indicates that the model effect stops growing, determining whether the effect of the win model reaches an expected value.

Wherein the method further comprises: and obtaining a trained machine learning model based on at least one model with the best effect obtained when a preset termination condition is met.

Wherein the competition hyper-parameter combination comprises at least one model hyper-parameter and at least one training hyper-parameter; or, the competition hyperparameter combination comprises at least one training hyperparameter.

For each competition super-parameter combination of the first round competition stage, at least one training super-parameter is obtained in the following mode: dividing the linear hyper-parameter space of each training hyper-parameter into a plurality of parts, and taking a plurality of points in the middle as a value candidate set of each training hyper-parameter; and for each competition model, selecting a numerical value from the value candidate set of each training hyper-parameter respectively to form a group of configured training hyper-parameter combinations.

Wherein for each updated contention hyperparameter combination of the resumed contention phase, at least one of the training hyperparameters is obtained by: obtaining the value of each training hyper-parameter from the win hyper-parameter combination in the previous growth stage; randomly setting the numerical value of each training hyper-parameter as the upper boundary or the lower boundary of the linear hyper-parameter space of each training hyper-parameter to obtain a new linear hyper-parameter space; dividing a new linear hyper-parameter space into a plurality of shares, and taking a plurality of points in the middle as a value candidate set of each training hyper-parameter; and for each updated competition model, respectively selecting a numerical value from the value candidate set of each training hyper-parameter to form a group of configured training hyper-parameter combinations.

Wherein the training hyper-parameters comprise a learning rate, and for each updated competition hyper-parameter combination of the restarted competition phase, the learning rate is obtained by the following method: acquiring a value of the learning rate from a win-win super parameter combination in a previous growth stage; setting the value of the learning rate as the upper boundary of the learning rate hyper-parameter space to obtain an updated learning rate hyper-parameter space; acquiring an updated intermediate value of the learning rate hyper-parameter space as an alternative value, and storing the alternative value into an alternative value set; and randomly allocating a corresponding alternative value to each updated competition model as a corresponding learning rate when the number of the alternative values included in the alternative value set is greater than or equal to the number of the updated competition models.

Wherein the method further comprises: when the number of the candidate values in the candidate value set is smaller than the number of the updated competition models, taking the currently obtained candidate values as the lower boundary of the learning rate hyper-parameter space to obtain an updated learning rate hyper-parameter space; and re-executing the step of acquiring the updated intermediate value of the learning rate hyper-parameter space as a candidate value and storing the candidate value into the candidate value set.

Wherein the preset termination condition indicates that: the effect of each competition model obtained at the end of each competition training step indicates that the competition model is in a stop state; and/or the training time reaches a time limit, and the method further comprises: determining whether a machine learning model based on at least one best-effect model obtained when a preset termination condition is met can obtain a desired effect; in the case of a machine learning model that can achieve a desired effect, outputting the machine learning model of the desired effect; and in the case of a machine learning model which cannot obtain the expected effect, resetting the model hyper-parameters and entering the competition phase again.

The training sample is composed of data, and the machine learning model is used for processing the data; wherein the data comprises at least image data, text data or voice data. The training sample is composed of image data, the machine learning model is a neural network model, the machine learning algorithm is a neural network algorithm, and the neural network model is used for processing images.

In this embodiment, a training process of the model is divided into a plurality of competition stages and a plurality of growth stages, in the competition stages, a plurality of competition hyper-parameter combinations are adopted to train corresponding competition models simultaneously, the competition hyper-parameter combinations are selected, eliminated and evolved continuously, and the competition model with the best effect and the corresponding competition hyper-parameter combination are selected to enter the growth stage, that is, in the growth stage, the corresponding competition model is trained continuously only by using the winning competition hyper-parameter combination.

In one example, the combination of competition hyper-parameters may include at least one model hyper-parameter and at least one training hyper-parameter.

The above model hyper-parameters are hyper-parameters for defining a model, such as but not limited to activation functions (e.g., identity function, sigmoid function, and truncated ramp function), number of hidden layer nodes, number of convolutional layer channels, and number of fully-connected layer nodes.

The above training hyper-parameters are hyper-parameters for defining the model training process, such as, but not limited to, learning rate, batch size, and number of iterations.

In another example, at least one training hyper-parameter may be included in the combination of competition hyper-parameters.

In this embodiment, the model training starts to enter a competition phase, that is, the first competition phase currently, and here, for each competition hyperparameter combination in the first competition phase, at least one training hyperparameter in the competition phase may be obtained through the following steps S1010 to S1020:

and step S1010, dividing the linear hyper-parameter space of each training hyper-parameter into a plurality of parts, and taking a plurality of points in the middle as a value candidate set of each training hyper-parameter.

In step S1010, if the competition phase is currently entered for initialization (the first round of competition phase), for example, the linear hyperparameter space of each training hyperparameter may be divided into N +2 parts or N +4 parts, and the middle N values are taken as the candidate value set corresponding to the training hyperparameter. Wherein N is the number of the competition models, and the size of N may be adaptively adjusted, for example, the size of N may be set according to hardware resources (such as the number of CPU cores, the number of GPUs, the number of clusters, and the like), or the size of N may be set according to the current remaining training time, or the size of N may be calculated according to the following formula (1):

N＝2*n+1 (1)

the n may be the total number of the training hyper-parameters and the model hyper-parameters in the competition hyper-parameter combination, or may be only the number of the training hyper-parameters in the competition hyper-parameter combination, which is not limited herein.

Step S1020, for each competition model, selecting a value from the candidate value set of each training hyper-parameter to form a set of configured training hyper-parameter combinations.

In step S1020, for each competition model, for example, a value may be randomly selected from the candidate value-taking set of each training hyper-parameter with a medium probability as a hyper-parameter value, or a value may be selected from the candidate value-taking set of each training hyper-parameter as a hyper-parameter value according to a preset probability distribution (for example, but not limited to, a gaussian distribution or a poisson distribution), and values are taken for different training hyper-parameters once, so that a set of configured training hyper-parameter combinations is formed.

Here, it is sufficient if it can be ensured that the training hyper-parameter combinations configured for each competition model are different as a whole, and for example, but not limited to, the hyper-parameter values of the same training hyper-parameter configured for each competition model are different from each other.

In an embodiment of the present invention, in the competition stage in step S2100, training the corresponding competition models according to the machine learning algorithm under a plurality of competition hyper-parameter combinations to obtain the competition model with the best effect, respectively, may further include the following steps S2110 to S2120:

step S2110, when each of the plurality of competition training steps in the competition phase is finished, obtaining an effect of the competition model trained under each competition hyper-parameter combination.

In step S2110, for example, K competition training steps may be set in a competition stage, and for each competition model entering the current competition training step, the training of the current training step is performed on the corresponding competition model through a respective set of competition hyper-parameters, so as to obtain the effect of the trained competition model. Wherein K is the number of the competitive training steps in the competitive stage, and the size of K can be adaptively adjusted, for example, the size of K can be manually specified, or the size of K can be calculated according to the number of competitive models, the total number of training samples, the maximum iteration number of the training samples, and the number of training samples that can be trained in each competitive training step:

wherein, N is the number of competition models, R is the maximum iteration number of the training samples, I is the total number of the training samples, and I is the number of the training samples which can be trained in each competition training step.

Here, any index may be used to measure the effect of the competition model, for example, any one or more of the accuracy, the loss rate, the derivative of the accuracy, or the derivative of the loss rate of the trained competition model on the current validation data set may be used as the evaluation criterion, and after the competition model trained by each competition hyperparameter combination is obtained at the end of the current competition training step, the effect of the trained competition model is ranked.

And S2120, adjusting the competition hyper-parameter combination entering the next competition training step and the corresponding competition model based on the obtained effect of each competition model, and obtaining the competition model with the best effect when the last competition training step is finished.

It can be seen that, according to the exemplary embodiment of the present invention, in the competition phase, after obtaining the effect of the competition model trained by each competition hyperparameter combination at the end of the current competition training step, it is possible to obtain the competition hyperparameter combination and the corresponding competition model for entering the next competition training step based on the effect, that is, during the whole competition phase model training process, no surrogate model is used, so that, at the end of the competition phase training, in addition to the optimal competition hyperparameter combination, the competition model with the best effect can be generated, and the competition model with the best effect corresponds to the optimal hyperparameter combination.

In an embodiment of the present invention, the step S2120 of adjusting the competition hyper-parameter combination entering the next competition training step and the competition model corresponding to the competition super-parameter combination based on the obtained effect of each competition model, and obtaining the competition model with the best effect when the last competition training step ends may further include the following step S2121:

step S2121, when the obtained effect of each competition model indicates that the competition model is not in a stop state, if the current competition training step is not the last competition training step, obtaining a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof, and if the current competition training step is the last competition training step, obtaining the competition model with the best effect.

The fact that the competition model is not in the stop state means that the optimal effect in the effects obtained at the end of each competition training step is better than the optimal effect obtained at the end of the last competition training step.

In this step S2121, at least one gradient update of the competition model is performed based on the first predetermined number of training samples in each competition training step.

The first predetermined number may be set according to a specific application scenario or a simulation experiment, and may be training in which a plurality of competitive models perform a competitive training step using the same number of training samples at the same time, but training sample data of each competitive model may be the same or different. For example, although a plurality of competition models are trained in one competition training step using 1000 training samples, the 1000 training samples used in each competition model may be the same or different.

In an embodiment of the present invention, the obtaining of the competition hyper-parameter combination entering the next competition training step in step S2121 and the corresponding competition model thereof may further include any one or more of the following steps S2121-1 to S2121 to 7:

and S2121-1, removing the second preset number of competition models with the worst effect to obtain a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof.

The second predetermined number may be set according to a specific application scenario or a simulation experiment. The above second predetermined number is set to a fixed value or a regularly changing value for each competitive training step

It is understood that after removing the second predetermined number of the least effective competition models, the number of competition models entering the current competition training step is smaller than the number of competition models entering the next competition training step.

It will be appreciated that the removal of the least-effective model may be such that the removal of the second predetermined number of models after the last competitive training step leaves only one best-effective model.

And S2121-2, replacing the second preset number of competitive models with the worst competitive models by the third preset number of competitive models with the best effect, and performing fine adjustment on the respective competitive hyper-parameter combinations of the third preset number of competitive models with the best replacement effect, so as to obtain the competitive hyper-parameter combination entering the next competitive training step and the corresponding competitive model thereof.

The third predetermined number may be set according to a specific application scenario or a simulation experiment.

The above second predetermined number is greater than or equal to the third predetermined number.

In one example, in a case where the second predetermined number is equal to the third predetermined number, the number of the competition models entering the current competition training step is the same as the number of the competition models entering the next competition training step.

In another example, in the case that the second predetermined number is greater than the third predetermined number, the number of the competitive models entering the current competitive training step is smaller than the number of the competitive models entering the next competitive training step.

In this step S2121-2, the method for performing fine tuning is not unique, for example, but not limited to, randomly increasing or decreasing the current value of z% for the competition hyperparameter combination, so as to obtain a new set of competition hyperparameter combinations. Wherein z may be set according to a specific application scenario and a simulation experiment.

And S2121-3, randomly removing a second preset number of competition models to obtain a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof.

It is understood that after randomly removing the second predetermined number of the competition models, the number of the competition models entering the current competition training step is smaller than the number of the competition models entering the next competition training step.

And S2121-4, replacing the randomly selected second preset number of competition models with the third preset number of competition models with the best effect, and carrying out fine adjustment on the respective hyper-parameter combinations of the third preset number of competition models with the best effect so as to obtain competition entering the next competition training step.

And S2121-5, removing the second preset number of competition models with the longest existence time to obtain a competition hyper-parameter combination entering the next competition training step and a corresponding competition model thereof.

It is understood that after removing the second predetermined number of the competition models having the longest existence time, the number of the competition models entering the current competition training step is smaller than the number of the competition models entering the next competition training step.

And S2121-6, replacing the second preset number of competition models with the longest existing time by the third preset number of competition models with the best effect, and carrying out fine adjustment on the respective hyper-parameter combinations of the third preset number of competition models with the best effect so as to obtain competition entering the next competition training step.

And S2121-7, directly taking the competition hyperparameter combination of the current competition training step and the competition model corresponding to the competition hyperparameter combination as the competition hyperparameter combination entering the next competition training step and the competition model corresponding to the competition combination.

In this embodiment, after obtaining the competition model with the best effect according to the above step S2100, the obtained competition model and the corresponding competition super parameter combination thereof can be used as the winning model and the winning super parameter combination to enter the growing stage, so that the winning model can be continuously and fully trained by using the winning super parameter combination in the growing stage.

Step S2200, in the growth stage, under the condition of the win-win over-parameter combination obtained in the competition stage of the round, continuing to train the win model obtained in the competition stage of the round, and obtaining the effect of the win model, if the effect of the win model indicates that the model effect stops growing, restarting the competition stage to continue to train the updated competition model according to the machine learning algorithm under the updated plurality of competition over-parameter combinations, otherwise, continuing to train the win model, and repeating the above processes until the preset termination condition is met.

The main effect of the growth stage is to fully train the winning model obtained in the competition stage by using the winning hyper-parameter combination obtained in the competition stage, and the number of training samples used in the competition stage is smaller than that used in the growth stage, so that extremely small calculation cost is used for searching the winning of the hyper-parameter combination in the competition stage, and the time-consuming center of gravity of the model falls in the growth stage.

It can be seen that, according to the exemplary embodiment of the present invention, the competition phase and the growth phase are iterated repeatedly, wherein, when returning from the growth phase to the competition phase again, a plurality of updated competition hyper-parameter combinations are obtained based on the winning hyper-parameter combinations of the previous growth phase, and the updated competition models are all the winning models obtained from the previous growth phase.

In one example, when returning to the competition phase from the growth phase, since the updated plurality of competition super parameter combinations are obtained based on the winning super parameter combination of the previous growth phase, for each updated competition super parameter combination of the resumed competition phase, at least one training super parameter therein may be obtained through the following steps S2011 to S2014:

in step S2011, the value of each training hyper-parameter is obtained from the combination of the win hyper-parameters in the previous growth stage.

In step S2011, if the growing stage enters the competition stage, for example, the value of each training hyper-parameter in the winning hyper-parameter combination may be obtained from the winning hyper-parameter combination in the previous growing stage, and the linear hyper-parameter space of each training hyper-parameter is updated according to the value of each training hyper-parameter.

Step S2012, randomly setting the value of each training superparameter as the upper boundary or the lower boundary of the linear superparameter space of each training superparameter to obtain a new linear superparameter space.

The linear hyper-parameter space of each training hyper-parameter can be narrowed through this step S2012.

And S2013, dividing the new linear hyper-parameter space into multiple parts, and taking a plurality of points in the middle as a value candidate set of each training hyper-parameter.

In step S2013, the new linear hyperparameter space of each training hyperparameter may be divided into N +2 parts or N +4 parts, and the middle N values are taken as a candidate value set corresponding to the training hyperparameter. The calculation of N is already given in detail in step S1010 above, and is not described herein again.

Step S2014, for each updated competition model, selecting a value from the candidate value set of each training hyper-parameter to form a set of configured training hyper-parameter combinations.

In step S2014, for each updated competition model, for example, a value may be randomly selected from the candidate value-taking set of each training hyper-parameter as a hyper-parameter value at a medium probability, or a value may be selected from the candidate value-taking set of each training hyper-parameter as a hyper-parameter value according to a preset probability distribution (for example, but not limited to, gaussian distribution or poisson distribution), and values are taken for different training hyper-parameters once, so as to form a set of configured training hyper-parameter combinations.

Here, it is sufficient if it can be ensured that the training hyper-parameter combinations configured for each updated competition model are different as a whole, and for example, but not limited to, the hyper-parameter values of the same training hyper-parameter configured for each updated competition model are different from each other.

In another example, the training hyper-parameters include a learning rate, which may be obtained through the following steps S2021 to S2024, for each updated competition hyper-parameter combination of the restarted competition phase according to a learning rate reduction strategy:

step S2021, obtain the learning rate value from the win-win superparameter combination in the previous growth stage.

In step S2021, if the development stage is currently going to the competition stage, the value of the learning rate may be obtained from the winning-exceeding parameter combination of the previous development stage, for example.

Step S2022, setting the value of the learning rate as the upper boundary of the learning rate hyper-parameter space to obtain the updated learning rate hyper-parameter space.

The super parameter space of the learning rate can be reduced by this step S2022.

Step S2023, acquiring the updated intermediate value of the learning rate over-parameter space as an alternative value, and storing the alternative value in an alternative value set.

Step S2024, randomly assigning a corresponding candidate value to each updated competition model as a corresponding learning rate when the number of candidate values included in the candidate value set is greater than or equal to the number of updated competition models.

And step S2024, when the number of the candidate values included in the candidate value set is smaller than the number of the updated competition models, taking the currently obtained candidate value as the lower boundary of the learning rate hyper-parameter space to obtain an updated learning rate hyper-parameter space, and re-executing the step S2023 to obtain the intermediate value of the updated learning rate hyper-parameter space as the candidate value and store the candidate value into the candidate value set.

In one example, the preset termination condition for the termination of the iterative iteration indicates any one or more of the following:

the effect of each competition model obtained at the end of each competition training step indicates that the competition model is in a stop state;

the training time reaches the time limit; and

the effect of the win model reaches the expected value.

The competition model is in a stop state, which means that the optimal effect in the effects obtained at the end of each competition training step is not better than the optimal effect obtained at the end of the last competition training step.

In this example, it may be determined whether the training time has reached a time limit, for example, when the effect of the winning model indicates that the model effect ceases to grow; also, for example, it may be determined whether the effect of the winning model reaches an expected value when the effect of the winning model indicates that the model effect stops growing.

In this example, the trained machine learning model may be obtained based on at least one most effective model obtained when the preset termination condition is satisfied, and the machine learning model may be output.

In one example, the preset termination condition for the termination of the iterative iteration may further indicate: the effect of each competition model obtained at the end of each competition training step indicates that the competition model is in a stop state; and/or the training time reaches a time limit.

In this example, it may be a machine learning model that determines whether a desired effect can be obtained based on at least one best-performing model obtained when a preset termination condition is satisfied. And outputting the machine learning model with the expected effect under the condition that the machine learning model with the expected effect can be obtained; and resetting the model hyper-parameters and entering the competition phase again under the condition that the machine learning model with the expected effect cannot be obtained. That is, on the basis of the initially set value of the at least one model hyper-parameter, the machine learning model with the desired effect cannot be obtained only through the repeated iteration of the competition stage and the growth stage, and therefore, the value of the at least one model hyper-parameter needs to be set again, the competition stage needs to be entered again, and the machine learning model with the desired effect is obtained through the repeated iteration of the competition stage and the growth stage.

In an embodiment of the invention, in the step S2200, in the growing stage, under the win-out parameter combination obtained in the current round of competition stage, the training of the win-out model obtained in the current round of competition stage is continued, and the obtaining of the effect of the win-out model may further include:

in the growth stage, under the condition of the win-win super-parameter combination obtained in the competition stage of the round, the win model obtained in the competition stage of the round is continuously trained according to the growth training steps, and the effect of the win model obtained in each growth training step is obtained.

In this embodiment, at least one gradient update of the winning model is performed based on a fourth predetermined number of training samples at each growth training step.

The fourth predetermined number may be set according to a specific application scenario or a simulation experiment, and each growth training step may include the same number of training samples. For example, each step of the growth training comprises 10000 training samples; for example, 10000 samples may be trained in the first growth training step, 8000 samples may be trained in the second growth training step, and so on.

In an embodiment of the present invention, it may be determined whether the effect of the winning model indicates that the model effect stops growing based on the effect of the winning model obtained in the fifth predetermined number of growing training steps, and in a case that the model effect stops growing, the competition phase is restarted to continue training the updated competition model according to the machine learning algorithm under the updated plurality of competition hyperparameters combination, otherwise, the winning model is continued to be trained, and the above process is iterated repeatedly until the preset termination condition is satisfied.

In one example, the determination of whether the effect of the winning model indicates that the effect of the model stops growing based on the effect of the winning model obtained in the fifth predetermined number of growing training steps can be determined by any one or more of the following methods:

in the mode 1, when the effect of the win model obtained in the fifth predetermined number of growth training steps is in a downward-sliding trend, the effect of the win model is determined to indicate that the effect of the model stops growing.

The fifth predetermined number may be set according to a specific application scenario or a simulation experiment.

In the mode 1, whether or not the slip-down tendency occurs may be determined based on the degree of decrease and/or the degree of shaking of the effect obtained in the fifth predetermined number of growth training steps in succession.

Mode 2, when the effect of the winning model obtained in the fifth predetermined number of growth training steps satisfies the preset attenuation condition, determining that the effect of the winning model indicates that the effect of the model stops growing.

In this mode 2, the attenuation condition is satisfied that there is an average value in which the sixth predetermined number of consecutive effects are all lower than the seventh predetermined number of effects before it.

The sixth predetermined number and the seventh predetermined number may be set according to a specific application scenario or a simulation experiment.

Illustratively, if the attenuation condition is such that there are any two consecutive effects that are lower than the average of the three preceding effects, then the attenuation condition may be:

and is

Wherein v is_i、v_jRepresenting any two successive effects, v, in a fifth predetermined number of successive growth training steps_i、v_i-1、v_i-3Presentation effect v_iThe previous three successive effects, v_j、v_j-1、v_j-3Presentation effect v_jThe previous three successive effects may be that when the decay condition of equation (3) is satisfied, the determination indicates that the model effect has occurred to stop growing.

In an embodiment of the present invention, obtaining an effect of the win model obtained in a fifth predetermined number of consecutive growth training steps for each growth training step, and continuing training the win model includes:

if the effect of the winning model indicates that no model effect has occurred to stop growing, the next growth training step is continued.

In one embodiment of the present invention, the effect of the win model obtained in the fifth predetermined number of growth training steps consecutively is obtained for every fifth predetermined number of growth training steps, and the continuing training of the win model includes:

if the effect of the winning model indicates that no model effect has occurred to stop growing, then the next fifth predetermined number of growth training steps are continued.

According to the method of the embodiment of the invention, based on the model parameter inheritance technology, one training process of the model is divided into a plurality of competition stages and growth stages, in the competition stage, a plurality of competition hyper-parameter combinations are adopted to train corresponding competition models simultaneously, the hyper-parameter combinations are selected, eliminated and evolved continuously, and selecting a group of competition super parameters and corresponding competition models which perform the best at the end of the competition phase to enter a growth phase, the method comprises performing hyper-parameter optimization in competition stage, continuously training corresponding competition model in growth stage by using competition hyper-parameter combination obtained in competition stage, and when the model stops growing, the method enters a competition stage again to carry out the super-parameter optimization, the competition stage and the growing stage are alternately carried out until a preset termination condition is met, the training is stopped, no manpower is needed, the calculated amount is less, and the end-to-end automatic parameter adjustment is realized.

On one hand, the competition stage and the growth stage are both in the process of one-time training, so that the optimal hyper-parameter combination and the optimal machine learning model can be obtained through one-time training period.

On the other hand, the model training process is discretized based on the model parameter inheritance technology, so that the calculation amount of the super-parameter optimization from the initialization is reduced, and the optimization selection of multiple groups of super-parameters in one-time model training becomes possible.

In the third aspect, in the prior art, when multiple machine learning models are simultaneously trained by using multiple hyper-parameter combinations, P times of iterative selections are required, M models are trained for each iterative selection, and after M × N times of training, a threshold value set in advance is reached to obtain the hyper-parameter combinations meeting requirements, so that the calculated amount is huge, and a final model is not obtained; in the application, because the training samples used in the competition stage can be greatly less than the training samples used in the growth stage, the time-consuming center of gravity of model training is positioned in the growth stage, and the calculated amount is only that of the existing scheme

The computation is greatly reduced, i.e. the final model is directly generated at the end of training, thereby avoiding manual participation.

On the basis of the above embodiment, before performing step S2100, the method may further include steps S1100 to S1200 as shown below:

step S1100, providing a setting entry for setting an application scenario of the machine learning model.

The user can determine the specific application scene of the needed machine learning model according to the requirement of the user, and the application scene is input through the setting entrance.

Step S1200, acquiring an application scene input through the setting entry.

Then, step S1000 may further be: and acquiring a corresponding training sample set according to the input application scene.

Specifically, the electronic device implementing the embodiment of the present invention may be configured to store a training sample set corresponding to a plurality of application scenarios in advance, where the training sample set is composed of data, for example, image data, text data, or voice data, and obtain a training sample set matching the application scenario according to the application scenario input through the provided setting entry of the application scenario to perform machine learning training, so that the obtained final machine learning model can be applied to the input application scenario, and perform corresponding processing on the data.

After the final machine learning model is obtained through the above embodiment, the method may further include steps S1300 to S1500 shown as follows:

step S1300, determining an application scenario to which the final machine learning model is applied.

Step S1400, finding an application item matching the application scenario.

In step S1500, the final machine learning model is input to the application item.

In the embodiment, the final machine learning model is input to the application item matched with the application scene to which the final machine learning model is applied, so that the sample information in the application item is processed by the final machine learning model in the corresponding application item.

The third automatic parameter adjusting mode is as follows: respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning algorithms based on the same target data set, wherein each machine learning algorithm at least explores M groups of hyper-parameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning algorithm in the current round and calculating the future potential score of each machine learning algorithm based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning algorithms in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning algorithm, and determining a resource allocation scheme for allocating available resources to each machine learning algorithm; and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.

Wherein the calculating the performance score of each machine learning model in the current round comprises: determining the first K optimal model evaluation indexes from the plurality of machine learning models in the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters searched in the round, wherein K is a positive integer; and for each machine learning model, taking the proportion value of the machine learning model to the first K best model evaluation indexes as the performance score of the current round of the machine learning model.

Wherein the calculating the future potential score for each machine learning model comprises: storing model evaluation indexes respectively corresponding to a plurality of groups of hyper-parameters searched by each machine learning model in an array according to the sequence to obtain a plurality of arrays respectively corresponding to the plurality of machine learning models; for each machine learning model, extracting a monotone enhancement array from an array corresponding to the machine learning model, and taking the ratio of the length of the monotone enhancement array to the length of the array corresponding to the machine learning model as the future potential score of the machine learning model.

Wherein the plurality of machine learning models comprises at least two of a logistic regression machine learning model with a hyper-parameter selection mechanism, a naive Bayes machine learning model with a hyper-parameter selection mechanism, an ensemble learning model with a hyper-parameter selection mechanism, and a regression correlation machine learning model with a hyper-parameter selection mechanism.

Wherein the resources include at least one of a central processor, a memory space, and a thread.

Wherein, the step of respectively carrying out a round of hyper-parameter exploration training on a plurality of machine learning models based on the same target data set further comprises: determining whether at least one machine learning model of the plurality of machine learning models meets a condition of early stopping, wherein when at least one machine learning model is determined to meet the condition of early stopping, training of the at least one machine learning model is stopped, and the step of calculating the performance score of the current round and the future potential score is not performed on the at least one machine learning model.

The conditions for the early stop include: when the model evaluation indexes corresponding to the current round exploration hyper-parameters of a machine learning model are not innovated for I times continuously, the machine learning model meets the early stop condition; and/or when the model evaluation indexes corresponding to the J hyper-parameters searched by one machine learning model in the current round are higher than the optimal evaluation indexes of the other machine learning model in the current round, the other machine learning model meets the early stop condition.

Wherein the array corresponding to the machine learning model sequentially includes a first model evaluation index to an Xth model evaluation index, wherein X is an integer greater than or equal to M; the step of extracting a monotone enhanced array from an array corresponding to the machine learning model includes: extracting the first model evaluation index as a first value in a monotone enhancement array; and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array.

Wherein the step of determining the resource allocation scheme comprises: calculating a comprehensive score of each machine learning model based on the performance score of the current round and the potential score of the future of each machine learning model; calculating the ratio of the comprehensive score of each machine learning model to the sum of all the comprehensive scores as the resource distribution coefficient of each machine learning model; determining the resource allocation scheme as the following resource allocation scheme: determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model.

The step of determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model includes: in all the machine learning models except the machine learning model with the highest resource distribution coefficient, from the machine learning model with the lowest resource distribution coefficient, rounding down the product of the resource distribution coefficient of the machine learning model and the total resource to be distributed and determining the value after rounding down as the number of the resources to be distributed to the machine learning model; and determining the resource which is not allocated to the machine learning model in the total resources to be allocated as the resource to be allocated to the machine learning model with the highest resource allocation coefficient.

The step of determining a resource corresponding to a product of the resource allocation coefficient of each machine learning model and the total resource to be allocated as a resource to be allocated to each machine learning model further includes: when the number of the resources allocated to each machine learning model has a value of zero and a value greater than one, sorting the number of the resources of the machine learning model for which the number of the resources allocated to the machine learning model is greater than one in an increasing order; and in the resources of the machine learning models which are ordered according to the ascending order, starting from the machine learning model with the least resources, reducing the resources of the machine learning model by one unit, allocating the reduced resources to one machine learning model in the machine learning models with the zero resource number, and returning to the step of ordering according to the ascending order until all the resources of the machine learning models are not zero.

The resource scheduling method further comprises: in response to a user's stop request, the total training time reaching a predetermined total training time, or the total number of training rounds reaching a predetermined total number of training rounds, stopping allocating resources to the machine learning model.

The method comprises the following steps of performing a round of hyper-parameter exploration training on a plurality of machine learning models respectively based on the same target data set: and respectively allocating the same number of resources to the multiple machine learning models, and respectively performing a round of hyper-parameter exploration training on the multiple machine learning models based on the same target data set by using the same number of resources.

In an exemplary embodiment, the model evaluation indexes respectively corresponding to the plurality of sets of hyper-parameters explored in the round of the logistic regression machine learning model lr, the gradient boosting decision tree machine learning model gbdt, and the deep sparse network machine learning model dsn may be represented as follows:

lr:[0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3]

gbdt:[0.5,0.2,0.1,0.4,0.2,0.6]

dsn:[0.61,0.67,0.63,0.72,0.8]

wherein a single value in a single array may indicate a training effect of a machine learning model having a set of hyper-parameters. For example, as an example, a single value (e.g., 0.2) in the array herein may indicate the validation set accuracy. Further, in this example, the logistic regression machine learning model lr is trained with eight sets of hyper-parameters, the gradient boosting decision tree machine learning model gbdt is trained with six sets of hyper-parameters, and the deep sparse network machine learning model dsn is trained with five sets of hyper-parameters. In the present exemplary embodiment, the optimal model evaluation index and the 5 th (in this example, J is 5, however the present invention is not limited thereto) optimal model evaluation index of the logistic regression machine learning model lr are 0.7 and 0.3, respectively, the optimal model evaluation index and the 5 th optimal model evaluation index of the gradient boosting decision tree machine learning model gbdt are 0.6 and 0.2, respectively, and the optimal model evaluation index and the 5 th optimal model evaluation index of the deep sparse network machine learning model dsn are 0.8 and 0.61, respectively. Since the 5 th best model evaluation index 0.61 of the deep sparse network machine learning model dsn is greater than the optimal model evaluation index 0.6 of the gradient boosting decision tree machine learning model gbdt, the gradient boosting decision tree machine learning model gbdt is determined to satisfy the early stop condition. Therefore, the gradient boosting decision tree machine learning model gbdt no longer participates in the model exploration. Therefore, by determining whether the machine learning model satisfies the early-stop condition and stopping the search for the machine learning model satisfying the early-stop condition, the waste of resources can be reduced and the search efficiency can be improved.

The calculation of the performance score of each machine learning model in this exemplary embodiment is embodied as: since the gradient boost decision tree machine learning model gbdt may satisfy the early stop condition as described above, the gradient boost decision tree machine learning model gbdt does not subsequently participate in training exploration. In this case, the top 5 (here, K is 5, however the present invention does not limit this) best model evaluation indexes among all the model evaluation indexes of the logistic regression machine learning model lr and the deep sparse network machine learning model dsn are respectively: 0.7, 0.67,0.63,0.72 and 0.8. Here, "0.7" is a model evaluation index of the logistic regression machine learning model lr, and therefore, the ratio of the model evaluation index of the logistic regression machine learning model lr to the top 5 best model evaluation indexes "0.7, 0.67,0.63,0.72, and 0.8" is 1/5. In contrast, in the present exemplary example, "0.67", "0.63", "0.72", and "0.8" of the 5 optimal model evaluation indices "0.7, 0.67,0.63,0.72, and 0.8" are model evaluation indices of the deep-sparse-network machine learning model dsn, and therefore, the scale value of the model evaluation index of the deep-sparse-network machine learning model dsn to the first 5 optimal model evaluation indices "0.7, 0.67,0.63,0.72, and 0.8" is 4/5. Thus, in the present exemplary embodiment, the performance score of the current round of the logistic regression machine learning model lr may correspond to 1/5 and the performance score of the current round of the deep sparse network machine learning model dsn may correspond to 4/5.

The calculation of the future potential score for each machine learning model in the present exemplary embodiment is specified as: as described above, the array corresponding to the model evaluation index of the logistic regression machine learning model lr is [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3], and the array corresponding to the model evaluation index of the deep sparse network machine learning model dsn is [0.61,0.67,0.63,0.72,0.8 ]. Here, the monotone increasing array does not necessarily express the monotone increasing array. In one example, when the training effect indicates a validation set accuracy rate, the monotonically increasing array may indicate a monotonically increasing array. In another example, the monotonically increasing array may indicate a monotonically decreasing array when the training effect may indicate a mean square error. In other words, an enhancement of a value in the monotonic enhancement array may indicate an enhancement or optimization of the training effect. For convenience of description, it is assumed below that the array corresponding to the machine learning model sequentially includes a first model evaluation index to an xth model evaluation index, where X is an integer equal to or greater than M. The step of extracting the monotone enhanced array from the array corresponding to the machine learning model may include: the first model evaluation index is extracted as a first value in a monotonically increasing array. For example, in the exemplary first example, the array corresponding to the model evaluation index of the logistic regression machine learning model lr is [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3], and therefore, 0.2 is extracted as the first value in the monotonic enhancing array. Further, the step of extracting the monotone enhanced array from the array corresponding to the machine learning model may further include: and for any model evaluation index from the second model evaluation index to the Xth model evaluation index, if the model evaluation index is superior to the maximum value in the current monotone enhancement array, extracting the model evaluation index as a new value in the monotone enhancement array. For example, in the exemplary first example, for the second model evaluation index 0.4 of the logistic regression machine learning model lr, since the second model evaluation index 0.4 is larger than the maximum value (i.e., 0.2) in the current monotonic enhancement array (at this time, corresponding to the monotonic enhancement array including only the first value), 0.4 is extracted as a new value (i.e., a second value) in the monotonic enhancement array. At this time, the monotonic boost array becomes [0.2,0.4 ]. Next, with respect to the third model evaluation index 0.5 of the logistic regression machine learning model lr, since the third model evaluation index 0.5 is larger than the maximum value (i.e., 0.4) in the current monotone enhancing array (at this time, corresponding to the monotone enhancing array including the first value and the second value), 0.4 is extracted as a new value (i.e., a third value) in the monotone enhancing array. At this time, the monotone enhancing array becomes [0.2,0.4,0.5 ]. Next, with respect to the fourth model evaluation index 0.3 of the logistic regression machine learning model lr, since the fourth model evaluation index 0.3 is smaller than the maximum value (i.e., 0.5) in the current monotone enhancing array (at this time, corresponding to the monotone enhancing array including the first value, the second value, and the third value), 0.3 is not extracted as a new value in the monotone enhancing array. At this point, the monotonic boost array is still [0.2,0.4,0.5 ]. Subsequently, the fifth to eighth model evaluation indexes 0.6 to 0.3 are processed similarly to the second to fourth model evaluation indexes 0.4 to 0.3, and the monotone enhancement array finally obtained is [0.2,0.4,0.5,0.6,0.7 ]. In the present invention, the length of the array may indicate the number of values included in the array. In the illustrative first example, the length of the resulting monotonically increasing array of the logistic regression machine learning model lr is 5, and the length of the array [0.2,0.4,0.5,0.3,0.6,0.1,0.7,0.3] corresponding to the logistic regression machine learning model lr is 8, so the future potential of the logistic regression machine learning model lr is 5/8. In an example, based on a method similar to the calculation method of the future potential score with reference to the logistic regression machine learning model lr, the length of the resulting monotonically enhanced array [0.61,0.67,0.72,0.8] of the deep sparse network machine learning model dsn is 4, and the length of the array [0.61,0.67,0.63,0.72,0.8] corresponding to the deep sparse network machine learning model dsn is 5, and thus, the future potential score of the logistic regression machine learning model lr is 4/5.

Thus, the composite score of the logistic regression machine learning model lr may be: 1/5+5/8 ═ 33/40; the composite score of the deep sparse network machine learning model dsn may be: 4/5+4/5 is 8/5. The resource allocation coefficient of the logistic regression machine learning model lr can be calculated as: (33/40)/(33/40 +8/5) — 33/97, the resource allocation coefficient of the deep sparse network machine learning model dsn can be calculated as: (8/5) ÷ (33/40+8/5) ═ 64/97. Determining a resource corresponding to a product of the resource allocation coefficient 33/97 of the logistic regression machine learning model lr and the total resource to be allocated as a resource to be allocated to the logistic regression machine learning model lr; a resource corresponding to the product of the resource allocation coefficient 64/97 of the deep sparse network machine learning model dsn and the total resource to be allocated is determined as a resource to be allocated to the deep sparse network machine learning model dsn.

Examples of allocation compensation mechanisms are as follows: for convenience of explanation, the following description will be given by taking an example in which the number of tasks allocated to the six machine learning models a, b, c, d, e, f is [1,0,0,0,2,7], however, the present invention is not limited thereto, and the number of machine learning models and the number of specifically allocated resources (e.g., the number of tasks) may be any other number. Since at least one machine learning model (i.e., machine learning model b through machine learning model d) assigns a task number of 0, an assignment compensation mechanism is triggered. Here, the number of resources of the machine learning model that are greater than one of the resources to be allocated to the machine learning model are ordered in increasing order. That is, in the present example, the number of resources of the machine learning models (i.e., machine learning model e and machine learning model f) for which the number of resources in the number of tasks to which the machine learning models a to f are allocated is greater than one is sorted in increasing order as [2,7 ]. In the present invention, the resource of the number 1 does not necessarily mean a single resource, and it may mean one unit of resource, wherein one unit of resource corresponds to a predetermined number of resources. The number of resources 2 of machine learning model e is subtracted by 1, and the reduced number of resources 1 is allocated to one machine learning model (e.g., machine learning model b) of the number of resources 0. Since the number of resources of the machine learning model e after subtracting 1 becomes 1, the number of resources of the machine learning model e is subsequently kept to 1, i.e., resources are no longer allocated from the resources of the machine learning model e to other machine learning models. At this time, since there are two machine learning models (i.e., machine learning models c and d) whose number of resources is 0, it is conceivable to continue allocating resources from the other machine learning models to the machine learning model whose number of resources is 0. Since the number of resources of the machine learning model e has become 1, at most the number of resources of the next machine learning model (i.e., the machine learning model f) is reduced to 1. The number of resources of the machine learning model f may be reduced from 7 to 5, and the reduced resources may be allocated to the machine learning model c and the machine learning model d, respectively, such that the number of resources of the machine learning model c and the machine learning model d are both 1. By assigning the compensation mechanism, the number of resources of the machine learning model a to the machine learning model f eventually becomes [1,1,1,1, 5 ]. Therefore, after the allocation compensation mechanism is adopted, the machine learning models a to f are allocated with resources, the constant intensity of the strong person is ensured, the weak person still has an old chance, and therefore the situation that the machine learning models which are not good in performance are stopped exploring only at one time is avoided, and the accuracy rate of exploration is further improved.

A great deal of space has been spent on the description of several innovative operators. The following describes a process of encapsulating content generated in the processing process, obtaining new data, samples, models, and operators, and adding the new data, samples, models, and operators to a node display list.

Specifically, data generated in the process of executing a data processing flow corresponding to the directed acyclic graph is encapsulated into nodes of which the types are data and stored; encapsulating samples generated in the process of executing the data processing flow corresponding to the directed acyclic graph into nodes of which the types are samples and storing the nodes; encapsulating a model generated in the process of executing the data processing flow corresponding to the directed acyclic graph into nodes of which the types are models and storing the nodes; and encapsulating the data processing flow itself corresponding to the execution directed acyclic graph into nodes with the types of operators and storing the nodes. And adding the encapsulated nodes into a node display list for subsequent editing or creating a directed acyclic graph.

The data and the samples are static data, and therefore the data and the samples are stored according to the format of the data/sample nodes, and the corresponding bound node symbols are generated and added to the node display list. These data or samples may then be dragged from the node list as nodes of the DAG.

In the embodiment of the present invention, the encapsulating the model generated in the process of executing the data processing flow corresponding to the directed acyclic graph into the node of which the type is the model includes the following three cases:

1) and writing the description information of the model into a file, and packaging the file into a node of which the type is the model file. The description information of the model itself here includes: the name of the model and the parameters of the model itself (i.e., the model parameters determined by the model training).

When the description information of the model itself is written into a file, and the file is packaged into a node of which the type is the model file, the code of the processing logic of the model itself is packaged into the node, and the node has two inputs, one input is the model file, the other input sample, namely the processing logic of the model algorithm packaged in the node, and the model file is required to input the model parameters, so that the prediction processing logic of the trained model can be executed on the sample. And in such cases it is generally required that the process of obtaining the input samples be consistent with the process of obtaining the samples when training the model.

2) And writing the description information of the model, the data source description information of the input model and the description information of the data source for preprocessing and characteristic extraction processing into a file, and packaging the file into a node of which the type is the model file.

Also, here, the description information of the model itself includes: the name of the model and the parameters of the model itself (i.e., the model parameters determined by the model training). The data source description information is field information (shcema) of one or more data tables as input data, for example, for a user online shopping behavior data table, the field is purchase time, purchase commodity name, season, discount information of purchase commodity and the like. The preprocessing of the data source comprises: normalization, null filling, etc. The characteristic processing comprises the following steps: from which fields features are extracted, how features are extracted, automatic feature combinations, etc.

The method comprises the steps of writing description information of a model, description information of a data source and description information of preprocessing and feature extraction processing of the data source into a file, obtaining the description information of the analytic data source and the description information of preprocessing and feature extraction processing of the data source and generating an analytic code corresponding to a data processing code under the condition that the file is packaged into a node of which the type is a model file, packaging the analytic code and a code of processing logic of the model into the node, wherein the node has two inputs, one input is the model file, and the other input is the data source. The node carries out preprocessing and characteristic extraction processing on an input data source to obtain a sample, combines the parameters of the model file with the processing logic of the model to obtain a trained model, and then carries out prediction on the sample according to the trained model to output a result.

3) And encapsulating the description information of the model, the description information of the data source, the code for preprocessing the data source and extracting the characteristics and the code for executing the processing logic of the model into the nodes of which the types are the models.

And encapsulating the description information of the model, the description information of the data source, the code for preprocessing and characteristic extraction processing of the data source and the code for executing the processing logic of the model into a node with the type of the model, wherein the node has an input, and the input is the data source.

In an embodiment of the present disclosure, the execution main bodies corresponding to each node in the directed acyclic graph respectively transparently transmit the data source description information and the description information or the code of the data processing logic of each execution main body according to the execution order, so that each execution main body can output the data source description information and the description information or the code of the data processing logic of the execution main body and all the upper-level execution main bodies thereof.

Or, in an embodiment of the present disclosure, each execution main body corresponding to each node in the directed acyclic graph has a corresponding information record file; each execution main body saves the description information of the data input source and the description information or the code of the data processing logic of the execution main body into the information recording file of the execution main body if the execution main body does not have the upper level execution main body, reads the content in the information recording file of the upper level execution main body if the execution main body has the upper level execution main body, and saves the description information or the code of the data processing logic of the execution main body and the read content into the information recording file of the execution main body.

In an embodiment of the present invention, in response to the operation of running the directed acyclic graph in step S120, executing a data processing flow corresponding to the directed acyclic graph includes: acquiring configuration information of a directed acyclic graph; determining a previous node and a next node of each node in the directed acyclic graph according to the configuration information of the directed acyclic graph, and further determining the connection relation between the nodes; and executing the data processing flow corresponding to each node according to the connection relation among the nodes.

The configuration information of the directed acyclic graph comprises configuration information of each node, and the configuration information of each node comprises input slot information (input slot) and output slot information (output slot); the input slot information is used to describe information of a previous node, such as an identifier of the previous node, data information output by the previous node, and the like, and correspondingly, the output slot information is used to describe information of a next node, such as an identifier of the next node, data information output to the next node, and the like.

Determining the previous node and the next node of each node in the directed acyclic graph according to the configuration information of the directed acyclic graph includes: and determining the upper-level node and the lower-level node of each node in the directed acyclic graph according to the input slot information and the output slot information in the configuration information of each node in the directed acyclic graph.

The configuration information of each node further includes information for determining an operation mode of the node, specifically, an identifier indicating standalone operation or an identifier indicating distributed operation. The method shown in fig. 1 further comprises: and determining that each node operates in a stand-alone mode or in a distributed mode according to the configuration information of the node. Correspondingly, when the data processing flow corresponding to each node is executed, the data processing flow is executed in a single machine mode or a distributed mode.

FIG. 4 shows a schematic diagram of a system implementing data processing according to an embodiment of the invention. As shown in fig. 4, the system 400 for implementing data processing includes:

an operation unit 401 adapted to generate a corresponding directed acyclic graph in response to an operation of a user to generate the directed acyclic graph;

the running unit 402 is adapted to execute a data processing flow corresponding to the directed acyclic graph in response to the operation of running the directed acyclic graph.

The operating unit 401 is adapted to display a first graphical user interface including a node display area and a canvas area, where node types in the node display area include data, samples, models, and operators; in response to an operation of selecting a node in the node exhibition area, displaying the corresponding node in the canvas area, and in response to an operation of connecting the nodes, generating a connecting line between the corresponding nodes in the canvas area to generate the directed acyclic graph.

The node display area comprises an element list and an operator list, the element list comprises data, samples and models, and the operator list comprises various data processing operators related to machine learning; the node display area further comprises a file list, and the file list comprises a directed acyclic graph.

The nodes of the node display area also comprise a directed acyclic graph; the operating unit 401 is further adapted to perform at least one of the following: in response to an operation of selecting a directed acyclic graph in the node display area, displaying the selected directed acyclic graph in the canvas area for direct running or modification editing; responding to the operation of saving the directed acyclic graph in the drawing area, saving the directed acyclic graph, and adding the saved directed acyclic graph into the node display area; in response to an operation to export a directed acyclic graph, the corresponding directed acyclic graph is output to a specified export location.

The operating unit 401 is further adapted to perform at least one of the following: responding to the operation of importing elements from the outside, saving the corresponding elements and adding the elements into the node display area; saving elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, and adding the saved elements to the node display area; providing a management page for managing elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, so that a user can check and delete the intermediate elements through the management page; in response to an operation to export an element, outputting the corresponding element to the specified export location; wherein the element is a data, sample, or model.

The operating unit 401 is further adapted to perform at least one of the following: responding to the operation of importing operators from the outside, storing codes corresponding to the corresponding operators, and adding the corresponding operators to the node display area; and providing an operator code editing interface, acquiring and storing the input codes from the interface, and adding corresponding operators to the node display area.

The operation unit 401 is further adapted to perform the following operations: responding to the operation of selecting one node in the layout area, displaying a configuration interface of the node, and finishing the relevant configuration of the corresponding node according to the configuration operation on the configuration interface; when the node does not carry out necessary configuration or the configured parameters do not meet the preset requirements, the prompt mark is displayed at the corresponding node in the drawing area.

Wherein the operating unit 401 is further adapted to perform at least one of the following: displaying a graphic control running the directed acyclic graph in the first graphic user interface, responding to the operation of triggering the graphic control, and executing a data processing flow corresponding to the directed acyclic graph according to each node in the directed acyclic graph and the connection relation among the nodes; and displaying a timer on the first graphical user interface, wherein the timer is used for timing the time spent on executing the data processing flow corresponding to the directed acyclic graph in real time.

Wherein the operating unit 401 is further adapted to perform at least one of the following: displaying information used for representing the executed progress of the corresponding node on each node of the directed acyclic graph on a first graphical user interface in the process of executing the data processing flow corresponding to the directed acyclic graph; displaying an identifier in operation on each node of the directed acyclic graph on a first graphical user interface in the process of executing the data processing flow corresponding to the directed acyclic graph, and displaying the identifier after execution on the node when the data processing flow corresponding to the node is executed; and responding to the operation of checking the operation result of the node in the directed acyclic graph, and acquiring and displaying the operation result data corresponding to the node.

Wherein the operation unit 401 is further adapted to perform one or more of the following: the data, samples, and models in the canvas area each support one or more of the following operations: copy, delete, and preview; operators in the canvas area support one or more of the following operations: copying, deleting, previewing, running the current task, starting running from the current task, running to the current task, viewing a log, viewing details of the task.

For the oriented acyclic graph which is finished in operation in the layout area, responding to the operation of clicking one operator, displaying product type marks respectively corresponding to the types of products output by the operator, responding to the operation of clicking the product type marks, and displaying a product related information interface, wherein the product related information interface comprises: a control for previewing the product, a control for exporting the product, a control for importing the product into an element list, basic information of the product and path information for storing the product; wherein the product types output by the operators include: data, samples, models, and reports.

Wherein, the node display region comprises one or more of the following operators:

a data splitting operator: the data splitting method provided in the configuration interface of the data splitting operator comprises one or more of splitting according to proportion, splitting according to rules and splitting after sequencing; when proportional splitting is selected, proportional sequential splitting, proportional random splitting and proportional layered splitting can be further selected, when random splitting is selected, an input area for setting random seed parameters is further provided on the configuration interface, and when layered splitting is selected, an input area for setting fields of layered basis is further provided on the configuration interface; providing an input area for inputting a split rule when splitting by the rule is selected; when the sorting is selected and then the splitting is carried out, a splitting ratio selection item, an input area for setting a sorting field and a sorting direction selection item are further provided on the configuration interface;

a feature extraction operator: providing an interface for adding an input source and a script editing inlet in a configuration interface of the feature extraction operator, and providing at least one of a sample random ordering option, a feature accurate statistics option, an option for outputting whether a sample is compressed, an output plaintext option, a tag type option and an output result storage type option;

feature importance analysis operator: the input of the feature importance analysis operator is a sample table with a target value, the output is a feature importance evaluation report, the report comprises the importance coefficient of each feature, and the report further comprises one or more of the feature number, the sample number and the basic statistical information of each feature;

automatic feature combination operator: at least one of a feature selection item, a score index selection item, a learning rate setting item and a termination condition selection item is provided in a configuration interface of the automatic feature combination operator, wherein the feature selection item is used for determining each feature for feature combination, and the termination condition selection item comprises the maximum number of operation feature pools and the maximum number of output features.

Automatic parameter adjusting operator: the automatic parameter adjusting operator is used for searching out proper parameters from a given parameter range according to a parameter adjusting algorithm, training a model by using the searched parameters and evaluating the model; providing at least one of a feature selection setting item, a parameter adjusting method option and a parameter adjusting times setting item in a configuration interface of an automatic parameter adjusting operator, wherein all features or self-defined feature selections can be selected in the feature selection setting item, and random search or grid search can be selected in the parameter adjusting method option;

tensorflow operator: the TensorFlow operator is used for running a user-written TensorFlow code, and an input source setting item and a TensorFlow code file path setting item are provided in a configuration interface of the TensorFlow operator;

self-defining a script operator: the interface is used for providing a user with a custom writing operator by using a specific script language, and an input source setting item and a script editing inlet are provided in the configuration of the custom script operator.

Wherein the feature importance analysis operator determines the importance of a feature by at least one of:

training at least one characteristic pool model based on the sample set, wherein the characteristic pool model refers to a machine learning model which provides a prediction result about a machine learning problem based on at least a part of characteristics in each characteristic contained in the sample, acquiring the effect of the at least one characteristic pool model, and determining the importance of each characteristic according to the acquired effect of the at least one characteristic pool model; wherein the feature pool model is trained by performing a discretization operation on at least one continuous feature among the at least a portion of features;

determining a basic feature subset of the sample, and determining a plurality of target feature subsets of which the importance is to be determined; aiming at each target feature subset in the plurality of target feature subsets, acquiring a corresponding compound machine learning model, wherein the compound machine learning model comprises a basic sub-model and an additional sub-model which are trained according to a lifting frame, the basic sub-model is trained on the basis of the basic feature subsets, and the additional sub-model is trained on the basis of each target feature subset; and determining the importance of the plurality of subsets of target features from the effects of the composite machine learning model;

pre-sorting at least one candidate feature in a sample according to importance, and screening a part of candidate features from the at least one candidate feature according to a pre-sorting result to form a candidate feature pool; and reordering the importance of each candidate feature in the candidate feature pool, and selecting at least one candidate feature with higher importance from the candidate feature pool as an important feature according to the reordering result.

Wherein the automatic feature combination operator performs feature combination by at least one of:

executing at least one binning operation aiming at each continuous feature in the sample to obtain binning group features consisting of at least one binning feature, wherein each binning operation corresponds to one binning feature; and generating combined features of the machine-learned samples by combining features between the binned features and/or other discrete features in the samples;

performing feature combination between at least one feature of the sample stage by stage according to a heuristic search strategy to generate candidate combined features, wherein for each stage, a target combined feature is selected from a candidate combined feature set to serve as the combined feature of the machine learning sample;

acquiring unit features which can be combined in a sample; providing a graphical interface for setting feature combination configuration items for defining how feature combinations are to be made between unit features to a user; receiving input operation executed on a graphical interface by a user for setting a feature combination configuration item, and acquiring the feature combination configuration item set by the user according to the input operation; combining the features to be combined in the unit features based on the obtained feature combination configuration items to generate combined features of the machine learning samples;

according to a search strategy, iteratively performing feature combination between at least one discrete feature of a sample to generate candidate combined features, and selecting a target combined feature from the generated candidate combined features as a combined feature; the method comprises the steps of performing importance pre-sorting on each candidate combination feature in a candidate combination feature set aiming at each iteration, screening a part of candidate combination features from the candidate combination feature set according to a pre-sorting result to form a candidate combination feature pool, performing importance re-sorting on each candidate combination feature in the candidate combination feature pool, and selecting at least one candidate combination feature with higher importance from the candidate combination feature pool as a target combination feature according to a re-sorting result;

screening a plurality of key unit characteristics from the characteristics of the sample; obtaining at least one combined feature from the plurality of key unit features by using an automatic feature combination algorithm, wherein each combined feature is formed by combining corresponding partial key unit features in the plurality of key unit features; and taking the obtained at least one combined feature as an automatically generated combined feature.

Wherein, the automatic parameter adjusting operator automatically adjusts parameters by any one of the following modes:

the following steps are executed in each iteration process: determining currently available resources; respectively scoring a plurality of super-parameter tuning strategies, and distributing currently available resources to the super-parameter tuning strategies according to scoring results, wherein each super-parameter tuning strategy is used for selecting a super-parameter combination for the machine learning model based on a corresponding super-parameter selection strategy; acquiring one or more hyper-parameter combinations generated by each hyper-parameter tuning strategy distributed to the resources based on the distributed resources;

in the competition stage, under the condition of a plurality of competition hyper-parameter combinations, respectively training corresponding competition models according to a machine learning algorithm to obtain competition models with the best effect, and taking the obtained competition models and the corresponding competition hyper-parameter combinations thereof as a win model and a win hyper-parameter combination to enter a growth stage; in the growth stage, under the condition of the win-win super-parameter combination obtained in the competition stage of the current round, continuously training the win model obtained in the competition stage of the current round, obtaining the effect of the win model, if the effect of the win model indicates that the model effect appears and stops growing, restarting the competition stage to continuously train the updated competition model under the condition of a plurality of updated competition super-parameter combinations according to the machine learning algorithm, otherwise, continuously training the win model, and repeating the process until the preset termination condition is met; wherein, a plurality of updated competition super parameter combinations are obtained based on the winning super parameter combination of the previous growth stage, and the updated competition models are all the winning models obtained in the previous growth stage.

Respectively carrying out a round of hyperparameter exploration training on a plurality of machine learning algorithms, wherein each machine learning algorithm at least explores M groups of hyperparameters in the round of exploration, and M is a positive integer greater than 1; calculating the performance score of each machine learning algorithm in the current round and calculating the future potential score of each machine learning algorithm based on the model evaluation indexes respectively corresponding to the plurality of groups of hyper-parameters explored by the plurality of machine learning algorithms in the current round; integrating the performance scores of the current round and the potential scores of the future of each machine learning algorithm, and determining a resource allocation scheme for allocating available resources to each machine learning algorithm; and carrying out corresponding resource scheduling in next round of hyper-parameter exploration training according to the resource allocation scheme.

Wherein the operation unit 401 is further adapted to perform one or more of the following:

encapsulating data generated in the process of executing the data processing flow corresponding to the directed acyclic graph into nodes with the types of the data and storing the nodes;

encapsulating samples generated in the process of executing the data processing flow corresponding to the directed acyclic graph into nodes of which the types are samples and storing the nodes;

encapsulating a model generated in the process of executing the data processing flow corresponding to the directed acyclic graph into nodes of which the types are models and storing the nodes;

and encapsulating the data processing flow itself corresponding to the executed directed acyclic graph into nodes with the types of operators and storing the nodes.

The operation unit 401 is further adapted to add the encapsulated node to a node display list for subsequent editing or creating a directed acyclic graph.

The operation unit 401 is adapted to write description information of the model itself into a file, and encapsulate the file into a node of which the type is a model file; or writing the description information of the model, the description information of the data source and the description information of the data source for preprocessing and characteristic extraction processing into a file, and packaging the file into a node of which the type is the model file; or encapsulating the description information of the model, the description information of the data source, the code for preprocessing and characteristic extraction processing of the data source and the code for executing the processing logic of the model into the node of which the type is the model.

The execution main bodies corresponding to all nodes in the directed acyclic graph are respectively transmitted with the data source description information and the description information or the codes of the data processing logic of all the execution main bodies according to the execution sequence, so that each execution main body can output the data source description information and the description information or the codes of the data processing logic of the execution main body and all the superior execution main bodies thereof;

or, each execution main body corresponding to each node in the directed acyclic graph has a corresponding information recording file; each execution main body saves the description information of the data input source and the description information or the code of the data processing logic of the execution main body into the information recording file of the execution main body if the execution main body does not have the upper level execution main body, reads the content in the information recording file of the upper level execution main body if the execution main body has the upper level execution main body, and saves the description information or the code of the data processing logic of the execution main body and the read content into the information recording file of the execution main body.

Wherein the data source description information is field information of one or more data tables as input data.

The operation unit 401 is adapted to encapsulate, when the description information of the model itself is written in a file and the file is encapsulated into a node of which the type is a model file, a code of the processing logic of the model itself is also encapsulated into a node, and the input of the node is the model file; writing description information of a model, data source description information of an input model and description information of preprocessing and feature extraction processing of a data source into a file, acquiring the description information of the analytic data source and the description information of the preprocessing and feature extraction processing of the data source and generating an analytic code corresponding to a data processing code under the condition that the file is packaged into a node of which the type is the model file, packaging the analytic code and a code of processing logic of the model into the node, and inputting the node into the model file.

The running unit 402 is adapted to obtain configuration information of a directed acyclic graph; determining a previous node and a next node of each node in the directed acyclic graph according to the configuration information of the directed acyclic graph, and further determining the connection relation between the nodes; and executing the data processing flow corresponding to each node according to the connection relation among the nodes.

The configuration information of the directed acyclic graph comprises configuration information of each node, and the configuration information of each node comprises input slot information and output slot information; the running unit 402 is adapted to determine a previous-level node and a next-level node of each node in the directed acyclic graph according to input slot information and output slot information in configuration information of each node in the directed acyclic graph.

The configuration information of the directed acyclic graph comprises configuration information of each node; the operation unit 402 is further adapted to determine that each node operates in a standalone manner or in a distributed manner according to configuration information of the node.

The information for determining the operation mode of each node in the configuration information of each node comprises: an identity indicating a standalone operation or an identity indicating a distributed operation.

A method and system for implementing data processing according to exemplary embodiments of the present disclosure have been described above with reference to fig. 1 to 4. However, it should be understood that: the devices and systems shown in the figures may each be configured to include software, hardware, firmware, or any combination thereof for performing the specified functions. For example, the systems and apparatuses may correspond to an application-specific integrated circuit, and may also correspond to a module in which software is combined with hardware. Further, one or more functions implemented by these systems or apparatuses may also be performed collectively by components in a physical entity device (e.g., a processor, a client, or a server, etc.).

Further, the above-mentioned method of implementing data processing may be implemented by instructions recorded on a computer-readable storage medium, for example, according to an exemplary embodiment of the present disclosure, there may be provided a computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the processes disclosed in the present document of implementing data processing. For example, the following steps are performed: responding to the operation of generating the directed acyclic graph of the user, and generating a corresponding directed acyclic graph; and responding to the operation of running the directed acyclic graph, and executing a data processing flow corresponding to the directed acyclic graph.

The instructions stored in the computer-readable storage medium described above may be executed in an environment deployed in a computer apparatus such as a client, a host, a proxy device, a server, etc., it being noted that the instructions may also be used to perform additional steps other than or in addition to the steps described above.

It should be noted that the system for implementing data processing according to the exemplary embodiments of the present disclosure may fully rely on the execution of computer programs or instructions to implement the corresponding functions, i.e., each device corresponds to each step in the functional architecture of the computer program, so that the whole system is called by a special software package (e.g., lib library) to implement the corresponding functions.

On the other hand, when the system of the present invention and its functions are implemented by software, firmware, middleware or microcode, program codes or code segments for performing the corresponding operations may be stored in a computer-readable medium such as a storage medium, so that at least one processor or at least one computing device may perform the corresponding operations by reading and executing the corresponding program codes or code segments.

For example, according to an exemplary embodiment of the present disclosure, a system may be provided that includes at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the processes disclosed herein that implement data processing. For example, the following steps are performed: responding to the operation of generating the directed acyclic graph of the user, and generating a corresponding directed acyclic graph; and responding to the operation of running the directed acyclic graph, and executing a data processing flow corresponding to the directed acyclic graph.

The at least one computing device may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller or microprocessor, a display device, or the like. By way of example, and not limitation, the at least one computing device may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like. The computing device may execute instructions or code stored in one of the storage devices, which may also store data. Instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory device may be integrated with the computing device, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, the storage device may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The storage device and the computing device may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the computing device can read instructions stored in the storage device.

While various exemplary embodiments of the present disclosure have been described above, it should be understood that the above description is exemplary only, and not exhaustive, and that the present disclosure is not limited to the disclosed exemplary embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the disclosure. Therefore, the protection scope of the present disclosure should be subject to the scope of the claims.

Claims

1. A method of implementing data processing, wherein the method comprises:

2. The method of claim 1, wherein the generating a directed acyclic graph in response to the user's operation of generating a directed acyclic graph comprises:

displaying a first graphical user interface comprising a node display area and a canvas area, wherein node types in the node display area comprise data, samples, models, and operators;

in response to an operation of selecting a node in the node exhibition area, displaying the corresponding node in the canvas area, and in response to an operation of connecting the nodes, generating a connecting line between the corresponding nodes in the canvas area to generate the directed acyclic graph.

3. The method of claim 2, wherein,

the node display area comprises an element list and an operator list, the element list comprises data, samples and models, and the operator list comprises various data processing operators related to machine learning;

the node display area further comprises a file list, and the file list comprises a directed acyclic graph.

4. The method of claim 2, wherein the nodes of the node exposure area further comprise a directed acyclic graph;

the method further comprises at least one of:

in response to an operation of selecting a directed acyclic graph in the node display area, displaying the selected directed acyclic graph in the canvas area for direct running or modification editing;

responding to the operation of saving the directed acyclic graph in the drawing area, saving the directed acyclic graph, and adding the saved directed acyclic graph into the node display area;

in response to an operation to export a directed acyclic graph, the corresponding directed acyclic graph is output to a specified export location.

5. The method of claim 2, wherein the method further comprises at least one of:

responding to the operation of importing elements from the outside, saving the corresponding elements and adding the elements into the node display area;

saving elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, and adding the saved elements to the node display area;

providing a management page for managing elements generated in the process of executing the data processing flow corresponding to the directed acyclic graph, so that a user can check and delete the intermediate elements through the management page;

in response to an operation to export an element, outputting the corresponding element to the specified export location;

wherein the element is a data, sample, or model.

6. The method of claim 2, wherein the method further comprises at least one of:

responding to the operation of importing operators from the outside, storing codes corresponding to the corresponding operators, and adding the corresponding operators to the node display area;

and providing an operator code editing interface, acquiring and storing the input codes from the interface, and adding corresponding operators to the node display area.

7. The method of claim 2, wherein the method further comprises:

responding to the operation of selecting one node in the layout area, displaying a configuration interface of the node, and finishing the relevant configuration of the corresponding node according to the configuration operation on the configuration interface;

when the node does not carry out necessary configuration or the configured parameters do not meet the preset requirements, the prompt mark is displayed at the corresponding node in the drawing area.

8. A computer-readable storage medium storing instructions that, when executed by at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.

9. A system comprising at least one computing device and at least one storage device storing instructions that, when executed by the at least one computing device, cause the at least one computing device to perform the method of any of claims 1 to 7.

10. A data processing system, wherein the system comprises: