WO2022161081A1

WO2022161081A1 - Training method, apparatus and system for integrated learning model, and related device

Info

Publication number: WO2022161081A1
Application number: PCT/CN2021/142240
Authority: WO
Inventors: 余思; 贾佳峰; 熊钦; 王工艺
Original assignee: 华为技术有限公司
Priority date: 2021-01-28
Filing date: 2021-12-28
Publication date: 2022-08-04
Also published as: CN114819195A

Abstract

A training method for an integrated learning model, which method is applied to a model training system comprising a control node and a working node. The method comprises: when training an integrated learning model, a control node acquiring a training request for the integrated learning model; generating, according to the training request, a training task set comprising a plurality of training tasks; and then, the control node respectively sending the training tasks in the training task set to a plurality of working nodes in a working node set, wherein each training task is executed by one working node, each training task is used for training at least one learning sub-model in the integrated learning model, and different training tasks are used for training different learning sub-models. A training result of each learning sub-model can be processed by a working node, such that the amount of data needing to be communicated between working nodes during the process of training the learning sub-models can be effectively reduced, and the training efficiency and success rate of an integrated learning model are improved.

Description

Training method, device, system and related equipment for ensemble learning model

technical field

The present application relates to the field of computer technology, and in particular, to a training method, apparatus, system, device, and computer-readable storage medium for an integrated learning model.

Background technique

The ensemble learning model can be used in the processing of classification, regression and other problems in the field of machine learning, in order to obtain better classification accuracy and prediction performance. The ensemble learning model is widely used in Anping, operators, finance and other industries as well as various production systems.

At present, in order to meet the performance and accuracy requirements of model training, the ensemble learning model is usually trained on a cluster including multiple servers. Specifically, each server uses a set of training samples to train the ensemble learning model. Only part of the training results of the ensemble learning model can be obtained through model training, so different servers will also exchange part of the training results obtained by their respective training, so that the trained ensemble learning model is determined based on the training results obtained by the aggregation. However, when the data size of the training samples is large, the data volume of some training results that need to be interacted between different servers is also large, which makes the training efficiency of the ensemble learning model low due to the large amount of data interacting between servers. , it may even cause the training of the ensemble learning model to fail due to training timeout or bandwidth overload between servers, making it difficult to meet the user's application needs. Therefore, how to provide an efficient ensemble learning model training method has become an urgent technical problem to be solved.

SUMMARY OF THE INVENTION

The present application provides an integrated learning model training method, apparatus, device, system, computer-readable storage medium and computer program product, so as to improve the training efficiency and training success rate of the integrated learning model.

In a first aspect, a training method for an integrated learning model is provided, and the method can be applied to a model training system including a control node and a working node. When training the ensemble learning model, the control node obtains the training request of the ensemble learning model, and generates a training task set according to the training request, wherein the generated training task set includes a plurality of training tasks, and among the plurality of training tasks; Then, the control node sends the training tasks in the training task set to the plurality of working nodes in the working node set respectively, each training task can be assigned to one working node and executed by one working node, and each training task is used for At least one sub-learning model in the ensemble learning model is trained, and different training tasks are used to train different sub-learning models. Of course, a worker node may only be assigned to a training task of one sub-learning model, or may be assigned to a training task of multiple sub-learning models.

Since each sub-learning model in the ensemble learning model is trained by a working node, the training results of each sub-learning model can be processed by one working node, so that after the working node completes the training of the sub-learning model, it is not necessary to Obtain training results for this sub-learning model from other worker nodes. In this way, the amount of data required to communicate between each working node in the process of training the sub-learning model can be effectively reduced, thereby not only reducing the resource consumption required for training the ensemble learning model, but also effectively improving the training of the ensemble learning model. efficiency and success rate.

In a possible implementation, the training request includes an instruction for training the ensemble model and the number of sub-learning models in the ensemble learning model, and the control node may, after receiving the training request, Instruct, trigger the training of the ensemble learning model, and generate an equal number of training tasks for the training of the sub-learning models according to the number of the sub-learning models included in the training request. Of course, in other possible implementations, the number of sub-learning models in the ensemble learning model may be fixed. In this case, the training request may only include a training instruction for the ensemble learning model, and the control node may generate a fixed number based on the training instruction. sub-learning model to complete the training of the ensemble learning model.

In another possible implementation, when the control node generates a training task set according to the training request, it may specifically generate a training task set according to the number of sub-learning models included in the training request, and the training task set includes the training task set. The number of tasks is equal to the number of sub-learning models, for example, each training task can be used to train a sub-learning model, etc. Of course, in other possible implementations, multiple training tasks may also be used to train a sub-learning model, etc., which is not limited in this embodiment.

In another possible implementation manner, when the control node sends the training tasks in the training task set to the plurality of working nodes in the working node set, it may specifically obtain the load of each working node in the working node set, and According to the load of each worker node, a training task is sent to each worker node in the first part of the worker nodes, wherein the worker node set includes the second part of worker nodes in addition to the first part of worker nodes, and the received The load of each worker node in the first part of the training task is smaller than the load of each worker node in the second part of the worker node. For example, the control node can sort the load of each worker node in the work set, and assign training tasks to the first n worker nodes with less load in turn, while the last m worker nodes with greater load may not be assigned to Training nodes (m+n worker nodes are included in the working set). In this way, in the process of training the ensemble learning model, it can be avoided that the load of some working nodes is too high and the training efficiency of the whole ensemble learning model is lowered or even the training process fails. Of course, in practical application, other possible implementations may also be used to assign training tasks to the working nodes, etc., which is not limited in this embodiment of the present application.

In another possible implementation, the sub-learning model in the ensemble learning model may specifically be a decision tree model. Of course, in practical applications, the sub-learning model may also be other types of models, which are not limited in this embodiment of the present application.

In another possible implementation, when the sub-learning model is specifically a decision tree model, the training termination condition of the sub-learning model includes at least one of the following conditions: the number of training samples corresponding to the leaf nodes of the sub-learning model is less than or equal to The number threshold, or the impurity index of the training sample set used to train the sub-learning model is less than the impurity threshold, or the depth of the nodes in the sub-learning model is greater than or equal to the depth threshold. When each sub-learning model in the ensemble learning model satisfies at least one of the above conditions, the control node may end the training of the ensemble learning model.

In a second aspect, the present application also provides a training device for an integrated learning model, where the training device for an integrated learning model includes a training device for performing the integrated learning model in the first aspect or any possible implementation manner of the first aspect. the individual modules of the method.

In a third aspect, the present application also provides a device, comprising: a processor and a memory; the memory is used to store an instruction, and when the device is running, the processor executes the instruction stored in the memory, so that the device executes the above-mentioned instructions A method for training an ensemble learning model in the first aspect or any implementation method of the first aspect. It should be noted that the memory may be integrated in the processor, or may be independent of the processor. The business requirement adjustment system may also include a bus. Among them, the processor is connected to the memory through the bus. The memory may include readable memory and random access memory.

In a fourth aspect, the present application also provides a model training system, where the model training system includes a control node and a working node, where the control node is used to execute the integrated learning model in the first aspect or any implementation manner of the first aspect. Training method, the worker node is used to execute the training task sent by the control node.

In a fifth aspect, the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer executes the first aspect and any one of the first aspect. The method described in the embodiment.

In a sixth aspect, the present application provides a computer program product comprising instructions, which, when run on a computer, cause the computer to execute the method described in the first aspect and any one of the embodiments of the first aspect.

On the basis of the implementation manners provided by the above aspects, the present application may further combine to provide more implementation manners.

Description of drawings

FIG. 1 is a schematic diagram of the architecture of an exemplary model training system provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an exemplary application scenario provided by an embodiment of the present application;

3 is a schematic flowchart of a training method for an integrated learning model provided by an embodiment of the present application;

4 is a schematic diagram of an exemplary configuration interface provided by an embodiment of the present application;

Fig. 5 is the schematic diagram that each work node utilizes training sample set to train each sub-learning model;

6 is a schematic structural diagram of a training device for an integrated learning model provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a hardware structure of a device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

As shown in FIG. 1 , a model training system is provided in an embodiment of the present application. The model training system includes a control node (driver) 100 and a plurality of worker nodes (workers). Among them, the control node 100 and the worker nodes cooperate to complete the training of the integrated learning model, and the control node 100 is used to generate the training task of the integrated learning model, and send the above training tasks to the worker nodes; the worker nodes are used to execute the received training tasks. The training task is to obtain the sub-learning model in the ensemble learning model. It is worth noting that in FIG. 1 , the model training system includes 10 working nodes as an example for illustration, which are working nodes 201 to 210 respectively. In actual application, it can also include any number (more than one) At the same time, the number of the control node 100 may not be limited to one. The control node 100 and each working node may interact through an intermediate device (such as a switch, etc., not shown in FIG. 1 ).

For example, in an enterprise application scenario, in order to achieve the performance and accuracy requirements of model training, the model training system shown in FIG. 1 can be deployed in a cluster including multiple servers, and the control node 100 and the work Node 201 to worker node 210 may be deployed by servers in the cluster, respectively, and the servers in the cluster are divided into control nodes and worker nodes according to the functions performed by the servers. For example, server 1 to server n form a cluster, in which a big data platform such as a database (for example, Hadoop) or a computing engine (for example, Spark) can be deployed, and a model training system for training an ensemble learning model can be based on The above big data platform is deployed to implement the training and inference of the integrated learning model. Specifically, the model training system can train an integrated learning model on the big data platform, and the integrated learning model can include multiple sub-learning models; then, the integrated learning model can be used to infer the known input data, specifically, the The known input data is respectively input into multiple sub-learning models, each sub-learning model infers the known input data and outputs the corresponding inference result, and finally determines the integration from the inference results of the multiple sub-learning models by voting The inference result of the learning model, for example, may be the inference result with the most votes among the inference results of the multiple sub-learning models as the inference result of the integrated learning model.

During specific implementation, the ensemble learning model may be an ensemble learning model based on random forests, such as Spark ML, Spark MLlib, etc.; the ensemble learning model may also be other types of ensemble learning models, which are not limited in this application. In addition, the application scenario of deploying the model training system shown in FIG. 2 is only an exemplary illustration, and the model training system can also be deployed in other possible scenarios in practical application, for example, the control node 100 in FIG. 1 and Each worker node may also be deployed on a server. In this case, the control node 100 and each worker node may specifically be different computing units on the server, and are divided into control nodes and worker nodes by function. In this embodiment, the applicable application scenarios of the model training system are not limited.

In order to improve the training efficiency of the ensemble learning model, in the process of training the ensemble learning model in the model training system, each sub-learning model can be trained by using a working node, so that the training results of each sub-learning model can all be located in a working node, thus After the worker node completes the training of the sub-learning model, it is not necessary to obtain the training result for the sub-learning model from other worker nodes. In this way, the amount of data required to communicate between each working node in the process of training the sub-learning model can be effectively reduced, thereby not only reducing the resource consumption (mainly the consumption of communication resources) required for training the ensemble learning model, but also It can effectively improve the training efficiency and success rate of the integrated learning model. In addition, when the ensemble learning training model needs to go through multiple rounds of training, in the process of training the ensemble learning model in each round, you can also use the above method to train each sub-learning model using a working node, which effectively reduces the process of training the sub-learning model. The amount of data that needs to be communicated between each worker node in the

The model training system can generate a training task for each sub-learning model when training the integrated learning model. For example, when the model training system receives a training request for the integrated training model, it can parse the integrated learning from the training request. The number of sub-learning models included in the model and generating the same number of training tasks, the model training system can assign each training task to a worker node for execution. Optionally, the model training system can also assign multiple sub-learning models to one worker node for execution.

In an example of allocating training tasks, the model training system may allocate training tasks to corresponding worker nodes according to the load of the worker nodes. For example, the model training system (specifically, the control node 100 in the model training system) obtains the load of each working node, and sorts the load of each working node; then, the model training system can preferentially allocate training tasks to those with smaller loads The worker nodes with a large load can not allocate training tasks, so that the load balance can be achieved in the model training system. Of course, in other possible examples, the model training system can also directly issue the training tasks in sequence according to the numbering sequence of the worker nodes. The specific implementation is not limited.

Exemplarily, the sub-learning model in the ensemble learning model may be a decision tree model. Among them, the decision tree model refers to a tree diagram composed of decision points, strategy points (event points) and results, and can be applied to sequence decision-making. In practical application, it can be based on the maximum expected benefit value or the lowest expected cost as the decision criterion, and the decision results of various schemes under different conditions can be solved graphically, and then the final decision results can be given by comparing the decision results of various schemes. . Of course, the sub-learning model may also be other models having a tree structure, etc., which is not limited in this embodiment.

Below, the training method of the integrated learning model provided by the present application is further introduced in conjunction with FIG. 3 . FIG. 3 is a schematic flowchart of the training method of the integrated learning model provided by the embodiment of the present application. For ease of understanding, the following describes the training method of the integrated learning model. In the training process, the sub-learning model is specifically a decision tree model as an example, the method can be applied to the model training system shown in FIG. 1 above, or other applicable model training systems. Specifically, the method may include:

S301: The control node 100 obtains a training request of the ensemble learning model.

In this embodiment, the model training system may trigger the training process of the ensemble learning model when receiving a training request for the ensemble learning model. As an example, the model training system may have a communication connection with a user terminal, and the user may perform a trigger operation for training the integrated learning model on the user terminal, and the user terminal generates a training request for the corresponding integrated learning model based on the operation, It is then sent to the model training system, so that the control node 100 in the model training system obtains the training request and triggers the execution of the subsequent model training process.

S302: The control node 100 generates a training task set according to the received training request, wherein the training task set includes multiple training tasks, and each training task in the multiple training tasks is executed by a worker node, and each training task uses for training at least one sub-learning model in the ensemble learning model, and different training tasks are used for training different sub-learning models.

After receiving the training request, the control node 100 in the model training system may obtain training samples for training multiple sub-learning models in the integrated learning model, wherein the number of samples is usually multiple; and the control node 100 may further Multiple training sample sets are generated according to the training samples, and the samples included in different training sample sets may be different.

In an example, after the control node 100 obtains the training samples (for example, it may be provided to the control node 100 by the user, etc.), the control node 100 may sample the training samples by means of sampling with replacement to obtain P training sample sets. Certainly, the control node 100 may also generate multiple training sample sets in other manners, which is not limited in this embodiment.

The number (P) of training sample sets sampled by the control node 100 may be determined by the user. For example, the control node can present the configuration interface as shown in FIG. 4 to the user, so that the user can configure the parallel parameter of model training to be P on the configuration interface, that is, to execute P model training processes at the same time, the control node 100 can be based on the user's The configuration generates P training sample sets, each of which can support the training of a sub-learning model, as shown in Figure 4. In practical applications, the nodes that execute the training process may specifically be working nodes. Therefore, the control node 100 can distribute the generated multiple training sample sets to different working nodes respectively. For example, each training sample set can be distributed to a worker nodes, etc.

At the same time, the control node 100 can also generate a training task set based on the received training request, the training task includes a plurality of training tasks, and each training task can be used to instruct the working node to execute a sub-learning model based on the received training sample set the training process. As an example, the training request received by the control node 100 may specifically include an instruction for training the integrated training model and the number of sub-learning models included in the integrated learning model, so that the control node 100 receives the training request After that, the training process of the integrated learning model can be started based on the instruction in the training request, and an equal number of training tasks can be generated according to the number of sub-learning models, and each training task is used to instruct the worker node to complete the training of a sub-learning model. process, and the sub-learning models trained by different training tasks are different.

S303: The control node 100 respectively sends the training tasks in the training task set to the plurality of working nodes in the working node set.

After the control node 100 generates the training task set, it can issue the training tasks in the training task set to each worker node, and each training task can be issued to a worker node for execution, and the sub-learning model corresponding to the training task The training results of can all be executed by one worker node. In practical applications, the control node 100 may be a worker node that selects a training task based on the load of each worker node. For example, for the set of working nodes in the model training system, the control node 100 may determine the first part of the working nodes and the second part of the working nodes in the set of working nodes according to the load order of each working node in the set of working nodes, wherein the first part of the working nodes The load of each work node in a part of the work nodes is less than or equal to the load of each work node in the second part of the work nodes; then, the control node 100 can issue the training tasks to each work node in the first part of the work nodes one by one For the second part of the worker nodes with high load, the control node 100 may not assign training tasks to them. In this way, it can be avoided as much as possible that the load of some worker nodes in the model training system is too high and the model training efficiency is affected. Success rate, to achieve load balancing of worker nodes. Of course, in other possible examples, the model training system may also randomly select worker nodes to perform training tasks, or sequentially issue training tasks in the order of the number of the worker nodes, etc. In this embodiment, for each training task The specific implementation of the delivery to which worker node for execution is not limited.

In some practical application scenarios, each worker node may include multiple executors, and the executor example may be a logic execution unit in the worker node, and the executor is used to execute the training task required by the worker node. Based on this, in a possible implementation manner, each training task may include multiple subtasks (tasks), and each subtask may be used to instruct a worker node to execute a partial model training process of a sublearning model, thereby controlling the node 100 After the training task is delivered to the worker node, multiple executors on the worker node can be used to execute different subtasks in the training task. In this way, by executing multiple different sub-tasks in parallel by multiple executors, the efficiency of training a single sub-learning model by worker nodes can be improved.

As an example, when the sub-learning model is specifically a decision tree model, as shown in Figure 5, each executor on the work node can use some samples in the training sample set 1, respectively, The node is trained, and each executor uses different block samples when training the tree node. In this way, on a working node, the training result of the tree node of the decision tree model 1 based on each block sample can be obtained, that is, the complete training result of the decision tree model 1 using the training sample set 1 can be obtained.

Among them, the number of training tasks can be the same as the number of training sample sets, so that in the process of training the sub-learning model for multiple working nodes, each working node can use the received training sample set, and according to the training sample set included in a single training task The task performs the corresponding model training process; and the number of subtasks in each training task can be determined according to the number of training samples in the training sample set. For example, when the training samples in the training sample set are divided into the number of When there are equal 5 parts, the control node 100 can generate 5 subtasks according to the 5 parts training samples, each subtask corresponds to one of the block samples, the block samples corresponding to different subtasks are different, and the generated 5 subtasks can be A training task is constituted, so that the control node 100 can generate different training tasks based on different training sample sets. The number of blocks for each training sample set can be configured by the user on the configuration interface shown in This is not limited.

Exemplarily, the number of training tasks and the number of worker nodes may be the same or different. For example, when the number of training tasks is the same as worker nodes, each worker node can execute all the tasks in one training task. When the number of training tasks is different from that of worker nodes, one worker node can execute all subtasks in multiple training tasks, that is, one worker node can complete the training of multiple sub-learning models.

It is worth noting that the model training system usually performs multiple rounds of iterative training process when training the ensemble learning model. And during each round of model training, the control node 100 can regenerate multiple training tasks and send them to the worker nodes for execution. During multiple rounds of iterative training, the training sample set used to train each sub-learning model can remain unchanged, and the content of the sub-tasks included in the training task of training a sub-learning model in each round can be the same as the previous round of training the sub-task. There are differences in the subtasks included in the training task of the learned model. For example, when the sub-learning model is a decision tree model, the sub-task in the previous round of training is used to train tree node 1 in the decision tree model, and the sub-task in the current round of training is used for the decision tree model. tree node 2 and tree node 3 in the training, etc. Among them, the current round of training refers to the rounds in which the model training system is performing training on the ensemble learning model in the process of training the ensemble learning model. For example, when the model training system is performing the second round of model training on the ensemble learning model , the second round of model training is the current round of training.

In the model training process of each round, if the number of training tasks to be generated is fixed (for example, it may be determined according to the number of working nodes), the control node 100 may first allocate a sub-learning model for each training task, that is, each training task Subtasks in a training task are used to train a sublearning model. When the number of sub-learning models is greater than the number of training tasks, after assigning a sub-learning model to each training task, the control node 100 continues to assign another sub-learning model to each training task from the remaining sub-learning models. , subtasks in one training task can be used to train multiple sub-learning models.

In an example, when the sub-learning model is specifically a decision tree model, the sub-task in the training task may specifically be used to determine the best split point of a tree node in the decision tree model. Among them, determining the best split point refers to determining a tree node suitable for sample splitting in the decision tree model, and the training samples contained in the two sub-nodes obtained after the tree node split belong to different attributes respectively. scope.

During the first round of training the decision tree model, the control node 100 may create a tree node list to be trained, and initialize the tree nodes in the list as the root nodes of each decision tree model, that is, when the When the number is x, the tree node list includes x root nodes.

Since the decision tree model will undergo multiple rounds of iterative training, and the task of training the decision tree model in each round is to split the tree nodes in the decision tree model, therefore, the control node 100 can be trained from Select a tree node from the list of tree nodes to be trained and add it to the current round of training tree node list (cur-tree node list). The control node 100 may specifically add the tree nodes in the tree node list to be trained to the current round of training tree node list one by one according to the order of the index numbers of the tree nodes. Optionally, after multiple rounds of training, more nodes may be split in the decision tree model, in order to avoid too many nodes added to the current round of training tree node list and the workload of the split node of the worker node is too large. . In a possible implementation, the control node 100 may limit the number of nodes added to the current round of training node list not to exceed a node number threshold, that is, limit the length of the current round of training node list, while the tree node list to be trained is limited. The nodes that have not participated in the current round of training can participate in the next round of model training, so that when there are too many nodes in the tree node list to be trained, the control node 100 can control the training in batches.

Then, the control node 100 can generate the mapping relationship between the subtask and the tree node to be trained, and broadcast it to each worker node. In addition, the control node 100 may further deliver the subtasks in the generated training task to the working nodes, for example, through a round-robin mechanism to deliver the subtasks to each working node and the like.

The worker node can perform the corresponding model training task by using the training sample set corresponding to the training task according to the received training task. During specific implementation, the executor on each work node can determine the tree node to be trained according to the mapping relationship between the subtask broadcasted by the control node 100 and the tree node to be trained, and the subtask issued by the control node 100, and Use the training sample corresponding to the subtask to train the tree node, specifically, first determine the sample attribute for splitting the tree node, such as age, etc., and then determine the sample attribute for the sample in the block sample corresponding to the subtask. That is, to determine which training samples in the block sample are classified into one class, such as the attribute value of the age attribute greater than 23, the training samples are classified into one class, and which training samples are classified into another class, such as the attribute value of the age attribute Training samples smaller than 23 are classified into another class. The sample attribute may be selected by a preset random algorithm, or selected by other methods. In this way, after the multiple executors on the working node have completed the model training respectively, the working node can obtain the training results of all training samples in the entire training sample set for the tree node, and the training result can be, for example, the training samples for the tree node distribution histogram, etc.

Among them, when all the subtasks in the training task are completed by the executor on one work node, the work node can directly obtain the training results of the current round of training for the tree nodes, without needing to obtain from other work nodes, so that different There is no need to interact between the working nodes of their training results for the decision tree model. In this way, the amount of data communicated between working nodes during the training process of the ensemble learning model can be effectively reduced.

After the worker node obtains the complete training results for the tree node in the current round of training, it can calculate the best split point for the tree node, and the best split point is used to split the training samples contained in the tree node. (For the root node, the training samples corresponding to the root node are all training samples in the entire training sample set). In specific implementation, the complete training result may indicate the sample value (ie, the attribute value) of each training sample for a predetermined sample attribute, and the working node may determine all samples according to the sample value of each training sample for the sample attribute The sample value that can achieve the maximum information gain in the value, so as to determine the sample value as the boundary, divide the training sample into two parts, and the sample value that achieves the maximum information gain is the best segmentation point.

As an example of determining the best split point, the worker node can determine the sample value corresponding to the above-mentioned attribute feature of all the training samples in the training sample set according to the obtained complete training result, and traverse the calculation method from the sample value. Determine the sample value as the best split point. Specifically, it is assumed that the optimal split point is a variable s, and the value of s is any one of the above-mentioned sample values. According to the value of the variable s, the training sample set D of size N is divided into two sets, namely the left training sample set D _left and the right training sample set D _right , for example, the sample value in the training sample set is smaller than the variable s The training samples of s are classified into the left training sample set, and the training samples whose sample values in the training sample set are greater than or equal to the variable s are classified into the right training sample set, etc. Then, the working node can calculate the information gain IG(D, s) possessed by the value of the variable s, wherein, the information gain IG can be calculated by the following formulas (1) and (2), for example:

Among them, Impurity refers to the index of impurities in the training sample set, which can also be called "impurity index", K is the number of sample categories in the training sample set, and p _i is the probability of the ith sample category in the training sample set. It is worth noting that the Impurity in the formula (2) is based on Gini as an example. In practical application, the Impurity may also include entropy, variance, etc., which is not limited in this embodiment.

Through the above formula (1) and formula (2), the working node can traverse and calculate the information gain corresponding to each value of the variable s, so that the value of s corresponding to the maximum information gain can be used as the best segmentation point.

The working node feeds back the calculated optimal split point as the final training result of the current round of training to the control node 100, so that the control node 100 can obtain the current training results of multiple decision tree models from multiple working nodes, that is, the first round of training results. The best split point for the root node of each decision tree model during round training.

For each decision tree model, the control node 100 may calculate a split node according to the optimal split point corresponding to the decision tree model, such as splitting the root node into a left node and a right node, wherein the training sample corresponding to the left node may be is the training sample in the above left training sample set D _left , and the training sample corresponding to the right node may be the training sample in the above right training sample set D _right . In this way, based on the above process, one round of training process for each decision tree model can be completed.

Then, the control node 100 can judge whether the decision tree model obtained after the current round of training satisfies the training termination condition, if so, the multiple decision tree models in the integrated learning model are the decision tree models obtained by the current round of training, and If not, the control node 100 may continue to perform the next round of training on the decision tree model.

In some examples, the training termination condition includes at least one of the following:

In Mode 1, the number of training samples corresponding to the leaf nodes of the decision tree model are all smaller than the number threshold.

Mode 2, the impurity index of the training sample set used for training the sub-learning model is smaller than the impurity threshold.

In mode 3, the depth of the nodes in the sub-learning model is greater than or equal to the depth threshold.

When each decision tree model in the ensemble learning model satisfies any one or more of the above training termination conditions, the control node 100 can stop the model training process, that is, complete the training of the ensemble learning model. Of course, the above training termination condition is only an example, and in practical application, the training termination condition may also be implemented in other ways.

When it is determined that each decision tree model obtained by the round of training does not meet the training termination condition, the control node 100 may continue to perform the next round of training process for each decision tree model.

Specifically, the control node 100 may clear the tree node list to be trained and the tree nodes that have been split in the current round of training tree node list, and add the split nodes for each decision tree model in the previous round to the tree node list. Then, the control node 100 adds the tree nodes in the current to-be-trained tree node list to the current round of training tree nodes list, and based on the above-mentioned similar process, uses the working node to split the tree nodes in the current round of training tree node list again. , in this way, each decision tree model of the ensemble learning model is trained through multiple iterations until the training of the ensemble learning model is completed when each decision tree model satisfies the training termination condition.

In this embodiment, when training the ensemble learning model, each sub-learning model in the ensemble learning model is trained by using a working node, so that the training results of each sub-learning model can all be located in one working node, so that the working node After the training of the sub-learning model is completed, the training results for the sub-learning model may not be obtained from other working nodes. In this way, the amount of data required to communicate between each working node in the process of training the sub-learning model can be effectively reduced, thereby not only reducing the resource consumption required for training the ensemble learning model, but also effectively improving the training of the ensemble learning model. efficiency and success rate.

It is worth noting that, in the above-mentioned embodiment, the control node and the working node are deployed in a cluster including multiple servers as an example for illustration. In other possible embodiments, the above-mentioned training process for the integrated learning model, It can also be implemented by a cloud service provided by a cloud data center.

Specifically, the user can send a training request for the integrated learning model to the cloud data center through a corresponding terminal device, so as to request the cloud data center to train the integrated learning model and feed it back to the user. After receiving the training request, the cloud data center can call the corresponding computing resources to complete the training of the integrated learning model. Specifically, the cloud data center can call some computing resources (such as a server supporting the cloud service, etc.) to realize the above control node 100. function, and invoking another part of computing resources (such as multiple servers supporting the cloud service, etc.) to realize the functions of the above-mentioned multiple working nodes. Wherein, the cloud data center completes the training process of the integrated learning model based on the invoked computing resources, and reference may be made to the relevant descriptions in the above embodiments, which will not be repeated here. After the cloud data center completes the training of the ensemble learning model, it can send the ensemble learning model obtained by training to the terminal device on the user side, so that the user can obtain the required ensemble learning model.

It is worth noting that the above-mentioned embodiments are only illustrative of the technical solutions of the present application, and other reasonable step combinations that those skilled in the art can think of based on the above descriptions also fall within the protection scope of the present application. Secondly, those skilled in the art should also be familiar with that, the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the present application.

The training method of the integrated learning model provided by the present application is described in detail above with reference to Fig. 1 to Fig. 5 , and the training device of the integrated learning model provided by the present application and the training device for training the integrated learning model provided by the present application will be described below in conjunction with Fig. 6 to Fig. 7 Devices that integrate learning models.

FIG. 6 is a schematic structural diagram of an integrated learning model training device provided by the present application. The integrated learning model training device 600 can be applied to a control node in a model training system, and the model training system further includes a working node. Wherein, the device 600 includes:

Obtaining module 601, for obtaining the training request of the integrated learning model;

A generating module 602, configured to generate a training task set according to the training request, the training task set includes multiple training tasks, each training task in the multiple training tasks is executed by a worker node, and each training task for training at least one sub-learning model in the integrated learning model, and different training tasks are used for training different sub-learning models;

The communication module 603 is configured to send the training tasks in the training task set to a plurality of working nodes in the working node set respectively.

It should be understood that the apparatus 600 in this embodiment of the present application may be implemented by a central processing unit (central processing unit, CPU), an application-specific integrated circuit (application-specific integrated circuit, ASIC), or a programmable logic device (programmable logic device). device, PLD) implementation, the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA), general array logic (generic array logic, GAL) or its random combination. When the training method of the integrated learning model shown in FIG. 3 can also be implemented by software, the apparatus 600 and its respective modules can also be software modules.

Optionally, the training request includes an instruction to train the ensemble learning model and the number of sub-learning models in the ensemble learning model.

Optionally, the generating module 602 is specifically configured to generate the training task set according to the number of sub-learning models included in the training request, where the number of training tasks included in the training task set is the same as the number of the sub-learning models. equal in quantity.

Optionally, the communication module 603 specifically includes:

a load obtaining unit, configured to obtain the load of each working node in the set of working nodes;

The sending unit is configured to send a training task to each working node in the first part of working nodes according to the load of each working node, the set of working nodes includes the first part of working nodes and the second part of working nodes, so The load of each working node in the first part of the working nodes is smaller than the load of each working node in the second part of the working nodes.

Optionally, the sub-learning model includes a decision tree model.

Optionally, the training termination condition of the sub-learning model includes at least one of the following conditions:

The number of training samples corresponding to the leaf nodes of the sub-learning model is less than the number threshold; or,

The impurity index of the training sample set used to train the sub-learning model is less than the impurity threshold; or,

The depth of the nodes in the sub-learning model is greater than or equal to the depth threshold.

When the apparatus 600 trains the ensemble learning model, each sub-learning model in the ensemble learning model is trained by using a working node, so that the training results of each sub-learning model can all be located in one working node, so that the working node is completing the After the sub-learning model is trained, the training results for the sub-learning model may not be obtained from other working nodes. In this way, the amount of data required to communicate between each working node in the process of training the sub-learning model can be effectively reduced, thereby not only reducing the resource consumption required for training the ensemble learning model, but also effectively improving the training of the ensemble learning model. efficiency and success rate.

The training device 600 for the integrated learning model according to the embodiment of the present application may correspond to executing the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of the respective modules of the training device for the integrated learning model 600 are for the purpose of realizing Fig. For the sake of brevity, the corresponding processes of each method in 3 will not be repeated here.

FIG. 7 is a schematic diagram of a device 700 provided by this application. As shown in FIG. 7 , the device 700 includes a processor 701 , a memory 702 , and a communication interface 703 . The processor 701 , the memory 702 , and the communication interface 703 communicate through the bus 704 , and can also communicate through other means such as wireless transmission. The memory 702 is used for storing instructions, and the processor 701 is used for executing the instructions stored in the memory 702 . Further, the device 700 may further include a memory unit 705 , and the memory unit 705 may be connected to the processor 701 , the storage medium 702 and the communication interface 703 through the bus 704 . Wherein, the memory 702 stores program codes, and the processor 701 can call the program codes stored in the memory 702 to perform the following operations:

Get the training request of the ensemble learning model;

A training task set is generated according to the training request, the training task set includes a plurality of training tasks, each training task in the plurality of training tasks is executed by a worker node, and each training task is used to train the ensemble at least one sub-learning model in the learning model, and different training tasks are used to train different sub-learning models;

The training tasks in the training task set are respectively sent to a plurality of working nodes in the working node set.

It should be understood that in this embodiment of the present application, the processor 701 may be a CPU, and the processor 701 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete device components, graphics processing unit (GPU), neural network processing unit (NPU), tensor processor (tensor) At least one of processing unit, TPU), artificial intelligence (artificial intelligent) chip, etc. A general purpose processor may be a microprocessor or any conventional processor or the like.

The memory 702 , which may include read-only memory and random access memory, provides instructions and data to the processor 701 . Memory 702 may also include non-volatile random access memory. For example, memory 702 may also store device type information.

The memory 702 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM), Double data rate synchronous dynamic random access memory (double data date SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) and direct Memory bus random access memory (direct rambus RAM, DR RAM).

The communication interface 703 is used to communicate with other devices connected to the device 700 . In addition to the data bus, the bus 704 may also include a power bus, a control bus, a status signal bus, and the like. For clarity, however, the various buses are labeled as bus 704 in the figure.

It should be understood that the device 700 according to the embodiment of the present application may correspond to the training apparatus 600 of the integrated learning model in the embodiment of the present application, and may correspond to the control node 100 that executes the method shown in FIG. 3 according to the embodiment of the present application, And the above and other operations and/or functions implemented by the device 700 are respectively to implement the corresponding processes of the respective methods in FIG. 3 , and are not repeated here for brevity.

As a possible embodiment, the device provided by the present application may also be composed of multiple devices as shown in FIG. 7 , and the multiple devices communicate through the network, and the device is used to implement each method in the above-mentioned FIG. 3 The corresponding process, for the sake of brevity, will not be repeated here.

The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains one or more sets of available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive (SSD)

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited thereto. Any person skilled in the art can easily think of various equivalents within the technical scope disclosed in the present application. Modifications or substitutions shall be covered by the protection scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

A training method for an integrated learning model, wherein the method is applied to a model training system, the model training system includes a control node and a working node, and the method includes:

The control node obtains the training request of the integrated learning model;

The control node generates a training task set according to the training request, the training task set includes multiple training tasks, each training task in the multiple training tasks is executed by a worker node, and each training task is used for training at least one sub-learning model in the integrated learning model, and different training tasks are used to train different sub-learning models;

The control node sends the training tasks in the training task set to a plurality of working nodes in the working node set respectively.
The method of claim 1, wherein the training request includes an indication to train the ensemble learning model and a number of sub-learning models in the ensemble learning model.
The method according to claim 2, wherein the control node generates a training task set according to the training request, comprising:

The control node generates the training task set according to the number of sub-learning models included in the training request, where the number of training tasks included in the training task set is equal to the number of the sub-learning models.
The method according to any one of claims 1 to 3, wherein the control node sends the training tasks in the training task set to a plurality of working nodes in the working node set respectively, comprising:

obtaining, by the control node, the load of each worker node in the worker node set;

The control node sends a training task to each of the first part of the work nodes according to the load of the respective work nodes, the set of work nodes includes the first part of the work nodes and the second part of the work nodes, the first part of the work nodes. The load of each worker node in the part of the worker nodes is smaller than the load of each worker node in the second part of the worker nodes.
The method according to any one of claims 1 to 4, wherein the sub-learning model comprises a decision tree model.
The method according to claim 5, wherein the training termination condition of the sub-learning model comprises at least one of the following conditions:

The number of training samples corresponding to the leaf nodes of the sub-learning model is less than the number threshold; or,

The impurity index of the training sample set used to train the sub-learning model is less than the impurity threshold; or,

The depth of the nodes in the sub-learning model is greater than or equal to the depth threshold.
A training device for an integrated learning model, characterized in that the device is applied to a control node in a model training system, the model training system further includes a working node, and the device includes:

The acquisition module is used to acquire the training request of the ensemble learning model;

The generation module is configured to generate a training task set according to the training request, the training task set includes a plurality of training tasks, each training task in the plurality of training tasks is executed by a worker node, and each training task uses a For training at least one sub-learning model in the integrated learning model, different training tasks are used to train different sub-learning models;

The communication module is used for respectively sending the training tasks in the training task set to a plurality of working nodes in the working node set.
8. The apparatus of claim 7, wherein the training request includes an instruction to train the ensemble learning model and a number of sub-learning models in the ensemble learning model.
The apparatus according to claim 8, wherein the generating module is specifically configured to generate the training task set according to the number of sub-learning models included in the training request, and the training task set includes a number of training tasks. The number is equal to the number of the sub-learning models.
The device according to any one of claims 7 to 9, wherein the communication module specifically comprises:

a load obtaining unit, configured to obtain the load of each working node in the set of working nodes;

The sending unit is configured to send a training task to each working node in the first part of working nodes according to the load of each working node, the set of working nodes includes the first part of working nodes and the second part of working nodes, so The load of each working node in the first part of the working nodes is smaller than the load of each working node in the second part of the working nodes.
The apparatus according to any one of claims 7 to 10, wherein the sub-learning model comprises a decision tree model.
The apparatus according to claim 11, wherein the training termination condition of the sub-learning model comprises at least one of the following conditions:

The number of training samples corresponding to the leaf nodes of the sub-learning model is less than the number threshold; or,

The impurity index of the training sample set used to train the sub-learning model is less than the impurity threshold; or,

The depth of the nodes in the sub-learning model is greater than or equal to the depth threshold.
A device, characterized in that it includes a processor and a memory;

the memory for storing computer instructions;

The processor is configured to execute the operation steps of the method according to any one of claims 1 to 6 according to the computer instructions.
A model training system, characterized in that the model training system comprises the control node and the working node according to any one of claims 1 to 6.
A computer-readable storage medium, characterized by comprising instructions for implementing the operation steps of the method according to any one of claims 1 to 6.