CN106156786B - Random forest training method based on multiple GPUs - Google Patents

Random forest training method based on multiple GPUs Download PDF

Info

Publication number
CN106156786B
CN106156786B CN201510183467.9A CN201510183467A CN106156786B CN 106156786 B CN106156786 B CN 106156786B CN 201510183467 A CN201510183467 A CN 201510183467A CN 106156786 B CN106156786 B CN 106156786B
Authority
CN
China
Prior art keywords
gpu
training
number
calculation
tree
Prior art date
Application number
CN201510183467.9A
Other languages
Chinese (zh)
Other versions
CN106156786A (en
Inventor
张京梅
Original Assignee
北京典赞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京典赞科技有限公司 filed Critical 北京典赞科技有限公司
Priority to CN201510183467.9A priority Critical patent/CN106156786B/en
Publication of CN106156786A publication Critical patent/CN106156786A/en
Application granted granted Critical
Publication of CN106156786B publication Critical patent/CN106156786B/en

Links

Abstract

The invention discloses a random forest training method based on multiple GPUs, which comprises the following steps: controlling a plurality of GPUs to calculate a decision tree, and calculating a decision of one sample by each GPU unit; calculating the tree leaves in the decision tree, stopping calculation, and releasing the GPU units of the tree leaf nodes; and starting the calculation of the next decision tree by the GPU unit released by the decision tree leaves, and repeating the steps until the calculation is completed. The GPU full load in the training process can be guaranteed, and the training efficiency is improved.

Description

Random forest training method based on multiple GPUs

Technical Field

The invention relates to a random forest training method based on multiple GPUs, in particular to a random forest training method based on multiple GPUs, which improves training efficiency.

Background

In machine learning, a Random Forest (RF-Random Forest) is a classifier that contains multiple decision trees and whose output classes are dependent on the mode of the class output by the individual trees. Leo Breiman and Adele Cutler developed algorithms that inferred random forests. For many kinds of data (input data or training samples), a random forest can generate a classifier with high accuracy by balancing errors through multiple trees, but training of multiple trees makes training complicated and time-consuming. The traditional single CPU can not well meet the random forest training requirement in practical application, generally, the data scale in practical application is huge, especially in a big data era, a single CPU or a multi-core CPU (generally not more than 10 due to less core number) can not even be qualified for training a single tree, the complexity of the tree in practical application is increased, the training complexity is further increased due to the increase of the forest scale, and the random forest training of the parallel GPU becomes a selection and hot spot technology.

The existing multi-GPU random forest training method generally takes different decision trees as parallel computing tasks, distributes the tasks to a single GPU or a group of GPUs for parallel training, namely, each GPU/group of GPUs independently performs different decision tree training, and finally synchronizes all the decision trees to synthesize a final random forest.

The parallel scale of the existing method is limited by the mismatching of the number of decision trees and the number of GPUs (graphic processing units), generally, the order of magnitude of the GPUs is thousands, when the number of decision trees of the random forest is small, multiple GPU resources cannot be fully utilized, and the training efficiency of the random forest is low; meanwhile, when a single/group of GPUs are trained independently, with the increase of the depth of the tree, leaves are continuously generated, which means that the operation of the leaf sample is finished, the depth is increased, the calculation requirement is changed (reduced), and the unbalanced calculation requirement causes that the GPUs in the group cannot work at full load all the time.

A new method is needed to solve this technical problem.

Disclosure of Invention

In order to overcome the defects of the conventional multi-GPU random forest training method, the invention provides the random forest training method based on the multi-GPU, so that the GPU full load in training is ensured, and the training efficiency is improved.

In order to achieve the purpose, the technical scheme of the invention is as follows: a random forest training method based on multiple GPUs comprises the following steps:

(1) controlling a plurality of GPUs to calculate a decision tree, and calculating a decision of one sample by each GPU unit;

(2) judging the number of GPU units and the number of training samples;

(3) if the number of GPU units is less than the number of training samples, enabling each GPU unit to calculate a decision of one sample; if the number of the GPU units is larger than the number of the training samples, grouping the GPU units to enable the number of the GPU units in each group to be smaller than or equal to the number of the training samples, and simultaneously training different decision trees in parallel by a plurality of groups;

(4) calculating the tree leaves in the decision tree, stopping calculation, and releasing the GPU units of the tree leaf nodes;

(5) and starting the calculation of the next decision tree by the GPU unit released by the decision tree leaves, and repeating the steps until the calculation is completed.

Preferably, the decision tree calculation includes: on each node in each layer, each GPU trains one sample in the training data;

for all samples in each node in each layer of decision tree, the multiple GPUs process different samples in parallel and store training results.

Preferably, each GPU unit is responsible for the computation of one sample until a leaf node is reached, and after reaching the leaf node, the GPU unit initiates the computation of the next tree (the GPU which completes the leaf node first) or directly enters the computation of the next tree (the other GPUs which do not reach the leaf node first), and the GPU is always in a full load state.

Preferably, the training result is stored in local storage of the GPU preferentially, and if the local storage conflicts, the training result is stored in global storage.

The invention has the beneficial effects that:

1. the invention does not simply take decision trees as the task of parallel training, but comprehensively considers the number of decision trees and training samples and the number of available GPU units, and at the beginning, the number of decision trees and/or the number of training samples and/or the number of characteristic are completely matched with the number of GPUs, so that the principle of completely or maximally utilizing the units with multiple GPUs in parallel can be used, the full load of the GPUs in the training can be ensured, and the training efficiency is improved.

Drawings

FIG. 1 is a flowchart of a method for multi-GPU based random forest training according to a first embodiment of the present invention;

FIG. 2 is a flowchart illustrating a random forest training method based on multiple GPUs according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

The invention does not simply take decision trees as the task of parallel training, but comprehensively considers the number of decision trees, the number of training samples and the number of available GPU units, and at the beginning, the number of decision trees and/or the number of training samples and/or the number of characteristics are completely matched with the number of GPUs, and the multi-GPU unit can be completely or maximally utilized in parallel.

Example 1

As shown in fig. 1, assuming that there are N samples, each sample has d features, and the number of decision trees in random forest RF is M, without loss of generality, in this example, the number G of GPU units is smaller than N, and with the decisions of the samples as training tasks, multiple GPU units can be utilized in parallel to the maximum.

A) Controlling a plurality of GPUs to calculate a first decision tree, wherein each GPU unit calculates a decision of one sample;

B) as the tree depth increases, the decision tree computes to the leaves, stopping computation

C) GPU unit release at leaf node

D) Starting a second decision tree, and calculating and training the released GPU unit according to the steps A-C

E) The other GPU units released from the first decision tree are added into the calculation of the second decision tree

F) Similarly, the GPU unit released by the leaves of the second decision tree starts the calculation of the third decision tree;

G) the rest is analogized from this.

Example 2

As shown in fig. 2, similarly, assuming that there are N samples, each sample has d features, the number of decision trees in random forest RF is M, in this example, the number G of GPU units is greater than N, in order to ensure full load of GPUs, GPUs are grouped, without loss of generality, taking two groups as an example, assuming that G/2< = N < G here, it is ensured that GPUs in each group are full load, and the two groups work in parallel, ensuring full load of the entire GPU, and multiple GPU units can be utilized in parallel to the maximum.

A) Controlling multiple GPUs in each group to calculate a decision tree, and calculating a decision of one sample by each GPU unit; control the multi-group synchronization

B) As the tree depth increases, the first set of decision trees computes to the leaves, stopping computation

C) GPU unit release at a first set of leaf nodes

D) Starting a third decision tree, and calculating and training the released GPU unit according to the steps A-C

E) The other GPU units released by the first group of decision trees are added into the calculation of the third decision tree

F) Similarly, the GPU unit released by the second decision tree leaves starts the calculation of a fourth decision tree;

G) the new third, fourth groups and so on.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (4)

1. A random forest training method based on multiple GPUs is characterized by comprising the following steps:
(1) controlling a plurality of GPUs to calculate a decision tree, and calculating a decision of one sample by each GPU unit;
(2) judging the number of GPU units and the number of training samples;
(3) if the number of GPU units is less than the number of training samples, enabling each GPU unit to calculate a decision of one sample; if the number of the GPU units is larger than the number of the training samples, grouping the GPU units to enable the number of the GPU units in each group to be smaller than or equal to the number of the training samples, controlling multiple GPUs in each group to calculate a decision tree, and calculating a decision of one sample by each GPU unit; controlling a plurality of groups to be synchronously carried out;
(4) calculating the tree leaves in the decision tree, stopping calculation, and releasing the GPU units of the tree leaf nodes;
(5) and the GPU unit releasing the leaf node starts the calculation of the next decision tree, and the like until the calculation is finished.
2. A multi-GPU based random forest training method as claimed in claim 1, wherein said decision tree calculation comprises: on each node in each layer, each GPU trains one sample in the training data;
for all samples in each node in each layer of decision tree, the multiple GPUs process different samples in parallel and store training results.
3. A multi-GPU based random forest training method as claimed in claim 1, wherein each GPU unit is responsible for the calculation of one sample until a leaf node is reached, the GPU unit starts the calculation of the next tree or directly enters the calculation of the next tree, and the GPU is always in a full load state.
4. A method for multi-GPU based random forest training as claimed in claim 2, wherein the training results are stored preferentially in local storage of the GPUs and in global storage if the local storage conflicts.
CN201510183467.9A 2015-04-19 2015-04-19 Random forest training method based on multiple GPUs CN106156786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510183467.9A CN106156786B (en) 2015-04-19 2015-04-19 Random forest training method based on multiple GPUs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510183467.9A CN106156786B (en) 2015-04-19 2015-04-19 Random forest training method based on multiple GPUs

Publications (2)

Publication Number Publication Date
CN106156786A CN106156786A (en) 2016-11-23
CN106156786B true CN106156786B (en) 2019-12-27

Family

ID=58057703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510183467.9A CN106156786B (en) 2015-04-19 2015-04-19 Random forest training method based on multiple GPUs

Country Status (1)

Country Link
CN (1) CN106156786B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908536B (en) * 2017-11-17 2020-05-19 华中科技大学 Performance evaluation method and system for GPU application in CPU-GPU heterogeneous environment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226540A (en) * 2013-05-21 2013-07-31 中国人民解放军国防科学技术大学 CFD (Computational Fluid Dynamics) accelerating method for multi-region structured grids on GPU (Ground Power Unit) based on grouped multi-streams
CN103336718A (en) * 2013-07-04 2013-10-02 北京航空航天大学 GPU thread scheduling optimization method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8860715B2 (en) * 2010-09-22 2014-10-14 Siemens Corporation Method and system for evaluation using probabilistic boosting trees
US9171264B2 (en) * 2010-12-15 2015-10-27 Microsoft Technology Licensing, Llc Parallel processing machine learning decision tree training
CN102214213B (en) * 2011-05-31 2013-06-19 中国科学院计算技术研究所 Method and system for classifying data by adopting decision tree
CN104391970B (en) * 2014-12-04 2017-11-24 深圳先进技术研究院 A kind of random forest data processing method of attribute subspace weighting

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226540A (en) * 2013-05-21 2013-07-31 中国人民解放军国防科学技术大学 CFD (Computational Fluid Dynamics) accelerating method for multi-region structured grids on GPU (Ground Power Unit) based on grouped multi-streams
CN103336718A (en) * 2013-07-04 2013-10-02 北京航空航天大学 GPU thread scheduling optimization method

Also Published As

Publication number Publication date
CN106156786A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN103970607B (en) Carry out the method and apparatus that calculation optimization virtual machine is distributed using equivalence set
EP3353656B1 (en) Processing computational graphs
CN106027643B (en) A kind of resource regulating method based on Kubernetes container cluster management systems
CN103810999B (en) Language model training method based on Distributed Artificial Neural Network and system thereof
Skrypnikov et al. Mathematical model of statistical identification of car transport informational provision
CN103226467B (en) Data parallel processing method, system and load balance scheduler
US20200090073A1 (en) Method and apparatus for generating machine learning model
Liu et al. HSim: a MapReduce simulator in enabling cloud computing
US10540587B2 (en) Parallelizing the training of convolutional neural networks
CN103716381B (en) Control method and management node of distributed system
CN103713956B (en) Method for intelligent weighing load balance in cloud computing virtualized management environment
Tsai et al. Parallel cat swarm optimization
US20140258689A1 (en) Processor for large graph algorithm computations and matrix operations
CN104834569B (en) A kind of cluster resource dispatching method and system based on application type
CN105468742A (en) Malicious order recognition method and device
Anjos et al. MRA++: Scheduling and data placement on MapReduce for heterogeneous environments
Pabst et al. Fast and scalable cpu/gpu collision detection for rigid and deformable surfaces
JP2018533792A (en) Calculation graph correction
US20120060167A1 (en) Method and system of simulating a data center
CN103345514A (en) Streamed data processing method in big data environment
Cheung et al. A large-scale spiking neural network accelerator for FPGA systems
CN103699441B (en) The MapReduce report task executing method of task based access control granularity
US8131660B2 (en) Reconfigurable hardware accelerator for boolean satisfiability solver
CN105729491B (en) The execution method, apparatus and system of robot task
CN104572998B (en) Question and answer order models update method and device for automatically request-answering system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant