CN108875955B - Gradient lifting decision tree implementation method based on parameter server and related equipment - Google Patents

Gradient lifting decision tree implementation method based on parameter server and related equipment Download PDF

Info

Publication number
CN108875955B
CN108875955B CN201710326930.XA CN201710326930A CN108875955B CN 108875955 B CN108875955 B CN 108875955B CN 201710326930 A CN201710326930 A CN 201710326930A CN 108875955 B CN108875955 B CN 108875955B
Authority
CN
China
Prior art keywords
parameter server
optimal
point
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710326930.XA
Other languages
Chinese (zh)
Other versions
CN108875955A (en
Inventor
江佳伟
崔斌
肖品
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710326930.XA priority Critical patent/CN108875955B/en
Priority to PCT/CN2018/081900 priority patent/WO2018205776A1/en
Publication of CN108875955A publication Critical patent/CN108875955A/en
Application granted granted Critical
Publication of CN108875955B publication Critical patent/CN108875955B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a method for realizing a gradient lifting decision tree based on a parameter server, which comprises the following steps: acquiring an interface of an optimal split point through a preset parameter server, and respectively sending optimal split point acquisition requests to P parameter server nodes; the parameter server comprises P parameter server nodes, and each parameter server node stores M/P characteristics; receiving information of the optimal split points respectively sent by the P parameter server nodes to obtain information of the P optimal split points; the P optimal split points are calculated by P parameter server nodes from M/P characteristics stored in the P parameter server nodes through a preset GBDT optimization algorithm; and comparing the target function gains of the P optimal splitting points, and selecting the splitting point with the maximum target function gain as the global optimal splitting point. The invention also discloses a computing node device, a parameter server node device and a system for realizing the gradient lifting decision tree based on the parameter server.

Description

Gradient lifting decision tree implementation method based on parameter server and related equipment
Technical Field
The invention relates to the field of computers, in particular to a gradient boost decision tree implementation method based on a parameter server, computing node equipment, a parameter server node, a parameter server and a gradient boost decision tree implementation system based on the parameter server.
Background
Machine Learning (ML) is a multi-domain cross subject, and relates to multi-domain subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. In recent years, machine learning has been used in a wide variety of applications, such as: data mining, computer vision, natural language processing, biometric identification, search engines, medical diagnostics, detecting credit card fraud, stock market analysis, deoxyribonucleic acid (DNA) sequence sequencing, speech and handwriting recognition, strategic gaming, and robotic use.
In the current big data era, a single machine has become far from meeting storage and computing requirements, and thus the importance of distributed machine learning techniques is increasingly emerging. The existing large-scale data set is often a high-dimensional data set, each training data has characteristics of millions or even hundreds of millions, the high-dimensional data set is processed in a distributed environment, communication overhead is high, and challenges are brought to a distributed machine learning system.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a method for implementing a gradient boosting decision tree based on a parameter server, a computing node device, a parameter server node, a parameter server, and a system for implementing a gradient boosting decision tree based on a parameter server, so as to solve the problem that in the prior art, when large-dimension training data is processed, the incremental fusion logic of single-point bottleneck problem is complex, and greatly reduce communication overhead.
In order to solve the above technical problem, a first aspect of the embodiments of the present invention discloses a method for implementing a gradient boosting decision tree based on a parameter server, including:
acquiring an interface of an optimal split point through a preset parameter server, and respectively sending optimal split point acquisition requests to P parameter server nodes; the parameter server comprises P parameter server nodes, wherein each parameter server node stores M/P characteristics, and M is the characteristic quantity of training data;
receiving the information of the optimal split points respectively sent by the P parameter server nodes to obtain the information of the P optimal split points; the P optimal splitting points are the optimal splitting points calculated by the P parameter server nodes from the respective stored M/P characteristics through a preset Gradient Boost Decision Tree (GBDT) optimization algorithm;
and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
The second aspect of the embodiment of the invention discloses a method for realizing a gradient lifting decision tree based on a parameter server, which comprises the following steps:
the parameter server node acquires an interface of the optimal split point through a preset parameter server, and receives an optimal split point acquisition request sent by the computing node;
the parameter server node obtains a request according to the optimal split point, and calculates the optimal split point from the stored M/P characteristics through a preset gradient lifting decision tree GBDT optimization algorithm; the M is the characteristic quantity of the training data, and the P is the quantity of the parameter server nodes included by the parameter server;
and the parameter server node sends the information of the optimal splitting point to the computing node.
A third aspect of the embodiments of the present invention discloses a computing node device, including a processor, a memory, an input module, and an output module, where the memory stores a plurality of instructions, and the instructions are loaded and executed by the processor:
acquiring an interface of an optimal split point through a preset parameter server, and respectively sending optimal split point acquisition requests to P parameter server nodes through the output module; the parameter server comprises P parameter server nodes, each parameter server node stores M/P features, and M is the feature quantity of training data;
receiving the information of the optimal splitting points respectively sent by the P parameter server nodes through the input module to obtain the information of the P optimal splitting points; the P optimal splitting points are the optimal splitting points calculated by the P parameter server nodes from the respective stored M/P characteristics through a preset Gradient Boost Decision Tree (GBDT) optimization algorithm;
and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
The fourth aspect of the embodiments of the present invention discloses a parameter server node device, which includes a processor, a memory, an input module, and an output module, where the memory stores a plurality of instructions, and the instructions are loaded and executed by the processor:
the interface of the optimal split point is obtained through a preset parameter server, and an optimal split point obtaining request sent by the computing node is received through the input module;
calculating the optimal split point from the stored M/P characteristics through a preset gradient boosting decision tree GBDT optimization algorithm according to the optimal split point acquisition request; the M is the characteristic quantity of the training data, and the P is the quantity of the parameter server nodes included in the parameter server;
and sending the information of the optimal splitting point to the computing node through the output module.
The fifth aspect of the embodiment of the invention discloses a parameter server, which comprises P parameter server node devices, wherein P is more than 1; the parameter server node device is the parameter server node device.
The sixth aspect of the embodiment of the invention discloses a system for realizing a gradient lifting decision tree based on a parameter server, which comprises the parameter server and at least one computing node device; wherein, the parameter server is the parameter server; the computing node device is the computing node device.
When the embodiment of the invention is implemented, when the computing node searches for the optimal split point, the computing node obtains the interface of the optimal split point through the preset parameter server, respectively sends the optimal split point obtaining requests to the P parameter server nodes, compares the target function gains of the P optimal split points after obtaining the optimal split point calculated by each parameter server node from the M/P characteristics stored by the parameter server node through the GBDT optimization algorithm, and selects the split point with the maximum target function gain as the global optimal split point. Each computing node does not need to acquire a complete global gradient histogram, and only needs to compare information of the candidate optimal split point returned by the parameter server node, so that the communication overhead is greatly reduced. And the global gradient histogram is divided into a plurality of parameter server nodes for storage, so that the problem of single-point bottleneck in processing large-dimension training data is solved, and the speed of summarizing the local gradient histogram is accelerated. In addition, after one computing node is appointed to select the optimal split point, the optimal split point can be stored in the parameter server, and the parameter server sends the global optimal split point to other computing nodes, so that the repeated computation of other computing nodes can be avoided, and the computation amount of the computing nodes is greatly saved. In addition, the number of the computing nodes and the number of the parameter server nodes in the embodiment of the invention can be expanded according to the requirements of users, and the invention is an expandable PS architecture, thereby solving the problems of single-point bottleneck and limited expandability existing in Spark and XGboost during processing large-dimension training data in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an architecture diagram of a system for implementing a gradient boosting decision tree based on a parameter server according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for implementing a gradient boosting decision tree based on a parameter server according to an embodiment of the present invention;
FIG. 3a is a schematic flow chart diagram illustrating another embodiment of a method for implementing a gradient boosting decision tree based on a parameter server according to the present invention;
FIG. 3b is a schematic diagram of a decision tree provided by the present invention;
FIG. 4 is a flowchart illustrating another embodiment of a method for implementing a gradient boosting decision tree based on a parameter server according to the present invention;
FIG. 5 is a flowchart illustrating another embodiment of a method for implementing a gradient boosting decision tree based on a parameter server according to the present invention;
FIG. 6 is a schematic diagram of parameter storage of a parameter server provided by an embodiment of the present invention;
FIG. 7 is a flowchart illustrating another embodiment of a method for implementing a gradient boosting decision tree based on a parameter server according to the present invention;
FIG. 8 is a schematic diagram of a gradient histogram provided by an embodiment of the present invention;
FIG. 9 is a distributed architecture of a Gradient Boosting Decision Tree (GBDT) based on Map-Reduce implementation;
FIG. 10 is a distributed architecture of a GBDT implemented based on an All-Reduce (All-Reduce) of a Multi-Point Interface (MPI);
FIG. 11 is a schematic structural diagram of an apparatus for implementing a gradient boosting decision tree based on a parameter server according to an embodiment of the present invention;
FIG. 12 is a schematic structural diagram of a computing node device according to an embodiment of the present invention;
FIG. 13 is a schematic structural diagram of an implementation apparatus of a gradient boosting decision tree based on a parameter server according to another embodiment of the present invention;
fig. 14 is a schematic structural diagram of a parameter server node device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
As shown in fig. 1, an architecture diagram of an implementation system of a gradient boosting decision tree based on a Parameter Server according to an embodiment of the present invention may include a Parameter Server (PS) and at least one computing node device, where the PS may include a plurality of Parameter Server nodes, and the PS stores a copy of global model parameters; each compute node device maintains a copy of the model parameters, updates the parameter copy with the assigned data subset in one iteration, sends updates to the PS, and then starts the next iteration by obtaining the latest global parameters from the PS. In the figure, for example, data transmission between one of the computing nodes and the parameter server node is taken as an example, a transmission line for updating and acquiring the parameters is drawn (in the figure, the data transmission mode between each computing node and the parameter server node is the same, and the data transmission lines between all the computing nodes and the parameter server node are not drawn), the computing node divides local parameters of the computing node into P parts and respectively sends the P parts to the P parameter server nodes, and acquires information of P optimal split points from the P parameter server nodes.
Further, in the embodiment of the present invention, the system for implementing a gradient boosting decision tree based on a parameter server may further include a main control node, configured to monitor the parameter server and a computing node device. It should be noted that, in the embodiment of the present invention, the parameter server node, the computing node device, and the main control node may be physical machines. The number of the parameter server nodes and the number of the computing node devices may be determined by a user (including a developer, etc.), and the embodiments of the present invention are not limited thereto. The computing nodes in the embodiments of the present invention are equivalent to computing node devices, and the parameter server nodes are equivalent to parameter server node devices.
The gradient lifting decision tree implementation system based on the parameter server can complete machine learning classification tasks such as advertisement recommendation, user gender prediction and picture classification, and machine learning regression tasks such as user age prediction and user consumption prediction. The user only needs to submit the machine learning task and finally obtain the processing result without knowing the specific processing flow, namely the whole distributed processing mechanism is transparent to the user.
Specifically, a specific processing flow of machine learning according to an embodiment of the present invention is described in detail below with reference to flow diagrams of an implementation method of a gradient boosting decision tree based on a parameter server, which are shown in fig. 2 to fig. 10, so as to describe how to implement the gradient boosting decision tree based on the parameter server.
Fig. 2 shows a flowchart of a method for implementing a gradient boosting decision tree based on a parameter server according to an embodiment of the present invention, which is described from a computing node side, and may include the following steps:
step S200: acquiring an interface of an optimal split point through a preset parameter server, and respectively sending optimal split point acquisition requests to P parameter server nodes;
specifically, the parameter server in the embodiment of the present invention is preset with an interface for acquiring the optimal split point, and provides the interface for each computing node. When a computing node searches for an optimal split point, sending an optimal split point acquisition request to P parameter server nodes through an interface for acquiring the optimal split point, wherein the parameter server comprises P parameter server nodes, each parameter server node stores M/P characteristics, and M is the characteristic quantity of training data; p may be a natural number greater than 1, and M may also be a natural number greater than 1.
It should be noted that the optimal splitting point in the embodiment of the present invention includes the optimal or optimal splitting characteristic and splitting characteristic value, and the optimal splitting point may also be named as the optimal splitting point, the optimal splitting result, or the optimal splitting result, and the present invention is not limited as long as the information or the parameter of the optimal or optimal splitting characteristic and splitting characteristic value is included.
Step S202: receiving the information of the optimal splitting points respectively sent by the P parameter server nodes to obtain the information of the P optimal splitting points; the P optimal splitting points are the optimal splitting points calculated by the P parameter server nodes from the M/P characteristics stored in the P parameter server nodes through a preset GBDT optimization algorithm;
specifically, after receiving an optimal split point acquisition request sent by a computing node, a parameter server node calculates the optimal split point from M/P features stored in the parameter server node by using a preset GBDT optimization algorithm, and returns information of the optimal split point to the computing node. Since the P parameter server nodes respectively return the information of the respective optimal split points to the computing node, the computing node obtains the information of the P optimal split points.
Step S204: and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
Specifically, the calculation node compares the target function gains of the P optimal split points, so as to select the split point with the maximum target function gain as the global optimal split point, and subsequently, a leaf node may be created, and the training data of the node is split to two leaf nodes.
It should be noted that, in the process of training the respective training data of each computing node, according to the embodiment of the present invention, each computing node does not need to obtain a complete global gradient histogram, and only needs to compare information of the candidate optimal split point returned by the parameter server node, thereby greatly reducing communication overhead. In the embodiment of the invention, the candidate optimal splitting point is the information of P candidate optimal splitting points respectively returned by P parameter server nodes, and the computing node selects one global optimal splitting point which is most split from the information of the P candidate optimal splitting points.
Further, with reference to the flow diagram of another embodiment of the method for implementing a gradient boosting decision tree based on a parameter server, which is shown in fig. 3a, the method is described in more detail from the side of a computing node, and it should be noted that, the embodiment of the present invention is described by taking one computing node as an example (the processing flows of other computing nodes are also the same), and may include the following steps:
step S300: calculating candidate split points;
specifically, the computing node sequentially reads the training data distributed to itself, and computes a candidate splitting characteristic value for each characteristic, thereby obtaining candidate splitting points.
It should be noted that the training data of the present invention is historical data used for constructing a machine learning model, and the training data assigned to each computing node may be different.
Step S302: creating a decision tree;
specifically, the compute nodes create a new tree and perform initialization work including initializing a tree structure, computing first and second order gradients of training data, initializing a queue of tree nodes to be processed, adding a root node of the tree to the queue, and the like.
Step S304: calculating a gradient histogram;
specifically, the compute node fetches the pending tree node from the queue of pending tree nodes. For each tree node to be processed, respectively calculating a local gradient histogram corresponding to each tree node according to training data of the node, and dividing each local gradient histogram into P block local gradient histograms; the P block local gradient histograms correspond to the P parameter server nodes one by one; and then updating an interface of the gradient histogram through a preset parameter server, and sending each block local gradient histogram of each tree node to be processed to the corresponding parameter server node.
It should be noted that the local gradient histogram in the embodiment of the present invention may include a first-order gradient histogram and a second-order gradient histogram, respectively; that is, dividing each local gradient histogram into P blocks includes dividing each first-order gradient histogram into P blocks, dividing each second-order gradient histogram into P blocks, and then sending to the corresponding parameter server nodes, respectively. The parameter server divides the global gradient histogram (namely the whole gradient histogram) into P parameter server nodes, namely, each parameter server node stores the first-order and second-order gradient histograms of M/P features, thereby solving the problem of single-point bottleneck in large-dimension training data and accelerating the speed of summarizing the local gradient histogram.
Step S306: searching an optimal splitting point;
specifically, the computing node obtains an interface of an optimal split point through the parameter server, obtains information of the optimal split point of each parameter server node from the parameter server, compares target function gains of the P split points, and selects the split point with the maximum gain as the global optimal split point. Specifically, reference may be made to step S202 to step S204, which are not described herein again.
It should be noted that data is usually represented as a vector, each dimension of the vector is called a feature of the data and represents a certain property of the data, and a value of the certain property of the data is called a feature value of the data. Fig. 3b is a schematic diagram of a decision tree provided by the present invention, in which each node is a tree node, where the starting node "owns a property" is a root node, and other nodes such as "marriage", "monthly income" may be leaf nodes.
Further, the information of the optimal splitting point in the embodiment of the present invention may include splitting characteristics, splitting characteristic values, and target function gains; then, the selecting, by the computing node, the splitting point with the maximum target function gain as the global optimal splitting point may specifically include: and taking the splitting characteristic and the splitting characteristic value of the splitting point with the maximum objective function gain as the global optimal splitting point. Therefore, each computing node does not need to acquire a complete global gradient histogram, but only needs to compare candidate optimal split points returned by the parameter server nodes, and each parameter server node in the embodiment of the invention can only return three numbers (split characteristics, split characteristic values and target function gains) to the computing node, thereby greatly reducing communication overhead.
Step S308: splitting tree nodes;
specifically, the calculation node creates leaf nodes according to the calculated optimal split point, and splits the training data of the node to two leaf nodes.
Step S310: judging whether the height of the tree reaches the maximum limit;
specifically, when the height of the tree is not determined to reach the maximum limit, adding two leaf nodes into a queue of the tree node to be processed, and then jumping to the step S304; when it is determined that the height of the tree has reached the maximum limit, step S312 is performed.
Step S312: judging whether the number of trees reaches the maximum limit;
specifically, when it is determined that the number of trees does not reach the maximum limit, the method jumps to step S302, and a new decision tree is created again. When it is judged that the number of trees reaches the maximum limit, step S314 is performed. The compute node iteratively invokes steps S302 through S308 until the building of all trees is completed.
Step S314: and finishing the training.
Specifically, the computing node finishes training all decision trees, calculates and outputs performance indexes (accuracy, error and the like), and can output a training model.
It should be further noted that, in the embodiment of the present invention, the information of the optimal split point may further include an objective function gain and identification information, where the identification information is used to indicate a parameter server node corresponding to the optimal split point; that is to say, after receiving an optimal split point acquisition request sent by a computing node, each parameter server node calculates the split characteristic, the split characteristic value and the objective function gain of the optimal split point from the M/P characteristics stored in the parameter server node through a preset GBDT optimization algorithm, but only sends the objective function gain to the computing node. And after the calculation nodes compare the target function gains of the P optimal splitting points and select the maximum target function gain, requesting splitting characteristics and splitting characteristic values from the corresponding parameter server nodes according to the identification information of the splitting point with the maximum target function gain, and then receiving the splitting characteristics and the splitting characteristic values returned by the corresponding parameter server nodes as the global optimal splitting point. Each computing node does not need to acquire a complete global gradient histogram, but only needs to compare candidate optimal split points returned by the parameter server nodes, and each parameter server node in the embodiment of the invention can only return two numbers (identification information and target function gain) to the computing node, and after the computing node determines the maximum target function gain, the splitting characteristic and the splitting characteristic value are acquired from the corresponding parameter server node, so that the communication overhead is greatly reduced.
The following description, with reference to the flowchart of fig. 4, of another embodiment of the method for implementing a gradient boosting decision tree based on a parameter server according to the present invention is correspondingly described from a node side of the parameter server, and may include the following steps:
step S400: the parameter server node acquires an interface of the optimal split point through a preset parameter server, and receives an optimal split point acquisition request sent by the computing node;
specifically, when the computing node searches for the optimal split point, it may send an optimal split point acquisition request to the P parameter server nodes through the interface for acquiring the optimal split point, and the parameter server nodes receive the optimal split point acquisition request sent by the computing node.
Step S402: the parameter server node obtains a request according to the optimal split point, and calculates the optimal split point from the stored M/P characteristics through a preset gradient lifting decision tree GBDT optimization algorithm; the M is the characteristic quantity of the training data, and the P is the quantity of the parameter server nodes included in the parameter server;
step S404: and the parameter server node sends the information of the optimal splitting point to the computing node.
More specifically, with reference to the flowchart of fig. 5, which illustrates another embodiment of the method for implementing a gradient boosting decision tree based on a parameter server according to the present invention, the method is described in more detail from the side of a parameter server node, and may include the following steps:
step S500: the parameter server node updates an interface of the gradient histogram through a preset parameter server and receives a blocked local gradient histogram sent by the computing node;
the blocked local gradient histograms are local gradient histograms corresponding to the calculation nodes aiming at each tree node to be processed respectively according to training data of the nodes, and each local gradient histogram is divided into P blocks; and the local gradient histograms of the P blocks are in one-to-one correspondence with the P parameter server nodes.
Specifically, as shown in fig. 6, which is a schematic diagram of parameter storage of a parameter server provided in the embodiment of the present invention, the parameter server stores a global gradient histogram through P parameter server nodes. T is the total number of decision trees to be trained, d is the maximum height of each tree, and then the maximum number of trees is (2) d -1) tree nodes. The embodiment of the invention will be described in eachThe global gradient histograms of a feature on each tree node are respectively represented as a vector with the length of 2K, wherein the first K are first-order gradient histograms, and the last K are second-order gradient histograms. M is the feature quantity of the training data, and each feature corresponds to a first-order gradient histogram and a second-order gradient histogram. The gradient histograms of all M characteristics are connected into a vector with the length of 2KM, are averagely divided into P blocks and are respectively stored on P parameter server nodes, so that the global gradient histograms of M/P characteristics with the length of 2KM/P are stored on each parameter server node. Because of at most (2) d 1) tree nodes, so that each parameter server node stores a global gradient histogram of total size 2KM (2) d -1)/P. According to the embodiment of the invention, the global gradient histogram is divided into a plurality of parameter server nodes for storage, so that the problem of single-point bottleneck encountered during processing of large-dimension training data is solved.
It should be noted that, in order to multiplex the global gradient histograms in the process of building a plurality of decision trees, the entire global gradient histogram is all reset to 0 before building each decision tree.
Step S502: the parameter server node adds the block local gradient histogram to a corresponding global gradient histogram;
specifically, after each parameter server node receives the local gradient histogram sent by the computing node, the processed tree node is determined and is accumulated to the corresponding global gradient histogram, so that the global gradient histogram is updated.
Step S504: the parameter server node acquires an interface of an optimal split point through a preset parameter server, and receives an optimal split point acquisition request sent by the computing node;
step S506: the parameter server node obtains a request according to the optimal split point, and calculates the optimal split point from the stored M/P characteristics through a preset GBDT optimization algorithm;
step S508: and the parameter server node sends the information of the optimal splitting point to the computing node.
Specifically, step S400 to step S406 may be referred to in step S504 to step S508, which are not described herein again. It should be noted that, the operation of updating the gradient histogram in steps S500 and S502 and the processing operation of the optimal split point in steps S504 to S508 may not be in a sequential order, and when the optimal split point is calculated, the latest global gradient histogram is used, and the updating of the global gradient histogram is continuous.
It should be noted that the information of the optimal splitting point in the embodiment of the present invention includes the splitting characteristic, the splitting characteristic value, and the objective function gain. The information of the optimal split point may also include an objective function gain and identification information, where the identification information is used to indicate the parameter server node; after the parameter server node sends the information of the optimal split point to the designated computing node, the method may further include: the parameter server node receives a request of the computing node for acquiring the splitting characteristic and the splitting characteristic value, and sends the splitting characteristic and the splitting characteristic value of the optimal splitting point to the computing node according to the request for acquiring the splitting characteristic and the splitting characteristic value.
In the following, referring to the flowchart of fig. 7, which illustrates another embodiment of the method for implementing a gradient-boosting decision tree based on a parameter server according to the present invention, taking a computing node to update a primary gradient histogram and obtain a primary optimal split point as an example, the method is described from two sides of the computing node and the parameter server node, and may include the following steps:
step S700: the calculation node executes a GBDT algorithm by using the distributed data subset to obtain a local gradient histogram;
specifically, in the GBDT algorithm, for each tree, the following objective function is minimized:
Figure BDA0001291462030000111
wherein x i Is training data, y i Is a true label of the training data, f t (x i ) Is the predicted value given by the t-th tree to the training data,
Figure BDA0001291462030000112
is the predicted value of the training data after the t-th tree, l is the cost function (given the true label and the predicted value, one cost value is calculated), and Ω is the regular term to prevent model overfitting.
The GBDT algorithm uses the gradient histogram as side information to find the best split point. For each training data, its first order gradient g to the objective function l is calculated i And a second order gradient h i Then, the distribution of the first order gradient and the second order gradient of all the training data is counted, and a gradient histogram is generated for each feature of the training data. As shown in fig. 8, which is a schematic diagram of the principle of the gradient histogram provided by the embodiment of the present invention, the abscissa is an interval of the feature value of a certain feature, and the ordinate is the sum of the gradients of the training data of the feature value in this interval. For example, training data with eigenvalues between 0 and 0.2, we accumulate their first order gradients as the first bin (bin) of the first order gradient histogram of fig. 8.
Step S702: the calculation node divides the local gradient histogram into P blocks;
specifically, P block local gradient histograms correspond one-to-one to P parameter server nodes.
Step S704: respectively sending each block local gradient histogram to a corresponding parameter server node;
step S706: the parameter server nodes summarize local gradient histograms of all the computing nodes to generate a global gradient histogram;
step S708: the computing node requests an optimal split point from each parameter server node;
step S710: after each parameter server node receives the request, returning the information of the optimal split point in the characteristics in the M/P stored by the node to the computing node;
specifically, in order to find the optimal split point at each tree node, the embodiment of the present invention may adopt a heuristic optimization method, and calculate the optimal split point according to the first-order and second-order gradient histograms:
(1) Suppose there are M characteristics, M is trainingTraining the feature quantity of data, and selecting K candidate splitting feature values S for each feature m ={S m1 ,S m2 ,...,S mK }:
for m=1to M do
generate K split candidates S m ={S m1 ,S m2 ,...,S mk }
end for
(2) We poll N training data in turn to find first and second order gradient histograms. We use G mk To represent the kth bin of the first order gradient histogram of the mth feature, by H mk To represent the kth bin of the second order gradient histogram of the mth feature.
for m=1to M do
loop N instances to generate gradient histogram with K bins
G mk =∑g i where s mk-1 <x im <s mk
H mk =∑h i where s mk-1 <x im <s mk
end for
The gradient histogram of each feature has K bins, the splitting feature value of K candidates is the maximum value of the feature value interval corresponding to each bin, that is, the feature value interval corresponding to the kth bin is [ s ] mk-1, s mk ]。
(3) From left to right, we read the k bins of the first and second order gradient histograms in turn, based on
The following algorithm finds the best split point (split feature and split feature value):
g is the sum of the values of all bins of the first order gradient histogram and H is the sum of the values of all bins of the second order gradient histogram. G L For accumulating the sum of the values of bins of the first order gradient histogram that have been read, H L A sum of the values of the bins that have been read for accumulating the second order gradient histogram; g R Is the sum of the values of bins of the first order gradient histogram that have not been read, H R Is the sum of the values of bins of the second order gradient histogram that have not been read.
Figure BDA0001291462030000133
for m=1to M do
G L =0,H L =0
for k=1to K do
G L =G L +G mk ,H L =H L +H mk
G R =G-G L ,H R =H-H L
Figure BDA0001291462030000131
end for
end for
The M is the number of features of the training data, the total number of the gradient histograms is 2M, and each feature corresponds to a first-order gradient histogram and a second-order gradient histogram. When finding the optimal splitting point of a certain feature, a first-order gradient histogram and a second-order gradient histogram of the feature are needed, and the optimal splitting point is found according to the two gradient histograms. In the above algorithm, we process the first and second order gradient histograms of M features in turn, for each candidate splitting feature value of each feature, calculate the target function F that can bring this as the splitting point (t) Gain magnitude of
Figure BDA0001291462030000132
And selecting the characteristic and the characteristic value with the largest gain from all MK candidate splitting points as the optimal splitting point.
Step S712: the calculation node receives the information of the optimal split points respectively returned by the P parameter servers, and selects the split point with the maximum gain brought to the target function;
it should be noted that, as shown in fig. 9, in a distributed architecture of a Gradient Boosting Decision Tree (GBDT) implemented based on Map-Reduce (Map-Reduce), as used by a compute engine Spark, each worker node (i.e., compute node) sends a local Gradient histogram to a node through a Reduce operation after generating the local Gradient histogram; the node collects the gradient histograms of all the computing nodes, generates a global gradient histogram, searches for an optimal split point, and broadcasts the optimal split point to all the computing nodes through Map operation. However, when the feature dimension of the training data is large, the local gradient histogram generated by each computation node is also large. In this case, a single-point bottleneck problem is encountered in Reduce operation, and the nodes summarizing the local gradient histograms may slow down the processing speed of the entire system due to network congestion. The distributed architecture of GBDT based on All-Reduce implementation of Multi-Point Interface (MPI) as shown in fig. 10, as used by XGBoost, all computing nodes are organized into a binary tree structure; after each computing node generates a local gradient histogram, sending the local gradient histogram to a parent computing node; after the parent computing node collects the gradient histograms of the two child computing nodes, the gradient histograms are sent to the parent computing node; after the root computing node obtains the global gradient histogram, the optimal split point is searched, and then the optimal split point is sequentially sent to all the computing nodes through the tree structure. However, this scheme requires multi-step transmission, which brings large traffic when the feature dimension of the training data is large. And with the increase of computing nodes, more transmission steps are needed to complete the summarization of the local gradient histogram.
By implementing the embodiment of the invention, when the computing node searches for the optimal split point, the computing node obtains the interface of the optimal split point through the preset parameter server, respectively sends the optimal split point obtaining requests to the P parameter server nodes, compares the target function gains of the P optimal split points after obtaining the optimal split point calculated by each parameter server node from the respective stored M/P characteristics through the GBDT optimization algorithm, and selects the split point with the maximum target function gain as the global optimal split point. Each computing node does not need to acquire a complete global gradient histogram, and only needs to compare information of the candidate optimal split point returned by the parameter server node, so that the communication overhead is greatly reduced. And the global gradient histogram is divided into a plurality of parameter server nodes for storage, so that the problem of single-point bottleneck encountered during processing of large-dimension training data is solved, and the speed of summarizing the local gradient histogram is accelerated. In addition, the number of the computing nodes and the number of the parameter server nodes in the embodiment of the invention can be expanded according to the requirements of users, and the invention is an expandable PS architecture, thereby solving the problems of single-point bottleneck and limited expandability existing in Spark and XGboost during processing large-dimension training data in the prior art.
Step S714: and the computing node splits the currently processed tree node according to the optimal split point.
Specifically, the compute nodes are processed through iterations until all trees are built.
It should be further noted that, in the embodiment of the present invention, each computing node may search for the optimal split point according to the embodiments in fig. 3 to fig. 4, or one of the computing nodes may be designated to search for the optimal split point according to the embodiments in fig. 3 to fig. 4. If one of the computing nodes is designated to search for the optimal split point, after selecting the split point with the maximum gain of the objective function as the global optimal split point, the designated computing node may further include: and sending the global optimal splitting point to the parameter server so that the parameter server sends the global optimal splitting point to all the computing nodes. The parameter server may store all the acquired global optimal split points in the P parameter server nodes, that is, other computing nodes may also directly acquire the split characteristic and the split characteristic value of the optimal split point from the parameter server nodes. Therefore, repeated calculation of other calculation nodes can be avoided, and the calculation amount of the calculation nodes is greatly saved.
In order to better implement the above-mentioned solution of the embodiment of the present invention, the present invention further provides a device for implementing a gradient boosting decision tree based on a parameter server, which is described in detail below with reference to the accompanying drawings:
as shown in fig. 11, which is a schematic structural diagram of an implementation apparatus for gradient boosting decision tree based on a parameter server according to an embodiment of the present invention, the implementation apparatus 11 for gradient boosting decision tree based on a parameter server may include: a request module 110, an information receiving module 112, and an optimal split point selecting module 114, wherein,
the request module 110 is configured to obtain an interface of an optimal split point through a preset parameter server, and send optimal split point obtaining requests to P parameter server nodes respectively; the parameter server comprises P parameter server nodes, each parameter server node stores M/P features, and M is the feature quantity of training data;
the information receiving module 112 is configured to receive information of the optimal split points respectively sent by the P parameter server nodes, so as to obtain information of the P optimal split points; the P optimal splitting points are the optimal splitting points calculated by the P parameter server nodes from the M/P characteristics stored in the P parameter server nodes respectively through a preset Gradient Boosting Decision Tree (GBDT) optimization algorithm;
the optimal splitting point selecting module 114 is configured to compare the target function gains of the P optimal splitting points according to the information of the P optimal splitting points, and select the splitting point with the largest target function gain as the global optimal splitting point.
In particular, the implementation apparatus 11 for gradient boosting decision tree based on parameter server may further include a partitioning module and a sending module, wherein,
the blocking module is used for respectively calculating corresponding local gradient histograms according to training data of nodes per se aiming at each tree node to be processed, and dividing each local gradient histogram into P blocks; the P partitioned local gradient histograms correspond to the P parameter server nodes one by one;
the sending module is used for updating an interface of the gradient histogram through a preset parameter server and sending each block local gradient histogram of each tree node to be processed to the corresponding parameter server node.
More specifically, the information of each optimal splitting point may include a splitting characteristic, a splitting characteristic value, and an objective function gain;
the optimal splitting point selecting module 114 is specifically configured to use the splitting characteristic and the splitting characteristic value in the splitting point with the largest target function gain as the global optimal splitting point.
Or the information of each optimal splitting point comprises target function gain and identification information, and the identification information is used for indicating a parameter server node corresponding to the optimal splitting point;
the optimal splitting point selecting module 114 is specifically configured to request a splitting characteristic and a splitting characteristic value from a corresponding parameter server node according to the identification information of the splitting point with the largest gain of the target function; and receiving the splitting characteristic and the splitting characteristic value returned by the corresponding parameter server node as a global optimal splitting point.
More specifically, the apparatus 11 for implementing a gradient boosting decision tree based on a parameter server may further include a node sending module, configured to send the global optimal split point to the parameter server after the optimal split point selecting module 114 selects the split point with the largest target function gain as the global optimal split point, so that the parameter server sends the global optimal split point to all the computing nodes.
Still further, as shown in fig. 12, which is a schematic structural diagram of a computing node device provided in the embodiment of the present invention, the computing node device 120 may include: at least one processor 1201, e.g., a CPU, an input module 1202, an output module 1203, a memory 1204, at least one communication bus 1205. A communication bus 1205 is used to enable, among other things, connectivity communications between these components. The memory 1204 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 1204 includes a flash in the embodiment of the present invention. The memory 1204 may optionally be at least one memory system located remotely from the processor 1201. As shown in fig. 12, the memory 1204, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an implementation program of a gradient boosting decision tree based on a parameter server.
In the computing node device 120 shown in fig. 12, the processor 1201 may be configured to invoke an implementation program of the parameter server-based gradient boosting decision tree stored in the memory 1204, and perform the following operations:
acquiring an interface of an optimal split point through a preset parameter server, and respectively sending an optimal split point acquisition request to the P parameter server nodes through an output module 1203; the parameter server comprises P parameter server nodes, each parameter server node stores M/P features, and M is the feature quantity of training data;
receiving, by the input module 1202, information of the optimal split points respectively sent by the P parameter server nodes, to obtain information of the P optimal split points; the P optimal splitting points are the optimal splitting points calculated by the P parameter server nodes from the M/P characteristics stored in the P parameter server nodes respectively through a preset Gradient Boosting Decision Tree (GBDT) optimization algorithm;
and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
Specifically, the processor 1201 also performs:
respectively calculating corresponding local gradient histograms according to training data of nodes of each tree node to be processed, and dividing each local gradient histogram into P blocks; the P partitioned local gradient histograms correspond to the P parameter server nodes one by one;
the interface of the gradient histogram is updated through a preset parameter server, and the local gradient histogram of each block of each tree node to be processed is sent to the corresponding parameter server node through the output module 1203.
Specifically, the information of each optimal splitting point includes a splitting characteristic, a splitting characteristic value and an objective function gain;
the selecting, by the processor 1201, the split point with the largest target function gain as the global optimal split point may include:
and taking the splitting characteristic and the splitting characteristic value in the splitting point with the maximum objective function gain as the global optimal splitting point.
Specifically, the information of each optimal splitting point includes an objective function gain and identification information, where the identification information is used to indicate a parameter server node corresponding to the optimal splitting point;
the processor 1201 selects the split point with the maximum target function gain as the global optimal split point, including:
according to the identification information of the splitting point with the maximum gain of the objective function, requesting the splitting characteristic and the splitting characteristic value from the corresponding parameter server node through the output module 1203;
and receiving the splitting characteristic and the splitting characteristic value returned by the corresponding parameter server node through the input module to serve as a global optimal splitting point.
Specifically, after the processor 1201 selects the split point with the maximum objective function gain as the global optimal split point, the following steps may be further performed:
the global optimal split point is sent to the parameter server through the output module 1203, so that the parameter server sends the global optimal split point to all the computing nodes.
It should be noted that, in the embodiment of the present invention, the functions of the gradient boost decision tree implementation apparatus 11 based on a parameter server and each module in the computing node device 120 may refer to the specific implementation manner of any embodiment in fig. 3 to fig. 10 in the foregoing method embodiments, and details are not repeated here. The implementation apparatus 11 and the computing node device 120 of the gradient boosting decision tree based on the parameter server may include, but are not limited to, a computer or other physical machine.
The invention also provides another implementation device of the gradient boosting decision tree based on the parameter server, which is described in detail in the following with reference to the accompanying drawings:
as shown in fig. 13, which is a schematic structural diagram of another embodiment of the apparatus for implementing a gradient boosting decision tree based on a parameter server according to the present invention, the apparatus 13 for implementing a gradient boosting decision tree based on a parameter server may include: a request receiving module 130, an optimal split point calculating module 132, and an information transmitting module 134, wherein,
the request receiving module 130 is configured to obtain an interface of an optimal split point through a preset parameter server, and receive an optimal split point obtaining request sent by a computing node;
the optimal split point calculation module 132 is configured to calculate an optimal split point from the stored M/P features through a preset gradient boosting decision tree GBDT optimization algorithm according to the optimal split point acquisition request; the M is the characteristic quantity of the training data, and the P is the quantity of the parameter server nodes included by the parameter server;
the information sending module 134 is configured to send the information of the optimal splitting point to the computing node.
Specifically, the implementation apparatus 13 for gradient boosting decision tree based on parameter server may further include: a local gradient histogram receiving module and a histogram updating module, wherein,
the local gradient histogram receiving module is used for updating an interface of the gradient histogram through a preset parameter server and receiving the blocked local gradient histogram sent by the computing node; the block local gradient histogram is a local gradient histogram which is obtained by respectively calculating the local gradient histograms corresponding to the calculation nodes according to the training data of the nodes of the calculation nodes aiming at each tree node to be processed and dividing each local gradient histogram into P blocks; the local gradient histograms of the P blocks correspond to the P parameter server nodes one by one;
the histogram updating module is used for adding the block local gradient histogram to the corresponding global gradient histogram.
Specifically, the information of the optimal splitting point includes a splitting characteristic, a splitting characteristic value, and an objective function gain.
Or the information of the optimal split point comprises an objective function gain and identification information, and the identification information is used for indicating the parameter server node;
the implementation apparatus 13 for gradient boosting decision tree based on parameter server may further include: a splitting point information sending module, configured to receive, after the information sending module 134 sends the information of the optimal splitting point to the computing node, a request for obtaining splitting characteristics and splitting characteristic values from the computing node, and send, according to the request for obtaining the splitting characteristics and the splitting characteristic values, the splitting characteristics and the splitting characteristic values of the optimal splitting point to the computing node.
Still further, as shown in fig. 14, which is a schematic structural diagram of a parameter server node device provided in the embodiment of the present invention, the parameter server node device 140 may include: at least one processor 1401, e.g. a CPU, an input module 1402, an output module 1403, a memory 1404, at least one communication bus 1405. Wherein a communication bus 1405 is used to enable connective communication between these components. The memory 1404 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory, and the memory 1404 includes a flash in the embodiment of the present invention. The memory 1404 may optionally also be at least one storage system located remotely from the processor 1401 as described previously. As shown in fig. 14, a memory 1404, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an implementation program of a gradient boosting decision tree based on a parameter server.
In the parameter server node apparatus 140 shown in fig. 14, a processor 1401 may be configured to invoke an implementation program of a parameter server-based gradient boosting decision tree stored in a memory 1404, and perform the following operations:
an interface of an optimal split point is obtained through a preset parameter server, and an optimal split point obtaining request sent by a computing node is received through an input module 1402;
calculating the optimal split point from the stored M/P characteristics through a preset gradient boosting decision tree GBDT optimization algorithm according to the optimal split point acquisition request; the M is the characteristic quantity of the training data, and the P is the quantity of the parameter server nodes included in the parameter server;
the information of the optimal split point is sent to the computing node through the output module 1403.
Specifically, processor 1401 may also perform:
updating an interface of the gradient histogram through a preset parameter server, and receiving a blocked local gradient histogram sent by a computing node through an input module 1402; the block local gradient histogram is a local gradient histogram which is obtained by respectively calculating the local gradient histograms corresponding to the calculation nodes according to the training data of the nodes of the calculation nodes aiming at each tree node to be processed and dividing each local gradient histogram into P blocks; the local gradient histograms of the P blocks correspond to the P parameter server nodes one by one;
and accumulating the blocked local gradient histograms to the corresponding global gradient histogram.
Specifically, the information of the optimal splitting point includes a splitting characteristic, a splitting characteristic value, and an objective function gain.
Specifically, the information of the optimal split point includes an objective function gain and identification information, where the identification information is used to indicate the parameter server node;
after the processor 1401 sends the information of the optimal splitting point to the computing node through the output module, it may further perform:
the request of the computing node for obtaining the splitting characteristic and the splitting characteristic value is received through the input module 1402, and the splitting characteristic value of the optimal splitting point are sent to the computing node through the output module 1403 according to the request of obtaining the splitting characteristic and the splitting characteristic value.
It should be noted that, in the embodiment of the present invention, the functions of each module in the gradient boost decision tree implementation apparatus 13 and the parameter server node device 140 based on a parameter server may refer to the specific implementation manner of any embodiment in fig. 3 to fig. 10 in the foregoing method embodiments, and details are not repeated here. The implementation means 13 of the gradient boosting decision tree based on the parameter server and the parameter server node apparatus 140 may include, but are not limited to, a computer or other physical machine.
In addition, an embodiment of the present invention further provides a parameter server, where the parameter server includes P parameter server node devices, and the parameter server node devices may be the parameter server node devices 140 in the embodiment shown in fig. 14.
When the embodiment of the invention is implemented, when the computing node searches for the optimal split point, the computing node obtains the interface of the optimal split point through the preset parameter server, respectively sends the optimal split point obtaining requests to the P parameter server nodes, compares the target function gains of the P optimal split points after obtaining the optimal split point calculated by each parameter server node from the M/P characteristics stored by the parameter server node through the GBDT optimization algorithm, and selects the split point with the maximum target function gain as the global optimal split point. Each computing node does not need to acquire a complete global gradient histogram, and only needs to compare information of the candidate optimal split point returned by the parameter server node, so that the communication overhead is greatly reduced. And the global gradient histogram is divided into a plurality of parameter server nodes for storage, so that the problem of single-point bottleneck encountered during processing of large-dimension training data is solved, and the speed of summarizing the local gradient histogram is accelerated. In addition, after one computing node is appointed to select the optimal split point, the optimal split point can be stored in the parameter server, and the parameter server sends the global optimal split point to other computing nodes, so that the repeated computation of other computing nodes can be avoided, and the computation amount of the computing nodes is greatly saved. In addition, the number of the computing nodes and the number of the parameter server nodes in the embodiment of the invention can be expanded according to the requirements of users, and the invention is an expandable PS (packet switched) architecture, thereby solving the problems of single-point bottleneck and limited expandability of Spark and XGboost in the prior art when processing large-dimension training data.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (14)

1. A gradient boosting decision tree implementation method based on a parameter server is characterized by comprising the following steps:
respectively calculating corresponding local gradient histograms according to training data of nodes of each tree node to be processed, and dividing each local gradient histogram into P block local gradient histograms; the P block local gradient histograms correspond to the P parameter server nodes one by one;
updating an interface of the gradient histogram through a preset parameter server, and sending each blocking local gradient histogram of each tree node to be processed to a corresponding parameter server node;
acquiring an interface of an optimal split point through a preset parameter server, and respectively sending optimal split point acquisition requests to the P parameter server nodes; the parameter server comprises P parameter server nodes, each parameter server node stores M/P features and a gradient histogram of each feature, M is the feature quantity of training data, and P is the quantity of the parameter server nodes included in the parameter server;
receiving the information of the optimal splitting points respectively sent by the P parameter server nodes to obtain the information of the P optimal splitting points; the P optimal splitting points are gradient histograms of the P parameter server nodes which are respectively stored by the P parameter server nodes through a preset gradient lifting decision tree GBDT optimization algorithm, the target function gain of each feature when the feature is used as the splitting point is calculated, and the feature with the maximum target function gain is selected from the M/P features which are respectively stored as the optimal splitting point;
and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
2. The method of claim 1, wherein the information for each optimal split point includes a split characteristic, a split characteristic value, and an objective function gain;
the selecting the splitting point with the maximum target function gain as the global optimal splitting point comprises the following steps:
and taking the splitting characteristic and the splitting characteristic value in the splitting point with the maximum objective function gain as the global optimal splitting point.
3. The method according to any one of claims 1-2, wherein after selecting the split point with the maximum objective function gain as the global optimal split point, the method further comprises:
and sending the global optimal splitting point to the parameter server so that the parameter server sends the global optimal splitting point to all the computing nodes.
4. A gradient boosting decision tree implementation method based on a parameter server is characterized by comprising the following steps:
the parameter server node updates an interface of the gradient histogram through a preset parameter server and receives a blocked local gradient histogram sent by the computing node; the block local gradient histogram is a local gradient histogram which is obtained by respectively calculating the local gradient histograms corresponding to the calculation nodes according to the training data of the nodes of the calculation nodes aiming at each tree node to be processed and dividing each local gradient histogram into P blocks; the local gradient histograms of the P blocks correspond to the P parameter server nodes one by one;
the parameter server node adds the block local gradient histogram to a corresponding global gradient histogram;
the parameter server node acquires an interface of the optimal split point through a preset parameter server, and receives an optimal split point acquisition request sent by the computing node;
the parameter server node sequentially processes gradient histograms of the stored characteristics through a preset gradient boost decision tree GBDT optimization algorithm according to the optimal split point acquisition request, calculates the target function gain when each characteristic is used as a split point, and selects the characteristic with the maximum target function gain from the stored characteristics as the optimal split point; each parameter server node respectively stores M/P features and a gradient histogram of each feature, wherein M is the feature quantity of training data, and P is the quantity of the parameter server nodes included by the parameter server;
and the parameter server node sends the information of the optimal splitting point to the computing node.
5. A computing node device comprising a processor, a memory, an input module, and an output module, wherein the memory stores a plurality of instructions that are loaded and executed by the processor:
respectively calculating corresponding local gradient histograms according to training data of nodes of each tree node to be processed, and dividing each local gradient histogram into P block local gradient histograms; the P block local gradient histograms correspond to the P parameter server nodes one by one;
updating an interface of the gradient histogram through a preset parameter server, and sending each blocked local gradient histogram of each tree node to be processed to a corresponding parameter server node through the output module;
acquiring an interface of an optimal split point through the preset parameter server, and respectively sending optimal split point acquisition requests to the P parameter server nodes through the output module; the parameter server comprises P parameter server nodes, wherein each parameter server node respectively stores M/P characteristics and a gradient histogram of each characteristic, M is the characteristic quantity of training data, and P is the quantity of the parameter server nodes included by the parameter server;
receiving the information of the optimal splitting points respectively sent by the P parameter server nodes through the input module to obtain the information of the P optimal splitting points; the P optimal splitting points are gradient histograms of the P parameter server nodes which are respectively stored by the P parameter server nodes through a preset gradient lifting decision tree GBDT optimization algorithm, the target function gain of each feature when the feature is used as the splitting point is calculated, and the feature with the maximum target function gain is selected from the M/P features which are respectively stored as the optimal splitting point;
and comparing the target function gains of the P optimal split points according to the information of the P optimal split points, and selecting the split point with the maximum target function gain as the global optimal split point.
6. The apparatus of claim 5, wherein the information for each optimal split point comprises a split characteristic, a split characteristic value, and an objective function gain;
the processor selects the splitting point with the maximum target function gain as the global optimal splitting point, and comprises the following steps:
and taking the splitting characteristic and the splitting characteristic value in the splitting point with the maximum objective function gain as the global optimal splitting point.
7. The apparatus of any of claims 5-6, wherein the processor, after selecting the split point with the largest objective function gain as the global optimal split point, further performs:
sending, by the output module, the global optimal split points to the parameter server, so that the parameter server sends the global optimal split points to all the computing nodes.
8. A parameter server node device comprising a processor, a memory, an input module, and an output module, wherein the memory stores a plurality of instructions that are loaded and executed by the processor:
updating an interface of the gradient histogram through a preset parameter server, and receiving a blocked local gradient histogram sent by a computing node through the input module; the block local gradient histogram is a local gradient histogram which is obtained by respectively calculating the local gradient histograms corresponding to the calculation nodes according to the training data of the nodes of the calculation nodes aiming at each tree node to be processed and dividing each local gradient histogram into P blocks; the local gradient histograms of the P blocks correspond to the P parameter server nodes one by one;
adding the block local gradient histogram to a corresponding global gradient histogram;
the interface of the optimal split point is obtained through a preset parameter server, and an optimal split point obtaining request sent by the computing node is received through the input module;
according to the optimal splitting point obtaining request, sequentially processing gradient histograms of the features stored in the parameter server node equipment through a preset Gradient Boosting Decision Tree (GBDT) optimization algorithm, calculating target function gain when each feature is used as a splitting point, and selecting the feature with the maximum target function gain from the features stored in the parameter server node equipment as the optimal splitting point; the parameter server node equipment stores M/P features and a gradient histogram of each feature, wherein M is the feature quantity of training data, and P is the quantity of parameter server nodes included by the parameter server;
and sending the information of the optimal splitting point to the computing node through the output module.
9. The apparatus of claim 8, wherein the information of the optimal split point comprises a split characteristic, a split characteristic value, and an objective function gain.
10. An apparatus for implementing a gradient boosting decision tree of a parameter server, the apparatus comprising: a block module, a sending module, a request module, an information receiving module and an optimal split point selecting module, wherein,
the blocking module is used for respectively calculating corresponding local gradient histograms according to training data of nodes per se aiming at each tree node to be processed, and dividing each local gradient histogram into P blocks; the P block local gradient histograms correspond to the P parameter server nodes one by one;
the sending module is used for updating an interface of the gradient histogram through a preset parameter server and sending each blocked local gradient histogram of each tree node to be processed to the corresponding parameter server node;
the request module is used for acquiring an interface of an optimal split point through the preset parameter server and respectively sending an optimal split point acquisition request to the P parameter server nodes; the parameter server comprises P parameter server nodes, each parameter server node stores M/P features and a gradient histogram of each feature, M is the feature quantity of training data, and P is the quantity of the parameter server nodes included in the parameter server;
the information receiving module is used for receiving the information of the optimal split points respectively sent by the P parameter server nodes to obtain the information of the P optimal split points; the P optimal splitting points are gradient histograms of the P parameter server nodes which are respectively stored by the P parameter server nodes through a preset gradient lifting decision tree GBDT optimization algorithm, the target function gain of each feature when the feature is used as the splitting point is calculated, and the feature with the maximum target function gain is selected from the M/P features which are respectively stored as the optimal splitting point;
and the optimal splitting point selection module is used for comparing the target function gains of the P optimal splitting points according to the information of the P optimal splitting points and selecting the splitting point with the maximum target function gain as the global optimal splitting point.
11. An apparatus for implementing a gradient boosting decision tree of a parameter server, the apparatus comprising: a local gradient histogram receiving module, a histogram updating module, a request receiving module, an optimal split point calculating module and an information sending module, wherein,
the local gradient histogram receiving module is used for updating an interface of the gradient histogram through a preset parameter server and receiving the blocked local gradient histogram sent by the computing node; the blocked local gradient histograms are local gradient histograms corresponding to the calculation nodes aiming at each tree node to be processed respectively according to training data of the nodes, and each local gradient histogram is divided into P blocks; the local gradient histograms of the P blocks correspond to the P parameter server nodes one by one;
the histogram updating module is used for accumulating the blocked local gradient histograms to corresponding global gradient histograms;
the request receiving module is used for acquiring an interface of the optimal split point through a preset parameter server and receiving an optimal split point acquisition request sent by the computing node;
the optimal splitting point calculation module is used for sequentially processing the gradient histograms of the features stored in the parameter server through a preset Gradient Boosting Decision Tree (GBDT) optimization algorithm according to the optimal splitting point acquisition request, calculating the target function gain when each feature is used as a splitting point, and selecting the feature with the maximum target function gain from the features stored in the parameter server as the optimal splitting point; the parameter server stores M/P features and a gradient histogram of each feature, wherein M is the feature quantity of training data, and P is the quantity of parameter server nodes included in the parameter server;
and the information sending module is used for sending the information of the optimal splitting point to the computing node.
12. A parameter server is characterized by comprising P parameter server node devices, wherein P is larger than 1; the parameter server node device of any of claims 8, 9 or 11.
13. A gradient boosting decision tree implementation system based on a parameter server is characterized by comprising the parameter server and at least one computing node device;
wherein the parameter server is the parameter server of claim 12; the computing node device is the computing node device of any one of claims 5, 6, 7, or 8.
14. A computer-readable storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method according to any one of claims 1 to 4.
CN201710326930.XA 2017-05-10 2017-05-10 Gradient lifting decision tree implementation method based on parameter server and related equipment Active CN108875955B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710326930.XA CN108875955B (en) 2017-05-10 2017-05-10 Gradient lifting decision tree implementation method based on parameter server and related equipment
PCT/CN2018/081900 WO2018205776A1 (en) 2017-05-10 2018-04-04 Parameter server-based method for implementing gradient boosting decision tree and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710326930.XA CN108875955B (en) 2017-05-10 2017-05-10 Gradient lifting decision tree implementation method based on parameter server and related equipment

Publications (2)

Publication Number Publication Date
CN108875955A CN108875955A (en) 2018-11-23
CN108875955B true CN108875955B (en) 2023-04-18

Family

ID=64104317

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710326930.XA Active CN108875955B (en) 2017-05-10 2017-05-10 Gradient lifting decision tree implementation method based on parameter server and related equipment

Country Status (2)

Country Link
CN (1) CN108875955B (en)
WO (1) WO2018205776A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109826626B (en) * 2019-01-08 2020-10-20 浙江大学 Intelligent coal mining machine cutting mode recognition system
EP3709229A1 (en) * 2019-03-13 2020-09-16 Ricoh Company, Ltd. Learning device and learning method
CN112052954B (en) * 2019-06-06 2024-05-31 北京百度网讯科技有限公司 Gradient lifting tree modeling method and device and terminal
CN110728317A (en) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 Training method and system of decision tree model, storage medium and prediction method
US20210133677A1 (en) * 2019-10-31 2021-05-06 Walmart Apollo, Llc Apparatus and methods for determining delivery routes and times based on generated machine learning models
CN113497785B (en) * 2020-03-20 2023-05-12 深信服科技股份有限公司 Malicious encryption traffic detection method, system, storage medium and cloud server
CN111475988B (en) * 2020-04-03 2024-02-23 浙江工业大学之江学院 Printing and dyeing setting energy consumption optimization method based on gradient lifting decision tree and genetic algorithm
CN111680799B (en) * 2020-04-08 2024-02-20 北京字节跳动网络技术有限公司 Method and device for processing model parameters
CN111488942A (en) * 2020-04-15 2020-08-04 深圳前海微众银行股份有限公司 Data processing method, device and computer readable storage medium
CN111860831B (en) * 2020-06-19 2023-01-10 苏州浪潮智能科技有限公司 Automatic recalculation method and device based on PyTorch framework
CN111738534B (en) * 2020-08-21 2020-12-04 支付宝(杭州)信息技术有限公司 Training of multi-task prediction model, and prediction method and device of event type
CN113824677B (en) * 2020-12-28 2023-09-05 京东科技控股股份有限公司 Training method and device of federal learning model, electronic equipment and storage medium
CN112948608B (en) * 2021-02-01 2023-08-22 北京百度网讯科技有限公司 Picture searching method and device, electronic equipment and computer readable storage medium
CN113722739B (en) * 2021-09-06 2024-04-09 京东科技控股股份有限公司 Gradient lifting tree model generation method and device, electronic equipment and storage medium
CN114118641B (en) * 2022-01-29 2022-04-19 华控清交信息科技(北京)有限公司 Wind power plant power prediction method, GBDT model longitudinal training method and device
CN114529108B (en) * 2022-04-22 2022-07-22 北京百度网讯科技有限公司 Tree model based prediction method, apparatus, device, medium, and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718493A (en) * 2014-12-05 2016-06-29 阿里巴巴集团控股有限公司 Method and device for sorting search results based on decision-making trees
CN105808582A (en) * 2014-12-30 2016-07-27 华为技术有限公司 Parallel generation method and device of decision tree on the basis of layered strategy
CN105809176A (en) * 2014-12-30 2016-07-27 华为技术有限公司 Optimal split point generating method based on range strategy and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351196A1 (en) * 2013-05-21 2014-11-27 Sas Institute Inc. Methods and systems for using clustering for splitting tree nodes in classification decision trees
US10229357B2 (en) * 2015-09-11 2019-03-12 Facebook, Inc. High-capacity machine learning system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718493A (en) * 2014-12-05 2016-06-29 阿里巴巴集团控股有限公司 Method and device for sorting search results based on decision-making trees
CN105808582A (en) * 2014-12-30 2016-07-27 华为技术有限公司 Parallel generation method and device of decision tree on the basis of layered strategy
CN105809176A (en) * 2014-12-30 2016-07-27 华为技术有限公司 Optimal split point generating method based on range strategy and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Stochastic gradient boosted distributed decision trees》;Jerry Ye等;《Proceedings of the 18th ACM conference on information and knowledge management》;20091106;第2061-2064页 *
《基于MPI的正则化贪心森林的并行化设计与实现》;杨梓灿;《中国优秀硕士学位论文全文数据库》;20150115;第1-4,30-34页 *

Also Published As

Publication number Publication date
CN108875955A (en) 2018-11-23
WO2018205776A1 (en) 2018-11-15

Similar Documents

Publication Publication Date Title
CN108875955B (en) Gradient lifting decision tree implementation method based on parameter server and related equipment
US11544536B2 (en) Hybrid neural architecture search
US11755367B2 (en) Scheduling operations on a computation graph
US9500487B2 (en) Shortest path determination for large graphs
KR20190086134A (en) Method and apparatus for selecting optiaml training model from various tarining models included in neural network
CN110659678B (en) User behavior classification method, system and storage medium
CN111340221B (en) Neural network structure sampling method and device
US11809486B2 (en) Automated image retrieval with graph neural network
CN111461753B (en) Method and device for recalling knowledge points in intelligent customer service scene
CN109032630B (en) Method for updating global parameters in parameter server
CN114913386A (en) Training method of multi-target tracking model and multi-target tracking method
CN114663848A (en) Knowledge distillation-based target detection method and device
CN113449012A (en) Internet service mining method based on big data prediction and big data prediction system
CN114298122A (en) Data classification method, device, equipment, storage medium and computer program product
CN111260056B (en) Network model distillation method and device
CN113590898A (en) Data retrieval method and device, electronic equipment, storage medium and computer product
CN113240089B (en) Graph neural network model training method and device based on graph retrieval engine
CN115358397A (en) Parallel graph rule mining method and device based on data sampling
CN114021541A (en) Presentation generation method, device, equipment and storage medium
CN111723247A (en) Graph-based hypothetical computation
CN116089722B (en) Implementation method, device, computing equipment and storage medium based on graph yield label
CN113033827B (en) Training method and device for deep forest
CN111563783B (en) Article recommendation method and device
CN112258297A (en) Method, device and computer-readable storage medium for pushing description information of article
CN114707505A (en) Design mode recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant