WO2018205776A1 - 基于参数服务器的梯度提升决策树的实现方法及相关设备 - Google Patents

基于参数服务器的梯度提升决策树的实现方法及相关设备 Download PDF

Info

Publication number
WO2018205776A1
WO2018205776A1 PCT/CN2018/081900 CN2018081900W WO2018205776A1 WO 2018205776 A1 WO2018205776 A1 WO 2018205776A1 CN 2018081900 W CN2018081900 W CN 2018081900W WO 2018205776 A1 WO2018205776 A1 WO 2018205776A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter server
optimal
point
splitting
node
Prior art date
Application number
PCT/CN2018/081900
Other languages
English (en)
French (fr)
Inventor
江佳伟
崔斌
肖品
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018205776A1 publication Critical patent/WO2018205776A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • Machine Learning is a multi-disciplinary subject involving many disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. Specializing in how computers simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to continuously improve their performance.
  • machine learning has been widely used, such as: data mining, computer vision, natural language processing, biometrics, search engines, medical diagnosis, detection of credit card fraud, securities market analysis, deoxyribonucleic acid (Deoxyribonucleic acid, DNA) sequence sequencing, speech and handwriting recognition, strategy games and robotics.
  • An embodiment of the present invention discloses a method for implementing a gradient lifting decision tree based on a parameter server, which includes: sending, by using a preset parameter server, an interface for obtaining an optimal splitting point, respectively sending the most parameters to the P parameter server nodes.
  • the parameter server includes P parameter server nodes, each parameter server node stores M/P features, and the M is a feature quantity of the training data; both M and P are greater than 1 a natural number; receiving information of the best split point respectively sent by the P parameter server nodes, and obtaining information of P best split points; wherein the P best split points are passed by the P parameter server nodes
  • Pre-determined gradient lifting decision tree GBDT optimization algorithm calculates the best splitting point from the M/P features stored separately; compares the P best splitting points according to the information of the P best splitting points The objective function gains and selects the best split point with the largest gain of the objective function as the global optimal split point.
  • Another aspect of the embodiment of the present invention discloses a method for implementing a gradient lifting decision tree based on a parameter server, including: a parameter server node receives an interface sent by a computing node through an interface of a preset parameter server for acquiring an optimal splitting point. An optimal splitting point acquisition request; the parameter server node calculates an optimal splitting point from the stored M/P features by using a preset gradient lifting decision tree GBDT optimization algorithm according to the optimal splitting point acquisition request; M is the number of features of the training data, the P is the number of parameter server nodes included in the parameter server; M and P are both natural numbers greater than 1; the parameter server node sends the information of the optimal splitting point Give the compute node.
  • Yet another aspect of an embodiment of the present invention discloses a computing node device, including a processor, a memory, an input module, and an output module, the memory storing a plurality of instructions, the instructions being loaded and executed by the processor: Outputting a module, and sending an optimal split point acquisition request to the P parameter server nodes respectively through an interface for obtaining an optimal split point in the preset parameter server; wherein the parameter server includes P parameter server nodes, each Each of the parameter server nodes stores M/P features, the M is a feature number of the training data; M and P are both natural numbers greater than 1; and the P module server nodes respectively receive the most transmitted by the input module
  • the information of the best splitting point is obtained by obtaining the information of the P best splitting points; wherein the P optimal splitting points are the M/s stored by the PDT parameter server nodes through the preset gradient lifting decision tree GBDT optimization algorithm.
  • the best split point calculated among the P features; comparing the objective function gains of the P best split points according to the information of the P best split points
  • a parameter server node device includes a processor, a memory, an input module, and an output module, where the memory stores a plurality of instructions, and the instructions are loaded and executed by the processor:
  • the tree GBDT optimization algorithm calculates an optimal splitting point from the stored M/P features; the M is the number of features of the training data, and the P is the number of parameter server nodes included in the parameter server; M and P are both a natural number greater than one; information of the optimal split point is sent to the computing node by the output module.
  • a further aspect of the embodiment of the present invention discloses a parameter server, including P parameter server node devices, wherein the P is a natural number greater than 1; and the parameter server node device is the parameter server node device described above.
  • a further aspect of the embodiments of the present invention discloses a parameter server-based gradient lifting decision tree implementation system, including a parameter server and at least one computing node device; wherein the parameter server is the parameter server; the computing node device For the above computing node device.
  • a further aspect of the embodiments of the present invention provides a method for implementing a gradient lifting decision tree based on a parameter server, where the method is performed by a computing node, and the method includes: adopting, by using a preset parameter server, an interface for acquiring an optimal splitting point, Sending an optimal split point acquisition request to the P parameter server nodes respectively; wherein the parameter server includes P parameter server nodes, each parameter server node stores M/P features, and the M is a feature of the training data.
  • M and P are both natural numbers greater than 1; receiving information of the best split point respectively sent by the P parameter server nodes, and obtaining information of P best split points; wherein the P best split points Comparing the optimal splitting points calculated by the P-parameter server nodes from the M/P features stored by the preset gradient lifting decision tree GBDT optimization algorithm; and comparing according to the information of the P best splitting points
  • the objective function gains of the P best split points, and the best split point with the largest gain of the objective function is selected as the global optimal split point.
  • a still further aspect of the embodiments of the present invention provides a parameter server-based gradient lifting decision tree implementation apparatus, including: at least one memory; at least one processor; wherein the at least one memory stores at least one instruction module, configured by Executing, by the at least one processor, the at least one instruction module includes: a requesting module, configured to send the best to the P parameter server nodes by using an interface for obtaining an optimal splitting point in the preset parameter server a split point acquisition request; wherein the parameter server includes P parameter server nodes, each parameter server node stores M/P features, the M is a feature quantity of the training data; and both M and P are greater than 1.
  • a data receiving module configured to receive information about an optimal splitting point respectively sent by the P parameter server nodes, to obtain information of P optimal splitting points, where the P best splitting points are the P
  • the parameter server nodes calculate the most from the M/P features stored by each of the stored gradient decision tree GBDT optimization algorithms.
  • a split point selection module configured to compare objective function gains of the P best split points according to information of the P best split points, and select an optimal split point with the largest gain of the objective function as The global best split point.
  • a still further aspect of the embodiments of the present invention provides a parameter server-based gradient lifting decision tree implementation apparatus, including: at least one memory; at least one processor; wherein the at least one memory stores at least one instruction module, configured by Executing, by the at least one processor, the at least one instruction module includes: a request receiving module, configured to receive an optimal splitting point sent by the computing node by using an interface of the preset parameter server for acquiring an optimal splitting point Obtaining a request; an optimal splitting point calculating module, configured to calculate an optimal splitting point from the stored M/P features by using a preset gradient lifting decision tree GBDT optimization algorithm according to the optimal splitting point acquisition request; M is the number of features of the training data, the P is the number of parameter server nodes included in the parameter server; M and P are both natural numbers greater than 1; and an information sending module is configured to use the information of the optimal splitting point Sent to the compute node.
  • a request receiving module configured to receive an optimal splitting point sent by the computing node by using an interface of the preset parameter server for
  • Yet another aspect of an embodiment of the present invention discloses a computer readable storage medium having stored thereon a computer program, which may be implemented when the processor executes the computer program.
  • FIG. 1 is an architecture diagram of an implementation system of a gradient server based gradient lifting decision tree according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for implementing a gradient server based gradient lifting decision tree according to an embodiment of the present invention
  • FIG. 3a is a schematic flowchart diagram of another embodiment of a method for implementing a parameter server-based gradient lifting decision tree according to the present invention.
  • FIG. 3b is a schematic diagram of the principle of the decision tree provided by the present invention.
  • FIG. 4 is a schematic flowchart diagram of another embodiment of a method for implementing a parameter server-based gradient lifting decision tree according to the present invention.
  • FIG. 5 is a schematic flowchart diagram of another embodiment of a method for implementing a parameter server-based gradient lifting decision tree according to the present invention.
  • FIG. 6 is a schematic diagram of a principle of parameter storage of a parameter server according to an embodiment of the present invention
  • FIG. 7 is a schematic flowchart of another embodiment of a method for implementing a gradient server based gradient elevation decision tree provided by the present invention
  • FIG. 8 is a schematic diagram of a principle of a gradient histogram according to an embodiment of the present invention.
  • GDT Gradient boosting decision tree
  • FIG. 11 is a schematic structural diagram of an apparatus for implementing a gradient server based gradient lifting decision tree according to an embodiment of the present invention
  • FIG. 12 is a schematic structural diagram of a computing node device according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of another embodiment of an apparatus for implementing a parameter server-based gradient lifting decision tree according to the present invention.
  • FIG. 14 is a schematic structural diagram of a parameter server node device according to an embodiment of the present invention.
  • a parameter server-based gradient elevation decision tree implementation system may include a parameter server (Parameter Server, PS) and at least A computing node device, wherein the PS may include a plurality of parameter server nodes, the PS stores a global model parameter; each computing node device maintains a copy of the model parameters, and in one iteration, updates the parameters with the assigned subset of data The copy will send an update of the parameters to the PS, then get the latest global parameters from the PS to start the next iteration.
  • PS Parameter Server
  • the data transmission of the parameter update and acquisition is drawn (the data transmission mode of each computing node and the parameter server node is the same in the figure, and all computing nodes are not drawn here.
  • the computing node divides its local parameters into P parts and sends them to the P parameter server nodes, and acquires information of the P best split points from the P parameter server nodes.
  • the implementation system of the gradient-based decision tree based on the parameter server in the embodiment of the present invention may further include a master node for monitoring the parameter server and the computing node device.
  • the parameter server node, the computing node device, and the master node in the embodiment of the present invention may all be physical machines.
  • the number of the parameter server nodes and the number of the computing node devices may be determined by the user (including the developer, etc.), which is not limited by the embodiment of the present invention.
  • the computing node in various embodiments of the present invention is equivalent to a computing node device, and the parameter server node is equivalent to a parameter server node device.
  • the system for implementing the gradient-based decision tree based on the parameter server of the embodiment of the present invention can complete machine learning classification tasks such as advertisement recommendation, user gender prediction, image classification, and machine learning regression tasks such as user age prediction and user consumption prediction.
  • the user only needs to submit the machine learning task, and finally obtain the processing result without knowing the specific processing flow, that is, the entire distributed processing mechanism is transparent to the user.
  • FIG. 2 is a schematic flowchart of a method for implementing a parameter server-based gradient elevation decision tree according to an embodiment of the present invention, which is first described from a computing node side, and may include the following steps:
  • Step S200 Obtain an interface of an optimal splitting point by using a preset parameter server, and send an optimal splitting point obtaining request to the P parameter server nodes respectively; that is, an interface for obtaining an optimal splitting point in the preset parameter server. Sending an optimal split point acquisition request to the P parameter server nodes respectively;
  • the parameter server in the embodiment of the present invention is preset with an interface for obtaining an optimal split point, and is provided to each computing node.
  • the best splitting point acquisition request can be sent to the P parameter server nodes through the interface for obtaining the best splitting point, wherein the parameter server includes P parameter server nodes, each of which The parameter server nodes each store M/P features, which are the number of features of the training data; P can be a natural number greater than 1, and M can also be a natural number greater than 1.
  • the optimal splitting point in the embodiment of the present invention includes an optimal or optimal splitting feature and a splitting feature value
  • the optimal splitting point may also be named as an optimal splitting point, an optimal splitting result, or the most The result of the excellent splitting, etc., is not limited in the present invention as long as it includes information or parameters of the best or optimal splitting feature and splitting feature value.
  • Step S202 Receive information of the best split point respectively sent by the P parameter server nodes, and obtain information of P best split points; wherein the P best split points are pre-prescribed by the P parameter server nodes.
  • the GBDT optimization algorithm calculates the optimal splitting point from the M/P features stored by each;
  • the parameter server node calculates the optimal splitting point from the M/P features stored by the preset GBDT optimization algorithm, and optimizes the splitting point.
  • the information of the point is returned to the compute node. Since the P parameter server nodes respectively return information of the respective optimal split points to the compute node, the compute node obtains information of the P best split points.
  • Step S204 Compare the target function gains of the P best split points according to the information of the P best split points, and select a split point with the largest gain of the objective function as the global optimal split point.
  • the computing node compares the objective function gains of the P best splitting points, and selects the splitting point with the largest gain of the objective function as the global optimal splitting point, and subsequently can create a leaf node, and segment the training data of the node into On two leaf nodes.
  • each computing node does not need to obtain a complete global gradient histogram in the process of training the respective training data, and only needs to compare the candidate candidates returned by the parameter server node.
  • the information of the good split point greatly reduces the communication overhead.
  • the candidate optimal splitting point is information of an optimal splitting point of P candidates respectively returned by the P parameter server nodes, and the computing node selects one of the information of the best splitting points of the P candidates. As the global best split point for splitting.
  • FIG. 3a a schematic flowchart of another embodiment of a method for implementing a parameter server-based gradient elevation decision tree provided by the present invention, which is shown in FIG. 3a, is described in more detail from the side of the computing node, and it should be noted that The embodiment of the present invention is described by taking a computing node as an example (the processing flow of other computing nodes is also the same), and may include the following steps:
  • Step S300 calculating candidate split points
  • the computing node sequentially reads the training data assigned to itself, and calculates a candidate splitting feature value for each feature, thereby obtaining a candidate splitting point.
  • training data of the present invention is historical data used to construct a machine learning model, and the training data assigned to each computing node may be different.
  • Step S302 Create a decision tree.
  • the computing node creates a new tree and performs initialization work, including initializing the tree structure, calculating the first and second levels of the training data, initializing a queue of the tree node to be processed, and adding the root node of the tree to the queue.
  • Step S304 calculating a gradient histogram
  • the computing node takes out the tree node to be processed from the queue of the tree node to be processed.
  • the corresponding local gradient histograms are respectively calculated according to the training data of the own nodes, and each local gradient histogram is divided into P block partial gradient histograms; wherein, P blocks are locally gradient
  • the histogram corresponds to the P parameter server nodes one by one; and then the interface of the gradient histogram is updated by the preset parameter server, and each block local gradient histogram of each to-be-processed tree node is sent to the corresponding parameter server node.
  • the local gradient histogram in the embodiment of the present invention may include a step histogram and a two-step histogram, respectively; that is, each partial gradient histogram is divided into P blocks including each A stepped histogram is divided into P blocks, and each two-step histogram is divided into P blocks and then sent to corresponding parameter server nodes.
  • the parameter server divides the global gradient histogram (ie, the entire gradient histogram) into P parameter server nodes, that is, each parameter server node saves the first-order and second-step histograms of the M/P features, thereby solving the large
  • the single-point bottleneck problem encountered in dimension training data accelerates the speed of summarizing local gradient histograms.
  • Step S306 Find an optimal split point
  • the computing node obtains the interface of the optimal splitting point through the parameter server, obtains the information of the optimal splitting point of each parameter server node from the parameter server, and then compares the objective function gains of the P splitting points, and selects the splitting point with the largest gain. As the global best split point. For details, refer to step S202 to step S204, and details are not described herein again.
  • FIG. 3b is a schematic diagram of the principle of the decision tree provided by the embodiment of the present invention.
  • Each node in the figure is a tree node, wherein the starting node “owns the real estate” is the root node, “whether married”, “monthly income”, etc.
  • Other nodes can be leaf nodes.
  • the information of the optimal splitting point in the embodiment of the present invention may include a splitting feature, a splitting feature value, and an objective function gain; then the calculating node selects the splitting point with the largest gain of the objective function as the global optimal splitting point, which may specifically include:
  • the splitting feature and splitting eigenvalue of the split point with the largest gain of the function are used as the global optimal splitting point. Therefore, each computing node does not need to obtain a complete global gradient histogram, but only needs to compare the candidate optimal splitting points returned by the parameter server node.
  • each parameter server node may only need to return three numbers (split feature). The split eigenvalue and the objective function gain can be given to the compute node, which greatly reduces the communication overhead.
  • Step S308 splitting the tree node
  • the computing node creates a leaf node according to the calculated optimal splitting point, and splits the training data of the node into two leaf nodes.
  • Step S310 determining whether the height of the tree reaches a maximum limit
  • step S304 when the height of the determination tree does not reach the maximum limit, two leaf nodes are added to the queue of the tree node to be processed, and then the process jumps to step S304; when the height of the determination tree reaches the maximum limit, step S312 is performed.
  • Step S312 determining whether the number of trees reaches a maximum limit
  • step S302 when the number of judgment trees does not reach the maximum limit, the process jumps to step S302 to create a new decision tree again.
  • step S314 is performed.
  • the compute node iteratively invokes steps S302 through S308 until all trees are established.
  • Step S314 The training is completed.
  • the computing node trains all decision trees, calculates and outputs performance indicators (accuracy, error, etc.), and can output the training model.
  • the information about the optimal splitting point in the embodiment of the present invention may further include an objective function gain and identification information, where the identifier information is used to indicate a parameter server node corresponding to the optimal splitting point; that is, each parameter
  • the server node calculates the splitting feature, splitting feature value and objective function of the best splitting point from the stored M/P features by using the preset GBDT optimization algorithm. Gain, but only the objective function gain is sent to the compute node.
  • the computing node compares the objective function gains of the P best splitting points, selects the maximum objective function gain, and requests the splitting feature and the splitting feature value from the corresponding parameter server node according to the identification information of the splitting point with the largest gain of the objective function.
  • each compute node does not need to obtain a complete global gradient histogram, but only needs to compare the candidate best split points returned by the parameter server node.
  • each parameter server node may only need to return two numbers (identification information and target).
  • the function gain can be given to the computing node, and after the computing node determines the maximum objective function gain, the splitting feature and the splitting feature value can be obtained from the corresponding parameter server node, which greatly reduces the communication overhead.
  • a flow diagram of another embodiment of the implementation method of the parameter server-based gradient elevation decision tree provided by the present invention, which is shown in FIG. 4, is correspondingly described from the parameter server node side, and may include the following steps:
  • Step S400 The parameter server node obtains an interface of an optimal split point through a preset parameter server, and receives an optimal split point acquisition request sent by the calculation node; that is, the parameter server node uses the preset parameter server to obtain an optimal split. An interface of the point, receiving an optimal split point acquisition request sent by the computing node;
  • the interface that obtains the best splitting point can send an optimal splitting point obtaining request to the P parameter server nodes, and the parameter server node receives the sent by the computing node.
  • the best split point gets the request.
  • Step S402 The parameter server node calculates an optimal splitting point from the stored M/P features by using a preset gradient lifting decision tree GBDT optimization algorithm according to the optimal splitting point acquisition request; the M is training The number of features of the data, the P being the number of parameter server nodes included in the parameter server;
  • Step S404 The parameter server node sends the information of the optimal split point to the computing node.
  • FIG. 5 the flow diagram of another embodiment of the implementation method of the parameter server-based gradient elevation decision tree provided by the present invention shown in FIG. 5 is described in more detail from the parameter server node side, and may include the following steps. :
  • Step S500 The parameter server node updates the interface of the gradient histogram through the preset parameter server, and receives the block local gradient histogram sent by the computing node; that is, the parameter server node uses the parameter server to update the gradient histogram. Interface, receiving a block local gradient histogram sent by the computing node;
  • the block local gradient histogram is that the computing node calculates a corresponding local gradient histogram according to the training data of the own node for each tree node to be processed, and divides each local gradient histogram into P pieces.
  • the partial gradient histogram of the block; and the partial gradient histogram of the P blocks correspond one-to-one with the P parameter server nodes.
  • the parameter server stores the global gradient histogram through the P parameter server nodes.
  • T is the number of decision trees that need to be trained in total.
  • d is the maximum height of each tree, then there are at most (2 d -1) tree nodes in a tree.
  • the global gradient histogram of a feature on each tree node is represented as a vector of length 2K, wherein the first K are a step degree histogram, and the last K are two step degree histograms.
  • M is the number of features of the training data, and each feature corresponds to a one-degree histogram and a two-step histogram.
  • the gradient histograms of all M features are connected into a vector of length 2KM, which is equally divided into P blocks, which are stored on P parameter server nodes, so that M/P are stored on each parameter server node.
  • a global gradient histogram of features with a length of 2KM/P. Since there are at most (2 d -1) tree nodes, the total size of the global gradient histogram stored by each parameter server node is 2KM(2 d -1)/P.
  • the embodiment of the present invention solves the single-point bottleneck problem encountered in processing large-dimensional training data by dividing the global gradient histogram into multiple parameter server nodes for storage.
  • Step S502 The parameter server node accumulates the block local gradient histogram on the corresponding global gradient histogram
  • each parameter server node determines the processed tree node and adds it to the corresponding global gradient histogram to update the global gradient histogram.
  • Step S504 The parameter server node obtains an interface of an optimal splitting point through a preset parameter server, and receives an optimal splitting point acquisition request sent by the computing node;
  • Step S506 The parameter server node calculates an optimal split point from the stored M/P features by using a preset GBDT optimization algorithm according to the optimal split point acquisition request.
  • Step S508 The parameter server node sends the information of the optimal split point to the computing node.
  • steps S504 to S508 may refer to the foregoing steps S400 to S406, and details are not described herein again. It should be noted that the operations of updating the gradient histogram in steps S500 and S502 and the processing operations of the optimal splitting point in steps S504 to S508 may be performed in any order, and the latest global is used when calculating the optimal splitting point.
  • the gradient histogram, while the update of the global gradient histogram is continuous.
  • the information of the optimal splitting point in the embodiment of the present invention includes a splitting feature, a splitting feature value, and an objective function gain.
  • the information of the optimal splitting point may also include an objective function gain and identification information, the identification information being used to indicate the parameter server node; then the parameter server node sends the information of the optimal splitting point to the designated computing node And the parameter server node receives the request of the computing node for acquiring the splitting feature and the splitting feature value, and according to the request for the splitting feature and the splitting feature value, the splitting feature of the optimal splitting point is A split feature value is sent to the compute node.
  • FIG. 7 A flowchart of another embodiment of a method for implementing a parameter server-based gradient elevation decision tree according to the present invention, which is shown in FIG. 7 , is an example in which a computing node updates a gradient histogram and obtains an optimal split point. Described from both the compute node and the parameter server node, the following steps can be included:
  • Step S700 The computing node executes the GBDT algorithm by using the allocated subset of data to obtain a local local gradient histogram
  • x i is the training data
  • y i is the real label of the training data
  • f t (x i ) is the predicted value given by the t-th tree to the training data.
  • l is the cost function (giving a real tag and the predicted value, calculating a generation value)
  • is a regular term that prevents the model from overfitting.
  • the GBDT algorithm uses gradient histograms as auxiliary information to find the best splitting point. For each training data, a step g i and a second degree h i of the objective function l are obtained, and then the distribution of one step and two steps of all the training data is calculated, and each feature of the training data is generated.
  • a gradient histogram As shown in the schematic diagram of the gradient histogram provided by the embodiment of the present invention, the abscissa is the interval of the feature values of a certain feature, and the ordinate is the sum of the gradients of the training data of the feature values in the interval. For example, for training data with eigenvalues between 0 and 0.2, we accumulate one of their steps as the first bin of a stepped histogram of Figure 8.
  • Step S702 The computing node divides the local gradient histogram into P blocks
  • the P block partial gradient histograms are in one-to-one correspondence with the P parameter server nodes.
  • Step S704 Send each block local gradient histogram to the corresponding parameter server node respectively;
  • Step S706 The parameter server node summarizes the local gradient histograms of all the computing nodes, and generates a global gradient histogram
  • Step S708 The computing node requests an optimal splitting point from each parameter server node.
  • Step S710 After receiving the request, each parameter server node returns information of the best split point among the features in the M/P saved by the node to the computing node;
  • the embodiment of the present invention may adopt a heuristic optimization method to calculate an optimal splitting point according to the first-order and two-step histograms:
  • the gradient histogram of each feature has K bins, and the K feature candidate split feature values are the maximum value of the feature value interval corresponding to each bin, that is, the feature value interval corresponding to the kth bin is [s mk-1 , s mk ].
  • G L is used to accumulate the sum of the values of the already read bins of a step histogram, and H L is used to accumulate the sum of the values of the bins of the two-step histograms that have been read;
  • G R is a step-degree histogram The sum of the values of bins that have not yet been read, and H R is the sum of the values of the bins of the two-step histogram that have not yet been read.
  • G L G L +G mk
  • H L H L +H mk
  • the above M is the number of features of the training data, and the total number of gradient histograms is 2M, and each feature corresponds to a one-degree histogram and a two-step histogram.
  • a stepped histogram and a two-step histogram of the feature are needed to find the best split point based on the two gradient histograms.
  • F (t) the objective function that can be used as the splitting point for each candidate splitting feature value of each feature.
  • gain size Among all the MK candidate splitting points, the feature and eigenvalue with the largest gain are selected as the optimal splitting point.
  • Step S712 The computing node receives the information of the optimal splitting point respectively returned by the P parameter servers, and selects a splitting point in which the gain gain of the objective function is maximized;
  • the distributed architecture of the Gradient Boosting Decision Tree based on the Map-Reduce implementation as shown in FIG. 9 is used by the computing engine Spark.
  • the working node ie, the computing node
  • this node summarizes the gradient histogram of all the computing nodes, generates a global gradient histogram, finds the best splitting point, and then passes
  • the Map operation broadcasts the best split point to all compute nodes.
  • the local gradient histogram generated by each computing node is also large.
  • a single point bottleneck problem is encountered in the Reduce operation, and the nodes that summarize the local gradient histogram may slow down the processing of the entire system due to network congestion.
  • MPI multi-point interface
  • XGBoost all computing nodes are organized into a binary tree structure.
  • Each compute node sends a local gradient histogram to the parent compute node; the parent compute node aggregates the gradient histograms of the two child compute nodes and sends them to its parent compute node; the root compute node gets the global After the gradient histogram, find the best split point, and then send it to all compute nodes in a tree structure.
  • this scheme requires multi-step transmission, and when the feature dimension of the training data is large, multi-step transmission brings a large amount of traffic. Moreover, as the number of computing nodes increases, more transmission steps are needed to complete the aggregation of the local gradient histograms.
  • the global gradient histogram into multiple parameter server nodes for storage, the single point bottleneck problem encountered in processing large dimension training data is solved, and the speed of summarizing the local gradient histogram is accelerated.
  • the number of the computing node and the parameter server node in the embodiment of the present invention can be extended according to the requirements of the user, and is an extensible PS architecture, which solves the problem that the Spark and the XGBoost in the prior art process when processing the large-dimensional training data. Point bottlenecks and issues with limited scalability.
  • Step S714 The computing node splits the currently processed tree node according to the optimal splitting point.
  • the compute nodes are iteratively processed until all trees are established.
  • each computing node may find the optimal splitting point in the implementation manners of FIG. 3 to FIG. 4, and may also designate one of the computing nodes to implement the embodiment of FIG. 3 to FIG. 4. Perform the search for the best split point. If one of the computing nodes is specified to perform the search for the best splitting point, the designated computing node may further include: sending the global best splitting point after selecting the splitting point with the largest gain of the objective function as the global optimal splitting point. The parameter server is given to cause the parameter server to send the global best split point to all compute nodes.
  • the parameter server can save the obtained global optimal splitting points in the P parameter server nodes, that is, other computing nodes can also directly obtain the splitting feature and the splitting feature value of the optimal splitting point from the parameter server node. This can avoid repeated calculations by other computing nodes, which greatly saves the computational load of the computing nodes.
  • FIG. 11 is a schematic structural diagram of an implementation apparatus of a parameter server-based gradient elevation decision tree according to an embodiment of the present invention
  • the parameter server-based gradient elevation decision tree implementation apparatus 11 may include: at least one memory; at least one processor Wherein the at least one memory stores at least one instruction module configured to be executed by the at least one processor; wherein the at least one instruction module comprises:
  • a requesting module 110 an information receiving module 112, and an optimal splitting point selecting module 114, wherein
  • the requesting module 110 is configured to obtain an optimal splitting point interface by using a preset parameter server, and send an optimal splitting point obtaining request to the P parameter server nodes respectively; wherein the parameter server includes P parameter server nodes, each parameter The server nodes each store M/P features, and the M is the number of features of the training data;
  • the information receiving module 112 is configured to receive information about an optimal splitting point respectively sent by the P parameter server nodes, to obtain information of P optimal splitting points, where the P best splitting points are the P parameters.
  • the server node uses the preset gradient promotion decision tree GBDT optimization algorithm to calculate the best split point from the respective stored M/P features;
  • the optimal split point selection module 114 is configured to compare the target function gains of the P best split points according to the information of the P best split points, and select a split point with the largest objective function gain as the global optimal split point. .
  • the parameterizing server-based gradient lifting decision tree implementation device 11 may further include a blocking module and a sending module, where
  • the block module is configured to calculate respective local gradient histograms according to the training data of the own nodes for each tree node to be processed, and divide each local gradient histogram into P blocks; wherein, P blocks are partially The gradient histogram corresponds to the P parameter server nodes one by one;
  • the sending module is configured to update the interface of the gradient histogram by the preset parameter server, and send each block local gradient histogram of each to-be-processed tree node to the corresponding parameter server node.
  • each optimal splitting point may include a splitting feature, a splitting feature value, and an objective function gain
  • the optimal split point selection module 114 is specifically configured to use the splitting feature and the splitting feature value in the splitting point that maximizes the gain of the objective function as the global optimal splitting point.
  • each optimal splitting point includes an objective function gain and identification information, where the identification information is used to indicate a parameter server node corresponding to the optimal splitting point;
  • the optimal split point selection module 114 is specifically configured to request, according to the identification information of the split point with the largest gain of the objective function, the split feature and the split feature value from the corresponding parameter server node; and receive the split feature and the split returned by the corresponding parameter server node.
  • the eigenvalue is used as the global best split point.
  • the parameterizing server-based gradient lifting decision tree implementing apparatus 11 may further include a node sending module, after the optimal splitting point selecting module 114 selects the splitting point with the largest objective function gain as the global optimal splitting point, The global best split point is sent to the parameter server to cause the parameter server to send the global best split point to all compute nodes.
  • the computing node device 120 may include: at least one processor 1201, such as a CPU, an input module 1202, an output module 1203, and a memory 1204. At least one communication bus 1205. Among them, the communication bus 1205 is used to implement connection communication between these components.
  • the memory 1204 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory, and the memory 1204 includes a flash in the embodiment of the present invention.
  • the memory 1204 can also optionally be at least one storage system located remotely from the aforementioned processor 1201.
  • a memory 1204 as a computer storage medium may include an operating system, a network communication module, a user interface module, and an implementation program of a gradient server based gradient lifting decision tree.
  • the processor 1201 may be configured to invoke a parameter server-based gradient lifting decision tree implementation stored in the memory 1204 and perform the following operations:
  • the parameter server includes P parameter server nodes, and each parameter server Each node stores M/P features, and the M is the number of features of the training data;
  • the objective function gains of the P best split points are compared, and the split point with the largest gain of the objective function is selected as the global optimal split point.
  • the processor 1201 further performs:
  • each tree node to be processed the corresponding local gradient histograms are respectively calculated according to the training data of the own nodes, and each local gradient histogram is divided into P blocks; wherein, P partial block local gradient histograms and P One parameter server node corresponds one by one;
  • the interface of the gradient histogram is updated by a preset parameter server, and each block local gradient histogram of each tree node to be processed is sent to the corresponding parameter server node by the output module 1203.
  • the information of each optimal splitting point includes a splitting feature, a splitting feature value, and an objective function gain
  • the processor 1201 selects the split point with the largest gain of the objective function as the global optimal split point, which may include:
  • splitting feature and the splitting feature value in the splitting point where the gain of the objective function is maximized are taken as the global optimal splitting point.
  • the information of each optimal split point includes an objective function gain and identification information, where the identifier information is used to indicate a parameter server node corresponding to the optimal split point;
  • the processor 1201 selects the split point with the largest gain of the objective function as the global optimal split point including:
  • the output module 1203 requests the split parameter and the split feature value from the corresponding parameter server node;
  • the splitting feature and the splitting feature value returned by the corresponding parameter server node are received by the input module as a global optimal splitting point.
  • the processor 1201 selects the split point with the largest gain of the objective function as the global optimal split point, it can also perform:
  • the global best split point is sent by the output module 1203 to the parameter server to cause the parameter server to send the global best split point to all compute nodes.
  • the parameter server-based gradient boost decision tree implementation device 11 and compute node device 120 may include, but are not limited to, a physical machine such as a computer.
  • An embodiment of the present invention further provides an apparatus for implementing a gradient-based decision tree based on a parameter server, which is described in detail below with reference to the accompanying drawings:
  • FIG. 13 is a schematic structural diagram of another embodiment of an apparatus for implementing a parameter server-based gradient elevation decision tree according to the present invention.
  • the parameter server-based gradient elevation decision tree implementation apparatus 13 may include: at least one memory; a processor; wherein the at least one memory stores at least one instruction module configured to be executed by the at least one processor; wherein the at least one instruction module comprises: a request receiving module 130, an optimal split point calculation module 132 and an information sending module 134, wherein
  • the request receiving module 130 is configured to acquire an interface of an optimal splitting point by using a preset parameter server, and receive an optimal splitting point acquisition request sent by the computing node;
  • the optimal splitting point calculation module 132 is configured to calculate an optimal splitting point from the stored M/P features by using a preset gradient lifting decision tree GBDT optimization algorithm according to the optimal splitting point acquisition request; a number of features of the training data, where P is the number of parameter server nodes included in the parameter server;
  • the information sending module 134 is configured to send the information of the optimal split point to the computing node.
  • the parameterizing server-based gradient lifting decision tree implementation device 13 may further include: a local gradient histogram receiving module and a histogram updating module, where
  • the local gradient histogram receiving module is configured to receive, by using a preset parameter server, an interface of the gradient histogram, and receive a block local gradient histogram sent by the computing node; wherein the block local gradient histogram is for the computing node
  • the to-be-processed tree nodes respectively calculate respective local gradient histograms according to the training data of the own nodes, and divide each local gradient histogram into P-block partial gradient histograms; and P partial-block local gradient histograms
  • the map corresponds to the P parameter server nodes one by one;
  • the histogram update module is configured to accumulate the block local gradient histogram onto the corresponding global gradient histogram.
  • the information of the optimal splitting point includes a splitting feature, a splitting feature value, and a target function gain.
  • the information of the optimal splitting point includes an objective function gain and identification information, where the identifier information is used to indicate the parameter server node;
  • the implementation device 13 of the parameter server-based gradient elevation decision tree may further include: a split point information sending module, configured to receive the information after the information sending module 134 sends the information of the optimal split point to the computing node
  • the node is configured to obtain a split feature and a split feature value request, and send, according to the request for the split feature and the split feature value, the split feature and the split feature value of the best split point to the computing node.
  • the parameter server node device 140 may include: at least one processor 1401, such as a CPU, an input module 1402, an output module 1403, and a memory. 1404, at least one communication bus 1405. Among them, the communication bus 1405 is used to implement connection communication between these components.
  • the memory 1404 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory, and the memory 1404 includes a flash in the embodiment of the present invention.
  • the memory 1404 can optionally also be at least one storage system located remotely from the aforementioned processor 1401.
  • a memory 1404 as a computer storage medium may include an operating system, a network communication module, a user interface module, and an implementation program of a gradient server based gradient lifting decision tree.
  • the processor 1401 may be configured to invoke a parameter server-based gradient lifting decision tree implementation program stored in the memory 1404, and perform the following operations:
  • the information of the optimal split point is sent to the compute node by the output module 1403.
  • processor 1401 can also execute:
  • a block local gradient histogram Updating the interface of the gradient histogram by a preset parameter server, and receiving, by the input module 1402, a block local gradient histogram sent by the computing node; wherein the block local gradient histogram is for the computing node for each to-be-processed tree Nodes respectively calculate respective local gradient histograms according to the training data of their own nodes, and divide each local gradient histogram into P-blocked local gradient histograms; and P partial block local gradient histograms and P One-to-one correspondence of parameter server nodes;
  • the block local gradient histogram is added to the corresponding global gradient histogram.
  • the information of the optimal splitting point includes a splitting feature, a splitting feature value, and an objective function gain.
  • the information of the optimal splitting point includes an objective function gain and identification information, and the identifier information is used to indicate the parameter server node;
  • the processor 1401 may further perform:
  • the splitting feature of the optimal split point is performed by the output module 1403 A split feature value is sent to the compute node.
  • the parameter server-based gradient boost decision tree implementation device 13 and parameter server node device 140 may include, but are not limited to, a physical machine such as a computer.
  • the embodiment of the present invention further provides a parameter server, where the parameter server includes P parameter server node devices, and the parameter server node device may be the parameter server node device 140 in the embodiment of FIG.
  • the computing node obtains the optimal splitting point interface through the preset parameter server when searching for the optimal splitting point, and sends the optimal splitting point obtaining request to the P parameter server nodes respectively, and obtains each
  • the parameter server node calculates the optimal split point from the M/P features stored by the GBDT optimization algorithm, compares the objective function gains of the P best split points, and selects the split point with the largest gain of the objective function as the global maximum. Good split point.
  • Each computing node does not need to obtain a complete global gradient histogram, only need to compare the information of the candidate best split point returned by the parameter server node, which greatly reduces the communication overhead.
  • the global gradient histogram is divided into multiple parameter server nodes for storage, the single point bottleneck problem encountered in processing large dimension training data is solved, and the speed of summarizing the local gradient histogram is accelerated.
  • the optimal splitting point is stored in the parameter server, and the global optimal splitting point is sent by the parameter server to other computing nodes, thereby avoiding other computing nodes. Repeated calculations greatly save the computational load of the compute nodes.
  • the number of the computing node and the parameter server node in the embodiment of the present invention can be extended according to the requirements of the user, and is an extensible PS architecture, which solves the problem that the Spark and the XGBoost in the prior art process when processing the large-dimensional training data. Point bottlenecks and issues with limited scalability.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM). Therefore, the embodiment of the invention further provides a computer readable storage medium, on which a computer program is stored, and the above method can be implemented when the processor executes the computer program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种基于参数服务器的梯度提升决策树的实现方法、计算节点设备、参数服务器节点、参数服务器、基于参数服务器的梯度提升决策树的实现装置、系统、存储介质,方法包括:通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求(S200);接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息,其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点(S202);根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点(S204)。

Description

基于参数服务器的梯度提升决策树的实现方法及相关设备
本申请要求于2017年05月10日提交中国专利局、申请号为201710326930.X、发明名称为“基于参数服务器的梯度提升决策树的实现方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及计算机领域,尤其涉及基于参数服务器的梯度提升决策树的实现方法、计算节点设备、参数服务器节点、参数服务器、基于参数服务器的梯度提升决策树的实现装置、系统、存储介质。
背景
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。近年来,机器学习已经有了十分广泛的应用,例如:数据挖掘、计算机视觉、自然语言处理、生物特征识别、搜索引擎、医学诊断、检测信用卡欺诈、证券市场分析、脱氧核糖核酸(Deoxyribonucleic acid,DNA)序列测序、语音和手写识别、战略游戏和机器人运用。
技术内容
本发明实施例提供一种基于参数服务器的梯度提升决策树的实现方法、计算节点设备、参数服务器节点、参数服务器、基于参数服务器的梯度提升决策树的实现装置、系统、存储介质。
本发明实施例一方面公开了一种基于参数服务器的梯度提升决策树的实现方法,包括:通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的 自然数;接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
本发明实施例另一方面公开了一种基于参数服务器的梯度提升决策树的实现方法,包括:参数服务器节点通过预设的参数服务器中用于获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;所述参数服务器节点根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;所述参数服务器节点将所述最佳分裂点的信息发送给所述计算节点。
本发明实施例又一方面公开了一种计算节点设备,包括处理器、存储器、输入模块和输出模块,所述存储器存储多条指令,所述指令由所述处理器加载并执行:通过所述输出模块,并通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;通过所述输入模块接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
本发明实施例又一方面公开了一种参数服务器节点设备,包括处理器、存储器、输入模块和输出模块,所述存储器存储多条指令,所述指令由所述处理器加载并执行:通过预设的参数服务器中用于获取最佳分 裂点的接口,并通过所述输入模块接收计算节点发送的最佳分裂点获取请求;根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;通过所述输出模块将所述最佳分裂点的信息发送给所述计算节点。
本发明实施例又一方面公开了一种参数服务器,包括P个参数服务器节点设备,所述P为大于1的自然数;所述参数服务器节点设备为上述的参数服务器节点设备。
本发明实施例又一方面公开了一种基于参数服务器的梯度提升决策树的实现系统,包括参数服务器和至少一个计算节点设备;其中,所述参数服务器为上述的参数服务器;所述计算节点设备为上述的计算节点设备。
本发明实施例又一方面提供一种基于参数服务器的梯度提升决策树的实现方法,该方法由计算节点执行,该方法包括:通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
本发明实施例又一方面提供一种基于参数服务器的梯度提升决策树的实现装置,包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:请求模块,用于通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务 器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;信息接收模块,用于接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;最佳分裂点选取模块,用于根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
本发明实施例又一方面提供一种基于参数服务器的梯度提升决策树的实现装置,包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:请求接收模块,用于通过预设的参数服务器中用于获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;最佳分裂点计算模块,用于根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;信息发送模块,用于将所述最佳分裂点的信息发送给所述计算节点。
本发明实施例又一方面公开了一种计算机可读存储介质,其上存储有计算机程序,在处理器执行所述计算机程序时可实现上述实现方法。
附图简要说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的基于参数服务器的梯度提升决策树的实现系统的架构图;
图2是本发明实施例提供的基于参数服务器的梯度提升决策树的实现方法的流程示意图;
图3a是本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图;
图3b是本发明提供的决策树的原理示意图;
图4是本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图;
图5是本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图;
图6是本发明实施例提供的参数服务器的参数存储的原理示意图;图7是本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图;
图8是本发明实施例提供的梯度直方图的原理示意图;
图9是基于映射-归约(Map-Reduce)实现的梯度提升决策树(Gradient boosting decision tree,GBDT)的分布式架构;
图10是基于多点接口(Multi Point Interface,MPI)的全-归约(All-Reduce)实现的GBDT的分布式架构;
图11是本发明实施例提供的基于参数服务器的梯度提升决策树的实现装置的结构示意图;
图12是本发明实施例提供的计算节点设备的结构示意图;
图13是本发明提供的基于参数服务器的梯度提升决策树的实现装置的另一实施例的结构示意图;
图14是本发明实施例提供的参数服务器节点设备的结构示意图。
实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行描述。
如图1示出的本发明实施例提供的基于参数服务器的梯度提升决策树的实现系统的架构图,基于参数服务器的梯度提升决策树的实现系统可以包括参数服务器(Parameter Server,PS)以及至少一个计算节点设 备,其中PS可以包括多个参数服务器节点,PS存储一份全局模型参数;每个计算节点设备保存一份模型参数的副本,在一轮迭代中,用分配的数据子集更新参数副本,将对参数的更新发送到PS,然后从PS获取最新的全局参数开始下一轮迭代。图中以其中一个计算节点与参数服务器节点的数据传输为例,画出了参数更新和获取的传输线(图中每个计算节点与参数服务器节点的数据传输方式相同,这里没有画出全部计算节点与参数服务器节点的数据传输线),该计算节点将自身的局部参数分成P份分别发送给P个参数服务器节点,并从P个参数服务器节点中获取P个最佳分裂点的信息。
进一步地,本发明实施例中基于参数服务器的梯度提升决策树的实现系统还可以包括一个主控节点,用于监控参数服务器和计算节点设备。需要说明的是,本发明实施例中的参数服务器节点、计算节点设备以及主控节点都可以为物理机。参数服务器节点的数量以及计算节点设备的数量可以由用户(包括研发人员等)自己来决定,本发明实施例不作限制。本发明各个实施例中的计算节点等同于计算节点设备,参数服务器节点等同于参数服务器节点设备。
通过本发明实施例的基于参数服务器的梯度提升决策树的实现系统可以完成广告推荐、用户性别预测、图片分类等机器学习分类任务,以及用户年龄预测、用户消费预测等机器学习回归任务。用户只需要提交机器学习任务,最后获得处理结果,而不需要了解具体的处理流程,即整个分布式处理机制对于用户是透明的。
下面具体地,结合图2至图10示出的本发明实施例提供的基于参数服务器的梯度提升决策树的实现方法的流程示意图来详细说明本发明实施例的机器学习具体的处理流程,说明如何实现基于参数服务器的梯度提升决策树。
如图2示出的本发明实施例提供的基于参数服务器的梯度提升决策树的实现方法的流程示意图,先从计算节点侧来描述,可以包括如下步骤:
步骤S200:通过预设的参数服务器获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;即,通过预设的参数服 务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;
具体地,本发明实施例中的参数服务器预先设置有获取最佳分裂点的接口,并提供给各个计算节点。计算节点在寻找最佳分裂点的时候,即可以通过该获取最佳分裂点的接口向P个参数服务器节点发送最佳分裂点获取请求,其中,该参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,该M为训练数据的特征数量;P可以为大于1的自然数,M也可以为大于1的自然数。
需要说明的是,本发明实施例中的最佳分裂点包括最佳或最优的分裂特征和分裂特征值,该最佳分裂点也可以命名为最优分裂点、最佳分裂结果、或最优分裂结果等,本发明不作限制,只要包括最佳或最优的分裂特征和分裂特征值的信息或参数即可。
步骤S202:接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中所述P个最佳分裂点为所述P个参数服务器节点通过预设的GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
具体地,参数服务器节点接收到计算节点发送的最佳分裂点获取请求后,即通过预设的GBDT优化算法从自身存储的M/P个特征中计算出最佳分裂点,并将最佳分裂点的信息返回给计算节点。由于P个参数服务器节点分别向该计算节点返回各自的最佳分裂点的信息,因此该计算节点得到了P个最佳分裂点的信息。
步骤S204:根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的分裂点作为全局最佳分裂点。
具体地,计算节点通过比较P个最佳分裂点的目标函数增益,从而选取出目标函数增益最大的分裂点作为全局最佳分裂点,后续可以创建叶子节点,将本节点的训练数据切分到两个叶子节点上。
需要说明的是,每个计算节点在对各自的训练数据进行训练的过程中,通过本发明实施例,每个计算节点无需获取完整的全局梯度直方图,只需要比较参数服务器节点返回的候选最佳分裂点的信息,大大减少了 通信开销。本发明实施例中该候选最佳分裂点即为P个参数服务器节点分别返回的P个候选的最佳分裂点的信息,计算节点从该P个候选的最佳分裂点的信息中选取其中一个作为进行分裂的全局最佳分裂点。
进一步地,结合图3a示出的本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图,再次更加详细地从计算节点侧来描述,需要说明的是,本发明实施例以一个计算节点为例进行说明(其他计算节点的处理流程也相同),可以包括如下步骤:
步骤S300:计算候选分裂点;
具体地,计算节点依次读取分配给自己的训练数据,对每种特征计算候选分裂特征值,从而得到候选分裂点。
需要说明的是,本发明训练数据是用来构建机器学习模型的历史数据,每个计算节点分配到的训练数据都可以不同。
步骤S302:创建决策树;
具体地,计算节点创建新的树,进行初始化的工作,包括初始化树结构、计算训练数据的一阶和二阶梯度、初始化一个待处理树节点的队列、将树的根节点加入队列等。
步骤S304:计算梯度直方图;
具体地,计算节点从待处理树节点的队列中取出待处理的树节点。对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块局部梯度直方图;其中,P个分块局部梯度直方图与P个参数服务器节点一一对应;然后通过预设的参数服务器更新梯度直方图的接口,将每个待处理树节点的每个分块局部梯度直方图发送给对应的参数服务器节点。
需要说明的是,本发明实施例中的局部梯度直方图可以分别包括一阶梯度直方图和二阶梯度直方图;也就是说,将每个局部梯度直方图分成P个分块包括将每个一阶梯度直方图分成P个分块,将每个二阶梯度直方图分成P个分块,然后分别发送给对应的参数服务器节点。参数服务器将全局梯度直方图(即整个梯度直方图)切分到P个参数服务器节点上,即每个参数服务器节点保存M/P种特征的一阶和二阶梯度直方图,从而解决了大维度训练数据时遇到的单点瓶颈问题,加速了汇总局 部梯度直方图的速度。
步骤S306:寻找最佳分裂点;
具体地,计算节点通过参数服务器获取最佳分裂点的接口,从参数服务器获取每个参数服务器节点的最佳分裂点的信息,然后比较P个分裂点的目标函数增益,选取增益最大的分裂点作为全局最佳分裂点。具体可以参考步骤S202至步骤S204,这里不再赘述。
需要说明的是,数据通常被表示为向量,向量的每一维度叫做数据的特征,代表数据的某种性质,数据某种性质的数值叫做数据的特征值。如图3b示出的本发明实施例提供的决策树的原理示意图,图中每一个节点都为树节点,其中起始节点“拥有房产”为根节点,“是否结婚”、“月收入”等其他节点可以为叶子节点。
进一步地,本发明实施例中最佳分裂点的信息可以包括分裂特征、分裂特征值以及目标函数增益;那么计算节点选取目标函数增益最大的分裂点作为全局最佳分裂点可以具体包括:将目标函数增益最大的分裂点的分裂特征和分裂特征值作为全局最佳分裂点。因此,每个计算节点不需要获取完整的全局梯度直方图,而只需要比较参数服务器节点返回的候选最佳分裂点,本发明实施例每个参数服务器节点可以只需要返回三个数(分裂特征、分裂特征值以及目标函数增益)给计算节点即可,大大减小了通信开销。
步骤S308:分裂树节点;
具体地,计算节点根据计算得到的最佳分裂点,创建叶子节点,将本节点的训练数据切分到两个叶子节点上。
步骤S310:判断树的高度是否达到最大限制;
具体地,当判断树的高度没有达到最大限制,则将两个叶子节点加入到待处理树节点的队列,然后跳转到步骤S304;当判断树的高度达到了最大限制,则执行步骤S312。
步骤S312:判断树的数量是否达到最大限制;
具体地,当判断树的数量没有达到最大限制,则跳转到步骤S302,再次创建新的决策树。当判断树的数量达到了最大限制,则执行步骤S314。计算节点迭代地调用步骤S302至步骤S308,直到完成所有树的 建立。
步骤S314:完成训练。
具体地,计算节点训练完所有决策树,计算并输出性能指标(准确率、误差等),可以输出训练模型。
还需要说明的是,本发明实施例中最佳分裂点的信息还可以包括目标函数增益和标识信息,该标识信息用于指示最佳分裂点对应的参数服务器节点;也就是说,每个参数服务器节点在接收到计算节点发送的最佳分裂点获取请求后,通过预设的GBDT优化算法从各自存储的M/P个特征中计算出最佳分裂点的分裂特征、分裂特征值以及目标函数增益,但只将目标函数增益发送给计算节点。计算节点在比较P个最佳分裂点的目标函数增益,选取出最大的目标函数增益后,根据目标函数增益最大的分裂点的标识信息,向对应的参数服务器节点请求分裂特征和分裂特征值,然后接收该对应的参数服务器节点返回的分裂特征和分裂特征值,作为全局最佳分裂点。每个计算节点不需要获取完整的全局梯度直方图,而只需要比较参数服务器节点返回的候选最佳分裂点,本发明实施例每个参数服务器节点可以只需要返回两个数(标识信息以及目标函数增益)给计算节点即可,并在计算节点确定了最大的目标函数增益后,向对应的那个参数服务器节点获取分裂特征和分裂特征值即可,大大减小了通信开销。
下面结合图4示出的本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图,对应地从参数服务器节点侧来描述,可以包括如下步骤:
步骤S400:参数服务器节点通过预设的参数服务器获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;即,参数服务器节点通过预设的参数服务器中用于获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;
具体地,计算节点在寻找最佳分裂点的时候,即可以通过该获取最佳分裂点的接口向P个参数服务器节点发送最佳分裂点获取请求,参数服务器节点即接收到计算节点发送的该最佳分裂点获取请求。
步骤S402:所述参数服务器节点根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;
步骤S404:所述参数服务器节点将所述最佳分裂点的信息发送给所述计算节点。
再具体地,结合图5示出的本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图,再次更加详细地从参数服务器节点侧来描述,可以包括如下步骤:
步骤S500:参数服务器节点通过预设的参数服务器更新梯度直方图的接口,接收计算节点发送的分块局部梯度直方图;即,所述参数服务器节点通过所述参数服务器中用于更新梯度直方图的接口,接收计算节点发送的分块局部梯度直方图;
其中,所述分块局部梯度直方图为所述计算节点针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块的局部梯度直方图;且P个分块的局部梯度直方图与P个参数服务器节点一一对应。
具体地,如图6示出的本发明实施例提供的参数服务器的参数存储的原理示意图,参数服务器通过P个参数服务器节点来存储全局梯度直方图。T是总共需要训练的决策树个数,d是每棵树的最大高度,那么一棵树中最多有(2 d-1)个树节点。本发明实施例将每个树节点上一个特征的全局梯度直方图分别表示为一个长度为2K的向量,其中前K个为一阶梯度直方图,后K个为二阶梯度直方图。M为训练数据的特征数量,每个特征对应一个一阶梯度直方图和一个二阶梯度直方图。所有M个特征的梯度直方图被连接为一个长度为2KM的向量,被平均切分为P个分块,分别存储在P个参数服务器节点上,这样每个参数服务器节点上存储M/P个特征的全局梯度直方图,长度为2KM/P。由于最多有(2 d-1)个树节点,因此每个参数服务器节点存储的全局梯度直方图的总大小为2KM(2 d-1)/P。本发明实施例通过将全局梯度直方图切分到多个参数服务器节点上存储,解决了在处理大维度训练数据时遇到的单点瓶颈问 题。
需要说明的是,为了在建立多棵决策树的过程中复用全局梯度直方图,在建立每棵决策树之前,整个全局梯度直方图会被全部重置为0。
步骤S502:参数服务器节点将所述分块局部梯度直方图累加到对应的全局梯度直方图上;
具体地,每个参数服务器节点在接收到计算节点发送的局部梯度直方图后,确定处理的树节点,将其累加到对应的全局梯度直方图上,从而更新全局梯度直方图。
步骤S504:参数服务器节点通过预设的参数服务器获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;
步骤S506:参数服务器节点根据所述最佳分裂点获取请求,通过预设的GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;
步骤S508:参数服务器节点将所述最佳分裂点的信息发送给所述计算节点。
具体地,步骤S504至步骤S508可以参考上述步骤S400至步骤S406,这里不再赘述。需要说明的是,步骤S500、S502中更新梯度直方图的动作与步骤S504至步骤S508中最佳分裂点的处理动作可以不分先后顺序,在计算最佳分裂点时,使用的是最新的全局梯度直方图,而全局梯度直方图的更新是持续性的。
需要说明的是,本发明实施例最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益。最佳分裂点的信息也可以包括目标函数增益和标识信息,所述标识信息用于指示所述参数服务器节点;那么参数服务器节点将所述最佳分裂点的信息发送给该指定的计算节点之后,还可以包括:该参数服务器节点接收该计算节点用于获取分裂特征和分裂特征值的请求,并根据所述取分裂特征和分裂特征值的请求,将所述最佳分裂点的分裂特征和分裂特征值发送给所述计算节点。
下面结合图7示出的本发明提供的基于参数服务器的梯度提升决策树的实现方法的另一实施例的流程示意图,以一个计算节点更新一次梯度直方图以及获取一次最佳分裂点为例,从计算节点以及参数服务器节 点两侧来描述,可以包括如下步骤:
步骤S700:计算节点使用分配的数据子集执行GBDT算法,得到本地的局部梯度直方图;
具体地,在GBDT算法中,对每棵树,要最小化下面的目标函数:
Figure PCTCN2018081900-appb-000001
其中x i是训练数据,y i是训练数据的真实标签,f t(x i)是第t棵树对训练数据给出的预测值,
Figure PCTCN2018081900-appb-000002
是第 t棵树之后训练数据的预测值,l是代价函数(给定真实标签和预测值,计算一个代价值),Ω是防止模型过拟合的正则项。
GBDT算法使用梯度直方图作为辅助信息来寻找最佳分裂点。对每个训练数据,求其对目标函数l的一阶梯度g i和二阶梯度h i,然后统计所有训练数据的一阶梯度和二阶梯度的分布,对训练数据的每一种特征生成一个梯度直方图。如图8示出的本发明实施例提供的梯度直方图的原理示意图,横坐标是某个特征的特征值的区间,纵坐标是特征值在此区间的训练数据的梯度之和。例如,特征值在0和0.2之间的训练数据,我们将它们的一阶梯度累加,作为图8的一阶梯度直方图的第一个箱子(bin)。
步骤S702:计算节点将该局部梯度直方图分成P个分块;
具体地,P个分块局部梯度直方图与P个参数服务器节点一一对应。
步骤S704:分别将每个分块局部梯度直方图发送给对应的参数服务器节点;
步骤S706:参数服务器节点汇总所有计算节点的局部梯度直方图,生成全局梯度直方图;
步骤S708:计算节点向每个参数服务器节点请求最佳分裂点;
步骤S710:每个参数服务器节点接收到请求后,向计算节点返回本节点保存的M/P中特征中最佳分裂点的信息;
具体地,为了在每一个树节点寻找最优分裂点,本发明实施例可以采用一种启发式的优化方法,根据一阶和二阶梯度直方图算出最佳分裂点:
(1)假设有M种特征,M为训练数据的特征数量,对每种特征,我们选取K个候选的分裂特征值S m={S m1,S m2,...,S mK}:
for m=1to M do
generate K split candidates S m={S m1,S m2,...,S mk}
end for
(2)我们依次轮询N个训练数据,求得一阶和二阶梯度直方图。我们用G mk来代表第m种特征的一阶梯度直方图的第k个bin,用H mk来代表第m种特征的二阶梯度直方图的第k个bin。
for m=1to M do
loop N instances to generate gradient histogram with K bins
G mk=∑g iwhere s mk-1<x im<s mk
H mk=∑h iwhere s mk-1<x im<s mk
end for
每种特征的梯度直方图有K个bins,K个候选的分裂特征值就是每个bin对应的特征值区间的最大值,也就是第k个bin对应的特征值区间是[s mk-1,s mk]。
(3)我们从左到右依次读取一阶和二阶梯度直方图的k个bins,根据下面的算法求出最佳的分裂点(分裂特征和分裂特征值):G是一阶梯度直方图的所有bin的值之和,H是二阶梯度直方图的所有bin的值之和。G L用来累加一阶梯度直方图的已经读取的bin的值之和,H L用来累加二阶梯度直方图的已经读取的bin的值之和;G R是一阶梯度直方图的还未读取的bin的值之和,H R是二阶梯度直方图的还未读取的bin的值之和。
Figure PCTCN2018081900-appb-000003
for m=1to M do
G L=0,H L=0
for k=1to K do
G L=G L+G mk,H L=H L+H mk
G R=G-G L,H R=H-H L
Figure PCTCN2018081900-appb-000004
end for
end for
上述M为训练数据的特征数量,梯度直方图的总数是2M个,每个特征对应一个一阶梯度直方图和一个二阶梯度直方图。在寻找某个特征的最佳分裂点的时候,需要这个特征的一阶梯度直方图和二阶梯度直方图,根据这两个梯度直方图找到最佳分裂点。在上面的算法中,我们依次处理M种特征的一阶和二阶梯度直方图,对每种特征的每个候选分裂特征值,算出以此为分裂点能带来的对目标函数F (t)的增益大小
Figure PCTCN2018081900-appb-000005
在所有的MK个候选分裂点中,选取gain最大的特征和特征值,作为最佳分裂点。
步骤S712:计算节点接收到P个参数服务器分别返回的最佳分裂点的信息,选取其中对目标函数带来的增益gain最大的分裂点;
需要说明的是,如图9示出的基于映射-归约(Map-Reduce)实现的梯度提升决策树(Gradient boosting decision tree,GBDT)的分布式架构,如计算引擎Spark所使用的,每个工作节点(即计算节点)在生成局部梯度直方图之后,通过Reduce操作发送到一个节点;此节点汇总所有计算节点的梯度直方图,生成全局的梯度直方图,寻找最佳的分裂点,然后通过Map操作将最佳分裂点广播到所有的计算节点。然而,在训练数据的特征维度很大时,每个计算节点产生的局部梯度直方图也很大。在这种情况下,在Reduce操作中会遇到单点瓶颈问题,汇总局部梯度直方图的节点可能因为网络阻塞而使得整个系统的处理速度变慢。如图10示出的基于多点接口(Multi Point Interface,MPI)的全-归约(All-Reduce)实现的GBDT的分布式架构,如XGBoost所使用的,所有计算节点被组织成一个二叉树结构;每个计算节点在生成局部梯度直方图之后,将其发送给父计算节点;该父计算节点汇总两个子计算节点的梯度直方图后,再发送给它的父计算节点;根计算节点得到全局的梯度直方图后,寻找最佳的分裂点,然后通过树形结构依次发送给所有的计算节点。然而,这种方案需要多步传输,在训练数据的特征维度很大 时,多步传输带来较大的通信量。而且随着计算节点的增多,需要更多的传输步数来完成局部梯度直方图的汇总。
而通过实施本发明实施例,计算节点在寻找最佳分裂点时,通过预设的参数服务器获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求,在获取到每个参数服务器节点通过GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点后,比较P个最佳分裂点的目标函数增益,并选取目标函数增益最大的分裂点作为全局最佳分裂点。每个计算节点无需获取完整的全局梯度直方图,只需要比较参数服务器节点返回的候选最佳分裂点的信息,大大减少了通信开销。并且通过将全局梯度直方图切分到多个参数服务器节点上存储,解决了在处理大维度训练数据时遇到的单点瓶颈问题,加速了汇总局部梯度直方图的速度。并且,本发明实施例的计算节点以及参数服务器节点的数量可以按照用户的需求进行扩展,为可扩展的PS架构,解决了现有技术中Spark和XGBoost的在处理大维度训练数据时存在的单点瓶颈,以及可扩展性受限的问题。
步骤S714:计算节点根据最佳分裂点分裂当前处理的树节点。
具体地,计算节点通过迭代处理,直到完成所有树的建立。
还需要说明的是,本发明实施例中可以每个计算节点都以图3至图4的实施方式来寻找最佳分裂点,也可以指定其中一个计算节点来以图3至图4的实施方式执行寻找最佳分裂点。若指定其中一个计算节点来执行寻找最佳分裂点,那么该指定的计算节点在选取目标函数增益最大的分裂点作为全局最佳分裂点之后,还可以包括:将所述全局最佳分裂点发送给所述参数服务器,以使所述参数服务器将所述全局最佳分裂点发送给所有的计算节点。参数服务器可以将获取的全局最佳分裂点都保存在P个参数服务器节点中,也就是说,其他计算节点也可以从参数服务器节点中直接获取最佳分裂点的分裂特征和分裂特征值。这样能够避免其他计算节点重复计算,大大节省了计算节点的计算量。
为了便于更好地实施本发明实施例的上述方案,本发明还对应提供了一种基于参数服务器的梯度提升决策树的实现装置,下面结合附图来进行详细说明:
如图11示出的本发明实施例提供的基于参数服务器的梯度提升决策树的实现装置的结构示意图,基于参数服务器的梯度提升决策树的实现装置11可以包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:
请求模块110、信息接收模块112和最佳分裂点选取模块114,其中,
请求模块110用于通过预设的参数服务器获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;
信息接收模块112用于接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
最佳分裂点选取模块114用于根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的分裂点作为全局最佳分裂点。
具体地,基于参数服务器的梯度提升决策树的实现装置11还可以包括分块模块和发送模块,其中,
分块模块用于针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块;其中,P个分块局部梯度直方图与P个参数服务器节点一一对应;
发送模块用于通过预设的参数服务器更新梯度直方图的接口,将每个待处理树节点的每个分块局部梯度直方图发送给对应的参数服务器节点。
再具体地,每个最佳分裂点的信息可以包括分裂特征、分裂特征值以及目标函数增益;
最佳分裂点选取模块114具体用于将目标函数增益最大的分裂点中的分裂特征和分裂特征值作为全局最佳分裂点。
或者,每个最佳分裂点的信息包括目标函数增益和标识信息,所述 标识信息用于指示所述最佳分裂点对应的参数服务器节点;
最佳分裂点选取模块114具体用于根据目标函数增益最大的分裂点的标识信息,向对应的参数服务器节点请求分裂特征和分裂特征值;接收所述对应的参数服务器节点返回的分裂特征和分裂特征值,作为全局最佳分裂点。
再具体地,基于参数服务器的梯度提升决策树的实现装置11还可以包括节点发送模块,用于在最佳分裂点选取模块114选取目标函数增益最大的分裂点作为全局最佳分裂点之后,将所述全局最佳分裂点发送给所述参数服务器,以使所述参数服务器将所述全局最佳分裂点发送给所有的计算节点。
再进一步地,如图12示出的本发明实施例提供的计算节点设备的结构示意图,计算节点设备120可以包括:至少一个处理器1201,例如CPU,输入模块1202,输出模块1203,存储器1204,至少一个通信总线1205。其中,通信总线1205用于实现这些组件之间的连接通信。存储器1204可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器,存储器1204包括本发明实施例中的flash。存储器1204可选的还可以是至少一个位于远离前述处理器1201的存储系统。如图12所示,作为一种计算机存储介质的存储器1204中可以包括操作系统、网络通信模块、用户接口模块以及基于参数服务器的梯度提升决策树的实现程序。
在图12所示的计算节点设备120中,处理器1201可以用于调用存储器1204中存储的基于参数服务器的梯度提升决策树的实现程序,并执行以下操作:
通过预设的参数服务器获取最佳分裂点的接口,通过输出模块1203分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;
通过输入模块1202接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法 从各自存储的M/P个特征中计算出的最佳分裂点;
根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的分裂点作为全局最佳分裂点。
具体地,处理器1201还执行:
针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块;其中,P个分块局部梯度直方图与P个参数服务器节点一一对应;
通过预设的参数服务器更新梯度直方图的接口,通过输出模块1203将每个待处理树节点的每个分块局部梯度直方图发送给对应的参数服务器节点。
具体地,每个最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益;
处理器1201选取目标函数增益最大的分裂点作为全局最佳分裂点可以包括:
将目标函数增益最大的分裂点中的分裂特征和分裂特征值作为全局最佳分裂点。
具体地,每个最佳分裂点的信息包括目标函数增益和标识信息,所述标识信息用于指示所述最佳分裂点对应的参数服务器节点;
处理器1201选取目标函数增益最大的分裂点作为全局最佳分裂点包括:
根据目标函数增益最大的分裂点的标识信息,通过输出模块1203向对应的参数服务器节点请求分裂特征和分裂特征值;
通过所述输入模块接收所述对应的参数服务器节点返回的分裂特征和分裂特征值,作为全局最佳分裂点。
具体地,处理器1201选取目标函数增益最大的分裂点作为全局最佳分裂点之后,还可以执行:
通过输出模块1203将所述全局最佳分裂点发送给所述参数服务器,以使所述参数服务器将所述全局最佳分裂点发送给所有的计算节点。
需要说明的是,本发明实施例中的基于参数服务器的梯度提升决策树的实现装置11和计算节点设备120中各模块的功能可对应参考上述 各方法实施例中图3至图10任意实施例的具体实现方式,这里不再赘述。基于参数服务器的梯度提升决策树的实现装置11和计算节点设备120可以包括但不限于计算机等物理机。
本发明实施例还对应提供了另一种基于参数服务器的梯度提升决策树的实现装置,下面结合附图来进行详细说明:
如图13示出的本发明提供的基于参数服务器的梯度提升决策树的实现装置的另一实施例的结构示意图,基于参数服务器的梯度提升决策树的实现装置13可以包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:请求接收模块130、最佳分裂点计算模块132和信息发送模块134,其中,
请求接收模块130用于通过预设的参数服务器获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;
最佳分裂点计算模块132用于根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;
信息发送模块134用于将所述最佳分裂点的信息发送给所述计算节点。
具体地,基于参数服务器的梯度提升决策树的实现装置13还可以包括:局部梯度直方图接收模块和直方图更新模块,其中,
局部梯度直方图接收模块用于通过预设的参数服务器更新梯度直方图的接口,接收计算节点发送的分块局部梯度直方图;其中,所述分块局部梯度直方图为所述计算节点针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块的局部梯度直方图;且P个分块的局部梯度直方图与P个参数服务器节点一一对应;
直方图更新模块用于将所述分块局部梯度直方图累加到对应的全局梯度直方图上。
具体地,所述最佳分裂点的信息包括分裂特征、分裂特征值以及目 标函数增益。
或者,所述最佳分裂点的信息包括目标函数增益和标识信息,所述标识信息用于指示所述参数服务器节点;
基于参数服务器的梯度提升决策树的实现装置13还可以包括:分裂点信息发送模块,用于在信息发送模块134将所述最佳分裂点的信息发送给所述计算节点之后,接收所述计算节点用于获取分裂特征和分裂特征值的请求,并根据所述取分裂特征和分裂特征值的请求,将所述最佳分裂点的分裂特征和分裂特征值发送给所述计算节点。
再进一步地,如图14示出的本发明实施例提供的参数服务器节点设备的结构示意图,参数服务器节点设备140可以包括:至少一个处理器1401,例如CPU,输入模块1402,输出模块1403,存储器1404,至少一个通信总线1405。其中,通信总线1405用于实现这些组件之间的连接通信。存储器1404可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器,存储器1404包括本发明实施例中的flash。存储器1404可选的还可以是至少一个位于远离前述处理器1401的存储系统。如图14所示,作为一种计算机存储介质的存储器1404中可以包括操作系统、网络通信模块、用户接口模块以及基于参数服务器的梯度提升决策树的实现程序。
在图14所示的参数服务器节点设备140中,处理器1401可以用于调用存储器1404中存储的基于参数服务器的梯度提升决策树的实现程序,并执行以下操作:
通过预设的参数服务器获取最佳分裂点的接口,通过输入模块1402接收计算节点发送的最佳分裂点获取请求;
根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;
通过输出模块1403将所述最佳分裂点的信息发送给所述计算节点。
具体地,处理器1401还可以执行:
通过预设的参数服务器更新梯度直方图的接口,通过输入模块1402接收计算节点发送的分块局部梯度直方图;其中,所述分块局部梯度直 方图为所述计算节点针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块的局部梯度直方图;且P个分块的局部梯度直方图与P个参数服务器节点一一对应;
将所述分块局部梯度直方图累加到对应的全局梯度直方图上。
具体地,所述最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益。
具体地,所述最佳分裂点的信息包括目标函数增益和标识信息,所述标识信息用于指示所述参数服务器节点;
处理器1401通过所述输出模块将所述最佳分裂点的信息发送给所述计算节点之后,还可以执行:
通过输入模块1402接收所述计算节点用于获取分裂特征和分裂特征值的请求,并根据所述取分裂特征和分裂特征值的请求,通过输出模块1403将所述最佳分裂点的分裂特征和分裂特征值发送给所述计算节点。
需要说明的是,本发明实施例中的基于参数服务器的梯度提升决策树的实现装置13和参数服务器节点设备140中各模块的功能可对应参考上述各方法实施例中图3至图10任意实施例的具体实现方式,这里不再赘述。基于参数服务器的梯度提升决策树的实现装置13和参数服务器节点设备140可以包括但不限于计算机等物理机。
另外,本发明实施例还提供了一种参数服务器,该参数服务器包括P个参数服务器节点设备,该参数服务器节点设备可以为如图14实施例中的参数服务器节点设备140。
实施本发明实施例,计算节点在寻找最佳分裂点时,通过预设的参数服务器获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求,在获取到每个参数服务器节点通过GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点后,比较P个最佳分裂点的目标函数增益,并选取目标函数增益最大的分裂点作为全局最佳分裂点。每个计算节点无需获取完整的全局梯度直方图,只需要比较参数服务器节点返回的候选最佳分裂点的信息,大大减少了通信开销。并且通 过将全局梯度直方图切分到多个参数服务器节点上存储,解决了在处理大维度训练数据时遇到的单点瓶颈问题,加速了汇总局部梯度直方图的速度。另外,可以在指定一个计算节点来选取出最佳分裂点后,将该最佳分裂点存储在参数服务器,由该参数服务器将该全局最佳分裂点发送给其他计算节点,能够避免其他计算节点重复计算,大大节省了计算节点的计算量。并且,本发明实施例的计算节点以及参数服务器节点的数量可以按照用户的需求进行扩展,为可扩展的PS架构,解决了现有技术中Spark和XGBoost的在处理大维度训练数据时存在的单点瓶颈,以及可扩展性受限的问题。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。故,本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,在处理器执行所述计算机程序时可实现以上方法。
以上所揭露的仅为本发明较佳实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。

Claims (19)

  1. 一种基于参数服务器的梯度提升决策树的实现方法,包括:
    通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;
    接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
    根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
  2. 如权利要求1所述的方法,其中,还包括:
    针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块局部梯度直方图;其中,P个分块局部梯度直方图与P个参数服务器节点一一对应;
    通过所述参数服务器中用于更新梯度直方图的接口,将每个待处理树节点的每个分块局部梯度直方图发送给对应的参数服务器节点。
  3. 如权利要求1所述的方法,其中,每个最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益;
    所述选取目标函数增益最大的最佳分裂点作为全局最佳分裂点包括:将目标函数增益最大的最佳分裂点中的分裂特征和分裂特征值作为全局最佳分裂点。
  4. 如权利要求1-3任一项所述的方法,其中,所述选取目标函数增益最大的最佳分裂点作为全局最佳分裂点之后,还包括:
    将所述全局最佳分裂点发送给所述参数服务器,以使所述参数服务器将所述全局最佳分裂点发送给所有的计算节点。
  5. 一种基于参数服务器的梯度提升决策树的实现方法,包括:
    参数服务器节点通过预设的参数服务器中用于获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;
    所述参数服务器节点根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;
    所述参数服务器节点将所述最佳分裂点的信息发送给所述计算节点。
  6. 如权利要求5所述的方法,其中,还包括:
    所述参数服务器节点通过所述参数服务器中用于更新梯度直方图的接口,接收计算节点发送的分块局部梯度直方图;其中,所述分块局部梯度直方图为所述计算节点针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块的局部梯度直方图;且P个分块的局部梯度直方图与P个参数服务器节点一一对应;
    所述参数服务器节点将所述分块局部梯度直方图累加到对应的全局梯度直方图上。
  7. 一种计算节点设备,包括处理器、存储器、输入模块和输出模块,所述存储器存储多条指令,所述指令由所述处理器加载并执行:
    通过所述输出模块,并通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;
    通过所述输入模块接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
    根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标 函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
  8. 如权利要求7所述的设备,其中,所述处理器还执行:
    针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块;其中,P个分块局部梯度直方图与P个参数服务器节点一一对应;
    通过所述输出模块,并通过所述参数服务器中用于更新梯度直方图的接口,将每个待处理树节点的每个分块局部梯度直方图发送给对应的参数服务器节点。
  9. 如权利要求7所述的设备,其中,每个最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益;
    所述处理器选取目标函数增益最大的最佳分裂点作为全局最佳分裂点包括:将目标函数增益最大的最佳分裂点中的分裂特征和分裂特征值作为全局最佳分裂点。
  10. 如权利要求7-9任一项所述的设备,其中,所述处理器选取目标函数增益最大的最佳分裂点作为全局最佳分裂点之后,还执行:通过所述输出模块将所述全局最佳分裂点发送给所述参数服务器,以使所述参数服务器将所述全局最佳分裂点发送给所有的计算节点。
  11. 一种参数服务器节点设备,包括处理器、存储器、输入模块和输出模块,所述存储器存储多条指令,所述指令由所述处理器加载并执行:
    通过预设的参数服务器中用于获取最佳分裂点的接口,并通过所述输入模块接收计算节点发送的最佳分裂点获取请求;
    根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;
    通过所述输出模块将所述最佳分裂点的信息发送给所述计算节点。
  12. 如权利要求11所述的设备,其中,所述处理器还执行:
    通过所述参数服务器中用于更新梯度直方图的接口,并通过所述输入模块接收计算节点发送的分块局部梯度直方图;其中,所述分块局部梯度直方图为所述计算节点针对每个待处理树节点,根据自身节点的训练数据分别计算各自对应的局部梯度直方图,并将每个局部梯度直方图分成P个分块的局部梯度直方图;且P个分块的局部梯度直方图与P个参数服务器节点一一对应;
    将所述分块局部梯度直方图累加到对应的全局梯度直方图上。
  13. 如权利要求11或12所述的设备,其中,所述最佳分裂点的信息包括分裂特征、分裂特征值以及目标函数增益。
  14. 一种参数服务器,包括P个参数服务器节点设备,所述P为大于1的自然数;所述参数服务器节点设备为如权利要求11-13任一项所述的参数服务器节点设备。
  15. 一种基于参数服务器的梯度提升决策树的实现系统,包括参数服务器和至少一个计算节点设备;其中,所述参数服务器为如权利要求14所述的参数服务器;所述计算节点设备为如权利要求7-10任一项所述的计算节点设备。
  16. 一种基于参数服务器的梯度提升决策树的实现方法,该方法由计算节点执行,该方法包括:
    通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;
    接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
    根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
  17. 一种基于参数服务器的梯度提升决策树的实现装置,包括:至 少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:
    请求模块,用于通过预设的参数服务器中用于获取最佳分裂点的接口,分别向P个参数服务器节点发送最佳分裂点获取请求;其中,所述参数服务器包括P个参数服务器节点,每个参数服务器节点各自存储有M/P个特征,所述M为训练数据的特征数量;M和P均为大于1的自然数;
    信息接收模块,用于接收所述P个参数服务器节点分别发送的最佳分裂点的信息,得到P个最佳分裂点的信息;其中,所述P个最佳分裂点为所述P个参数服务器节点通过预设的梯度提升决策树GBDT优化算法从各自存储的M/P个特征中计算出的最佳分裂点;
    最佳分裂点选取模块,用于根据所述P个最佳分裂点的信息,比较所述P个最佳分裂点的目标函数增益,并选取目标函数增益最大的最佳分裂点作为全局最佳分裂点。
  18. 一种基于参数服务器的梯度提升决策树的实现装置,包括:至少一个存储器;至少一个处理器;其中,所述至少一个存储器存储有至少一个指令模块,经配置由所述至少一个处理器执行;其中,所述至少一个指令模块包括:
    请求接收模块,用于通过预设的参数服务器中用于获取最佳分裂点的接口,接收计算节点发送的最佳分裂点获取请求;
    最佳分裂点计算模块,用于根据所述最佳分裂点获取请求,通过预设的梯度提升决策树GBDT优化算法从存储的M/P个特征中计算出最佳分裂点;所述M为训练数据的特征数量,所述P为所述参数服务器包括的参数服务器节点的数量;M和P均为大于1的自然数;
    信息发送模块,用于将所述最佳分裂点的信息发送给所述计算节点。
  19. 一种计算机可读存储介质,其上存储有计算机程序,在处理器执行所述计算机程序时可实现如权利要求1~6和16中任一项所述的方法。
PCT/CN2018/081900 2017-05-10 2018-04-04 基于参数服务器的梯度提升决策树的实现方法及相关设备 WO2018205776A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710326930.X 2017-05-10
CN201710326930.XA CN108875955B (zh) 2017-05-10 2017-05-10 基于参数服务器的梯度提升决策树的实现方法及相关设备

Publications (1)

Publication Number Publication Date
WO2018205776A1 true WO2018205776A1 (zh) 2018-11-15

Family

ID=64104317

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081900 WO2018205776A1 (zh) 2017-05-10 2018-04-04 基于参数服务器的梯度提升决策树的实现方法及相关设备

Country Status (2)

Country Link
CN (1) CN108875955B (zh)
WO (1) WO2018205776A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475988A (zh) * 2020-04-03 2020-07-31 浙江工业大学之江学院 基于梯度提升决策树和遗传算法的印染定型机能耗优化方法
CN111488942A (zh) * 2020-04-15 2020-08-04 深圳前海微众银行股份有限公司 数据处理方法、设备及计算机可读存储介质
EP3709229A1 (en) * 2019-03-13 2020-09-16 Ricoh Company, Ltd. Learning device and learning method
US20210133677A1 (en) * 2019-10-31 2021-05-06 Walmart Apollo, Llc Apparatus and methods for determining delivery routes and times based on generated machine learning models
CN113722739A (zh) * 2021-09-06 2021-11-30 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质
CN113824677A (zh) * 2020-12-28 2021-12-21 京东科技控股股份有限公司 联邦学习模型的训练方法、装置、电子设备和存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109826626B (zh) * 2019-01-08 2020-10-20 浙江大学 一种智能的采煤机切割模式识别系统
CN112052954B (zh) * 2019-06-06 2024-05-31 北京百度网讯科技有限公司 梯度提升树建模方法、装置以及终端
CN110728317A (zh) * 2019-09-30 2020-01-24 腾讯科技(深圳)有限公司 决策树模型的训练方法、系统、存储介质及预测方法
CN113497785B (zh) * 2020-03-20 2023-05-12 深信服科技股份有限公司 恶意加密流量检测方法、系统、存储介质和云端服务器
CN111680799B (zh) * 2020-04-08 2024-02-20 北京字节跳动网络技术有限公司 用于处理模型参数的方法和装置
CN111860831B (zh) * 2020-06-19 2023-01-10 苏州浪潮智能科技有限公司 一种基于PyTorch框架的自动重计算方法、装置
CN111738534B (zh) * 2020-08-21 2020-12-04 支付宝(杭州)信息技术有限公司 多任务预测模型的训练、事件类型的预测方法及装置
CN112948608B (zh) * 2021-02-01 2023-08-22 北京百度网讯科技有限公司 图片查找方法、装置、电子设备及计算机可读存储介质
CN114118641B (zh) * 2022-01-29 2022-04-19 华控清交信息科技(北京)有限公司 风电场功率预测方法、gbdt模型纵向训练方法及装置
CN114529108B (zh) * 2022-04-22 2022-07-22 北京百度网讯科技有限公司 基于树模型的预测方法、装置、设备、介质及程序产品

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718493A (zh) * 2014-12-05 2016-06-29 阿里巴巴集团控股有限公司 基于决策树的搜索结果排序方法及其装置
CN105809176A (zh) * 2014-12-30 2016-07-27 华为技术有限公司 基于区间策略的最佳分裂点生成方法和装置
US20170076198A1 (en) * 2015-09-11 2017-03-16 Facebook, Inc. High-capacity machine learning system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351196A1 (en) * 2013-05-21 2014-11-27 Sas Institute Inc. Methods and systems for using clustering for splitting tree nodes in classification decision trees
CN105808582A (zh) * 2014-12-30 2016-07-27 华为技术有限公司 基于分层策略的决策树并行生成方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718493A (zh) * 2014-12-05 2016-06-29 阿里巴巴集团控股有限公司 基于决策树的搜索结果排序方法及其装置
CN105809176A (zh) * 2014-12-30 2016-07-27 华为技术有限公司 基于区间策略的最佳分裂点生成方法和装置
US20170076198A1 (en) * 2015-09-11 2017-03-16 Facebook, Inc. High-capacity machine learning system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YANG, ZICAN: "A parallel Regularized Greedy Forest implementation based on Message Passing Interface.", ELECTRONIC JOURNAL, CHINESE MASTER'S THESES FULL-TEXT DATABASE, 15 January 2015 (2015-01-15), ISSN: 1674-0246 *
YE, JERRY ET AL.: "Stochastic Gradient Boosted Distributed Decision Trees", PROCEEDINGS OF THE 18TH ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 6 November 2009 (2009-11-06), pages 2061 - 2064, XP055238102 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3709229A1 (en) * 2019-03-13 2020-09-16 Ricoh Company, Ltd. Learning device and learning method
US11475314B2 (en) 2019-03-13 2022-10-18 Ricoh Company, Ltd. Learning device and learning method
US20210133677A1 (en) * 2019-10-31 2021-05-06 Walmart Apollo, Llc Apparatus and methods for determining delivery routes and times based on generated machine learning models
CN111475988A (zh) * 2020-04-03 2020-07-31 浙江工业大学之江学院 基于梯度提升决策树和遗传算法的印染定型机能耗优化方法
CN111475988B (zh) * 2020-04-03 2024-02-23 浙江工业大学之江学院 基于梯度提升决策树和遗传算法的印染定型机能耗优化方法
CN111488942A (zh) * 2020-04-15 2020-08-04 深圳前海微众银行股份有限公司 数据处理方法、设备及计算机可读存储介质
CN113824677A (zh) * 2020-12-28 2021-12-21 京东科技控股股份有限公司 联邦学习模型的训练方法、装置、电子设备和存储介质
CN113824677B (zh) * 2020-12-28 2023-09-05 京东科技控股股份有限公司 联邦学习模型的训练方法、装置、电子设备和存储介质
CN113722739A (zh) * 2021-09-06 2021-11-30 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质
CN113722739B (zh) * 2021-09-06 2024-04-09 京东科技控股股份有限公司 梯度提升树模型的生成方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN108875955A (zh) 2018-11-23
CN108875955B (zh) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2018205776A1 (zh) 基于参数服务器的梯度提升决策树的实现方法及相关设备
US11610115B2 (en) Learning to generate synthetic datasets for training neural networks
US10504009B2 (en) Image hash codes generated by a neural network
US10643124B2 (en) Method and device for quantizing complex artificial neural network
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
KR102469261B1 (ko) 적응적 인공 신경 네트워크 선택 기법들
US11755367B2 (en) Scheduling operations on a computation graph
US11645585B2 (en) Method for approximate k-nearest-neighbor search on parallel hardware accelerators
CA3094507A1 (en) Systems, devices and methods for transfer learning with a mixture of experts model
WO2022160442A1 (zh) 答案生成方法、装置、电子设备及可读存储介质
US11669768B2 (en) Utilizing relevant offline models to warm start an online bandit learner model
CN114329029B (zh) 对象检索方法、装置、设备及计算机存储介质
CN113806582B (zh) 图像检索方法、装置、电子设备和存储介质
CN109032630B (zh) 一种参数服务器中全局参数的更新方法
CN115131633A (zh) 一种模型迁移方法、装置及电子设备
CN113590898A (zh) 数据检索方法、装置、电子设备、存储介质及计算机产品
KR20160128869A (ko) 사전 정보를 이용한 영상 물체 탐색 방법 및 이를 수행하는 장치
WO2023160309A1 (zh) 一种联邦学习方法以及相关设备
CN116796038A (zh) 遥感数据检索方法、装置、边缘处理设备及存储介质
CN116776969A (zh) 联邦学习方法及装置、计算机可读存储介质
CN107944045B (zh) 基于t分布哈希的图像检索方法及系统
CN113240089B (zh) 基于图检索引擎的图神经网络模型训练方法和装置
US20220383073A1 (en) Domain adaptation using domain-adversarial learning in synthetic data systems and applications
US20220198334A1 (en) Method and system for active learning and for automatic analysis of documents
WO2024045188A1 (en) Loop transformation in tensor compilers of deep neural networks (dnns)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18797949

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18797949

Country of ref document: EP

Kind code of ref document: A1