CN105550161A

CN105550161A - Parallel logic regression method and system for heterogeneous systems

Info

Publication number: CN105550161A
Application number: CN201510945415.0A
Authority: CN
Inventors: 王娅娟; 张广勇; 吴韶华; 卢晓伟; 张清
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2016-05-04

Abstract

The invention discloses a parallel logic regression method and system for heterogeneous systems. The method comprises the following steps: computing the gradient of a target function of a logic regression model through a parallel manner; forming the feature vectors of samples used during the gradient computation into a sample matrix; classifying labels to form label vectors; respectively dividing the sample matrix, the label vectors and the feature weight vectors; and after the division, distributing to batched computational nodes to compute respectively and regressing the results in parallel to obtain the gradient values of a large number of samples so as to determine the target characteristic weight vector according to the gradients obtained according to the parallel computation and complete the solution of LR problems. According to the method and system disclosed in the invention, the parallel solution of the LR problems of large-scale samples can be effectively carried out by utilizing batched computational nodes.

Description

A kind of parallel logic homing method of heterogeneous system and system

Technical field

The present invention relates to machine learning field, particularly relate to a kind of parallel logic homing method and system of heterogeneous system.

Background technology

Logistic regression (LogisticRegression, be called for short LR) be a kind of sorting algorithm very conventional in machine learning, be widely used at internet arena, no matter be in ad system, carry out CTR estimate, the identification rubbish contents etc. estimated in conversion ratio or anti-spam system in commending system can see its figure.LR receives the favor of numerous application persons with the universality of its simple principle and application.

In LR model, being weighted the value on the different dimensions of proper vector by feature weight vector, and being compressed to the scope of 0 ~ 1 with logical function, is the probability of positive sample as this sample.Logical function curve as shown in Figure 1, given M training sample (X ₁, y ₁), (X ₂, y ₂) ... (X _m, y _m), wherein X _j={ x _ji| i=1,2...N} are the proper vector of N dimension; y _jfor tag along sort, value is+1 or-1 ,+1 expression sample is positive sample, and-1 represents that sample is negative sample.In LR model, to be the probability of positive sample be a jth sample:

P (y_{j} = 1 | W, X_{j}) = \frac{1}{1 + e^{- W^{T} X_{j}}}

Wherein W is the feature weight vector of N dimension, the model parameter that namely will solve in LR problem.

Solve LR problem, be exactly the suitable feature weight vector W of searching one, make the positive sample inside for training set, P (y _j=1|W, X _j) value as far as possible greatly; For the negative sample inside training set, this value is as far as possible little, or P (y _j=-1|W, X _j) as far as possible large.Be expressed as by joint probability and solve:

\max_{W} p (W) = Π_{j = 1}^{M} \frac{1}{1 + e^{- y_{j} W^{T} X_{j}}}

Log asked to above formula and get negative sign, being then equivalent to:

\min_{W} f (W) = Π_{j = 1}^{M} l o g (1 + e^{- y_{j} W^{T} X_{j}}) - - - (1)

Formula (1) is exactly the objective function that LR solves, and finding suitable W and make objective function f (W) minimum, is a Unconstrained Optimization Problem, and the common practice addressed this problem is a given initial W at random ₀, by iteration, in each iteration calculating target function descent direction and upgrade W, until objective function is stabilized in minimum point, iterative process is as shown in Figure 2.

The difference of different optimized algorithms is just the calculating of objective function descent direction Dt, but in a practical situation, need to utilize extensive sample data to train, objective function descent direction Dt is solved to extensive sample, data volume to be processed is huge, directly utilize unit directly to carry out Dt to each sample to solve, solution efficiency is low.

Summary of the invention

In view of this, fundamental purpose of the present invention is the parallel logic homing method and the system that provide a kind of heterogeneous system, can carry out the LR problem solving of extensive sample efficiently.

For achieving the above object, the invention provides a kind of parallel logic homing method of heterogeneous system, comprising:

Obtain the objective function of Logic Regression Models;

The gradient of objective function described in parallel computation;

According to result of calculation determination target signature weight vectors;

Described in described parallel computation, the gradient of objective function comprises:

The tag along sort of the sample of M in training set is formed the label vector of a M dimension, M N dimensional feature vector is formed the sample matrix of a M*N, obtain the computing node of the capable n row of m, by described label vector sum sample matrix divided by row, for each computing node distributes M/m proper vector and tag along sort, by the current signature weight vectors divided by column that sample matrix and N are tieed up, be that each computing node distributes N/n dimensional feature vector and current signature weight vectors;

Each computing node is made to carry out the dot product of the feature weight vector respective components of divided by column and the respective components of proper vector divided by column respectively, the result of calculation of the identical computing node of line number is carried out and returned, the current signature weight vectors of often being gone respectively and the dot product result of character pair vector, turn back in the computing node that often row is corresponding by each described dot product result;

Each computing node is made to calculate the intermediate scalar of described target function gradient according to the respective components of each described dot product result and label vector divided by row respectively, and respectively each described intermediate scalar is multiplied with the respective components of proper vector divided by row, the result of calculation of row number identical computing node is carried out and returned, obtains the component that gradient vector often arranges respectively;

The component described gradient vector often arranged carries out merging the gradient obtaining objective function.

Preferably, the objective function of described Logic Regression Models is:

w is the current signature weight vectors of N dimension, X _jfor the sampling feature vectors of N dimension, y _jfor tag along sort.

Preferably, the gradient of described objective function is G _t,

Preferably, comprise according to result of calculation determination target signature weight vectors:

Steps A: make iterations be 0, determines initial weight characteristic vector W when iterations is 0 ₀;

Step B: make iterations value add 1, the gradient of objective function according to the parallel computation of present weight proper vector, according to described gradient calculation direction of search value, upgrades present weight proper vector according to described direction of search value;

Step C: judge whether described Grad meets and preset iteration stopping condition, if so, then enter step D, otherwise return step B;

Step D: current signature weight vectors is defined as target signature weight vectors.

Present invention also offers a kind of parallel logic regression system of heterogeneous system, comprising:

Objective function determination module, for obtaining the objective function of Logic Regression Models;

Parallel computation module, for the gradient of objective function described in parallel computation;

Target signature weight vectors determination module, for according to result of calculation determination target signature weight vectors;

Described parallel computation module comprises:

Computing node distribution sub module, for the tag along sort of the sample of M in training set being formed the label vector of a M dimension, M N dimensional feature vector is formed the sample matrix of a M*N, obtain the computing node of the capable n row of m, by described label vector sum sample matrix divided by row, for each computing node distributes M/m proper vector and tag along sort, by the current signature weight vectors divided by column that sample matrix and N are tieed up, be that each computing node distributes N/n dimensional feature vector and current signature weight vectors;

Row parallel computation submodule, for the dot product making each computing node carry out the feature weight vector respective components of divided by column and the respective components of proper vector divided by column respectively, the result of calculation of the identical computing node of line number is carried out and returned, the current signature weight vectors of often being gone respectively and the dot product result of character pair vector, turn back in the computing node that often row is corresponding by each described dot product result;

Row parallel computation submodule, the intermediate scalar of described target function gradient is calculated according to the respective components of each described dot product result and label vector divided by row respectively for making each computing node, and respectively each described intermediate scalar is multiplied with the respective components of proper vector divided by row, the result of calculation of row number identical computing node is carried out and returned, obtains the component that gradient vector often arranges respectively;

Merge submodule, the component for described gradient vector often being arranged carries out merging the gradient obtaining objective function.

Preferably, the objective function of described Logic Regression Models is:

Preferably, the gradient of described objective function is G _t,

Apply parallel logic homing method and the system of a kind of heterogeneous system provided by the invention, the mode of the gradient calculation of the objective function of Logic Regression Models by parallelization is calculated, the proper vector of sample compute gradient used forms sample matrix, tag along sort forms label vector, by sample matrix, label vector sum feature weight vector divides respectively, the computing node being assigned to batch after division calculates respectively and again result is returned the Grad obtaining great amount of samples, thus according to the gradient determination target signature weight vectors that parallel computation obtains, complete solving of LR problem, the computing node of batch can be utilized to carry out the Parallel implementation of the LR problem of extensive sample efficiently.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only embodiments of the invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to the accompanying drawing provided.

Fig. 1 is the curve map of logical function in LR model;

Fig. 2 is the iterative method flow diagram of LR model;

Fig. 3 is the process flow diagram of the parallel logic homing method embodiment one of a kind of heterogeneous system of the present invention;

Fig. 4 is the detail flowchart of the parallel logic homing method embodiment one of a kind of heterogeneous system of the present invention;

Fig. 5 is the detailed schematic process flow diagram of the parallel logic homing method embodiment one of a kind of heterogeneous system of the present invention;

Fig. 6 is the structural representation of the parallel logic regression system embodiment two of a kind of heterogeneous system of the present invention;

Fig. 7 is the detailed construction schematic diagram of the parallel logic regression system embodiment two of a kind of heterogeneous system of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

The invention provides a kind of parallel logic homing method of heterogeneous system, Fig. 3 shows the process flow diagram of the parallel logic homing method embodiment one of heterogeneous system of the present invention, comprising:

Step S101: the objective function obtaining Logic Regression Models;

Given M training sample (X ₁, y ₁), (X ₂, y ₂) ... (X _m, y _m), wherein X _j={ x _ji| i=1,2...N} are the proper vector of N dimension; y _jfor tag along sort, value is+1 or-1 ,+1 expression sample is positive sample, and-1 represents that sample is negative sample.In LR model, to be the probability of positive sample be a jth sample:

P (y_{j} = 1 | W, X_{j}) = \frac{1}{1 + e^{- W^{T} X_{j}}}

\max_{W} p (W) = Π_{j = 1}^{M} \frac{1}{1 + e^{- y_{j} W^{T} X_{j}}}

Log asked to above formula and get negative sign, being then equivalent to:

\min_{W} f (W) = Π_{j = 1}^{M} l o g (1 + e^{- y_{j} W^{T} X_{j}}),

So the objective function of Logic Regression Models is:

\min_{W} f (W) = Σ_{j = 1}^{M} l o g (1 + e^{- y_{j} W^{T} X_{j}}),

Step S102: the gradient of objective function described in parallel computation;

The descent direction D of above-mentioned objective function _t, D _t=-G _t, G _tfor the gradient of objective function,

G_{t} = &dtri; w f (W) = Σ_{j = 1}^{M} [σ (y_{j} W_{t}^{T} X_{j}) - 1] y_{j} X_{j} .

As shown in Figure 4, particularly, step S102 comprises:

Step S201: the label vector tag along sort of the sample of M in training set being formed a M dimension, M N dimensional feature vector is formed the sample matrix of a M*N, obtain the computing node of the capable n row of m, by described label vector sum sample matrix divided by row, for each computing node distributes M/m proper vector and tag along sort, by the current signature weight vectors divided by column that sample matrix and N are tieed up, be that each computing node distributes N/n dimensional feature vector and current signature weight vectors;

Data are pressed task division by total parameter server (node0), build described server node and other child nodes (node1, node2 ...) server, the i.e. framework of computing node cooperated computing, by whole training dataset according to being laterally divided into each machine in units of instance carries out Distributed Calculation, according to longitudinally becoming multiple subsegment to carry out distributed calculating the instance Data Placement of super large dimension, in a broadcast manner ready-portioned data slot is distributed to afterwards on all child node servers, described computing node builds the framework of described CPU and heavy nucleus (MIC) coprocessor cooperated computing, comprise: described CPU will be connected to the described CPU of same single node server and multiple MIC coprocessor as the framework coordinating to calculate, in this framework, the number of all devices is the number of CPU and the number sum of MIC coprocessor.

After distribution, the line number of the computing node that the feature of same sample is corresponding is identical, and the row of the computing node that the feature of different sample identical dimensional is corresponding are number identical, and the proper vector of a sample is split to be assigned in the node of same a line different lines, that is: X _r,k=<X _{(r, 1), k}..., X _{(r, c), k}..., X _{(r, n), k}>.

Wherein X _r,krepresent the kth vector that r is capable, X _{(r, c), k}represent X _r,kcomponent on c row node.Same, use W _cthe component of representation feature vector W on c row node, that is: W=<W ₁..., W _c..., W _n>.The gradient calculation Formula dependency of objective function is in two result of calculations: feature weight vector W _tand feature vector, X _jdot product, scalar [σ (y _jw _t ^tx _j)-1] y _jand feature vector, X _jbe multiplied.

Step S202: make each computing node carry out the dot product of the feature weight vector respective components of divided by column and the respective components of proper vector divided by column respectively, the result of calculation of the identical computing node of line number is carried out and returned, the current signature weight vectors of often being gone respectively and the dot product result of character pair vector, turn back in the computing node that often row is corresponding by each described dot product result;

Each computing node parallel computation dot product, the computing node merger dot product result identical to line number: the dot product result calculated needs to turn back in all computing nodes of this row.

Step S203: make each computing node calculate the intermediate scalar of described target function gradient according to the respective components of each described dot product result and label vector divided by row respectively, and respectively each described intermediate scalar is multiplied with the respective components of proper vector divided by row, the result of calculation of row number identical computing node is carried out and returned, obtains the component that gradient vector often arranges respectively;

Each computing node independently calculates scalar [σ (y _jw _t ^tx _j)-1] y _jrow component and feature vector, X _jbeing multiplied of row component, calculates G _{(r, c), t}, merger is carried out to row number identical node and obtains the component that gradient vector often arranges respectively

G_{c, t} = Σ_{r = 1}^{m} G_{(r, c), t} .

Step S204: the component described gradient vector often arranged carries out merging the gradient obtaining objective function.

Root node receives gradient component and carries out merging the Grad G obtaining objective function _t=<G _{1, t}..., G _n,t>.

Step S103: according to result of calculation determination target signature weight vectors;

Step S103 specifically comprises:

The detail flowchart of the present embodiment is shown in Fig. 5, each child node constructs a decision tree central processing unit (CPU), distribute M thread, and be the data slot that each thread distributes 1/M, M namely new data set M ', calculates dot product result respectively for the individual different data set of this M ' on different mic cards.Namely for the thread that first in M ' data set M1 ', M1 ' are corresponding, M1 ' is sent on corresponding mic card, calculates dot product result for this data set of M1 '.For other data sets M2 ' of child node, the subdata collection such as M3 ' walk abreast and do same operation.Namely each MIC coprocessor there is a data set, after all data sets have all calculated, each child node dot product result is aggregated into parameter server node.

The parallel logic homing method of a kind of heterogeneous system that application the present embodiment provides, the mode of the gradient calculation of the objective function of Logic Regression Models by parallelization is calculated, the proper vector of sample compute gradient used forms sample matrix, tag along sort forms label vector, by sample matrix, label vector sum feature weight vector divides respectively, the computing node being assigned to batch after division calculates respectively and again result is returned the Grad obtaining great amount of samples, thus according to the gradient determination target signature weight vectors that parallel computation obtains, complete solving of LR problem, the computing node of batch can be utilized to carry out the Parallel implementation of the LR problem of extensive sample efficiently.

Present invention also offers a kind of parallel logic regression system of heterogeneous system, Fig. 6 shows the structural representation of the parallel logic regression system embodiment two of heterogeneous system of the present invention, comprising:

Objective function determination module 101, for obtaining the objective function of Logic Regression Models;

Parallel computation module 102, for the gradient of objective function described in parallel computation;

Target signature weight vectors determination module 103, for according to result of calculation determination target signature weight vectors;

As shown in Figure 7, described parallel computation module 102 comprises particularly:

Computing node distribution sub module 201, for the tag along sort of the sample of M in training set being formed the label vector of a M dimension, M N dimensional feature vector is formed the sample matrix of a M*N, obtain the computing node of the capable n row of m, by described label vector sum sample matrix divided by row, for each computing node distributes M/m proper vector and tag along sort, by the current signature weight vectors divided by column that sample matrix and N are tieed up, be that each computing node distributes N/n dimensional feature vector and current signature weight vectors;

Row parallel computation submodule 202, for the dot product making each computing node carry out the feature weight vector respective components of divided by column and the respective components of proper vector divided by column respectively, the result of calculation of the identical computing node of line number is carried out and returned, the current signature weight vectors of often being gone respectively and the dot product result of character pair vector, turn back in the computing node that often row is corresponding by each described dot product result;

Row parallel computation submodule 203, the intermediate scalar of described target function gradient is calculated according to the respective components of each described dot product result and label vector divided by row respectively for making each computing node, and respectively each described intermediate scalar is multiplied with the respective components of proper vector divided by row, the result of calculation of row number identical computing node is carried out and returned, obtains the component that gradient vector often arranges respectively;

Merge submodule 204, the component for described gradient vector often being arranged carries out merging the gradient obtaining objective function.

The objective function of Logic Regression Models described in the present embodiment is:

w is the current signature weight vectors of N dimension, X _jfor the sampling feature vectors of N dimension, y _jfor tag along sort, the gradient of objective function is G _t,

G_{t} = Σ_{j = 1}^{M} [σ (y_{j} W^{T} X_{j}) - 1] y_{j} X_{j} .

The parallel logic regression system of a kind of heterogeneous system that application the present embodiment provides, the mode of the gradient calculation of the objective function of Logic Regression Models by parallelization is calculated, the proper vector of sample compute gradient used forms sample matrix, tag along sort forms label vector, by sample matrix, label vector sum feature weight vector divides respectively, the computing node being assigned to batch after division calculates respectively and again result is returned the Grad obtaining great amount of samples, thus according to the gradient determination target signature weight vectors that parallel computation obtains, complete solving of LR problem, the computing node of batch can be utilized to carry out the Parallel implementation of the LR problem of extensive sample efficiently.

It should be noted that, each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For system class embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.

Finally, also it should be noted that, in this article, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.

Be described in detail method and system provided by the present invention above, apply specific case herein and set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. a parallel logic homing method for heterogeneous system, is characterized in that, comprising:

Obtain the objective function of Logic Regression Models;

The gradient of objective function described in parallel computation;

2. the parallel logic homing method of heterogeneous system according to claim 1, is characterized in that, the objective function of described Logic Regression Models is:

3. the parallel logic homing method of heterogeneous system according to claim 2, is characterized in that, the gradient of described objective function is G _t,

G_{t} = Σ_{j = 1}^{M} [σ (y_{j} W^{T} X_{j}) - 1] y_{j} X_{j} .

4. the parallel logic homing method of heterogeneous system according to claim 3, is characterized in that, comprises according to result of calculation determination target signature weight vectors:

5. a parallel logic regression system for heterogeneous system, is characterized in that, comprising:

Described parallel computation module comprises:

6. the parallel logic regression system of heterogeneous system according to claim 5, is characterized in that, the objective function of described Logic Regression Models is:

7. the parallel logic regression system of heterogeneous system according to claim 6, is characterized in that, the gradient of described objective function is G _t,

G_{t} = Σ_{j = 1}^{M} [σ (y_{j} W^{T} X_{j}) - 1] y_{j} X_{j} .