CN105630739A

CN105630739A - Apparatus and method for executing stochastic gradient descent

Info

Publication number: CN105630739A
Application number: CN201410601799.XA
Authority: CN
Inventors: 石自强; 刘汝杰
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2016-06-01

Abstract

The invention relates to an apparatus and a method for executing stochastic gradient descent. The apparatus comprises an initialization unit, an iterative unit and an output unit, wherein the initialization unit is configured to initialize a universal constant and predetermined precision related to smoothness information of a target function; the iterative unit is configured to randomly select a component loss function related to a specific sample in a training set to perform iteration, so that an intermediate solution in each iteration is updated according to the universal constant and the predetermined precision and can be closer to a real solution; and the output unit is configured to output a weighted mean of all intermediate solutions as a final solution after all iterations are executed.

Description

For performing the apparatus and method of stochastic gradient descent

Technical field

The present invention relates to stochastic gradient descent field, more particularly, to being used for performing the apparatus and method that the pervasive gradient unrelated with the smoothness of object function declines.

Background technology

The stochastic gradient method in machine learning field refers to that randomly selecting single sample in each iteration processes, but not all of batch data is loaded into internal memory and processes. This is one of the most promising in large-scale machines learning tasks in recent years method. In the famous document being correlated with about deep neural network, lasso problem, logistic recurrence, ridge recurrence, continuous stainer (Steiner) problem, support vector machine etc. recently, Stochastic gradient method has had important breakthrough and development.

Stochastic gradient method has become as process about smooth or the convex problem of Non-smooth surface large-scale optimizatoin powerful, but current method is it is to be appreciated that the accurate smoothness of majorized function, and when object function has scala media Hoelder continuous gradient, current any stochastic gradient method is optimized all without way.

It is able to carry out pervasive stochastic gradient descent carrys out the apparatus and method of optimization object function accordingly, it would be desirable to a kind of.

Summary of the invention

Brief overview about the present invention given below, in order to the basic comprehension about certain aspects of the invention is provided. Should be appreciated that this general introduction is not that the exhaustive about the present invention is summarized. It is not intended to determine the key of the present invention or pith, and nor is it intended to limit the scope of the present invention. It is only intended to and provides some concept in simplified form, in this, as the preamble in greater detail discussed after a while.

One main purpose of the present invention is in that, it is provided that a kind of device for performing stochastic gradient descent, including: initialization unit, it is configured to initialize the general constant relevant with the smoothness information of object function and predetermined accuracy; Iteration unit, is configured to randomly select the component lost function relevant to the specific sample in training set and is iterated, to update the intermediate solution of each iteration according to described general constant and described predetermined accuracy so that described intermediate solution is closer to true solution; Output unit, is configured to after having performed all iteration, exports the weighted average of all intermediate solutions as last solution.

According to an aspect of the invention, it is provided a kind of method for performing stochastic gradient process, including: initialization step, initialize the general constant relevant with the smoothness information of object function and predetermined accuracy; Iterative step, randomly selects the component lost function relevant to the specific sample in training set and is iterated, to update the intermediate solution of each iteration according to general constant and described predetermined accuracy so that described intermediate solution is closer to true solution; Output step, after having performed all iteration, exports the weighted average of all intermediate solutions as last solution.

It addition, embodiments of the invention additionally provide the computer program for realizing said method.

Additionally, embodiments of the invention additionally provide the computer program of at least computer-readable medium form, on it, record has the computer program code for realizing said method.

By below in conjunction with the accompanying drawing detailed description to highly preferred embodiment of the present invention, these and other advantage of the present invention will be apparent from.

Accompanying drawing explanation

Below with reference to the accompanying drawings illustrate embodiments of the invention, the above and other objects, features and advantages of the present invention can be more readily understood that. Parts in accompanying drawing are intended merely to and illustrate principles of the invention. In the accompanying drawings, same or similar technical characteristic or parts will adopt same or similar accompanying drawing labelling to represent.

Fig. 1 illustrates according to one embodiment of present invention for performing the block diagram of the exemplary configuration of the device 100 of stochastic gradient descent;

Fig. 2 is the block diagram of the exemplary configuration illustrating the device 100 ' for performing quick stochastic gradient descent according to another embodiment of the invention;

Fig. 3 illustrates the flow chart of the method 300 for performing stochastic gradient descent according to an embodiment of the invention;

Fig. 4 illustrates the flow chart of the method 400 for performing quick stochastic gradient descent according to another embodiment of the invention; And

Fig. 5 illustrates the exemplary block diagram that may be used for implementing the computing equipment of the apparatus and method for performing stochastic gradient descent of the present invention.

Detailed description of the invention

Embodiments of the invention are described with reference to the accompanying drawings. Can combine with the element shown in one or more other accompanying drawing or embodiment and feature at the element described in the accompanying drawing of the present invention or a kind of embodiment and feature. It should be noted that, for purposes of clarity, accompanying drawing and eliminate expression and the description of unrelated to the invention, parts known to persons of ordinary skill in the art and process in illustrating.

The present invention is proposed for the smooth and non-smooth convex problem with scala media Hoelder continuous gradient, it is proposed that a new pervasive Stochastic gradient method (UniversalStochasticGradientMethod, USGM) carrys out optimization object function. General Stochastic gradient method does not need the accurate smoothness obtaining object function of priori. Meanwhile, unlike traditional stochastic gradient method, the present invention can pre-suppose that the fixed precision of method. Therefore expand the range of application of stochastic gradient descent method, and propose the first-order arithmetic that a class is pervasive. The present invention also been proposed a kind of method of quick pervasive stochastic gradient descent simultaneously. Fact proved, pervasive stochastic gradient method and quick pervasive stochastic gradient method can be restrained, and quick pervasive stochastic gradient method has convergence rate faster.

In the present invention, the minimization problem solving two convex function sums is solved. One of them convex function is called loss function or cost function, and it is the meansigma methods of some finite smooth component functions with Hoelder continuous gradient, and its component function is the function relevant to the data sample in training set; Another one convex function is a regular function or penalty, and its effect is used to the overfitting of inhibition function.

In the method for the invention, the smoothing information that some iteration constants are required in object function is calculated. In each iteration, making current point be more nearly true solution by using Bregmann to map, this is the core of stochastic gradient method of the present invention. Bregmann maps the intermediate solution being used for updating iteration each time, is designated as x_t, and the weighted average of these intermediate solutions exports as last solution after all iteration terminate.

Pervasive according to an embodiment of the invention stochastic gradient descent method and quick pervasive stochastic gradient descent method is described in detail below in conjunction with accompanying drawing.

Fig. 1 illustrates according to one embodiment of present invention for performing the block diagram of the exemplary configuration of the device 100 of stochastic gradient descent.

As it is shown in figure 1, the device 100 for performing stochastic gradient descent includes initialization unit 102, iteration unit 104 and output unit 106.

Initialization unit 102 is configured to initialize the general constant relevant with the smoothness information of object function and predetermined accuracy.

For example, it is possible to select the Lipschitz constant (UniversalLipschitzConstant, ULC) of initial generic, and the precision �� more than zero. Further, intermediate solution x_t also to be carried out initialization and obtain x_0 by initialization unit 102, and initializes Bregmann distance function.

Iteration unit 104 is configured to randomly select the component lost function relevant to certain sample in training set and is iterated, to update the intermediate solution of each iteration according to general constant and predetermined accuracy so that intermediate solution is closer to true solution.

In one embodiment, iteration unit 104 is configured to: build the linear approximation (LinearizedApproximationoftheObjectiveFunction (LAOF)) of object function with Bregmann distance function and selected component lost functional dependence, obtain the linear approximation of object function the Bregmann value v-LAOF mapped and with the object function of component lost functional dependence at the Bregmann value v-OBJ mapped; Incrementally find the minimum coefficient being associated with Bregmann mapping so that the v-OBJ weighted sum less than v-LAOF Yu precision; Map with Bregmann and update intermediate solution x_t, and with the minimum general constant of coefficient update found; And judge whether iteration restrains, if convergence, then iteration terminates, and otherwise proceeds iteration.

Output unit 106 is configured to after having performed all iteration, exports the weighted average of all intermediate solution x_t as last solution.

Although the above-mentioned device 100 for performing stochastic gradient descent has simple operability, but convergence rate is relatively low in actual applications. So according to another embodiment of the invention, it is proposed that the acceleration version of a kind of device for performing stochastic gradient descent.

Fig. 2 is the block diagram of the exemplary configuration illustrating the device 100 ' for performing quick stochastic gradient descent according to another embodiment of the invention.

As in figure 2 it is shown, the device 100 ' for performing stochastic gradient descent includes initialization unit 102 ', iteration unit 104 ' and output unit 106 '.

Initialization unit 102 ' is configured to initialize ULC, initializes precision �� (more than zero), initializes intermediate solution x_t (obtaining x_0), initialized reference solution y_t (obtaining y_0) and initialize the auxiliary function based on Bregmann distance, wherein x_0=y_0.

Iteration unit 104 ' is specifically configured to: finds the minima of auxiliary function, is expressed as v_t; Randomly select the component lost function relevant to the specific sample in training set; The linear approximation (LAOF) of the object function of structure and selected component lost functional dependence; V_t and y_t is used to build the respective weighting renewal function of x_t and y_t; Obtain the linear approximation of the object function value v-LAOF at y_t place and with value v-OBJ at x_t place of the object function of component lost functional dependence; Incrementally find the minimum coefficient relevant to weighting renewal function so that the v-OBJ weighted sum less than v-LAOF Yu adjusted precision; Update auxiliary function, x_t, y_t, the weight of weighting renewal function, v_t and ULC; Judging whether iteration restrains, if convergence, then iteration terminates, and otherwise proceeds iteration.

Output unit 106 ' is identical with the configuration of output unit 106, is namely configured to after having performed all iteration, exports the weighted average of all intermediate solution x_t as last solution.

May certify that, pervasive stochastic gradient method and quick pervasive stochastic gradient method can be restrained, and quick pervasive stochastic gradient method has convergence rate faster.

Described above according to an embodiment of the invention for performing in the process of device of stochastic gradient descent, it is clear that also disclose some and process or methods. Hereinafter, the general introduction of these methods is provided when not repeating some details being already discussed above, but, it should be noted that, although in description for having performed disclosed in the process of the device of stochastic gradient descent these methods, but, these methods might not adopt these parts above-mentioned, or might not be performed by these parts. Such as, the embodiment of device for performing stochastic gradient descent partially or fully can be realized with hardware and/or firmware, and the method for performing stochastic gradient descent discussed below can also realize by the executable program of computer completely, although these methods can also adopt hardware and/or the firmware of the device for performing stochastic gradient descent.

It should be noted that at this, merely exemplary for performing the device 100 of stochastic gradient descent and 100 ' and the structure of component units shown in Fig. 1 and Fig. 2, the structured flowchart shown in Fig. 1 and Fig. 2 can be modified by those skilled in the art as required.

Fig. 3 illustrates the flow chart of the method 300 for performing stochastic gradient descent according to an embodiment of the invention.

In step s 302, the general constant ULC and predetermined accuracy �� relevant with the smoothness information of object function is initialized.

In one embodiment, also intermediate solution x_t is carried out initialization and obtain x_0, and initialize Bregmann distance function.

In step s 304, randomly select the component lost function relevant to the specific sample in training set and be iterated, to update the intermediate solution of each iteration according to general constant and predetermined accuracy so that intermediate solution is closer to true solution.

In one embodiment, iterative step farther includes: build the linear approximation (LinearizedApproximationoftheObjectiveFunction (LAOF)) of object function with Bregmann distance function and selected component lost functional dependence, obtain the linear approximation of object function the Bregmann value v-LAOF mapped and with the object function of component lost functional dependence at the Bregmann value v-OBJ mapped; Incrementally find the minimum coefficient being associated with Bregmann mapping so that the v-OBJ weighted sum less than v-LAOF Yu precision; Map with Bregmann and update intermediate solution x_t, and with the minimum general constant of coefficient update found; And judge whether iteration restrains, if convergence, then iteration terminates, and otherwise proceeds iteration.

In step S306, after having performed all iteration, export the weighted average of all intermediate solution x_t as last solution.

Fig. 4 illustrates the flow chart of the method 400 for performing quick stochastic gradient descent according to another embodiment of the invention.

In step S402, initialize ULC, precision �� (more than zero), intermediate solution x_t (obtaining x_0), reference solution y_t (obtaining y_0) and the auxiliary function based on Bregmann distance, wherein x_0=y_0.

In step s 404, find the minima of auxiliary function, be expressed as v_t.

In step S406, randomly select the component lost function relevant to the specific sample in training set.

In step S408, the linear approximation (LAOF) of the object function of structure and selected component lost functional dependence, use v_t and y_t to build the respective weighting renewal function of x_t and y_t, and obtain the linear approximation of the object function value v-LAOF at y_t place and with value v-OBJ at x_t place of the object function of component lost functional dependence.

In step S410, incrementally find the minimum coefficient relevant to weighting renewal function so that the v-OBJ weighted sum less than v-LAOF Yu adjusted precision.

In step S412, update auxiliary function, x_t, y_t, the weight of weighting renewal function, v_t and ULC.

In step S414, it is judged that whether iteration restrains, if convergence, then iteration terminates, and otherwise proceeds iteration.

Finally, in step S416, after having performed all iteration, export the weighted average of all intermediate solution x_t as last solution.

It is given below according to embodiments of the invention for performing the instantiation of stochastic gradient descent. The problem that the present invention can solve following form:

Wherein g_t (x) be relevant to the sample in training set, there is the convex loss function of Hoelder continuous gradient, and h (x) is convex penalty (or being called regular function).

Function has meaning that of the Hoelder continuous gradient of degree v

{| | &dtri; g (x) - &dtri; g (y) | |}_{*} \leq L_{v} {| | x - y | |}^{v}, ν &Element; [0,1]

Here some definition used in example outlined below:

With

Bregmann distance:

ξ (x, y) : = d (y) - d (x) - &lang; &dtri; d (x), y - x &rang;

Wherein d (x) is prox-function, and it is to have convex parameter equal to 1, the strong convex function of differentiable, and its minima is 0.

Bregmann maps:

Wherein

ψ_{M, f} (x, y) = f (x) + &lang; &dtri; f (x), y - x &rang; + Mξ (x, y) + h (y);

If

Based on defined above, pervasive stochastic gradient method according to an embodiment of the invention can be expressed as follows by machine false code:

Input: L₀> 0 and �� > 0 (wherein, L₀For the initial value of ULC, �� is precision)

1: for t=0,1 ..., T performs:

2: randomly select the one-component loss function gk relevant to the specific sample in training set_t(x), wherein k_t��{0,1,��,T}

3: find minimum i_t>=0 makes

4: arrangeAnd

L_{i + 1} = 2^{i_{t} - 1} L_{t}

5:t=t+1

6: terminate

Output:

\overset{&OverBar;}{x} = \frac{1}{S_{T}} Σ_{t = 1}^{T + 1} \frac{1}{L_{t}} x_{t},

Wherein

S_{T} = Σ_{t = 1}^{T + 1} \frac{1}{L_{t}} .

Quick pervasive stochastic gradient method according to another embodiment of the invention can be expressed as follows by machine false code:

Input: L₀> 0, �� > 0 and ��₀(x)=�� (x₀, x), A₀=0, y₀=x₀(wherein, L₀For the initial value of ULC, �� is precision, ��₀X () is the auxiliary function based on Bregmann distance, x₀For initial intermediate solution, y₀For initial reference solution)

1: for t=0,1 ..., T performs:

3: find v_t=argmin_x��_t(x)(v_tIt is the minima of auxiliary function)

4: find minimum it >=0 to make

\{\begin{matrix} a_{t + 1, i_{t}}^{2} = \frac{1}{2^{i_{t}} L_{t}} (A_{t} + a_{t + 1, i_{t}}) \\ A_{t + 1, i_{t}} = A_{t} + a_{t + 1, i_{t}}, τ_{t, i_{t}} = \frac{a_{t + 1, i_{t}}}{A_{t + 1, i_{t}}} \\ x_{t + 1, i_{t}} = τ_{i, i_{t}} v_{t} + (1 - τ_{t, i_{t}}) y_{t}, \\ y_{t + 1, i_{t}} = τ_{t, i_{t}} {\hat{x}}_{t + 1, i_{t}} + (1 - τ_{t, i_{t}}) y_{t}, \end{matrix}

Guarantee following relation:

{gk}_{t} (y_{t + 1, i_{t}}) \leq {gk}_{t} (x_{t + 1, i_{t}}) + &lang; &dtri; {gk}_{t} (x_{t + 1, i_{t}}), y_{t + 1, i_{t}} - x_{t + 1, i_{t}} &rang;

+ 2^{i_{t} - 1} L_{t} {| | y_{t + 1, i_{t}} - x_{t + 1, i_{t}} | |}^{2} + \frac{&Element;}{2} τ_{t, i_{t}}

Set up,

Wherein

{\hat{x}}_{t + 1, i_{t}} = \arg \min_{y} {ξ (v_{t, y}) + a_{t + 1, i_{t}} [&lang; &dtri; {gk}_{t} (x_{t + 1, i_{t}}), y &rang; + h (y)]}

5: arrange

x_{t + 1} = x_{t + 1, i_{t}}, y_{t + 1} = y_{t + 1, i_{t}}, a_{t + 1} = a_{t + 1, i_{t}},

AndDefinition A_t+1=A_t+a_t+1,

L_{t + 1} = 2^{i_{t} - 1} L_{t},

And

φ_{t + 1} (x) = φ_{t} (x) + a_{t + 1} [{gk}_{t} (x_{t + 1}) + &lang; &dtri; {gk}_{t} (x_{t + 1}), x - x_{t + 1} &rang; + h (x)]

6:t=t+1

7: terminate

Output:

\overset{&OverBar;}{x} = \frac{1}{S_{T}} Σ_{t = 1}^{T + 1} \frac{1}{L_{t}} x_{t},

Wherein

S_{T} = Σ_{t = 1}^{T + 1} \frac{1}{L_{t}} .

May certify that, USGM method needsSecondary iteration reaches stochastic convergence ��: E [f_g(yT)]-E[f_g(x^*)]�ܡ�

And to reach stochastic convergence ��, quick USGM method has only toSecondary iteration, therefore quickly USGM has convergence rate faster than USGM.

Specifically, the apparatus and method being used for performing stochastic gradient descent according to an embodiment of the invention such as may be used for solving Lasso problem and Steiner problem.

Lasso problem can describe with following formula:

\underset{x &Element; R^{n * 1}}{\min imize} \frac{1}{T} Σ_{t = 1}^{T} {| | a_{t}^{T} x - b_{t} | |}^{2} + μ {| | x | |}_{1}

Wherein, a_t, x �� R^n*1, b_tIt it is a scalar. If with A=[a₁a₂��a_T], b=[b₁b₂��b_T], the problem above that just becomes minimize | | Ax-b | |²+��||x||₁��

Lasso problem is usable in a lot of aspect, such as recognition of face, Speaker Identification etc.

Based in the recognition of face of rarefaction representation, the rarefaction representation of face is based on illumination model. An i.e. facial image, it is possible to represent with the linear combination of all of facial image of same person in data base. And for the face of other people in data base, the coefficient of its linear combination is theoretically zero. Owing to generally there being multiple images of a lot of different faces in data base, if the linear combination of image all of in data base is represented this given test face, its coefficient vector is sparse. Because except the image combination coefficient of the face of this and same person is not zero, other coefficient is all zero. Representing by above formula and be exactly, wherein A represents the matrix that in data base, multiple images of a lot of different faces form, and b represents unknown face to be identified, and x is b decomposition coefficient on A. Solve the rarefaction representation that namely above optimization problem obtains decomposing.

Similar with in the case of above for Speaker Identification, repeat no more.

The apparatus and method being used for performing stochastic gradient descent according to an embodiment of the invention are utilized to solve problem above, it is possible to be left out the smoothness information of object function.

In continuous Steiner problem, it is known that the c of center_i��Rⁿ, i=1 ..., m. The optimization position finding service centre x is necessary, because x achieves minimum total distance relative to other centers. Therefore, it can be described as problem:

\min_{x &Element; R^{n}} f (x) : = \frac{1}{m} Σ_{i = 1}^{m} | | x - c_{i} | |

Wherein, in this problem, all standards are Euclidean distance. The apparatus and method for performing stochastic gradient descent according to the present invention effectively solve this problem. But, in the application of reality, it is possible to having new position and add system, ratio, if any new shop the first transaction of a day's business or new Warehouse Establishing, is at this moment accomplished by on-line learning algorithm.

The present invention proposes a kind of method for performing stochastic gradient process, can be seen that pervasive stochastic gradient method compensate for the larger difference between smooth and non-smooth convex problem, this is a general frame that stochastic gradient method processes the object function with Hoelder continuous gradient. Traditional method needs to be known a priori by the practical extent of the flatness of object function, and the present invention focuses on and finds out the variable relevant with smoothness by linear search method, all necessary informations are accumulated in certain constant, thus realizing pervasive stochastic gradient descent method.

The ultimate principle of the present invention is described above in association with specific embodiment, but, it is to be noted, for those of ordinary skill in the art, it will be appreciated that whole or any steps of methods and apparatus of the present invention or parts, can in any calculation element (including processor, storage medium etc.) or the network of calculation element, being realized with hardware, firmware, software or their combination, this is that those of ordinary skill in the art use their basic programming skill can be achieved with when the explanation having read the present invention.

Therefore, the purpose of the present invention can also be realized by one program of operation or batch processing on any calculation element. Described calculation element can be known fexible unit. Therefore, the purpose of the present invention can also only by providing the program product comprising the program code realizing described method or device to realize. It is to say, such program product also constitutes the present invention, and storage has the storage medium of such program product also to constitute the present invention. Obviously, described storage medium can be any known storage medium or any storage medium developed in the future.

When realizing embodiments of the invention by software and/or firmware, from storage medium or network to the computer with specialized hardware structure, such as the general purpose computer 500 shown in Fig. 5 installs the program constituting this software, this computer is when being provided with various program, it is possible to perform various function etc.

In Figure 5, CPU (CPU) 501 is according to the program stored in read only memory (ROM) 502 or the program various process of execution being loaded into random access memory (RAM) 503 from storage part 508. In RAM503, also according to needing to store the data required when CPU501 performs various process etc. CPU501, ROM502 and RAM503 are via bus 504 link each other. Input/output interface 505 also link is to bus 504.

Components described below link is to input/output interface 505: importation 506 (including keyboard, mouse etc.), output part 507 (include display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and speaker etc.), storage part 508 (including hard disk etc.), communications portion 509 (including NIC such as LAN card, modem etc.). Communications portion 509 performs communication process via network such as the Internet. As required, driver 510 also can link to input/output interface 505. Detachable media 511 such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed in driver 510 as required so that the computer program read out is installed in storage part 508 as required.

When realizing above-mentioned series of processes by software, the program constituting software is installed from network such as the Internet or storage medium such as detachable media 511.

It will be understood by those of skill in the art that this storage medium be not limited to shown in Fig. 5 wherein have program stored therein and equipment distributes the detachable media 511 of the program that provides a user with separately. The example of detachable media 511 comprises disk (comprising floppy disk (registered trade mark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trade mark)) and semiconductor memory. Or, storage medium can be hard disk of comprising etc., wherein computer program stored in ROM502, storage part 508, and is distributed to user together with the equipment comprising them.

The present invention also proposes the program product that a kind of storage has the instruction code of machine-readable. When instruction code is read by machine and performs, above-mentioned method according to embodiments of the present invention can be performed.

Correspondingly, the storage medium being used for carrying the program product of the instruction code that above-mentioned storage has machine-readable is also included within disclosure of the invention. Storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc.

It should be appreciated by those skilled in the art that in this exemplified being illustrative of, the invention is not limited in this.

In this manual, the statement such as " first ", " second " and " n-th " is to described feature be distinguished on word, so that the present invention is explicitly described. Therefore, should not serve to that there is any determinate implication.

Software, firmware, hardware or its combination is may be embodied as an example, each step of said method and all modules of the said equipment and/or unit, and as the part in relevant device. In said apparatus all modules, unit when being configured by the mode of software, firmware, hardware or its combination spendable specific means or mode be well known to those skilled in the art, do not repeat them here.

As an example, when being realized by software or firmware, to the computer (such as the general purpose computer 500 shown in Fig. 5) with specialized hardware structure, the program constituting this software can be installed from storage medium or network, this computer is when being provided with various program, it is possible to perform various functions etc.

Herein above in the description of the specific embodiment of the invention, the feature described for a kind of embodiment and/or illustrate can use in one or more other embodiments in same or similar mode, combined with the feature in other embodiments, or substitute the feature in other embodiments.

It should be emphasized that term " include/comprise " refers to the existence of feature, key element, step or assembly herein when using, but it is not precluded from the existence of one or more other features, key element, step or assembly or additional.

Additionally, the method for the present invention be not limited to specifications described in time sequencing perform, it is also possible to according to other time sequencing ground, concurrently or independently executable. Therefore, the technical scope of the present invention is not construed as limiting by the execution sequence of the method described in this specification.

The present invention and advantage thereof it should be appreciated that various change, replacement and conversion can be carried out when without departing from the spirit and scope of the present invention being defined by the claims appended hereto. And, the scope of the present invention is not limited only to the specific embodiment of the process described by description, equipment, means, method and steps. One of ordinary skilled in the art will readily appreciate that from the disclosure, can use process, equipment, means, method or step that perform the function essentially identical to the corresponding embodiment at this or obtain the result essentially identical with it, that existing and future is to be developed according to the present invention. Therefore, appended claim is directed in their scope to include such process, equipment, means, method or step.

Based on description above, it is known that disclose and at least disclose techniques below scheme:

Remarks 1, a kind of device for performing stochastic gradient descent, including:

Initialization unit, is configured to initialize the general constant relevant with the smoothness information of object function and predetermined accuracy;

Iteration unit, is configured to randomly select the component lost function relevant to the specific sample in training set and is iterated, to update the intermediate solution of each iteration according to described general constant and described predetermined accuracy so that described intermediate solution is closer to true solution;

Output unit, is configured to after having performed all iteration, exports the weighted average of all intermediate solutions as last solution.

Remarks 2, device according to remarks 1, wherein, described initialization unit is configured to:

Initialize intermediate solution and Bregmann distance function.

Remarks 3, device according to remarks 2, wherein, described iteration unit is configured to:

Build and the linear approximation of described Bregmann distance function and the object function of described component lost functional dependence, obtain the first value and the second value, wherein, the linear approximation that described first value is described object function in the Bregmann value mapped, described second value for the object function that joins with described component lost functional dependence in the Bregmann value mapped;

Incrementally find the minimum coefficient being associated with described Bregmann mapping so that described second value is less than the weighted sum of described first value with described precision;

Map with Bregmann and update described intermediate solution, and with general constant described in the minimum described coefficient update found;

Judging whether described iteration restrains, if convergence, then described iteration terminates, and otherwise proceeds described iteration.

Remarks 4, device according to remarks 1, wherein, described initialization unit is configured to:

Initialize intermediate solution, reference solution and the auxiliary function based on Bregmann distance, wherein, initialized with reference to solving equal to initialized intermediate solution.

Remarks 5, device according to remarks 4, wherein, described iteration unit is configured to:

Find the minima of described auxiliary function;

The linear approximation of the object function of structure and selected component lost functional dependence;

The minima and the described reference solution that use described auxiliary function solve respective weighting renewal function to build described intermediate solution and described reference;

Obtaining the first value and the second value, wherein said first value is the linear approximation value at reference Xie Chu of object function, and described second value is the value at intermediate solution place of the object function with described component lost functional dependence;

Incrementally find the minimum coefficient relevant to described weighting renewal function so that described second value is less than the weighted sum of described first value with adjusted described precision;

Update described auxiliary function, described intermediate solution, described with reference to solution, the minima of the weight of described weighting renewal function, described auxiliary function and general constant of stating;

Remarks 6, a kind of method for performing stochastic gradient process, including:

Initialization step, initializes the general constant relevant with the smoothness information of object function and predetermined accuracy;

Iterative step, randomly selects the component lost function relevant to the specific sample in training set and is iterated, to update the intermediate solution of each iteration according to described general constant and described predetermined accuracy so that described intermediate solution is closer to true solution;

Output step, after having performed all iteration, exports the weighted average of all intermediate solutions as last solution.

Remarks 7, method according to remarks 6, wherein, described initialization step also includes:

Initialize intermediate solution and Bregmann distance function.

Remarks 8, method according to remarks 7, wherein, described iterative step also includes:

Remarks 9, method according to remarks 6, wherein, described initialization step also includes:

Remarks 10, method according to remarks 9, wherein, described iterative step also includes:

Find the minima of described auxiliary function;

Claims

1. for performing a device for stochastic gradient descent, including:

2. device according to claim 1, wherein, described initialization unit is configured to:

Initialize intermediate solution and Bregmann distance function.

3. device according to claim 2, wherein, described iteration unit is configured to:

4. device according to claim 1, wherein, described initialization unit is configured to:

5. device according to claim 4, wherein, described iteration unit is configured to:

Find the minima of described auxiliary function;

6. the method for performing stochastic gradient process, including:

7. method according to claim 6, wherein, described initialization step also includes:

Initialize intermediate solution and Bregmann distance function.

8. method according to claim 7, wherein, described iterative step also includes:

9. method according to claim 6, wherein, described initialization step also includes:

10. method according to claim 9, wherein, described iterative step also includes:

Find the minima of described auxiliary function;