CN106649203A

CN106649203A - Method for improving processing quality of big data

Info

Publication number: CN106649203A
Application number: CN201611232063.5A
Authority: CN
Inventors: 袁烨
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2017-05-10

Abstract

The invention discloses a method for improving the processing quality of big data. A big data processing and analysis problem is transformed into a problem of solving the minimum value, based on big data processing characteristics, an existing numerical optimization algorithm is improved, and a numerical optimization method for accelerating the convergence rate of an algorithm is obtained. On the basis of an existing iterative algorithm, the Powerball function is introduced, to be specific, the nonlinear Powerball function about a target function gradient term is introduced in an iterative formula, and the function contains a dynamic coefficient so as to improve the convergence rate of the algorithm. The method has the advantages of being high in convergence rate, short in operation time and the like, the problem about the big data analysis speed and quality can be well solved, and the running time of a computer can be significantly reduced.

Description

It is a kind of to improve the method that big data processes quality

Technical field

The invention belongs to big data is processed and data mining problem field, more particularly, to a class data analysis is improved The data processing technique of speed and data analysis quality.

Background technology

Present society is the society of a high speed development, and science and technology is flourishing, information flow, and the exchange between people is increasingly Closely, life also increasingly facilitates, and big data is exactly the product of this cyberage.The development of computer realm, Yi Jiguo The appearance of border internet, the optimization problem to solve large-scale data provides convenient.Into 21 century since, big rule have been solved The algorithm design of mould optimization problem and theory innovation are by the extensive concern of each research field expert.How by big data Analyzing and process problem is converted into and solves Numerical Optimization in large scale, and how to solve large-scale optimization problem is Instantly one of emphasis that each area research person pays close attention to.And the previous work with regard to solving-optimizing problem is focused primarily upon and changed For the selection of parameter in algorithm, some more classical numerical optimisation algorithms are worth to by choosing different parameters, so far still What is be widely used has the methods such as famous gradient descent method, Newton method.But in numerical optimisation algorithms, convergence of algorithm speed Rate is to weigh the good and bad important indicator of the algorithm, therefore how to accelerate convergence of algorithm speed and be particularly important.Substantially, when When solving large-scale optimization problem, primary iteration calculate constraint it is critical that

Big data has the features such as data scale of construction is huge, data type is various, value density is low, processing speed is fast, research Solve having with the efficient algorithm that calculating is easy, storage demand amount is little for large-scale data problem analysis highly important Theoretical and practical significance.How quickly and efficiently from the data of various type, quickly obtain valuable information, be big One of primary goal of data processing.And the essence of data processing, it is equivalent to not only quick but also efficiently solves optimized problem.

In fact, it is ubiquitous to optimize this concept, therefore as a kind of optimization of the means being optimal Method, it should be and be really also with countless changes.The problem overwhelming majority handled in operational research is all optimization problem. Method for solving these problems, such as Mathematical Planning, queueing theory, Analysis of Policy Making, analogue technique etc., naturally also just belong to Optimal method this category.In addition, optimizing also includes Engineering Control, optimum control, systematic science etc..It is wherein optimum Control is mainly used in the optimization to various control systems, and for example, the optimum control of guided missile system can guarantee that and be completed with minimum fuel Aerial mission, with the shortest time target is reached；The works such as the optimum control of aircraft, ship, power system etc. for another example, chemical industry, metallurgy The control of the optimum condition of factory.The further development of computer interface unit constantly improve and optimization method, also exists for computer Line production control creates advantage.The object of optimum control is also by from the control to hardware systems such as machinery, electric, chemical industry Turn to ecological, environment so that the control of social economic system.

Optimal method discussion be decision problem optimal selection characteristic, construction seeks the computational methods of optimal solution, Study the theory property and Practical Calculation performance of these computational methods.Along with the high speed development and optimized calculation method of computer Progress, fairly large optimization problem is resolved.Because optimization problem extensively sees Economic planning, engineering design, life The key areas such as management, communications and transportation, national defence are produced, it is paid much attention to by government department, scientific research institution and branch of industry. In the face of data scale it is huge the features such as, the numerical algorithm of existing solving-optimizing problem either convergence rate, or during operation Between and the aspect such as running memory, can not meet the requirement of big data process.

The existing algorithm for processing optimization problem has a lot, but main still more classical gradient descent method, Newton method And quasi-Newton method, the hereinafter referred to as numerical method containing gradient terms is gradient class method.

I) gradient descent method：

Gradient descent method is a kind of optimization algorithm, be to solve for that unconstrained optimization problem is most simple and most ancient method it One, although no longer there is stronger practicality now, but many efficient algorithms be all be improved based on it and Obtained from amendment.It with negative gradient direction is the direction of search that steepest descent method is, steepest descent method is closer to desired value, step Length is less, advances slower.

Ii) Newton method：

Newton method is to solve for optimization problem (1.1) ancient and effective method, asks without constraint compared to other solutions The method of topic, the method needs less iterations, functional value calculation times when optimum point is found.Classic Newton method One significant advantage is its Local quadratic convergence, but the successful key of Newton method is to make use of sea plug Hesse matrixes to carry For curvature information, Newton method requires the second dervative of calculating target function, and when solution of the iteration point away from problem, function The Hesse matrixes of f may not positive definite or even unusual, now Newton method failure.

Iii) quasi-Newton method：

The workload for calculating Hesse matrixes by above-mentioned Newton method is larger, and the Hesse matrixes of the object function having are difficult Calculate, in addition it is bad obtain, quasi-Newton method be exactly construct object function curvature it is approximate, and do not need obvious form Hesse matrixes, while with fast convergence rate a little.

Consider following most widely used quasi-Newton method：L-BFGS methods

First, it is assumed that object function f (x) be Second Order Continuous can be micro-, using Taylor launch can obtain

Wherein X (k+1)=X (k)+α_kd_k, d_k=-H_k▽f(X(k)).Construction ▽²F's (X (k)) suitably approaches matrix B_kSo that following equation is set up

B_k+1S_k=Y_k,

Wherein, S_k=X (k+1)-X (k), Y_k=▽ f (X (k+1))-▽ f (X (k)).The B of next step iteration_k+1By as follows Correction formula is obtained：

In order to avoid asking in each iterationOrderCan calculate next by following correction formula accordingly H needed for secondary iteration_k+1,

K values are 0,1,2 in formula ..., when k=0 takes at the beginning initial matrix H₀For unit matrix.

Quasi-Newton method is built upon the optimized algorithm on the basis of Newton method, and the method is mainly using target function value and single order The information of derivative is iterated calculating, the characteristics of it has fast convergence rate, and avoids the calculating of object function second dervative. But when the dimension of problem is very big, the method needs very big memory space.

Above-mentioned these algorithms are faced with that convergence rate is slow, precision is relatively low, Yi Jiji when the related problem of big data is processed The problems such as calculation amount is greatly and the requirement to internal memory is larger, therefore be not suitable for solving some optimization problems related to big data, And the aspect such as the development and application to information included in data.

The content of the invention

For the problems referred to above, the invention provides a kind of method for improving data processing quality, for solution and big data Some optimization problems of correlation or the problem of solution object function minimum of a value, method provided by the present invention solves prior art and deposits Convergence rate it is slow, relatively low and computationally intensive and to internal memory the requirement of precision is larger the problems such as

Method proposed by the present invention comprises the steps：

(1) according to the data characteristic collected, data are processed：If whether data processing problem is to solve for function Minimum of a value optimization problem：It is to go to step (2)；Otherwise, solution minimum of a value optimization problem is converted into, is gone to step (2)；

(2) minimum of a value Optimized model is set upWherein RⁿFor the n-dimensional vector of real number field, f (X) is object function It is the nonlinear function of a twice continuously differentiable, X is n-dimensional vector, its initial value is X (0)；

(3) gradient class optimization method is chosen, methods described includes gradient descent method, Newton method and L-BFGS methods；According to The optimization method of selection, introduces Powerball functions, sets up Powerball iterative formulas, is iterated；It is described Powerball function expressions are σ_γ(z)=sign (z) | z |^γ, γ ∈ (0, it is 1) Power coefficients, z ∈ R；

For gradient descent method, corresponding Powerball iterative formulas are：

For Newton method, corresponding Powerball iterative formulas are：

X (k+1)=X (k)-(▽²f(X(k)))^-1σ_γ(▽f(X(k)))；

For L-BFGS methods, corresponding Powerball iterative formulas are：

Wherein,It is that the Hesse matrixes of object function approach matrix, therefore with Hesse matrixes have identical dimension；S_k=X (k+1)-X (k), is the vector for having identical dimension with X (k)；Y_k=▽ f (X (k+1))-▽ f (X (k)),I.e.

Here, B_kIt is that object function Hesse matrixes approach matrix, has same dimension with Hesse matrixes；

In formula, ▽ f (X) are the gradient of object function f (X)；▽²F (X) is the Hesse matrixes of object function f (X)；K is Iterations, value is 0,1,2 ..., α_kStep-length during iteration secondary for kth, X (k) approaches value for what kth time iteration was obtained；When During k=0, B_kInitial value is taken as unit matrix, and the initial value of X (k) can arbitrarily be chosen；σ_γ(·):R → R is Powerball function σ_γIt is right The nonlinear transformation of target function gradient is Powerball conversion, to arbitrary vector X=(x₁,...,x_n)^T, Jing Powerball function σ_γNonlinear transformation, become σ_γ(X)=(σ_γ(x₁),...,σ_γ(x_n))^T；

(4) convergence is judged, concrete determination methods are as follows：

When object function is strong convex function, and its gradient meets L-Lipschitz conditions, that is, meet Lipschitz condition, And its Lipchitz coefficient is when being L, then differentiate that whether iterations is more than N；It is that iteration terminates, output optimal value X (k+1)； Otherwise continue iteration；

When object function is not strong convex function, or its gradient is when being unsatisfactory for L-Lipschitz conditions, then judge | | X (k+ 1)-X (k) | | whether ＜ ε set up, and are, iteration terminates, output optimal value X (k+1)；Otherwise continue iteration；ε is error precision, Weighed according to required precision and amount of calculation；

Wherein, N is the default iterations upper limit.

Further, the L-Lipschitz conditions in the step (4) are：

To any X, Y ∈ Rⁿ, function f (X) meets following formula：

||▽f(Y)-▽f(X)||≤L||Y-X||；| | | | any norm of vector is represented, L ＞ 0 are Lipschitz Constant, is chosen for a upper bound of the norm of target function gradient；

The strong convex function refers to function f (X) to any X ∈ Rⁿ,

It is convex function；

In above formula, | | | |₂Then represent and take 2- norms；For the norms of vectorial X bis- square, m is normal more than zero Number.

Further, in the step (4),Can minimum iterations obtain Optimal value X (k)；WhereinFor Liapunov Lyapunov letters Number；

Further, in the step (3), (0,1) value size is true according to error precision ε for Power coefficient gammas ∈ It is fixed；Error precision ε is bigger, and the value of γ is less, and convergence of algorithm speed is faster；

Further, the γ is selected according to iterations adaptive mode：

Wherein γ₀, γ₁The respectively initial value of γ and final value, its value mode is initial value γ₀Better, final value γ closer to 0₁ Better the closer to 1, optimal value takes γ₀=0.1, γ₁=0.9, N are the iterations upper limit；Primary iteration and later stage iteration have Convergence rate faster.

Further, step (3) alternative manner is as follows：

(1) for an One-bit gradient descent method, Powerball function σ are made_γ(z)=sign (z) | z |^γIn take γ =0；Now σ_γZ ()=sign (z), z ∈ R, iterative formula is：

Now, the symbol of each element is only needed in gradient calculation, to communication bandwidth when greatly reducing data conversion Requirement, so as to reduce communications cost optimized to strong convex function；

(2) when big data problem analysis is converted into into solving-optimizing problem, object function is sometimes some functions The form of sum, it is as follows

Wherein f_i(X), j=1,2 ..., l be meet L-Lipschitz gradients, Second Order Continuous can be micro- strong convex function, So now consider the Powerball methods for stochastic variable, set up Powerball iterative formulas as follows；

WhereinRandomly select during iteration each time.

Further, in the step (4), m elects the absolute value of the minimal eigenvalue of the Hesse matrixes of object function as.

With regard to Lipschitz conditions, as long as giving a function meets Lipschitz conditions, it is possible to get Lipschitz constant L.L>0 is Lipschitz constants, is generally chosen for a upper bound of the norm of target function gradient.

Compared with prior art, the present invention in big data analysis for improving data analyzing speed and data analysis quality Data processing technique, solves emphatically the convergence of algorithm speed during some optimization problems related to big data are solved Upgrade Problem, it is considered to find the minimum of a value of object function, the initial value application alternative manner to giving is solved.By calculating in optimization The nonlinear Powerball functions with regard to gradient terms are introduced in method, Powerball is iterative for construction.Because Powerball changes For the primary iteration fast convergence rate of method, under big data application background, limited by effective computing resource, optimized algorithm Primary iteration efficiency it is just particularly important, Powerball methods are to improve the effective ways of big data optimization processing quality.For Traditional optimized algorithm, with the increase of data volume, optimized algorithm tends not to obtain its last convergence.Present invention utilizes calculating The strategy of method convergence in finite iteration step, it is middle in former optimization process to increase by one with regard to gradient terms containing the coefficient of impact Nonlinear function, in an iterative process the optional coefficient of impact be certain determine constant can also according to actual conditions be chosen for The variable element of iterations change, not only realizes and the effect of Fast Convergent is reached in finite iteration number of times, and gives tool The iterations upper limit of body, improves convergence of algorithm speed, it is to avoid existing algorithm unpredictable calculating time and amount of calculation Problem.Meanwhile, by carrying out deforming the Powerball methods for obtaining, such as stochastic gradient to other standards method Powerball methods, L-BFGS Powerball methods etc..Have the advantages that fast convergence rate, run time are short, can be fine Solution data analysis speed and quality.

Description of the drawings

Fig. 1 is the schematic flow sheet of the optimized algorithm method that the present invention improves rate of convergence；

Power coefficient gammas optimum value when Fig. 2 is assigned error precision ε；

Fig. 3 is to solve three data sets in example with gradient Powerball method (γ ＜ 1) and gradient method (γ=1) Optimization problem result schematic diagram；

Fig. 4 is to solve three data in example with L-BGFS Powerball methods (γ ＜ 1) and L-BGFS (γ=1) The optimization problem result schematic diagram of collection.

Specific embodiment

With reference to Figure of description and specific embodiment, the present invention is further elaborated.

Detailed description below is an instantiation of the present invention in terms of big data process, not to limit The use range of the present invention, every concrete case that can be converted into optimization problem can adopt this method.First according to above-mentioned Convergence is verified described in the content of the invention；Can be asked using this method if convergence.

(1) example is the non-immediate optimization problem of data processing, according to Fig. 1, can be micro- with second order What Logistic was returned- Regularization function as object function, according to given data pairThe target of this problem It is to solve for following minimum problems：

Now, rememberIts gradient meets Lipschitz conditions, It is also twice differentiable, then there are Solve problems i.e.And f (w) is apparent from for strong convex function, wherein w is to meet certain bar The parameter vector of part, for example, meet certain normal distribution.

Three data sets are chosen in embodiment 1, regularization is carried out to data set using formula (1.4), wherein choosing it respectively In to data set KDD10 and CTR choose λ=1, to data set RCV1 choose λ=0, standard basis as shown in table 1 below can be obtained Data set is classified：

Table 1

Wherein, RCV1 is Reuter's news category data set, and Reuter is first three big multimedia Inter-Press Service of the world, All kinds of news and finance data are provided.Can so utilize the data collected by it to be analyzed, such as it is all kinds of by analyzing News data, can be classified and be issued accurately, in time, efficiently to news, improve the promptness and height of the issue of news Effect property.KDD10 is to find and the data chosen in data mining contest that international intellectual finds and data from international intellectual in 2010 It is the most influential race of current Data Mining to excavate contest.The match is converged simultaneously towards business circles and academia World data is excavated top expert, scholar, engineer, the student on boundary etc. and is participated in, and by contest, is data mining practitioner The ideal place that shows there is provided an academic exchange and achievement in research.Match topic over the years takes from different excavation applications, And have a very strong application background.CTR is the set obtained by being sampled to ad click rate data, and ad click rate is estimated Critically important role is played in accurate advertisement launch process, income, gray receipts of the accuracy estimated to advertiser The friendly experience of benefit and user has great impact, therefore suffers from the extensive concern of Internet enterprises.

(2) by given initial value, here the initial value is to randomly select, and according to following classical gradient descent method, is obtained To the iterative formula for solving the optimization problem in (1)

Powerball function σ are introduced in above formula_γ(z)=sign (z) | z |^γ, obtain such as Gradient Powerball side Method,

And it is 1,0.7,0.4,0.1 to take the value of selection γ respectively, wherein when γ=1, as former iterative formula itself is entered Row iteration solves the optimization problem in (1).In solution procedure, first, with the backtracking linear search method of standard step-length is chosen； Secondly as w is parameter vector, the initial value w of the selection weight coefficient w random as k=0₀, initial vector w₀In component can Elect the stochastic variable of Normal Distribution N (0,0.01) as；Finally, repetition test 10 times and the average of experimental result is taken as most Whole result of the test；Obtain result as shown in figure 3, wherein the corresponding data set of left hand view be RCV1, the corresponding data of middle graph Integrate as KDD10, the corresponding data set of right part of flg is CTR.Assigned error precision ε=10^-3When, changing the value of γ, such as γ divides Do not take 1,0.7,0.4,0.1, wherein γ=1 when, be the gradient method of standard.From the figure 3, it may be seen that introducing nonlinear factor Powerball methods accelerate the convergence rate of conventional method；And when γ gets over hour, Powerball convergence of algorithm speed is got over Hurry up；

(3) it is same by any given initial value, choose following L-BFGS methods

Wherein, B_kS_k=Y_k S_k=w (k+1)-w (k), Y_k=▽ f (w (k+1))-▽ f (w (k)),H_k+1= H_k+ΔH_k.Powerball functions are introduced in above formula, following corresponding L-BFGS Powerball methods are obtained

Wherein

Solution is iterated to the optimization problem in (1) using said method.In solution procedure, first, with standard Backtracking linear search method choose step-length；Secondly, according to normal distribution N (0,0.01) weight coefficient w is initialized；Most Afterwards, repetition test 10 times and the average of experimental result is taken as final result of the test；Result is obtained as shown in figure 4, wherein left Figure corresponding data set in side is RCV1, and the corresponding data set of middle graph is News20, and the corresponding data set of right part of flg is CTR.Give Determine error precision ε=10^-3When, change the value of γ, such as γ takes respectively 1,0.7,0.4,0.1, wherein γ=1 when, be standard L-BFGS methods.As shown in Figure 4, the Powerball methods for introducing nonlinear factor accelerate the convergence rate of conventional method；And When γ gets over hour, Powerball convergence of algorithm speed is faster.

It can be seen that, method proposed by the present invention can be realized restraining faster in finite iteration number of times, and this is in big data process In have stronger advantage.On the other hand, this method can in advance set the iterations for meeting required precision, can save operation Time, saving memory space.

As it will be easily appreciated by one skilled in the art that the foregoing is only presently preferred embodiments of the present invention, not to The present invention, all any modification, equivalent and improvement made within the spirit and principles in the present invention etc. are limited, all should be included Within protection scope of the present invention.

Claims

1. it is a kind of to improve the method that big data processes quality, it is characterised in that to comprise the steps：

(1) according to the data characteristic collected, it is analyzed to optimize data：If whether data processing problem is to solve for letter Several minimum of a value optimization problems：It is to go to step (2)；Otherwise, by the Regularization to data, it is converted into solution minimum of a value Optimization problem, goes to step (2)；

(2) minimum of a value Optimized model is set upWherein RⁿFor the n-dimensional vector of real number field, f (X) is object function, is one The nonlinear function of individual twice continuously differentiable, X is n-dimensional vector；

(3) gradient class optimization method is chosen, methods described includes gradient descent method, Newton method and L-BFGS methods；With specific reference to The optimization method of selection, introduces Powerball functions, sets up Powerball iterative formulas, is iterated；The Powerball Function expression σ_γ(z)=sign (z) | z |^γ, γ ∈ (0, it is 1) Power coefficients, z ∈ R；

For gradient descent method, corresponding Powerball iterative formulas are：

For Newton method, corresponding Powerball iterative formulas are：

X (k+1)=X (k)-(▽²f(X(k)))^-1σ_γ(▽f(X(k)))；

For L-BFGS methods, corresponding Powerball iterative formulas are：

Wherein,It is that the Hesse matrixes of object function approach matrix, with Hesse Matrix has identical dimension；S_k=X (k+1)-X (k), is the vector for having identical dimension with X (k)；X (k+1)=X (k)+α_kd_k,Y_k=▽ f (X (k+1))-▽ f (X (k)),I.e.

In formula, ▽ f (X) are the gradient of object function f (X)；▽²F (X) is the Hesse matrixes of object function f (X)；K is iteration time Number, value is 0,1,2 ... ..., α_kStep-length during iteration secondary for kth, X (k) approaches value for what kth time iteration was obtained；Work as k=0 When, B_kInitial value is taken as unit matrix, and the initial value X (0) of X (k) can arbitrarily choose；σ_γ(·):R → R is Powerball function σ_γIt is right The nonlinear transformation of target function gradient is Powerball conversion；

(4) convergence is judged, concrete determination methods are as follows：

When object function is strong convex function, and its gradient is when meeting L-Lipschitz conditions, then differentiate whether iterations is more than N；It is that iteration terminates, output optimal value X (k+1)；Otherwise continue iteration；

When object function is not strong convex function, or its gradient is when being unsatisfactory for L-Lipschitz conditions, then judge | | X (k+1)-X (k) | | whether ＜ ε set up, and are, iteration terminates, output optimal value X (k+1)；Otherwise continue iteration；ε is error precision, according to Required precision and amount of calculation are weighed；

Wherein, N is the default iterations upper limit.

2. the method that big data processes quality is improved according to claim 1, it is characterised in that the L- in the step (4) Lipschitz conditions are：

To any X, Y ∈ Rⁿ, function f (X) meets following formula：

||▽f(Y)-▽f(X)||≤L||Y-X||；| | | | any norm of vector is represented, L ＞ 0 are Lipschitz constants, It is chosen for a upper bound of the norm of target function gradient；

The strong convex function refers to function f (X) to any X ∈ Rⁿ,

It is convex function；

In above formula, | | | |₂Then represent and take 2- norms；For the norms of vectorial X bis- square, m is the constant more than zero.

3. the method that big data processes quality is improved according to claim 2, it is characterised in that described In step (4),Can minimum iterations obtain optimal value X (k)；WhereinFor Lyapunov functions.

4. the method that big data processes quality is improved according to claim 1, it is characterised in that described in the step (3) (0,1) value size determines Power coefficient gammas ∈ according to error precision ε；Error precision ε is bigger, and the value of γ is less, algorithm Convergence rate it is faster.

5. the method that big data processes quality is improved according to claim 1, it is characterised in that the γ is according to iterations Adaptive mode is selected：

Wherein γ₀, γ₁The respectively initial value of γ and final value, optimal value takes γ₀=0.1, γ₁=0.9, N are the iterations upper limit.

6. the method that big data processes quality is improved according to claim 1, it is characterised in that step (3) alternative manner is such as Under：

(1) for One-bit gradient descent methods, Powerball function σ are made_γ(z)=sign (z) | z |^γIn take γ=0；Now σ_γZ ()=sign (z), z ∈ R, iterative formula is：

(2) for the Powerball methods of stochastic variable, Powerball iterative formulas are set up as follows；

In formula,Wherein f_j(X), j=1,2 ..., l is to meet L-Lipschitz gradients, Second Order Continuous Strong convex function that can be micro-, whereinRandomly select during iteration each time.

7. the method that big data processes quality is improved according to Claims 2 or 3, it is characterised in that in the step (4), m Elect the absolute value of the minimal eigenvalue of the Hesse matrixes of object function as.