AU2020100840A4

AU2020100840A4 - Efficient Distributed Methods for Sparse Solution to Composite Optimization Problem

Info

Publication number: AU2020100840A4
Application number: AU2020100840A
Authority: AU
Inventors: Huaqing Li; Zheng Wang; Lifeng Zheng
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-07-02
Anticipated expiration: 2028-05-26

Abstract

This patent proposes a distributed algorithm for composite optimization problems defined in a networked multi-agent system, where a group of networked agents are applied to col laboratively minimize the sum of all local objective functions which are smooth (possibly non-convex) and subjected to a convex constraint. The algorithm mainly comprises five parts including determining parameter; variable initialization; updating variable; exchang ing information; tracking gradient. The proposed algorithm integrates a successive convex approximation technique with the gradient tracking mechanism that aims to locally track the gradient of the smooth part of the global objective function, and employs a momentum term to regulate the update step in each time instant. The introduced momentum term can accelerate the convergence rate to a certain extent when the update direction is the same as that of the last time instant, and adjusts the gradient update direction which enhances the stability of the proposed algorithm otherwise. Numerical results are presented to demon strate the practicability of the proposed algorithm and the correctness of the theoretical results. 1/3 b Star Select afixed step-size and amomentum parameter according to the network topology and the global ohjecte function Hach agent initializeloalvaaes Bch agent ati k=0 and maximum number ofteratin, k Each agent calculates the local optimal solution and updates local voThle Hach agent computes the average ofthe local optimal solutions received from neighbor agents, clcoalues the accelerating variable and tracks the global gradient Select afied step-siz according tothe network topology and the global ohjective fuLnctioni Each agent Rets kk+ k>k, 9 Y fid Figure 1

Description

1/3

b Star

Select afixed step-size and amomentum parameter according to the network topology and the global ohjecte function

Hach agent initializeloalvaaes

Bch agent ati k=0 and maximum number ofteratin, k

Each agent calculates the local optimal solution and updates local voThle

Hach agent computes the average ofthe local optimal solutions received from neighbor agents, clcoalues the accelerating variable and tracks the global gradient

Select afied step-siz according tothe network topology and the global ohjective fuLnctioni

Each agent Rets kk+

k>k, 9

Y

fid

Figure 1

1. Technical Field

[0001] The present invention relates to a field of large-scale optimization problem.

2. Background

[0002] Consider that a connected network of m agents that cooperatively solve the optimiza tion problem in the form m

min& (z)= f ()+G

) (1) F(z)

s.t. z G X

where fi : R -+ R is the local objective function of agent i, assumed to be smooth and known only to agent i; G : R -+ R is a convex (possibly non-smooth) function; and X represents the set of common constraint, assumed to be closed and convex. The smooth+non-smooth structure of the objective function arises in various areas including decision making in sensor networks, statistical inference, networked multi-vehicle coordination, and machine learning problems. Some examples include distributed average consensus, distributed spectrum sens ing, information control, power systems control, statistical inference and learning. All agents are connected through a communication network, commonly modeled as a graph. Usually the non-smooth term is used to promote some extra structure in the solution; for instance, G (x) = cIIxI1 is widely used to impose sparsity of the optimal solution to problem (1). We thus develop an algorithm for agents to obtain an optimal solution to problem (1). In this setting, agents seek to cooperatively solve problem (1) by exchanging local information with their immediate neighbors.

3. Notation

[0003] All vectors throughout the patent are defaulted to be column vectors. Notations xT and AT are employed to indicate the transpose of a vector x and a matrix A, respectively. For a matrix A, its (i,j)-th element is denoted by A. We apply ||-|| for both vectors and matrices, in the former case, ||-|| represents the Euclidean norm whereas in the latter case it is the spectral norm. The notation1, represents an n-dimensional vector of ones and the notation I represents the identity matrix with proper dimensions. A nonnegative vector is called stochastic if the sum of its elements equals to one. A nonnegative square matrix is called row- (column-) stochastic if its rows (columns) are stochastic vectors, respectively.

4. Network Model

[0004] Define a network containing m agents, all agents aim at cooperatively solving the opti mization problem (1). Let m agents be connected over an undirected graph, g = (V, E, W), where V = {1,...,m}is the set of agents, E C V x V is the collection of edges, and W = [wi] E Rmxm indicates the weighted adjacency matrix where the weight wij associated with edge (i, j) satisfies: wij > 0 if (i, j) E S; and mij = 0, otherwise. We assume that (i, i) E 9 and set wii = 1 - Em , i > 0. Two agents i and j can only communicate directly with each other if the edge (i, j) E 9.

5. Problem Formulation

[0005] We equivalently reformulate the optimization problem (1) as follows:

min U (x)= (fi (Xi) + G (i)

s.t. (W 0 I) X = 0, (2)

zi E X, i = 1,...,m

where x = (X)T,..., (Xm)T I. In this setting, each agent i E V possesses the subfunction

U(.)= fi (-)+G(.) which is composed of the privacy knowledge of fi (.) and the public knowledged of G (). We apply x* E Rn to denote the optimal solution to problem (1). The global optimal solution to problem (2) is thus defined as I (9 x* E Rmn

6. Detailed Implementation Description

[0006] Figure 1 is an algorithm flowchart of the present invention. As shown in Figure 1, the distributed computational methods for sparse solution to large-scale optimization problem comprises the following steps:

6.1. Initialization

[0007] Step 1: Calculate the variables p = W - I1m1m| < 1, h = (s 1 + s 2 (1 + (1- #)p +)2) (1+ (1- ) p + 0)2 and s o 48M p 2 2 .Each agent i E V selects a fixed step-size and a momentum parameter according to 0 < c < min (/L 2 + soh - L) (-f_- and < < 1, where L is the Lipschitz parameter off.

[0008] Step 2: Each agent i E V initializes with arbitrary s = x E R", and y = Vfi (s'), where Vfi (s') is the gradient of local objective function fi at s .

[0009]Step 3: Each agent i E V sets k = 0 and maximum number of iterations, km.

6.2. Calculating local variables

[0010] Step 4: Each agent i E V calculates local optimal solution, M =arg min fi (x; S)+ xiEX T (ir4) (xi -si)+G (xi) where ir' = my' -Vfj (s), and local variable, x = s +a (-si).

6.3. Exchanging information

[0011] Step 5: Each agent i E V sends variables x to its neighbor agent j if (ij) E

.

[0012] Step 6: Each agent i E V calculates a average of received variables, x', V (i, j) E

as xk 1 ZWijX'k+i. j=1 2

[0013] Step 7: Each agent i E V updates the variable s+1 as s+1 X +1 + +1) k(s

and sends it to its neighbor agent j if (i, j) E 9.

6.4. Gradient Tracking

[0014] Step 8: Each agent i C V tracks the global gradient y+ 1 = $W (yk + Vf3 (si+1) j=1 Vf 3 (si))

[0015] Step 9: Each agent i E V sets k = k + 1 and goes to Step 4 until a certain stopping criterion is satisfied, e.g., k > kmax where kmax is the maximum number of iterations.

7. Innovation

[0016] 1) Use an extra momentum term to accelerate the convergence rate and propose a new accelerated algorithm.

[0017] 2) Give the explicit range of uncoordinated step-sizes which makes the algorithm converge to the optimization of problem (1).

[0018] 3) The sparse solution to non-convex optimization problem can be obtained by the proposed method.

8. Simulations

[0019] In this section, we solve the logistic regression for the breast cancer wisconsin (diag nostic) data set provided in UCI Machine Learning Repository to show the effectiveness of the proposed algorithm. Features of the data set, including mean of distances from center to points on the perimeter, severity of concave portions of the contour and so on, are computed from a digitized image of a fine needle aspirate of a breast mass. Based on the sample values given in the data set, this experiment is to predict that a patient's condition is malignant. The probability can be computed as P (1 = 1|c) = 1/(1+ exp (-lcT)) , where c and 1 are the data and label of sample, respectively. We choose a subset of 683 samples from the data set, where N = 200 samples are assigned to m networked agents and used to train the discriminator z. Another 483 samples are used for testing. Each data and label of h-th sample of i-th agent is represented by a vector Ci,h C R9 and a constant li,h {-1, 1}, where

'_,1 qi = N, i E V,h= 1,. . ., i. It follows from this model that the regularized

maximum log likelihood estimate of the classifier z given the training samples (ci,h,li,h) for h = 1, . . . , qi and i = 1, . . . , m, is the optimal solution of the optimization problem

2 = argmin ||z ±+ In (1 + exp (-li,ch +c 1 i=1 h=1

where the regularization term } is added to avoid over-fitting and c||zllj is used to impose sparsity of the optimal solution. In the following simulations, the residual is defined as logo ((1/m) Em, | 1xi- z*||).

8.1. Comparison with Distributed Algorithms

[0020] We now compare the performance of NIDS, NEXT, PG-EXTRA and the proposed algorithm. In this case, we initialize the variables with s = x = -5 x 19 and y = Vf (x), Vi E V. Momentum parameter of the proposed algorithm is selected as/#= 0.5. The network topology with m = 10 agents is randomly generated, in which the connectivity probability of each edge is 0.7. Fig.2 shows the evolutions of residuals respect to different algorithms while Fig. 3 shows the testing accuracy when step-size is chosen as a = 0.01. From Fig.2, it can be found that the proposed algorithm converges faster than NIDS and PG-EXTRA when a = 0.01, and the residual of the proposed algorithm is reduced to a certain extent faster than that of NEXT.

8.2. Effects of Network Sparsity

[0021] This subsection investigates the effects of network sparsity on the performance of the proposed algorithm. We consider four categories of network, star network, ring network, tree network, and full connected network shown in Fig.4, with decreasing sparsity. We also initialize the variables with si = X = -5 x 19 and y = VfA (x), Vi E V and set step-size a = 0.01 and momentum parameter #= 0.5. The performance of the proposed algorithm under each category of networks is shown in Fig. 5. It is verified that the proposed algorithm converges faster as the network becomes dense.

9. Brief Description of The Drawings

[0022] Figure 1 is a flowchart of the fast distributed strategy.

[0023] Figure 2 shows the comparison across different algorithms.

[0024] Figure 3 shows the testing accuracy across different algorithms.

[0025] Figure 4 shows the different network topologies.

[0026] Figure 5 shows the evolutions over different network topologies.

Claims

The claims defining the invention are as follows:

1. Distributed algorithm

1.1. Initialization

Step 1: Calculate the variables p = W- 1m1m|| < 1, h = (si + s 2 (1 + (1- ) p + 0)2)

(1+ (1 - )p + 0)2 and so= _ _" 2 . Each agent i V selects a fixed step-size and a momentum parameter according to 0< a < (L 2 +sh-L)_((_)p+6) 2 and 0 < < 1, where L is the Lipschitz parameter of f. Step 2: Each agent i E V initializes with arbitrary s = x E R', and y = Vf (s'), where Vf; (s') is the gradient of local objective function fi at s. Step 3: Each agent i E V sets k = 0 and maximum number of iterations, kmX.

1.

2. Calculating local variables

Step 4: Each agent i E V calculates local optimal solution, arg min fi (x'; s)+(r) (x"

si)+ (xi) where r = my' - V f(si), and local variable, x' = s + a(i - s).

1.

3. Exchanging information

Step 5: Each agent i E V sends variables x to its neighbor agent j if (i, j) E

. k+2

Step 6: Each agent i E V calculates a average of received variables, x ,V (i, j) E as 22 m

3k1 =1 k+2j

Step 7: Each agent i E V updates the variable s+ 1 assi+1 = A X+1 (s - zi+)andsends it to its neighbor agent j if (i, j) E S.

1.

4. Gradient Tracking

Step 8: Each agent i c V tracks the global gradient yk+ 1 I (y + Vf (s4+1) - Vf (si)) j=1 Step 9: Each agent i E V sets k = k +1 and goes to Step 4 until a certain stopping criterion is satisfied, e.g., k > kmax where kmax is the maximum number of iterations.