CN112069631A

CN112069631A - Distributed projection method considering communication time delay and based on variance reduction technology

Info

Publication number: CN112069631A
Application number: CN202010614853.XA
Authority: CN
Inventors: 李华青; 胡锦辉; 夏大文; 陈欣; 王政; 吕庆国; 黄廷文
Original assignee: Southwest University
Current assignee: Southwest University
Priority date: 2020-06-30
Filing date: 2020-06-30
Publication date: 2020-12-11
Anticipated expiration: 2040-06-30
Also published as: CN112069631B

Abstract

The invention discloses a distributed projection method considering communication time delay and based on a variance reduction technology, which comprises the following steps: step 1, providing an original optimization problem model (1) aiming at a multi-intelligent system simultaneously provided with local set constraint and local equality constraint; step 2, equivalently converting the original optimization problem model (1) obtained in the step 1 into a convex optimization problem model (2) convenient for distributed processing; step 3, a distributed projection algorithm (3) based on a variance reduction technology is provided to solve a convex optimization problem model (2) with constraints, namely, a local random average gradient is adopted to estimate a local full gradient unbiased, so that heavy calculation burden caused by calculation of full gradients of all local objective functions in each iteration is relieved; step 4, carrying out convergence analysis; the invention can greatly reduce the calculation cost of all the agents in the network, thereby reducing the communication and calculation pressure of the whole multi-agent system and having higher practicability.

Description

Distributed projection method considering communication time delay and based on variance reduction technology

Technical Field

The invention relates to the technical field of intelligent communication, in particular to a distributed projection method considering communication time delay and based on a variance reduction technology.

Background

In recent years, with the rapid development of high-tech technology, particularly, emerging fields such as cloud computing and big data have appeared. Distributed optimization theory and application are paid more and more attention and gradually permeate into various aspects of scientific research, engineering application and social life, the distributed optimization is a task of effectively realizing optimization through cooperative coordination among a plurality of intelligent agents, and the distributed optimization can be used for solving the large-scale complex optimization problem which is hard to be competent by a plurality of centralized algorithms. However, when the existing distributed optimization algorithm faces a convex optimization problem which is large in scale and has relatively complex local constraints, the gradient calculation amount is large, and the calculation burden of the intelligent agents in the network is heavy, so that the calculation and communication efficiency of a multi-intelligent-agent system is low, and the like, and therefore the requirements of people cannot be met.

Disclosure of Invention

The invention provides a distributed projection algorithm based on variance reduction technology, which can greatly reduce the calculation cost of the intelligent agents in the network, thereby reducing the communication and calculation pressure of the whole multi-intelligent-agent system.

The invention adopts the following technical scheme:

a distributed projection method based on variance reduction technology and considering communication delay comprises the following steps:

step 1, providing an original optimization problem model (1) for a multi-intelligent system simultaneously provided with local set constraint and local equality constraint;

step 2, equivalently converting the original optimization problem model (1) obtained in the step 1 into a convex optimization problem model (2) convenient for distribution processing;

step 3, a distributed projection algorithm (3) based on a variance reduction technology is provided to solve a convex optimization problem model (2) with constraints, namely, a local random average gradient is adopted to estimate a local full gradient unbiased, so that heavy calculation burden caused by calculation of full gradients of all local objective functions in each iteration is relieved;

step 4, carrying out convergence analysis on the distributed projection algorithm (3) based on the variance reduction technology, which is provided in the step 3;

as a preferred technical scheme of the invention, the specific construction process and form of the original optimization problem model (1) in the step 1 are as follows:

firstly: defining an agent cluster V ═ {1, …, m }, communication network edge set

And a contiguous matrix

Directed communication network

And simple network G has no self-loops; when agent (i, j) is E, a_ij＝a_ji> 0, otherwise a_ij＝a_ji0; degree of agent i is represented as

For diagonal matrix D ═ diag { D₁,d₂,...,d_mThe Laplacian matrix of the undirected network G is defined as

If the undirected network G is connected, then the Laplace matrix

Are symmetrical and semi-positive;

secondly, the original optimization problem model (1) is embodied as follows

In the above formula, the objective function

Representing samples of a real problem requiring processing, said

Representing a decision vector, q_iRepresenting the total number of local questions assigned to agent i; while in the above equation the local objective function is further decomposed into

Wherein

h∈{1,…,q_iIs a sub-function of the h local objective function; based on the above, define

For closed convex sets, and intersection X is non-empty, a column full rank matrix is defined

And

defining an optimal solution for the constrained convex optimization problem (1) as

As a preferred technical solution of the present invention, the convex optimization problem model (2) in step 2 has the following specific form:

wherein x_iPairing decision vectors for agent i

An estimated value of (d);

defining matrix B as a diagonal matrix with full rank column and diagonal elements of { B }₁,...,B_mI.e. that

Stacked vector

Order to

Is the Cartesian product; order to

Wherein

Represents a symbol of kronecker product; q. q.s_iIs represented by q_maxAnd q is_min(wherein q is_minNot less than 1, namely: each agent processes at least one sample); from the above statements, λ can be obtained_min(B^TB)q_minIs greater than 0; based on the convex optimization problem model (2), the following assumptions and definitions are made:

assume that 1: each local sub-targeting function f_i ^hAre both strongly convex and all have a risch continuous gradient. Namely: for all i ∈ V, h ∈ {1,. q_iAre multiplied by

The following formula holds:

wherein mu is more than 0 and less than or equal to L_f(ii) a Then, under the assumption that one is true, the globally optimal solution of the constrained convex optimization problem (2) is unique and expressed as

Assume 2: the undirected network G is connected;

assume that 3: for the

And

exist of

Wherein B is₀Is a positive integer.

Definition 1: defining global vectors to collect local variables x_i,k,y_i,k,w_i,k,g_i,kAnd

the following were used:

and a global vector x_kAnd w_kVersion of local delay:

then, at the k-th iteration, the communication delay

i, j ∈ V, determined by agent i and agent j simultaneously, so the global delay vector x^k[i]And w^k[i]Held only by agent i.

As a preferred technical solution of the present invention, a specific iterative process of the distributed projection algorithm (3) based on the variance reduction technique in step 3 is as follows:

initialization: for all agents i ∈ V, initialize x_i,0,

Setting: k is 0

For agent i 1

1: from the set { 1.,. q., }_iArbitrarily select a sample

2: the local random mean gradient is calculated as follows

3: is provided with

And store

4: updating variable x_i,k+1As follows

5: updating variable y_i,k+1As follows

y_i,k+1＝y_i,k+B_ix_i,k+1-b_i

6: updating variable w_i,k+1As follows

w_i,k+1＝w_i,k+βx_i,k+1

End of cycle

Setting k to k +1, and repeating the cycle until a stop condition is met;

wherein, the

As a sub-function of a local objective function

h∈{1,...,q_iThe iteration value at the kth iteration,

representing an n-dimensional real column vector.

As a preferable technical means of the present invention, the above

The iteration rule of (1) is as follows:

at iteration k, for agent i, a local random mean gradient is defined:

wherein

The calculation can be performed using the following iterations:

let F_kRepresenting the σ -algebra produced by the local random mean gradient at iteration k, the following equation can be obtained:

as a preferred technical solution of the present invention, the convergence analysis process in step 4 is as follows:

the following definitions are first made:

definition 2: for 0 < alpha < 1/lambda_max(L), defining a semi-positive definite matrix P as:

where W ═ I- α L is a positive definite matrix, then:

wherein the vector

And U^*＝[(x^*)^T,(y^*)^T,(w^*)^T]^T；

Then, combining hypotheses 1-3 and definitions 1-2 yields:

consider the variance reduction technique based distributed projection algorithm (3) and definition 2U under the assumption that 1-3 holds_kWith the definition of U, if the parameters η, Φ and ξ satisfy:

0＜φ＜2μ (21b)

then, the constant step α and the algorithm parameter β satisfy:

then the sequence U_k}_k≥0Is bounded and converged, then the sequence x_k}_k≥0Is uniquely converged on x^*In (1).

The invention has the beneficial effects that:

1. the algorithm provided by the invention estimates the local full gradient unbiased by means of the local random average gradient, so that the calculation cost of the intelligent agent in the network can be greatly reduced, the communication and calculation pressure of the whole multi-intelligent-agent system is reduced, less gradient calculation cost is spent when the same convergence precision is reached, and less communication times are required;

2. compared with the existing distributed random gradient optimization algorithm, the algorithm provided by the invention can solve the more complex optimization problem, namely: a convex optimization problem with both local set constraints and local equality constraints;

3. compared with most of the existing optimization algorithms considering communication time delay, the algorithm provided by the invention also considers the privacy of the local information of the intelligent agent while introducing the communication time delay, and has higher practical value.

Drawings

FIG. 1 is a undirected network connectivity diagram;

FIG. 2 is a graph comparing the performance of the algorithm of the present invention with that of the prior art;

FIG. 3 illustrates the instantaneous behavior of an agent without communication delay in accordance with the present invention;

FIG. 4 illustrates the instantaneous behavior of an agent in the presence of communication delays in accordance with the present invention;

Detailed Description

The invention will now be described in further detail with reference to the drawings and examples.

First, the following is defined for each symbol in the following formula:

a set of real numbers is represented as,

representing an n-dimensional real column vector,

the dimension-real matrix represents m × n;

the identity matrix is represented by I, the dimensions of which are determined by the context;

λ₂(. -) represents the minimum non-zero eigenvalue of a semi-positive definite matrix, λ, for a real symmetric matrix A_max(A) And λ_min(A) Respectively representing the maximum characteristic value and the minimum characteristic value;

and

respectively represent ith row and ith column of the matrix A;

is a kronecker product notation;

and

x^Tand A^TRepresents the transpose of vector x and the transpose of matrix a;

the Euclidean norm of the vector and the spectral norm of the matrix are uniformly expressed by | | · |;

for a semi-positive definite matrix

And a vector x of the sum vector x,

defining a scalar product<x,y>_A＝<x,Ay>And is and

an A matrix weighted norm representing vector x;

e [ x ] represents the expectation for a random variable x;

vector quantity

In closed convex set

Is represented as P_X[x]Namely: p_X[x]＝arg min_v∈X||v-x||。

The following embodiments of the present invention are described below:

the specific construction process and form of the original optimization problem model (1) in the step 1 are as follows:

And a contiguous matrix

Directed communication network

And simple network G has no self-loops;

when agent (i, j) is E, a_ij＝a_ji> 0, otherwise a_ij＝a_ji＝0；

Degree of agent i is represented as

For diagonal matrix D ═ diag { D₁,d₂,...,d_mThe Laplace matrix of the undirected network G is defined as

If the undirected network G is connected, then the Laplace matrix

Are symmetrical and semi-positive;

secondly, the original optimization problem model (1) is embodied as follows

In the above formula, the objective function

Representing samples of a real problem requiring processing, said

Representing a decision vector, q_iRepresenting the total number of local questions assigned to agent i;

while in the above equation the local objective function is further decomposed into

Wherein

h∈{1,...,q_iIs a sub-function of the h local objective function;

based on the above formula, define

For closed convex sets, and with the intersection X non-empty, a column full rank matrix is defined

And

The concrete form of the convex optimization problem model (2) in the step 2 is as follows:

wherein x_iPairing decision vectors for agent i

An estimated value of (d);

Stacked vector

Order to

Is the Cartesian product;

order to

q_iIs represented by q_maxAnd q is_min(wherein q is_minNot less than 1, namely: each agent processes at least one sample);

from the above statements, λ can be obtained_min(B^TB)q_min＞0；

Based on the convex optimization problem model (2), the following assumptions and definitions are made:

The following formula holds:

wherein mu is more than 0 and less than or equal to L_f；

Then, assuming a true condition, the globally optimal solution of the constrained convex optimization problem (2) is unique and expressed as

Assume 2: the undirected network G is connected;

assume that 3: for the

And

exist of

Wherein B is₀Is a positive integer.

the following were used:

and a global vector x_kAnd w_kVersion of local delay:

then, at the k-th iteration, the communication delay

The specific iterative process of the distributed projection algorithm (3) based on the variance reduction technology in the step 3 is as follows:

initialization: for all agents i ∈ V, initialize x_i,0,

Setting: k is 0

For agent i 1

1: from the set { 1.,. q., }_iArbitrarily select a sample

2: the local random mean gradient is calculated as follows

3: is provided with

And store

4: updating variable x_i,k+1As follows

5: updating variable y_i,k+1As follows

y_i,k+1＝y_i,k+B_ix_i,k+1-b_i

6: updating variable w_i,k+1As follows

w_i,k+1＝w_i,k+βx_i,k+1

End of cycle

Setting k to k +1, and repeating the cycle until a stop condition is met;

wherein, the

As a sub-function of a local objective function

h∈{1,...,q_iThe iteration value at the kth iteration,

representing an n-dimensional real column vector.

The above-mentioned

The iteration rule of (1) is as follows:

at iteration k, for agent i, a local random mean gradient is defined:

wherein

The calculation can be performed using the following iterations:

the convergence analysis process in step 4 is as follows:

firstly, in the practical application process, the following 7 arguments are adopted in the convergence analysis of the embodiment: introduction 1: for any non-empty closed convex set X, the following two inequalities hold

Wherein P is_X[·]Is a projection operator;

2, leading: if there is

And

globally optimal solution of constrained convex optimization problem (2) under the assumption that one is true

Exists exclusively and has:

wherein the constant step length alpha is more than 0, and the parameter beta is more than 0;

and 3, introduction: considering a sequence generated by a distributed projection algorithm (3) based on a variance reduction technique under the condition that 1-2 is assumed to be established

And { g_k}_k≥0To a

Is provided with

Wherein the auxiliary sequence { p_k}_k≥0Is defined as:

sequence { p_k}_k≥0Non-negative under the assumption that one is true;

and (4) introduction: considering a distributed projection algorithm (3) and a sequence (13) based on a variance reduction technique, for the assumption that 1 holds

Is provided with

And (5) introduction: consider a global vector v under the condition that assumption 3 holds_k＝[(v_1,k)^T,...,(v_m,k)^T]^TAnd its delayed version v_k[i]The method comprises the following steps:

wherein

For a given sequence v_t}_t≥0We give

Where l and d are two non-negative scalars; then, will

The superposition with respect to k from 0 to n can be obtained

And (6) introduction: considering a distributed projection algorithm (3) based on variance reduction technique under the condition that 1-3 are assumed to be true, the following inequality is true

Wherein

And W ═ I- α L, Φ, η are positive constants;

the specific demonstration process of the above conclusion is as follows:

according to definition 1, we present a shorthand form of the distributed projection algorithm (3) based on the variance reduction technique as follows:

y_i，k+1＝y_i，k+Bix_i，k+1-b_i (9b)

w_i，k+1＝w_i，k+βx_i，k+1 (9c)

wherein

And v is_i，kThe definition is as follows:

according to (9a), we have:

wherein the inequality uses the following equation:

(i) note that x_k+1＝P_X[v_k]Then, according to theorem 1, the following equation holds:

wherein

And is

(ii) Similar to [12], we have

Then continuing the analysis

Wherein eta and

for positive constants, the first inequality applies the young's inequality, the second inequality applies the function f is strongly convex and has a Lipschitz continuous gradient, and substituting (27) into (24) yields:

next to inner pair 2 alpha (x)_k+1-x^*)^TB^TB(x_k+1-x_k) And (3) processing:

substituting the result of equation (29) into equation (28), and obtaining the desired result:

from the formula (8), we know

Therefore, we next pair

To perform treatment

Wherein p is_kDefinition in (13), the first equation in (31) uses the standard variance decomposition E [ | | a-E [ a | F [ ]_k]||²|F_k]＝E[||a||²|F_k]-||E[a|F_k]||²The inequality uses the strong convexity sum of f

Liphoz continuity of (a). Next, substituting the conclusion of (31) into (30) can result in:

next, we will introduce an important relation,

where V is a semi-positive definite matrix. From this relationship, we can obtain the following three equations:

finally, the result of expression (33) is substituted into expression (32).

And (3) introduction 7: under the condition that the assumption 3 is established, the following two inequalities are established

In which ξ₁，ξ₂Are two arbitrary positive constants; it is noted that when there is no network determination,

and is thus determined.

The above conclusion is specifically demonstrated as follows:

we first demonstrated (19a) in lemma 7

The second inequality uses the lemma 5, the last inequality uses the young inequality, and xi₁Is a positive constant; (19b) the certification process of (19a) is similar to that of (19a), and thus will not be described in detail;

next, for the convenience of analysis, the following definitions are made:

where W ═ I- α L is a positive definite matrix, then:

wherein the vector

And U^*＝[(x^*)^T,(y^*)^T,(w^*)^T]^T；

Then combining hypotheses 1-3 and definitions 1-2 can conclude as follows:

0＜φ＜2μ (21b)

then, the constant step α and the algorithm parameter β satisfy:

The specific demonstration process is as follows:

for α > 0 and β > 0, substituting the results of theorem 7 into theorem 6 yields:

wherein

Is defined in theorem 6. Next, according to lemma 4, we will convert c (E [ p ]_k+1|F_k]-p_k) To (35)

Both ends were obtained:

we know the sequence p according to the lemma 3_kNot less than 0; thus, if η > 2L_f[L_fq_max+q_min(L_f-μ)]/λ_min(B^TB)q_minAnd 4 α q_maxL_fAnd/η ≦ c, then equation (36) may be rewritten as:

according to definition 2, if 0 < alpha < 1/lambda_max(L) and 0 < beta < 1, we have

To handle the first term on the right hand side of the (38) inequality number, we set ξ below₁＝ξ₂Xi, 0 < xi < 2 mu, and 0 < xi < (2 mu-phi)/(1 + beta), and we next define an nonnegative constant

Based on this definition, we can rewrite equation (38) as:

summing (39) from 0 to n with respect to k, yields:

under conditions (21) and (22), we define a semi-positive definite matrix

Thus, the inequality (40) can be rewritten as:

when n approaches infinity, we have

The above formula indicates that the right side of formula (39) is harmonizable. Thus, the sequence

Internal accumulation<·,·>_PFitting Fej pir monotonous; we can directly derive the sequence

Is bounded and converged; thus, the sequence { U }_k}_k≥0Is bounded and converged; finally we can get the sequence x_k}_k≥0Converge on x^*(ii) a Under the condition that the assumption 1 holds, we know the global optimal solution x^*Are present only.

Detailed description of the preferred embodiments example 1

To demonstrate the effectiveness of the proposed algorithm, we considered using a multi-agent network with m-10 to solve the following least squares optimization problem:

wherein

And is

The abscissa represents the amount of calculation for calculating all samples at once. Let us set n 10, p _i1, and the overall sample is Q1000; the total samples are randomly and evenly distributed among agents in the network; thus, each agent i ∈ V needs to process q_iQ/m samples; local parameter

And

respectively in [ -1,1 [)]And [ -n, n [ -n]Randomly selecting the two groups; the equality constraint is defined as

When j is i, B_iIs 1, otherwise is 0; b_iIs always 1; the local set constraint for agent i is defined as X_i＝[-1_n,1_n]In which 1 is_nRepresenting a column vector of all 1 dimensions n.

The results of the network application and the application of the application embodiment are shown in fig. 1 to 4, and specifically include:

fig. 1 shows a diagram of an experimental communication network, in which the communication rate of the network is 0.5;

fig. 2 is a comparison graph of performance of the algorithm of the present invention and the algorithm of the prior art, wherein the algorithm of the prior art is the algorithm disclosed in the documents "q.liu, s.yang, and y.hong," structured senses algorithms with fixed step size for distributed Control over multi-agent networks, "IEEE Transactions on Automatic Control, vol.62, No.8, pp.4259-4265,2017", and it is apparent from fig. 2 that the performance of the algorithm proposed by the present invention is optimal, that is: the convergence rate is fastest;

FIG. 3 is the instantaneous behavior of agent Nos. 2, 4, 6, 8, 10 without communication delay;

FIG. 4 is the instantaneous behavior of agent Nos. 2, 4, 6, 8, 10 in the presence of communication delay (and maximum communication delay per iteration of 10) in accordance with the present invention;

it can be known from the combination of fig. 3 and fig. 4 that the communication delay has a large influence on the instantaneous behavior of the intelligent agent.

Finally, it should be noted that: these embodiments are merely illustrative of the present invention and do not limit the scope of the present invention. Moreover, it will be apparent to those skilled in the art that various other changes and modifications can be made based on the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the scope of the invention.

Claims

1. A distributed projection method based on variance reduction technology and considering communication delay comprises the following steps:

step 1, providing an original optimization problem model (1) aiming at a multi-intelligent system simultaneously provided with local set constraint and local equality constraint;

step 2, equivalently converting the original optimization problem model (1) obtained in the step 1 into a convex optimization problem model (2) convenient for distributed processing;

and 4, carrying out convergence analysis on the distributed projection algorithm (3) based on the variance reduction technology, which is provided in the step 3.

2. The distributed projection method based on variance reduction technology considering communication delay as claimed in claim 1, wherein:

And adjacency matrix

Directed communication network

And simple network G has no self-loops;

when agent (i, j) is E, a_ij＝a_ji> 0, otherwise a_ij＝a_ji＝0；

Degree of agent i is represented as

If the undirected network G is connected, then the Laplace matrix

Are symmetrical and semi-positive;

secondly, the original optimization problem model (1) is embodied as follows

In the above formula, the objective function

Representing samples of a real problem requiring processing, said

Wherein

Is a sub-function of the h local objective function;

based on the above formula, define

And

3. The distributed projection method based on variance reduction technology considering communication delay as claimed in claim 2, wherein:

wherein x_iPairing decision vectors for agent i

An estimated value of (d);

Stacked vector

Order to

Is the Cartesian product; note the book

Wherein

Represents a symbol of kronecker product; q. q.s_iRespectively maximum and minimum values ofShown as q_maxAnd q is_min(wherein q is_minNot less than 1, namely: each agent processes at least one sample); from the above statements, λ can be obtained_min(B^TB)q_min＞0；

The following formula holds:

wherein mu is more than 0 and less than or equal to L_f；

Assume 2: the undirected network G is connected;

assume that 3: for the

And

exist of

Wherein B is₀Is a positive integer.

the following were used:

and a global vector x_kAnd w_kVersion of local delay:

then, at the k-th iteration, the communication delay

Determined by agent i and agent j simultaneously, and thus, the global delay vector x^k[i]And w^k[i]Held only by agent i.

4. The distributed projection method based on variance reduction technology considering communication delay as claimed in claim 3, wherein:

initialization: for all agents i e V, initialize

Setting: k is 0

For agent i 1

1: from the set { 1.,. q., }_iArbitrarily select a sample

2: the local random mean gradient is calculated as follows

3: is provided with

And store

4: updating variable x_i,k+1As follows

5: updating variable y_i,k+1As follows

y_i,k+1＝y_i,k+B_ix_i,k+1-b_i

6: updating variable w_i,k+1As follows

w_i,k+1＝w_i,k+βx_i,k+1

End of cycle

Setting k to k +1, and repeating the cycle until a stop condition is met;

wherein, the

As a sub-function of a local objective function

The iteration value at the k-th iteration,

representing an n-dimensional real column vector.

5. The distributed projection method based on variance reduction technology considering communication delay as claimed in claim 4, wherein:

the above-mentioned

The iteration rule of (1) is as follows:

at iteration k, for agent i, a local random mean gradient is defined:

wherein

The calculation can be performed using the following iterations:

6. the distributed projection method based on variance reduction technology considering communication delay as claimed in claim 5, wherein:

the convergence analysis process in step 4 is as follows:

the following definitions are first made:

where W ═ I- α L is a positive definite matrix, then:

wherein the vector

And U^*＝[(x^*)^T,(y^*)^T,(w^*)^T]^T；

Then, combining hypotheses 1-3 and definitions 1-2 yields:

consider the variance reduction technique based distributed projection algorithm (3) and definition 2U under the assumption that 1-3 holds_kAnd U^*If the parameters η, φ and ξ satisfy:

0＜φ＜2μ (2]b)

then, the constant step α and the algorithm parameter β satisfy: