AU2020100180A4 - Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem - Google Patents
Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem Download PDFInfo
- Publication number
- AU2020100180A4 AU2020100180A4 AU2020100180A AU2020100180A AU2020100180A4 AU 2020100180 A4 AU2020100180 A4 AU 2020100180A4 AU 2020100180 A AU2020100180 A AU 2020100180A AU 2020100180 A AU2020100180 A AU 2020100180A AU 2020100180 A4 AU2020100180 A4 AU 2020100180A4
- Authority
- AU
- Australia
- Prior art keywords
- agent
- agents
- algorithm
- rpk
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Computer Hardware Design (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Abstract With advent of the large-scale network or data era, traditional synchronous algorithms, due to the requirement of the clock synchronization, is not suitable for handling large-scale network tasks. In view of this, this patent presents an effective doubly-accelerated distributed asynchronous algorithm based on heavy-ball method and Nesterov gradient method for solving general convex optimization problems, which are defined in a fixed directed multi-node network system. The algorithm mainly comprises six stages including variable initialization; picking delay value and activated node; eliminating outdated information; computing gradient; exchanging information; updating variable. The algorithm set forth in the present invention adopts a general asynchronous scheme, where agents can communicate with their in-neighbors at any time without any coordina tion or scheduling and perform their local computations by using outdated information from their in-neighbors. Therefore, the algorithm highly reduces idle time of communication links, mitigates congestion of communication and memory access, saves power, and has more fault-tolerant and robust. The present invention has broad application in large-scale machine learning and network information processing. Start Select the global objective function Each node initializes local variables Each node set k-0 and and maximum number of iteration, kmax N .k Compute system parameters Pick delay value dk Eliminate the old variables Select a step size and a momentum parameter according to the computing parameters Each activated node updates the variables and computes the gradient Each activated node sends variables to its out-neighbor nodes Each activated node sets k-k+ k> k x?N End Fig. 4
Description
1. Technical Field
The present invention relates to a field of large-scale machine learning and network information processing.
2. Background and Purpose
Early, centralized algorithms were widely concerned by researchers because of the excellent performance of master agents. In these algorithms, the master agents run the optimization algorithm gathering the needed information from slave agents, which only compute their local task. One obvious flaw of this kind of algorithm is that once the master agent is damaged, the whole network will not work. With the development of large-scale network or data, traditional centralized algorithms are no longer capable of solving large-scale computing problems. On the contrary, distributed optimization algorithms show great potential in large-scale network data computing, which have been widely used in network system control, machine learning, network information processing and source allocation. Due to this, people’s interest is shift to the design of distributed optimization algorithm. In this class of problems, each agent optimizes the global objective function by operating on its local objective functions and only communicating with its in-neighbors. Specifically, consider a variable x G R and a strongly connected network of m agents which cooperatively solve the following optimization problem:
m min f (5) = (1) i=l where each agent i has only access to a local objective function f, : Rn —> R. To solve problem (1), there are two types of distributed algorithms: synchronous and asynchronous algorithms. The distinct feature between them is that agents in asynchronous algorithm do not wait for updates from other agents but simply compute updates using its currently available information. We further use Fig. 2 to elaborate the striking differences between synchronous and asynchronous algorithm by taking the directed graph in Fig. 1 as an example. Obviously, all the agents in a synchronous algorithm need to agree on the update time t(/c) where k denotes the number of updates of the network, which usually needs a global clock or synchronization of all nodes. It is worthy mentioning that the clock synchronization is not easy for a large-scale system and has been studied for quite a long time.
However, many of the aforementioned applications give rise to extremely large-scale networks or data. From this point of view, we naturally call for asynchronous solution methods. In fact, compared with synchronous communication, asynchronous communication has the following ad2
2020100180 05 Feb 2020 vantages, such as reducing idle time of communication links, mitigating congestion of communication and memory access, saving power, and making the algorithm more fault-tolerant and robust. In addition, communication devices that support asynchronous communication are relatively simple and inexpensive. Thus, we develop an effective doubly-accelerated distributed asynchronous algorithm by combining the gradient tacking technologies with the sum-push (not push-sum) technologies. In each iteration, activated agents update their variables by communicating with their in-neighbors and the other agents keep their variables unchanged to next iteration.
3. notation
We use lowercase italics to denote column vectors and use uppercase italics to denote matrices. Let 1, gj, I, and O denote the column vector of all ones, the /-th canonical vector, the identity matrix, and the zero matrix, respectively, whose dimensions can be deduced from the context. For two arbitrary matrices, X and Y, we use X®Y to denote its Kronecker product. For arbitrary set V, let | V| represent its cardinality. Given an arbitrary vector x, let x and x indicate the largest element and the smallest element of x respectively, and the diag(rr) denotes the diagonal matrix, whose diagonal elements equal to the vector x. The spectral radius of a matrix, T, is represented by p(T). For a primitive, row-stochastic matrix, A, we denote its left and right eigenvectors corresponding to the eigenvalue of 1 by and 1, respectively, such that τη'1 = 1; Similarly, for a primitive, columnstochastic matrix, B, we denote its left and right eigenvectors corresponding to the eigenvalue of 1 by 1 and tfc, respectively, such that 7rjl = 1. For a matrix X, we denote X.^ as its infinite power, i.e., Xqo = Xk. According to the Perron-Frobenius theorem, we have A = Itt'1 and = ttc1t. We use || || for both vectors and matrices, where it, in the former case, represents the Euclidean norm and it is the spectral norm in the latter case. The set of nonnegative (resp. positive) integer is denoted by No (resp. N).
4. Communication Network Model
Consider a m agents connected directed original graph, Q = (V, £), where V = {1,2,..., m} is the set of agents, and £ is the set of directed edges, (i, j), i, j E V, such that agent j can receive information from agent i. Let Λζζ = {j \(j, i) E £} denote the set of in-neighbors of agent i and A/”/ut = O l (fi j) G £ } denote the set of out-neighbors of agent i. Then, we conduct the augmented graph by adding virtual agents to the original graph Q = (V, £). Specifically, we add an ordered set of virtual agents, denoted by να^,να^, , ναto associate to each edge (j, i) E £, where each virtual agent corresponds to a possible delay value. That is to say, these virtual agents store the information based on its associated delay value, which implies that the information has been
2020100180 05 Feb 2020 generated by agent j for i but not used by i yet. Here, we further use a simple example in Fig. 3 to illustrate this augmented graph. Agents in the original graph Q are named computing agents while the virtual agents are named noncomputing agents. The set of computing and noncomputing agents is defined as V = V U {va^ \(j, i) G £, d = 0,1, , D }, and its cardinality is denoted by S' = | V| = m + (D + 1) |£|. We reconsider this augmented directed graph. Each computing agent j only sends information to the noncomputing agent va’^ with (j, i) G £. Each noncomputing agent va^ can send information to the next noncomputing agent or the computing agent.
5. Problem Formulation
We rewrite the problem (1) with the following form:
m min Fix) = —fii.x1'), (2) i=l subject to x1 = x-i. i,j G V, where each local function f : Rn —> R is only known to agent i. Each local objective function, fi : Rn —> R, i = 1, 2 , m, is μ-strongly convex with Ly-Lipschitz continuous gradient. That is, for any i and xux2 G Rn, ffxf - ffx2) < (Vfi(rci))T(rri-x2) - ^||iri-rr2||2 and ||Λ(τι)fi(x2) || C Lf ||rci — x21|, where Lf > μ > 0. The global optimal solutions to problems (1) and (2) are denoted by x* and x*, respectively, where x* = 1 ® x*.
6. Detailed Implementation Description
Fig. 4 is an algorithm flowchart of the present invention. As shown in Fig. 4, the distributed asynchronous optimization algorithm comprises the following steps:
6.1. Initializing Variables
Step 1: Each agent i G V sets k = 0 and sets a stopping criterion.
Step 2: Each agent i G V initializes with χ£γ = 0, Xq G Rn, Sq G Rn, μθ = filsf), Vq = 0, for all i G V; pf = 0, for all j G Λζζ and i G V; = 0, for all t = —D, , 0.
6.2. Conducting augmented weight matrices
Step 3: According to the original graph, firstly conduct row-stochastic matrix A and columnstochastic matrix B. Meanwhile, introduce the matrix W = {w'J} to denote either A or B. There exists w > 0 such that w f w and f w, for all i G V and for all (j, i) G £, respectively. Otherwise, we set w2·7 = 0.
2020100180 05 Feb 2020
Step 4: Based on the original graph, conduct a augmented row-stochastic matrix Ak as follows:
alklk, if p = r = ik;
, if p = ik, r = j + (d3 k + l)m;
1, ϊΐp = m G {1,2, ,2m}\{ik,ik + τη};
< = if p G {2m + 1,2m + 2, , (D + 2)m}
U{ifc + m} and r = p — m;
otherwise.
Step 5: Conduct a augmented column-stochastic matrix Bk in two steps. One is establishing the transition matrix of the sum step as follows:
if r G — t^3 < d < D} and p = ik;
if p G V \{va^ \k — rl k 3 < d < D} and r = p;
0, otherwise.
The other is establishing the transition matrix of the push step as follows:
lPk, if r = ik and p = va^,j G Af^i, blklk, if r = p = ik,
1,
1,
1, o, if r = p G V\ik] if r = να^,ρ = rA... (i,f) G £, 0 < d < D - 1;
if r = p = va3^, (i,f) G £;
otherwise.
Thus, there holds Bk = PkSk by combining the sum step with the push step.
6.3. Selecting Parameters
Step 6: According to the graph and the properties of row- and column-stochastic matrices, select < T < oo and 0 < D < oo. Note that within T iterations all agents update, and D represent 1 the maximum delay value. Then, compute parameters p = (1 — K1 G (0,1) with K3 = (2m -l)T + mD,p = wK^ G (0,1), G = C2 = and C4 = ^2C3
2020100180 05 Feb 2020 with C3 = ά p
Step 7: Based on strongly-convex coefficient μ, Lipschitz-continuous coefficient Lf, using the small gain theorem, compute parameters di,i= 1,2, , 16, are
lai = Cirriy/mLf, | a3 = x/3 + 1, | a9 = CymLfX, <213 = 2my/mLf, |
a2 = (1 + p) Ci, | a6 = y/3my/mLf, | αχό = CymLf, (214 = my/mLf, |
a3 = Cim, | d7 = χ/3772, | all = a15 = 2m, |
<24 = CimLf, | as = VSrnLf, | (212 = rnLf, (21g = nr. |
and λ G (max {p, p + CiLfm^/ma, (χ/3 + 1) β, 1 — μη2a + mAaLf] , 1). In addition, define £(», Δα) = 1 — flnft -f Οι2Δα.
6.4. Computing step-size and momentum parameter
Step 8: According to strongly-convex coefficient μ, Lipschitz-continuous coefficient Lf, select the largest step-size a, the gap between the largest step-size and the smallest step-size Δα, and the largest momentum parameter β as follows:
_ .( ρω3- ρ2ω1 ω2 - α5^ι 1 1 - ρ Ί (J < a < mm < --------------------------,----------------------, ,> , (l\pjJ\ -J- Η- Δ)+Ί I +'3 “I- ^8++ ?? (p J
C < mm <----------------------,--[ <7 Ι2<4 + Οχ3iUι + 7/(212 {1 ραμ — ρ2αμ — αιρω3α — α3ρω3α — α^ρω^α —,, (¾ 7/2 <2 (ω>4 (1 — C (α, Δα)) — αμα^α \ —ω3α1βα + ω3αι3Δα — ω1α13Δα y (1 — ρ) ω3 — α9ω2
2ω2 ’ α9ω2 + α10ω2’
0^2 — — (Ιθϋ^ιΟί — (ΙγΟΟβΟί —1 α5ω2J where αμ, ω2, ω3, ω4 are arbitrary constants satisfy that 0<ω3<^β,0<ω2< ^fp3, and > 6113^1+^15^3 4 6t10*
6.5. Selecting activated agents and delay
Step 9: Pick (?fc, df) with dk = (άβ) ^ik, where ik represents that the agent i is activated at time k, and d3 k denotes the delay value which satisfies 0 < D < oo.
6.6. eliminating outdated information
Step 10: Set τβ3 = max(r^fcJ 1+ — +) to eliminate the outdated information. Note that records the generated time of the cumulative-mass variable p''L where p'' J with j G f/β captures
2020100180 05 Feb 2020 the cumulative information generated by agent j up to the current time and sent to agent ik.
6.7. Exchanging Information
Step 11: Each agent ik updates variable according to n/fc rPk I f^k (rPk rPk λ ^+1 ~Xk a Uk ' P \xk Xk-lJ·
Step 12: Each agent ik updates variable ν^+1 according to Xk+1 =α^νϊ+ι + 52 + ^fc(4fc - 4Li)· k
Step 13: To accelerated algorithm, introduce the auxiliary variable sk which, for each agent ik, is updated by
Pk rPk I (fik (rPk rPk\ bk+l ~^+1 ' p mi xk )·
Step 14: Introduce two auxiliary variables to prevent the packet loss. One is the cumulative-mass variable /T'·’ with j G which captures the cumulative information generated by agent j up to the current time and sent to agent ik. The other is the buffer variable /T' J with j G ΛΑ which stores the information sent from agent j to agent i and used by agent i in its last update. Then, adopt sum-push to achieve gradient tracking
514.1 Sum step:
yf =Λ + Σ (<;L -4U) + W.M1) k
514.2 Push step:
pP =Pk + ^ykk+h·
Step 15: Update the mass-buffer as follows:
p1+i = pp'k
Step 16: The remaining inactive agents keep the value of the last moment. Set k = k + 1 and go to Step 6 until a predefined stopping criterion is satisfied, e.g., k f /cmax where /cmax is the maximum number of iterations.
7. Innovation
2020100180 05 Feb 2020
The innovative points of the present invention are as follows:
1) this patent proposes an effective asynchronous scheme to execute distributed asynchronous optimization algorithm, which highly reduces idle time of communication links, mitigates congestion of communication and memory access, saves power, and has more fault-tolerant and robust.
2) the proposed asynchronous algorithm prevents the packet loss and is easily applied to the largescale machine learning and network information processing.
3) the proposed asynchronous algorithm employs uncoordinated constant step-sizes, which increases the flexibility of it.
4) the proposed asynchronous algorithm can linearly converge to the global optimal solution when the step-size and the momentum parameter are positive and do not exceed some explicit upper bound.
8. Simulations
8.1. Binary classification
To verify the effectiveness of the algorithm, we test it on a robust classification problem as follows:
1 m +11 mi Σ Σ + AllVM·)II2, x 27 1 1 7=1 where D = (JZL ^e set °f ^e data distributed across the agents, with each agent i owning 77' and satisfying 77' Π Dl = 0, for arbitrary i ψ I. In addition, the training data, id and y'J e {-i, i}, are the feature vector and the associated label of the ./-th sample in 77, respectively. In the last term, we set λ = 1, and /+(·) is a linear function with parameter x. Note that V is the loss function, which reads as follows:
r
0, if r > 1
V(r) = < |r 3 — |r 4- 1 ? if — 1 < r < 1
1, if r < — 1
Data: We use the Cleveland Heart Disease Data set with 14 features, preprocessing it by deleting observations with missing entries, scaling features between 0 and 1, and distributing the data to agents evenly. We set = e±5x + Σά=ι
2020100180 05 Feb 2020
Graph: We consider a directed graph with m = 30 agents as shown in Fig. 5, where each agent has 7 out-neighbors. One of out-neighbors links all the agents within a directed cycle while the others are chosen uniformly and randomly.
Asynchronous model: We mainly consider three activation rules by concatenating random rounds: I) Agents are awakened according to a cyclic rule where the order is randomly permuted at the beginning of each round; II) Activation lists are generated by concatenating random rounds. We first set each agent appearing exactly once and sample agents uniformly for the remaining spots within a round. Then a random shuffle of the agents order is performed on each round; III) Agents are activated by a pure random strategy in all iterations. To generate one round, we first sample its length uniformly from the interval [m, T] with T = 90, and each transmitted message has traveling time which is sampled uniformly from the interval [0, Dt\ with Dt = 90.
Fig. 6 depicts the evolution of the residual ;7\/Σ=ι ||afy — rf*||2 while Fig. 7 depicts the effects of three activated rules. Furthermore, Fig. 8 shows the effects of different step-sizes, and Fig. 9 shows the effects of different momentum parameters. According to Fig. 7 and Fig. 8, we can know that the practical upper bound of the constant step-size and the momentum parameter are around a = 0.865 and β = 0.087, respectively. Moreover, the best performance of the proposed algorithm is achieved when a = 0.70 and β = 0.08.
9. Brief Description of The Drawings
Fig. 1 depicts a simple directed graph with 3 computing.
Fig. 2 shows the difference between the synchronous algorithm and the asynchronous algorithm.
Fig. 3 shows a simple generating process of augmented graph.
Fig. 4 is a flowchart of the distributed asynchronous optimization algorithm.
Fig. 5 depicts a directed strongly-connected network with 30 agents.
Fig. 6 depicts the evolution of the residual with running times.
Fig. 7 shows the effects of the three activated rules on the proposed.
Fig. 8 depicts the evolution of residuals at the 45000th iteration with different constant step-sizes. Fig. 9 depicts the evolution of residuals at the 40000th iteration with different momentum parameter values.
Claims (2)
- The claims defining the invention are as follows:1. An effective doubly-accelerated distributed asynchronous optimization algorithm/./. Initializing VariablesStep 1: Each agent i G V sets k = 0 and sets a stopping criterion.Step 2: Each agent i G V initializes with χι_γ = 0, x* 2 * * *0 G R, 8θ G R, ρθ = V/,(sq), νθ = 0, for all i G V; ρθ = 0, for all j G A//n and i G V; = 0, for all t = —D, , 0.1.2. Conducting augmented weight matricesStep 3: According to the original graph, firstly conduct row-stochastic matrix A and columnstochastic matrix B. Meanwhile, introduce the matrix W = {w'·7} to denote either A or B. There exists w > 0 such that w w and ?C·7 > w, for all i G V and for all (j, i) G £, respectively.Otherwise, we set w2·7 = 0.Step 4: Based on the original graph, conduct a augmented row-stochastic matrix Ak as follows:ifp = r = ik;aV·7, ifp = ik,rAp k r =1, ifp = m G {1,2, , 2m}\{ik, ik + m};1, if p 6 {2m + 1,2m + 2, , (D + 2)m}U{fy + m} and r = p — m;otherwise.Step 5: Conduct a augmented column-stochastic matrix Bk in two steps. One is establishing the transition matrix of the sum step as follows:Qpr _1,1, if r G {va^|k — < d < D} and p = ik;if p G V \{w^\k — Tk kj < d < D} and r = p;0, otherwise.ίο2020100180 05 Feb 2020The other is establishing the transition matrix of the push step as follows:
if r = ik and p = υα^,β G if r = p = ik- 1, if r = p G V\ik', Ff = < 1, if r = να^,ρ = (i,j) G £, 0 < d < D - 1; 1, if r = p = vayDy (i,f) G 5; ο, otherwise. Thus, there holds Bk = PkSk by combining the sum step with the push step.1.3. Selecting ParametersStep 6: According to the graph and the properties of row- and column-stochastic matrices, select0 < T < oo and 0 < D < oo. Note that within T iterations all agents update, and D represent 1 the maximum delay value. Then, compute parameters p = (1 — K1 G (0,1) with Ki = (2m -l)T + mD,p = wK^ G (0,1), C. = = 2|±^, and C4 = ^26¾ with C3 = 0 pStep 7: Based on strongly-convex coefficient μ, Lipschitz-continuous coefficient Ly, using the small gain theorem, compute parameters a,i,i= 1,2, , 16, arelai = Cimy/mLy, 05 — a/3 + 1, ag = C4mLy\, oi3 = 2my/mLy, a2 = (1 + p) Ci, a6 = s/3my/mLy, aw = C4mLy, Oi4 = my/mLy, o3 = Cim, 07 = s/3m, On = μη2, Uis = 2m, 04 = CimLy, a8 = y/3mLy, o32 = mLy, Οχθ = m. and λ G (max {p, p + CiLymy/ma, (^/3 + 1) β, 1 — μη2a + mAaLy} , 1). In addition, defineΔα) = 1 — O| |O + Ο,γ2^α·1.4. Computing step-size and momentum parameterStep 8: According to strongly-convex coefficient μ, Lipschitz-continuous coefficient Ly, select the largest step-size a, the gap between the largest step-size and the smallest step-size Δα, and the largest momentum parameter β as follows:{ρωγ — ρ2ϋμ ω2 — α5ωι 1-----------------------------------------------------> ---------------------------------------------> 9 τ 1 α-^ρωγ + α8ρω8 + α^ρω^ αθωχ + + α8ω4 p2Ly2020100180 05 Feb 2020 λ d^yUJ^Oi — — d\ffJf(f Αμύ! 10 C Δα < mm <------------------------,> , [ <7 12 >’4 + «13^1 + α 15ω3 {1 ρωγ — ρ2ωγ — αγρωγα — α3ρω3α — α^ρω^α —,, <7 5(12^2 (ω>4 (1 — £ (a, Δα)) — ωρι^α \ + ω3αΐ5Δα — ω!α13Δα y (ι ρ) ω., α9ω2 - 2ω2 ' α9ω2 + αι0ω2 ’ ϋ^2 — (15^1 — — dfOJ^Cf — d^OJ^Cf 1 α5ω2 J where ω3, ω2, ω3, ω4 are arbitrary constants satisfy that 0 < ω, < pj, 0 < ω2 < and > ^13^1+^15^3 4 «10 *1.5. Selecting activated agents and delayStep 9: Pick (?fc, dk~) with dk = ((¾) ,pk, where ik represents that the agent i is activated at time d v in k, and d3 k denotes the delay value which satisfies 0 « <11 < D < oo.1.6. eliminating outdated informationStep 10: Set rfJ = max^, k — df) to eliminate the outdated information. Note that τ'”' records the generated time of the cumulative-mass variable plkf where with j G Λ?’ captures the cumulative information generated by agent j up to the current time and sent to agent ik.1.7. Exchanging InformationStep 11: Each agent ik updates variable vk+1 according to rPk —rPk rv^koSk I R^klrPk rPk λ ^+1 ~Xk a Uk 'P \Xk h-ll·Step 12: Each agent ik updates variable according to ++=^++ + Σ ux? kStep 13: To accelerated algorithm, introduce the auxiliary variable sk which, for each agent ik, is updated byPk --rPk I R^k (rPk _ rPk \ ' P ^k )’Step 14: Introduce two auxiliary variables to prevent the packet loss. One is the cumulative-mass variable plk:> with j G Iff which captures the cumulative information generated by agent j up to the current time and sent to agent ik. The other is the buffer variable p1^ with j G J\ff which2020100180 05 Feb 2020 stores the information sent from agent j to agent i and used by agent i in its last update. Then, adopt sum-push to achieve gradient tracking514.1 Sum step:«&J =Λ + Σ - d1) + - V/.J4?).k514.2 Push step:ylk +i =blklkylk+^ =rtk + vik^+¥Step 15: Update the mass-buffer as follows:rf’kj __ rk+1 ~ ’ ‘ kStep 16: The remaining inactive agents keep the value of the last moment. Set k = k + 1 and go toStep 6 until a predefined stopping criterion is satisfied, e.g., k kmax where kmax is the maximum number of iterations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020100180A AU2020100180A4 (en) | 2020-02-05 | 2020-02-05 | Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020100180A AU2020100180A4 (en) | 2020-02-05 | 2020-02-05 | Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2020100180A4 true AU2020100180A4 (en) | 2020-03-12 |
Family
ID=69724764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2020100180A Ceased AU2020100180A4 (en) | 2020-02-05 | 2020-02-05 | Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2020100180A4 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582494A (en) * | 2020-04-17 | 2020-08-25 | 浙江大学 | Hybrid distributed machine learning updating method based on delay processing |
-
2020
- 2020-02-05 AU AU2020100180A patent/AU2020100180A4/en not_active Ceased
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111582494A (en) * | 2020-04-17 | 2020-08-25 | 浙江大学 | Hybrid distributed machine learning updating method based on delay processing |
CN111582494B (en) * | 2020-04-17 | 2023-07-07 | 浙江大学 | Mixed distributed machine learning updating method based on delay processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017124809A1 (en) | Particle swarm optimization method and system based on gpu operation of mobile terminal | |
Li et al. | Distributed average consensus control in networks of agents using outdated states | |
Li et al. | A novel intrusion detection method for internet of things | |
AU2020100180A4 (en) | Effective Doubly-Accelerated Distributed Asynchronous Strategy for General Convex Optimization Problem | |
CN111953515B (en) | Double-acceleration distributed asynchronous optimization method based on Nesterov gradient method and gravity method | |
CN104615703A (en) | RDF data distributed parallel inference method combined with Rete algorithm | |
CN112529195A (en) | Quantum entanglement detection method and device, electronic device and storage medium | |
CN116260130A (en) | Micro-grid group power cooperative scheduling method and device | |
Gandhi et al. | Performance comparison of parallel graph coloring algorithms on bsp model using hadoop | |
CN116760762B (en) | Decentralised ad hoc network method and device | |
US20130067113A1 (en) | Method of optimizing routing in a cluster comprising static communication links and computer program implementing that method | |
Klasing et al. | Taking advantage of symmetries: gathering of asynchronous oblivious robots on a ring | |
CN116702925A (en) | Distributed random gradient optimization method and system based on event triggering mechanism | |
Bashir et al. | Minimal supervisory structure for flexible manufacturing systems using Petri nets | |
CN115456184B (en) | Quantum circuit processing method, quantum state preparation device, quantum state preparation equipment and quantum state preparation medium | |
Li et al. | Implementing an attack graph generator in CUDA | |
WO2022057459A1 (en) | Tensorcore-based int4 data type processing method and system, device, and medium | |
CN105550319B (en) | The optimization method of persistence under a kind of cluster Consistency service high concurrent | |
US10824482B1 (en) | Remote operations application programming interface | |
Gassen et al. | Graph color minimization using neural networks | |
Li et al. | Row-Stochastic Matrices Based Distributed Optimization Algorithm With Uncoordinated Step-Sizes | |
KR20190143115A (en) | Method for managing data based on blockchain | |
CN117938543B (en) | Network dynamic defense method and system based on topology difference measurement | |
CN116362341B (en) | Quantum device unitary transformation degree determining method and device, electronic device and medium | |
Shang-Guan et al. | A Fast Distributed Principal Component Analysis with Variance Reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |