AU2020101959A4 - Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient - Google Patents

Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient Download PDF

Info

Publication number
AU2020101959A4
AU2020101959A4 AU2020101959A AU2020101959A AU2020101959A4 AU 2020101959 A4 AU2020101959 A4 AU 2020101959A4 AU 2020101959 A AU2020101959 A AU 2020101959A AU 2020101959 A AU2020101959 A AU 2020101959A AU 2020101959 A4 AU2020101959 A4 AU 2020101959A4
Authority
AU
Australia
Prior art keywords
node
triggering
event
estimator
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020101959A
Inventor
Zhengran Cao
Qingguo Lü
Keke ZHANG
Lifeng Zheng
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to AU2020101959A priority Critical patent/AU2020101959A4/en
Application granted granted Critical
Publication of AU2020101959A4 publication Critical patent/AU2020101959A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Through utilizing the momentum acceleration mechanism, a novel decentralized accelerated stochastic double-efficient algorithm based on stochastic gradient-tracking technique is pro posed to solve the problem of decentralized optimization to minimize a finite-sum of convex cost functions over the nodes of networks where each cost function is further considered as the average of several constituent functions. The algorithm mainly includes five parts: Variable initialization, parameter selection, information exchange, event-triggering commu nication strategy, and variable updates. By adopting the event-triggering strategy and the variance-reduction technique, the proposed algorithm which is set forth in the present inven tion realizes better communication efficiency and achieves higher computation efficiency in comparison with related works. Presuming the smoothness and strong convexity cost func tions, the proposed algorithm with well-selected constant step-size can converge in mean to the exact optimal solution. At the same time, linear convergence rate is achieved if each constituent function is strongly convex and smooth. Furthermore, under certain conditions the time interval between two successive triggering instants for each node is proved to larger than the iteration interval. The present invention plays an important role in many large-scale machine learning tasks where the samples of a training dataset are randomly decentralized across multiple computing nodes. 1/2 Start Each node sets t=O and the maximum number of iteration, tmax 4 Each node initializes local variables and selects fixed step-size as well as the momentum coefficient according to the network and the problem Each node updates the decision estimator and the accelerated estimator according to the event triggering strategy and the momentum acceleration mechanism I Each node computes the stochastic gradient according to the variance reduction technique Each node updates the gradient auxiliary estimator according to the stochastic gradient-tracking method Each node tests the triggering condition, broadcasts the decision estimator and the gradient auxiliary estimator, and updates update the latest triggering time Each node sets t-t+1 No t>tma I~Yes7 Figure 1

Description

1/2
Start
Each node sets t=O and the maximum number of iteration, tmax 4 Each node initializes local variables and selects fixed step-size as well as the momentum coefficient according to the network and the problem
Each node updates the decision estimator and the accelerated estimator according to the event triggering strategy and the momentum acceleration mechanism I Each node computes the stochastic gradient according to the variance reduction technique
Each node updates the gradient auxiliary estimator according to the stochastic gradient-tracking method
Each node tests the triggering condition, broadcasts the decision estimator and the gradient auxiliary estimator, and updates update the latest triggering time
Each node sets t-t+1
No t>tma
I~Yes7
Figure 1
1. Technical Field
[0001] The present invention relates to a field of machine learning.
) 2. Background
[0002] The Internet of Things (IoT) and artificial intelligence have promoted the emergence of networked control systems, which urgently need efficient communication and computation. Due to the limited computational and storage capacity of the nodes, centralized processing of large-scale tasks on a single computing node becomes unrealistic. Decentralized optimization can solve the issues of multiple nodes interacting over a network and is of significance in many areas such as machine learning, resource allocation, data analysis, privacy masking, and signal processing, owing to its ability to parallelize computation and prevent agents from sharing information considered as privacy. Decentralized algorithm usually follows an iterative process, which involves nodes preserving certain estimator of decision vectors in the context of optimization, exchanging this information with neighboring nodes over a communication network, and updating their estimator according to the received information. Designing efficient decentralized algorithms for broad optimization problems is in general more challenging. Some of the literature for such decentralized schemes includes the early work on decen tralized gradient descent (DGD) and its various extensions in achieving efficiency, solving constraints, applying to complex networks, or performing acceleration. These methods suc cessfully showed the effectiveness for figuring out optimization problems in a decentralized manner over networks. Nonetheless, although these methods were intuitive and flexible for cost functions and networks, their computing speeds were particularly slow in comparison with that of centralized counterparts. Besides, linear convergence rates in sub-optimality could be derived for DGD-based methods with constant step-sizes. Therefore, from an op timization point of view, it is always a priority to propose and analyze methods that are comparable in performance to centralized counterparts in terms of convergence rate. In a recent stream of literature, decentralized gradient methods that solve this exactness-rate dilemma have been proposed, which achieve exact linear convergence rate for smooth and strongly convex cost functions. Instances of such methods, including methods based on gradient-tracking, methods based on Lagrangian multiplier, and methods based on dual decomposition, are characterized by various mechanism. Towards practical optimization models, approaches of momentum acceleration have been successfully and widely applied to optimization trick that facilitate the convergence of gradient-based methods. First-order optimization methods based on momentum acceler ation have been of significance in the machine learning community due to their scalability to large-scale problems (including deep learning, federal learning, etc.) and good performance in practice. To improve communication efficiency and meanwhile maintain the desired per formance of the network, various types of strategies have recently been proposed and gained popularity in existing works. The emergence of the event-triggering strategy provides a new perspective for collecting and transmitting information. The main idea behind the event triggering strategy is that nodes only take actions when necessary, that is, only when a measurement of the local node's state error reaches a specified threshold. Its superiority is that some desired properties of the network can still be maintained efficiently. With the advent of the big-data era, the amount of data that nodes in the network need to process is getting larger and more complicated. Therefore, the general methods can be computa tionally very necessary because the nodes in the network need to cope with large amounts of various data. Based on this, we assume that at each iteration, the proposed algorithm only evaluates the gradient of one randomly selected constituent function, and employs the unbiased stochastic average gradient (obtained by the average of all most recent stochastic gradients) to estimate the local gradients. Moreover, the proposed algorithm integrates the event-triggering strategy and the variance-reduction technique with the distributed acceler ated gradient-tracking method to linearly achieve in mean the exact optimal solution.
3. Notation
[0003] In this invention, all of the vectors are default to column vectors. Let R, RP and RP q
be the set of real numbers, p-dimensional real column vectors andp x q matrices, respectively. The inner product of vectors c and d is represented by (c, d). Given a random estimator x, E[x] denotes its expectation. The spectral radius of a matrix A is denoted by p(A). For two matrices A, B E RPXP, A 9 B represents its Kronecker product. The transposes of a vector x and a matrix A are represented by the symbols xT and AT. We denote ||- as the Euclidean norm of a vector. The notation Vf(y) denotes the gradient of function f at y. The matrix, Ip, is the p x p identity matrix, whereas and 0 (appropriate dimensional) are two column vectors of all ones and all zeros, respectively. A quantity (probably a vector) allocated to node i is indexed by a superscript i; e.g., x is node i's estimator at time t.
4. Network Model
[0004] Throughout this invention, the intrinsic interconnections among nodes in the network are considered as an undirected graph g = {V, S, A} involving the nodes set V = {1, 2,..., m}, the edges set S C V x V and the weight matrix A= [a] E Rm m . An edge (i, j) E S implies that node i can directly exchange data with node j.The connection weight between nodes i and j in graph g satisfies asj = aj3 > 0 if (i,j) and otherwise asj = aj = 0. The neighbors set of node i is denoted by V = {jlasj > 0}. A path is a series of consecutive edges. If there is at least one path between any pair of distinct nodes in the graph g, then the graph g is connected. Without loss of generality, it is noticed that a= 0 means no self-connection in the graph. The degree of node i E V is represented by d EM aij and the degree matrix Dg = diag{d',d 2 , ... , dm } is a diagonal matrix. The Laplacian matrix of graph g is defined as Lg = Dg - A which is symmetric and positive semi-definite if the graph g is undirected.
5. Problem Formulation
[0005]This invention focuses on optimizing a finite-sum cost functions which are commonly encountered in machine learning and which can be formulated as:
Tnn
min f (x) = (X)=-(X),f i.fY' (xW, xcRb~P M i=1 W j=1
where x E RP is the optimization estimator (decision vector) and fi : RP - R is a convex function that we view as the privately cost of node i, which is represented as the average of ni constituent functions fi0. Assume that x* is an optimal solution to the above problem. In addition, we make the following assumption on the constituent functions: 1) Each local constituent function fiJ, i E {1,...,m},j {1, ... ,n},is i-strongly convex and K 2-smooth, i.e., for any a, b E RP: (i) f ' (a + b) > f f (a) + Vf J (a)T b + (r1/2)1 b|2 (ii) Vf'(a+ b) - Vfi'(a)|| < i 2 ||b||; 2) The undirected communication network g is connected and the
corresponding weight matrix A= [aic] E Rmxm corresponding to the network g is primitive and doubly stochastic, which indicates that the second largest singular value r3 of A is less than 1, i.e., r3= |A- (1/m)1m1m < 1 The above formulated problem can be frequently found in some machine learning mod els, such as empirical risk minimization, logistic regression, support vector machines, deep neural networks, etc. The machine learning model contains large-scale training samples that are randomly scattered on multiple computing nodes. These computing nodes focus on collectively training a model x E RP by utilizing the neighboring nodes' data. Although the accuracy of the machine learning model can be improved when the local data batch at a single computing node is very large, i.e., ni » 1, the limited memory of the computing node causes a significant increase in training time as well as the amount of communication and computation. However, it is expensive to improve the computing and communication capabilities of a single piece of hardware. Hence, designing a novel decentralized accelerated stochastic double-efficient algorithm will have far-reaching implications.
6. Detailed Implementation Description
[0006] Figure 1 is the flowchart of the proposed algorithm in the present invention. As shown in Figure 1, the decentralized accelerated stochastic double-efficient algorithm comprises the following steps:
6.1. Variable initialization
[0007] Step 1: Each node icV sets t = 0 and maximum number of iterations, tmax.
[0008] Step 2: Each node istarts with x = x RP, s RP, e = ,Vj{1,...,ni},and
o = Y = go = Vf(s ) E RP.
6.2. Parameters selection
[0009] Step 3: According to the definition of the constituent functions, we denote =
maxicv{n} and h = minicv{ni}. Moreover, we select arbitrary parameter wi, select t 3
according to t 3 > 8w, chooset 2 following from 2 Kj > 8 2w - 8rja2 3 with 0 < a < wi/t 3 , pick W 4 such that 2nwi + 2nw2 < W4, and opt W5 satisfying 4960w + 1064W2 + 7523 + 1684 < W5 (1 - K3)2
[0010] Step 4: According to the parameters denoted in Step 3, the momentum coefficient a is selected from the interval 0 < a < Vw 1 /w3 , and the constant step-size i is selected from the interval 0 < i <
wiIj mi2 moi~ mna W3 1 - K3 V/os - 8wi min{ -3 2v2 rW4 ' 8Kw 4 iw4 Kiw4 99K 2 K2 160w3 + 96w 1 + 32W4 + 16W5
6.3. Information exchange
[0011] Step 5: According to the weight of the communication network, each node ic V exchanges variable (information) xi and yi with its neighboring nodes jcE . Then, each node ic V computes the weighted sum E ,/jo a l(x'- zj) and 1 _j ai(y' - yj) for t > 0, where a' > 0 is the weight between node j and node i.
6.4. Event-triggering communication strategy
[0012] Step 6: The emergence of the event-triggering strategy provides a new perspective for information sampling and transmission. Before introducing the event-triggering strategy, we first define t by the k-th triggering time of the i-th node, where ic V. In the methods based on event-triggering strategy, the local estimators of node i at time t are determined by its own estimators and the latest information sent from its neighbors j Eci (at the latest triggering time of node before t). Assume that _t and Qare the information that node i transmits to its neighbors at the latest triggering time before time t, i.e.,
ii= 4K for 'i, ~~)1 x fo t < <
Q = y k(ijt) for t(i tk < t < t(kit) t
where x and y' are two estimators of node i. Moreover, we suppose that all the nodes broadcast its estimators x and yi at initial time, i.e., H = x and Q=y for all ic V. In addition, the next triggering time tk(it)l after t for node ic V is decided by
ti ±2 c 2 >i} 2 k(it+l = inf{tt > tit), I XE!' I|2 + ||E"||2 > Crt}, (2)
where Crj is the event-triggering threshold with parameters C > 0 and 0 < r4 < 1, and E', E,' are the measurement errors which are defined by
E2,X Xi jjt j jyl (3)
6.5. Variables update
[0013] Step 7: According to the event-triggering strategy and the momentum acceleration mechanism, each node i first updates the step of the local decision estimator x 7 1 and the local accelerated estimator si+, i.e.,
i= ±+ a - -(si - i) (4) j=1 St. = Xi(5) =+ +I+C (%i 'a+aI 1 - 4)D
[0014] Step 8: Subsequently, each node i must own a gradient table that possesses all local constituent gradients Vfi(ej+), where e4(1 is the most recent estimator at which the con stituent gradient Vf2 '" was evaluated. At each iteration t + 1, each node i uniformly and randomly selects one constituent function that indexed byx c{1,..., ni} from its own local data batch, and then generates the local stochastic gradient g+1 as
1 +1 =Vf X(s+)- Vf 2 iz+ {m+) + Vf'J(ei 1). (6) 3=1
After generating g+1, the entry VfXi+1(e+21) is replaced by the newly constituent gradient Vf AX+1(si+1), while the other entries remain the same. That is to say, if j=X 1,then store Vf'(C2)= Vf"(s+1); else Vfl (e27' 2 ) =+1
[0015] Step 9: Based on the Step 8, each node ic V updates the local auxiliary estimator y'+1 according to
y '+1 = =Y±Zt&P! yt + a (Y- g) +±9<~ g+ - ti (7) T j=1
[0016] Step 10: Based on the above updates, each node ic V calculates the measurement errors E' ", E in (3), and then test the triggering condition in (2). If the triggering condition
in (2) is satisfied, then each node i E V broadcasts s+r 1 and y+1 to its neighbors j E Vi, and update the latest triggering time.
[0017] Step 11: Each node ic V sets t +1 and goes to Step 7 until a certain stopping criterion is satisfied, e.g., t > tmax where tmax is the maximum number of iterations.
7. Innovation
[0018] 1) Leverage the event-triggering strategy to realize better communication efficiency and the variance-reduction technique to achieve higher computation efficiency, which may avoid energy consumption and extend the useful life of the network.
[0019] 2) Substitute the momentum acceleration mechanism into the event-triggering decen tralized stochastic gradient-tracking method with some well-selected constant step-size to linearly achieve in mean the exact optimal solution.
[0020] 3) Propose a novel decentralized accelerated stochastic double-efficient algorithm for solving the problem of decentralized optimization, which has broad application in many large-scale machine learning tasks where the samples of a training dataset are randomly decentralized across multiple computing nodes.
8. Simulations
8.1. Logistic regression
[0021] First, the proposed algorithm, named as DE-SDAA, is leveraged to deal with a binary classification problem by logistic regression using two real datasets from UCI Machine Learn- ing Repository. In this example, we use the breast cancer wisconsin (diagnostic) dataset to examine the performance of DE-SDAA and use the mushroom dataset as well as the breast cancer wisconsin (diagnostic) dataset for the comparison with other related decentralized methods. In breast cancer wisconsin (diagnostic) dataset, we adopt n = 200 samples as training data, where each training data has dimension p = 9. In mushroom dataset, we employ n = 6000 samples as training data, where each training data has dimension p = 112. All the characters have been preprocessed and normalized to the unit vector for each dataset. For the network, we generate a randomly connected network with m = 10 nodes utilizing an Erdos-Renyi network with probability 0.4. The decentralized logistic regression problem can be formally described as
1i 1E X12 >n nCR~r ln(1 + exp(-bV' (c"' ) Tx))+ 2 |2, (8) i=1 i=1
with the local objective function f (x) being
I 2i7 fi(x) = ln(1 + exp(-b's (ca's)Tx)) + ||2, (9) j=1
where b' {-1,1} and c'CE RP are local data kept by node i for j {1,...,nt};The regularization term (7r/2)||x|1is added to avoid overfitting. In the simulation, we assign data randomly to each local node, i.e., Lin' = n. We set the regularization parameter 7r = 40. Moreover, the step-size r of each algorithm is selected to ensure the best possible convergence. The breast cancer wisconsin (diagnostic) dataset is applied to examine the performance of DE-SDAA. Firstly, the transient behaviors of three dimensions (randomly selected) of state estimator x are shown in Figure 2, which illustrates that the state estimator x in DE SDAA can achieve the consensus in mean at the global optimal solution (The test accuracy is 97.72%). Secondly, the triggering times of five nodes (randomly selected) for its neighbors by DE-SDAA are shown in Figure 3, which imply by combining with Figure 2 that DE SDAA with event-triggering communication strategy can achieve the expected results with fewer communications compared to the time-triggering algorithm. Thirdly, compared with DE-SDAA consider other specific scenarios, the appealing features of DE-SDAA are shown in Figure 4, where the residual (1/m)logi(E 1 |xi - x*||) are treated as the comparison metric. Figure 4 means that DE-SDAA can achieve accelerated linear convergence compared to the algorithms without momentum acceleration mechanism.
8.2. Energy-based source localization
[0022] Second, we further verify the application behavior of the proposed algorithm with numerical simulations for energy-based source localization problem over a network of m sensors. Estimating the location of the energy-based source is an important issue in the military field. Assume that there is a stationary acoustic source x* C R2 locating in an unknown location that we aim at locating in the sensor networks. The source emits an isotropic signal, and we want to use the energy measurement of the signal received by each sensor to estimate the location of the source. In this example, we suppose that each sensor is randomly distributed in spatial locations denoted as a R 2 ,i= 1, ... , m, which is known privately by itself, and each sensor collects ni measurements. Then, an isotropic energy propagation model is applied to measure the j-th received signal strength at sensor i, which is represented by s' = c/(sI- adl) + bi, where c > 0 is a constant and d > 1 is an attenuation characteristic; ||t- aill > 1 and bi is an independent and identically distributed sample noise following from the zero-mean Gaussian distribution with variance a 2 . The maximum-likelihood estimator for the source's location is found by solving the following problem:
mnIT Q(8'~ x-a* (10) _=_ j=1
with the local objective function f2 (x) being
c2 f .(x) = I.Y(si' - -. - a i||d (11) n' 3=1 ||x
According to the analysis, it is suffices to verify that solving the nonlinear least squares problem (11) is equivalent to find the optimal estimator x of the following transformed problem:
min (Y - xPQi, (x) 2), (12) xER2 Tn i=1 n j=1
where QGJ- = {x E R2|||x - aJ l < c/s'i} and PQi,j (x) is the orthogonal projection of x onto Q'0. In specific, we consider that m = 50 sensors are uniformly distributed in a 100 x 100
square and the source location is randomly chosen from the square. The source emits a signal with strength c = 100 and each sensor has n = 100 measurements. Based on the above, the randomly selected 7 paths taken by DE-SDAA are shown in Figure 5 which is plotted on top of contours of the log-likelihood. Figure 5 illustrates that DE-SDAA can successfully find the exact source location like other verified effective algorithm, which is suitable for the practical energy-based source localization problem.
9. Brief Description of The Drawings
[0023] Figure 1 is a flowchart of the decentralized accelerated stochastic double-efficient algorithm.
[0024] Figure 2 shows the transient behaviors of three dimensions (randomly selected) of state estimator x in the proposed algorithm.
[0025] Figure 3 shows the triggering times of five nodes (randomly selected) for its neighbors by the proposed algorithm.
[0026] Figure 4 shows the comparisons between DE-SDAA and DE-SDAA with other specific scenarios.
[0027] Figure 5 shows the randomly selected 7 paths displayed on top of contours of log likelihood function.

Claims (5)

The claims defining the invention are as follows:
1. A decentralized accelerated stochastic double-efficient algorithm 1.1. Variable initialization
Step 1: Each node i E V sets t = 0 and maximum number of iterations, tmax. Step 2: Each node i starts with x= xE RP, s E RP, e 3= 4,VjE{1,...,ni},and
Qo = YO = go = Vf (s ) E RP.
1.
2. Parameters selection
Step 3: According to the definition of the constituent functions, we denote n = maxicv{ni} and h = miniev{n'}. Moreover, we select arbitrary parameter wi, select W3 according to
W3 > 8w1 , choose W2 following from 2 K, > 8 wi - 8ra2 3 with 0 < a < Vwi/w 3 , pick
w4 such that 2nwi+2nw2 < w 4 h, and optw 5 satisfying 4960w 1 +1064W 2 +752W 3 +168W4 < W5 (1 Step 4: According to the parameters denoted in Step 3, the momentum coefficient a is selected from the interval 0 < a < Vw 1 / 3, and the constant step-size r is selected from the interval 0 < i <
mimw2 mow ma2 W 3 1 - K3 VW 3 - 8wi min{1 -22 wi 2v/2K2 W4' 82W4 Kiw4 Kiw4 '99K2 K 2/16W37+ 96wi + 32W4 + 16W5
1.
3. Information exchange
Step 5: According to the weight of the communication network, each node i E V exchanges variable (information) xi and yi with its neighboring nodes jA . Then, each node i E V computes the weighted sum EM ,j aij(xi - xz) and E _ a'j(y' - yj) for t > 0, where aj '> 0 is the weight between node j and node 1.
1.
4. Event-triggering communication strategy Step 6: The emergence of the event-triggering strategy provides a new perspective for infor mation sampling and transmission. Before introducing the event-triggering strategy, we first define t by the k-th triggering time of the i-th node, where i E V. In the methods based on event-triggering strategy, the local estimators of node iat time t are determined by its own estimators and the latest information sent from its neighbors j E V' (at the latest triggering time of node before t). Assume that x' and Q are the information that node i transmits to its neighbors at the latest triggering time before time t, i.e.,
= x(i for t (i;t) < k(it)4,(
Q =y( for t'kit < t < t, where x' and y' are two estimators of node i. Moreover, we suppose that all the nodes broadcast its estimators x and yi at initial time, i.e., H = x and Q=y for all ic V. In addition, the next triggering time tk(it after t for node ic V is decided by )~~t k(itl = inf{tt > t(it), I E 2 2> C4}, (2) where Cj is the event-triggering threshold with parameters C > 0 and 0 < r4 < 1, and E ', Et are the measurement errors which are defined by
ECt =xt~i- Xi, C" i- t 3
1.
5. Variables update
Step 7: According to the event-triggering strategy and the momentum acceleration mech anism, each node i first updates the step of the local decision estimator x's and the local accelerated estimator sj+, i.e.,
Xi= aj±>ai(si - )-r/ (4) j=1 St. = Xi(5) =+ +I+C x ( 1'i+a I4- xD.
Step 8: Subsequently, each node i must own a gradient table that possesses all local con stituent+gradientsVfl(e), where e<1 is the most recent estimator at which the con stituent gradient Vf2 '" was evaluated. At each iteration t + 1, each node i uniformly and randomly selects one constituent function that indexed by x 1 {1,..., n'} from its own local data batch, and then generates the local stochastic gradient g+1 as
9i= Vf 'Xti(s 3 +) - Vf ix+1(e*m)±+ + Vf ' (1). (6) j=1
After generating g+1, the entry Vf'Xi+1(e+21) is replaced by the newly constituent gradient VftXti+(sig), while the other entries remain the same. That is to say, ifj= y 1,then store Vf'(C2)= Vf 3 (sti); else Vfi j(e' 2 )= t+1
Step 9: Based on the Step 8, each node ic V updates the local auxiliary estimator y 1 according to
y 4 =y a(i - 9 ) + g±Z+1 - 9! (7) j=1
Step 10: Based on the above updates, each node ic V calculates the measurement errors
E E 'Y in (3), and then test the triggering condition in (2). If the triggering condition in
(2) is satisfied, then each node ic V broadcasts x 1 and y+ 1 to its neighbors jcA, and update the latest triggering time. Step 11: Each node i E V sets t + 1 and goes to Step 7 until a certain stopping criterion is satisfied, e.g., t > tmax where tmax is the maximum number of iterations.
AU2020101959A 2020-08-24 2020-08-24 Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient Ceased AU2020101959A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020101959A AU2020101959A4 (en) 2020-08-24 2020-08-24 Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020101959A AU2020101959A4 (en) 2020-08-24 2020-08-24 Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient

Publications (1)

Publication Number Publication Date
AU2020101959A4 true AU2020101959A4 (en) 2020-10-01

Family

ID=72608234

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020101959A Ceased AU2020101959A4 (en) 2020-08-24 2020-08-24 Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient

Country Status (1)

Country Link
AU (1) AU2020101959A4 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508199A (en) * 2020-11-30 2021-03-16 同盾控股有限公司 Feature selection method, device and related equipment for cross-feature federated learning
CN113065252A (en) * 2021-04-01 2021-07-02 南京航空航天大学 Method for establishing likelihood function of cutting stability experimental data about model parameters
CN113211446A (en) * 2021-05-20 2021-08-06 长春工业大学 Event trigger-neural dynamic programming mechanical arm decentralized tracking control method
CN113344214A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN114040425A (en) * 2021-11-17 2022-02-11 中国电信集团系统集成有限责任公司 Resource allocation method based on global resource availability optimization
CN114123173A (en) * 2021-11-15 2022-03-01 南京邮电大学 Micro-grid elastic energy management method based on event trigger mechanism under network attack

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508199A (en) * 2020-11-30 2021-03-16 同盾控股有限公司 Feature selection method, device and related equipment for cross-feature federated learning
CN113065252A (en) * 2021-04-01 2021-07-02 南京航空航天大学 Method for establishing likelihood function of cutting stability experimental data about model parameters
CN113211446A (en) * 2021-05-20 2021-08-06 长春工业大学 Event trigger-neural dynamic programming mechanical arm decentralized tracking control method
CN113211446B (en) * 2021-05-20 2023-12-08 长春工业大学 Mechanical arm decentralized tracking control method for event triggering-nerve dynamic programming
CN113344214A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113344214B (en) * 2021-05-31 2022-06-14 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN114123173A (en) * 2021-11-15 2022-03-01 南京邮电大学 Micro-grid elastic energy management method based on event trigger mechanism under network attack
CN114123173B (en) * 2021-11-15 2024-05-14 南京邮电大学 Micro-grid elastic energy management method based on event triggering mechanism under network attack
CN114040425A (en) * 2021-11-17 2022-02-11 中国电信集团系统集成有限责任公司 Resource allocation method based on global resource availability optimization
CN114040425B (en) * 2021-11-17 2024-03-15 中电信数智科技有限公司 Resource allocation method based on global resource utility rate optimization

Similar Documents

Publication Publication Date Title
AU2020101959A4 (en) Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient
Liang et al. Deep-learning-based wireless resource allocation with application to vehicular networks
Lee et al. Stochastic dual averaging for decentralized online optimization on time-varying communication graphs
Srivastava et al. Distributed min-max optimization in networks
Xie et al. A novel relay node placement and energy efficient routing method for heterogeneous wireless sensor networks
CN104079576A (en) Dynamic cooperation alliance structure forming method based on Bayes alliance game
Bouton et al. Coordinated reinforcement learning for optimizing mobile networks
Thien et al. A transfer games actor–critic learning framework for anti-jamming in multi-channel cognitive radio networks
Wang et al. Decentralized multi-agent power control in wireless networks with frequency reuse
Song et al. Optimizing DoS attack energy with imperfect acknowledgments and energy harvesting constraints in cyber-physical systems
CN103916969A (en) Combined authorized user perception and link state estimation method and device
Tuyishimire et al. Modelling and analysis of interference diffusion in the internet of things: An epidemic model
CN109412661A (en) A kind of user cluster-dividing method under extensive mimo system
CN116862021A (en) anti-Bayesian-busy attack decentralization learning method and system based on reputation evaluation
Huang et al. Stochastic approximation based consensus dynamics over Markovian networks
Alpcan Noncooperative games for control of networked systems
CN109548032B (en) Distributed cooperative spectrum cognition method for dense network full-band detection
Bianchi et al. Performance analysis of a distributed Robbins-Monro algorithm for sensor networks
Kirti et al. Scalable distributed Kalman filtering through consensus
Ge et al. Networked Kalman filtering with combined constraints of bandwidth and random delay
Wang et al. An adaptive location estimator based on alpha-beta filtering for wireless sensor networks
Håkansson et al. Optimal scheduling policy for spatio-temporally dependent observations using age-of-information
CN108736991B (en) Group intelligent frequency spectrum switching method based on classification
Banerjee Resource allocation and optimization in cognitive radio using cascaded machine learning algorithm
Wang et al. Convergence Time Minimization for Federated Reinforcement Learning over Wireless Networks

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry