AU2020101237A4 - A computation-efficient distributed algorithm for convex constrained optimization problem - Google Patents

A computation-efficient distributed algorithm for convex constrained optimization problem Download PDF

Info

Publication number
AU2020101237A4
AU2020101237A4 AU2020101237A AU2020101237A AU2020101237A4 AU 2020101237 A4 AU2020101237 A4 AU 2020101237A4 AU 2020101237 A AU2020101237 A AU 2020101237A AU 2020101237 A AU2020101237 A AU 2020101237A AU 2020101237 A4 AU2020101237 A4 AU 2020101237A4
Authority
AU
Australia
Prior art keywords
node
variable
gradient
computation
vfi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020101237A
Inventor
Zhengran Cao
Qingguo Lü
Keke ZHANG
Yunhang Zhu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University
Original Assignee
Southwest University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University filed Critical Southwest University
Priority to AU2020101237A priority Critical patent/AU2020101237A4/en
Application granted granted Critical
Publication of AU2020101237A4 publication Critical patent/AU2020101237A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

A computation-efficient distributed algorithm based on the stochastic gradient projection method for solving the problem of convex constrained optimization with a sum of smooth convex functions and non-smooth regularization terms subject to locally general constraints is proposed over an undirected network. The algorithm mainly includes five parts: Variable initialization, parameter definition and selection, information exchange, stochastic gradien t computation, and variable update. By adopting the variance reduction technique, the proposed algorithm which is set forth in the present invention reduces the amount of com putation in comparison with related works. Presuming the smoothness and strong convexity cost functions, the proposed algorithm can find the exact optimal solution in expectation in the event that the constant step-size is less than an explicitly estimated upper bound. With respect to the existing distributed schemes, the proposed algorithm is more convenient for general constrained optimization problems and possesses low computation complexity in terms of the total number of local gradient evaluations. The present invention has broad application in the modern large-scale information processing problems in machine learning (The samples of a training dataset are randomly distributed across multiple computing nodes and each of the smooth objective functions is further considered as the average of several constituent functions). 1/2 Start Each node sets k=O and maximum number of iteration, kmax Each node initializes local variables and selects fixed step-size as well as fixed tunable parameter according to the network and the problem Each node computes the stochastic gradient according to the variance reduction technique Each node updates the main variable according to the stochastic gradient projection method Each node updates the Lagrangian multipliers according to the main variable and the projection operator Each node sets k=k+l No <k Akax Yes End Figure 1

Description

1/2
Start
Each node sets k=O and maximum number of iteration, kmax
Each node initializes local variables and selects fixed step-size as well as fixed tunable parameter according to the network and the problem
Each node computes the stochastic gradient according to the variance reduction technique
Each node updates the main variable according to the stochastic gradient projection method
Each node updates the Lagrangian multipliers according to the main variable and the projection operator
Each node sets k=k+l
No <kAkax Yes
End
Figure 1
1. Technical Field
[0001] The present invention relates to a field of machine learning.
2. Background
[0002] Due to the limited computational and storage capacity of the nodes, centralized pro cessing of large-scale tasks on a single computing node becomes unrealistic. Distributed optimization has been a classical topic, yet is sparked considerable interest recently in many emerging applications (large-scale tasks) including but not limited to parameter estimation, network attack, machine learning, and IoT Networks. This resurgence of interest is facilitat ed by at least two facts: a) The latest development of high performance computing platform enables us to adopt distributed resources to significantly promote computation efficiency; b) The size of datasets often far exceeds the storage capacity of one machine and requires coordi nation among multiple machines. In distributed optimization (no centralized coordination), each node is only allowed to interact with its neighbors over a locally connected network. Designing computation-efficient distributed algorithms for broad optimization problems is in general more challenging. Distributed optimization methods that are only dependent on gradient information have become a core interest in processing large-scale task due to their excellent scalability. Many known methods, including distributed gradient descent (DGD), dual averaging, EXTRA, ADMM, adaptive diffusion, and gradient-tracking have been studied in the literature. More over, quite a few efficient methods for dealing with various practical problems such as privacy security, machine learning, and online distributed learning, have been emerged. Simultane ously, much attention has been paid for the distributed and continuous-time optimization methods, mainly due to its flexible application in continuous-time physical systems and hardware implementation, and the well-developed continuous-time control technology be ing helpful for analysis. During this period, a number of the distributed continuous-time optimization methods that adopt the first-order gradient information or the second-order Hessian information have been investigated for various kinds of problems. With the advent of the big-data era, the amount of data that nodes in the network need to process is getting larger and more complicated. Therefore, the above methods can be computationally very demanding due to the requirement that each iteration of the algorithm needs a full gradient evaluation of local objective functions. This may make these methods to be practically in feasible when dealing with large-scale tasks, mainly because the nodes in the network need to cope with large amounts of various data. Based on this, we assume that at each iteration, the proposed algorithm only evaluates the gradient of one randomly selected constituent function, and employs the unbiased stochastic average gradient (obtained by the average of all most recent stochastic gradients) to estimate the local gradients. Moreover, we integrate the variance reduction technique with the distributed stochastic gradient projection method with constant step-size to achieve the exact optimal solution in expectation.
3. Notation
[0003] In this invention, all of the vectors are default to column vectors. Let R, R' and R"' be the set of real numbers, n-dimensional real column vectors and m x n real matrices, respectively. The matrix, In, is the n x n identity matrix, whereas 1 and 0 (appropriate dimensional) are two column vectors of all ones and all zeros, respectively. A quantity (probably a vector) allocated to node i is indexed by a subscript i; e.g., x' is node i's estimate at time k. For a real symmetric matrix A, we use Xmax(A) and Xmi(A) to represent the largest and the smallest eigenvalues of A, respectively. The transposes of a vector x and a matrix A are represented by the symbols xT and AT. We denote and as the Euclidean norm (vectors) and 1-norm, respectively. For a positive semi-definite matrix A E R , we use ||x||A = v/xTAx. The symbols 9 and ] Jdenote the Kronecker product and the Cartesian product, respectively. Given a random estimator x, E[x] denotes its expectation. For a vector x = [x 1 , x2 , ..., xn]T, we utilize Z= diag{x} to represent a diagonal matrix which satisfies that zi = xi, Vi=1, ... , n, and zi =0, Vi j. Denote(-) = max{0, -}. Let PQ : R~ Qn Q
and o : R Y [-1, 1]" be two projection operators.
4. Network Model
[0004] The intrinsic interconnections among nodes in the network are considered as an undi rected graph 9 = {V, S, A} involving the node set V = {1, 2,..., n}, the edge set S C V x V and the adjacency matrix A = [aij] E R<". An edge (i, j) cS implies that node i can directly exchange data with node j. The connection weight between nodes i and j in graph 9 satisfies aij = aji > 0 if (i,J) S and otherwise aij = aji = 0. Without loss of generality, it is noticed that aii = 0 means no self-connection in the graph. The degree of node ic V is represented by d' = E_ aij and the degree matrix Dg = diag{d', d',..., d'} is a diagonal
matrix. The Laplacian matrix of graph 9 is defined as Lg = Dg - A which is symmetric and positive semi-definite if the graph 9 is undirected. A path is a series of consecutive edges. If there is at least one path between any pair of distinct nodes in the graph 9, then the graph 9 is connected.
5. Problem Formulation
[0005] Consider the convex constrained optimization problem of the following form: n min J(1)= (fi(-) + ||Ps - qi||1), s.t. Bis = ci, Dil < si, EV (1)
where ERd is the optimization estimator, fi(s) = (1/ei) E=1 fij(1) is the local objective function of node i, and Pis - qi Iis a non-smooth Li-regularization term of node i; Pi E R"i d (mi > 0), qi E R", Bi E R'id (0 < wi < n) is full row-rank, ci E Rwi, Di E R< d
(vi > 0), si E Rvi, and Qi C R nis a non-empty and closed convex set. Here, we consider that the invention is based on the following assumptions: 1) The network g is undirected and connected; 2) Each local constituent function fiC, I E V, j E {1, ... , e}, is --smooth and x-strongly connected, where o > , > 0; 3) The feasible set of (1) is nonempty, i.e., the optimization problem (1) is solvable. Under the above assumptions, problem (1) can be equivalently reformulated as the following form: n min J(x) = fi(xi) +||Px - q||1 ,s.t. Bx = c, Dx < s, Lx = 0 (2) i=1
where fi(xi) = (1/ei) E", f,(xi) and other parameters are defined below. The formulated problem can be frequently found in machine learning (such as modern large-scale information processing problems, reinforcement learning problems, etc.) with large-scale training samples randomly decentralized across the multiple computing nodes which focus on collectively training a model utilizing the neighboring nodes' data.
6. Detailed Implementation Description
[0006] Figure 1 is the flowchart of the proposed algorithm in the present invention. As shown in Figure 1, the computation-efficient distributed algorithm comprises the following steps:
6.1. Variable initialization
[0007] Step 1: Each node iC V sets k = 0 and maximum number of iterations, kmax.
[0008] Step 2: Each node i starts with x0 E Rd, ac Rm , 0 c Rw, AO E Ri, and E Rd.
6.2. Parameters definition and selection
[0009] Step 3: According to the convex constrained optimization problem (1) and the re formulated optimization problem (2), we define x as a vector that stacks all the local es timators xi, E V (i.e., x = vec[x1,...,,n] E Rnd). Let P, B and D be the block di
agonal matrices of P 1 to Pn (i.e., P = blkdiag{P1,..., Pn}C R""nd), B 1 to Bn (i.e., B = blkdiag{B1,..., Bn} E R'Xnd), and D 1 to Dn (i.e., D = blkdiag{D1,..., Dn} E R' nd), respec tively, where m = mi, w = w= and v = oi, v"oi. Denote q = [q qTT R, c=[cT, ... ,c'TT CR, s = [sT,...,sT]T E R, Q= ] Q I Qand L= Lg @Id.
[0010] Step 4: According to the definition of the constituent functions, we denote= maxicv{e} and e= minicv{ei}.
[0011] Step 5: According to the parameters denoted in Step 3 and Step 4, we select the step-size i as follows
1 0 < it < 1(3) + 2-Ind+ BTB + L + 2DTD + 2 PTP) Xma(aInd
where a is a tunable parameter. Then, we select the tunable parameter b as follows
41o < b < 28tXm(BTB) 48o-t(o - p1) (4) a o- au 6.3. Information exchange
[0012] Step 6: According to the weight of the communication network, each node i E V exchanges variable (information) x with its neighboring nodes j E V. Then, each node i E V computes the weighted sum Eu mo ag(X k- xk) for k > 0, where ai ;> 0 is the weight between node j and node 1.
6.4. Stochastic gradient computation
[0013] Step 7: Each node i must own a gradient table that possesses all local constituent gradients Vfj,(ti), where tij is the most recent estimator at which the constituent gradient
Vfi,j was evaluated. At each iteration k > 0, each node i uniformly at random selects one constituent function that indexed by X {1,..., ci} from its own local data batch, and then generates the local stochastic gradient g kas
g=Vf. -kf~y , (zX) (t1' i Vffi(a4) t Vfi,j (tkj)(5
j=1
After generating g , the entry Vfi'(tXk) is replaced by the newly constituent gradient
Vfi' (xi), while the other entries remain the same. That is to say, ifj= Xi then store
ij Vfi, (z4)in gradient table position; else VfV(t ')=
6.5. Variables update
[0014] Step 8: Each node i E V implements the projection step of estimator x kon the local stochastic gradient gi, i.e.,
k+1X
{ k _ 1q(gk +F%>(j±F4J+-p A TB(O 1- J<ck Xk Tg| Pq@,2f ii ) +BI #+Bix -C, (6)i
) z k+ = P2 +D i(Ak +DiX - s ±)+ + ia + "_1± ai (X - x)) (6)
[0015] Step 9: Based on the projection operator and Step 8, each node i c V updates the variable a>+1 according to
a>' i±F+x'-Ti) (7)
[0016] Step 10: Based on the Step 8, each node i E V updates the variable/3+1 according to
1=#+B +1- ci (8)
[0017] Step 11: Based on the definition of (-)+ and Step 8, each node ic V updates the variable A'+1 according to
A+1 = (Ak + Dix+1 _ S' (9)
[0018] Step 12: Based on the Step 8, each node ic V updates the variable ji+1 according to
ih'=tZ aij (x+1 _+1) (10) j='1#i
[0019] Step 13: Each node ic V sets k +1 and goes to Step 7 until a certain stopping criterion is satisfied, e.g., k > kmax where kmax is the maximum number of iterations.
7. Innovation
[0020] 1) Use the unbiased stochastic average gradient to highly reduces the expense of full gradient evaluations, which may avoid energy consumption and extend the useful life of the network.
[0021] 2) Substitute the variance reduction technique into the distributed stochastic gradient projection method with some well-selected constant step-size to achieve the exact optimal solution in expectation.
[0022] 3) Propose a computation-efficient distributed algorithm for solving the convex con strained optimization problem, which has broad application in the modern large-scale infor mation processing problems in machine learning.
8. Simulations
8.1. Performance examination
[0023] First, the proposed algorithm is applied to solve a general distributed minimization problem which is described as follows: n C
min mm Z( 1 Z ICij-b ||CY_- - bijs|2±f- 1)
1 =1
S.t. x 1 -+52 +x3 +x4 =3,
1 - 2 x3 - x4 2,
- 2< i 2, 1 = 1, . . , 4, (11)
4 where R4, C,j E R' , Pic R x 4 , bi j E R, and qi E R for all i, j. Let n = 10 and
ei = 10 for all i. The components of Cij, bij, Pi , and qi are randomly selected in [0, 2],
[-4,4], [0,2], and [-4,4], respectively. The communication among 10 nodes is modeled as a ring network. The node i is assigned the ith objective function fi(x) +||P4x- q|i| 1 , where Ci 2,J fi(x) = (i/eI) I Cui i-b,|2 with the constituent function fij)= ||Cij (x) - b 1 j=1 1, ... , c. In the simulation, the constant step-size r is set as 0.04 and the initial conditions (4i, ,j#, A, and ')are randomly generated as the proposed algorithm. Figure 2 depicts the transient behaviors of all dimensions of state estimator x. Figure 2 indicates that the state estimator in the proposed algorithm can successfully achieve the consensus at the global optimal solution in expectation. Figure 3 verifies the privacy masking properties of the generalization version of the proposed algorithm by using the differential privacy strategy. Suppose that two datasets Z and Z' differ in exactly one element while all other elements are identical. Figure 3 means that two outputs (one randomly displayed node) i and x' are almost fitted resulting in the adversary unable to obtain personal sensitive information.
8.2. Application behavior
[0024] Second, we further verify the application behavior of the proposed algorithm with nu merical simulations for real datasets. We consider the distributed sparse logistic regression problem using the breast cancer wisconsin (diagnostic) dataset and the mushroom dataset provided in UCI Machine Learning Repository. In breast cancer wisconsin (diagnostic) dataset, we adopt n = 200 samples as training data, where each training data has dimension d = 9. In mushroom dataset, we employ n = 6000 samples as training data, where each training data has dimension d = 112. All the characters have been preprocessed and normal- ized to the unit vector for each dataset. For the network, we generate a randomly connected network with n = 10 nodes utilizing an Erdos-Renyi network with probability p = 0.4. The distributed sparse logistic regression problem can be formally described as n min fix() + r1||s||1, (12) PcRd with the local objective function fi(t) being fi(J)O= ln(1 + exp(-bijc i)) +2 2 2' C.1i i=1 where bij {-1, 1} and cij Rd are local data kept by node i for j{1, ... , e; The regu larization term ,1 I is applied to impose sparsity of the optimal solution and( 2 /2)|| S|| is added to avoid overfitting, respectively. In the simulation, we assign data randomly to each local node, i.e., E= ei = n. We set the regularization parameters fi = 0.1 and K 2 = 5, respectively. Moreover, the step-size r of each algorithm is selected to ensure the best possi ble convergence. Figure 4 depicts the evolutions of residuals logi(-|i - 1*||) respected to the proposed algorithm and the the distributed method that can deal with non-smooth regularization terms for two training datasets. From Figure 4, we can find that the proposed algorithm performs linear convergence rate under two training sets.
9. Brief Description of The Drawings
[0025] Figure 1 is a flowchart of the computation-efficient distributed algorithm.
[0026] Figure 2 shows the convergence of i for solving the general minimization problem.
[0027] Figure 3 shows the outputs (one randomly displayed) iand i' corresponding to the adjacent datasets Z and Z'.
[0028] Figure 4 shows the comparisons between the proposed algorithm and the applicable method under two real datasets.

Claims (5)

The claims defining the invention are as follows:
1. A computation-efficient distributed algorithm
1.1. Variable initialization
Step 1: Each node i E V sets k = 0 and maximum number of iterations, kmax.
Step 2: Each node i starts with xO E Rd, aO c R, 00 E Rwi, AO c Rvi, and Y E Rd.
1.
2. Parameters definition and selection
Step 3: According to the convex constrained optimization problem (1) and the reformulated optimization problem (2), we define x as a vector that stacks all the local estimators xi, i E V (i.e., x= vec[xi, ... , xn] E R d) Let P, B and D be the block diagonal matrices of P1 to Pn (i.e., P= blkdiag{P1,..., Pn} c R"x"d), B 1 to Bn (i.e., B = blkdiag{B1,..., B} ER xn),
and D 1 to Dn (i.e., D= blkdiag{D 1 ,..., Dn} E Rvd), respectively, where m = E m,
w = , wi, and v = i, vi. Denoteq= [q qTT c R", c = [ci,..., cn]T E Rw,
s = [sT, ... ,GsT R, =]= 1 and L = Lg 9 Id.
Step 4: According to the definition of each constituent functions (2), we denote = maxicv{ei} and e= minicv{ei}. Step 5: According to the parameters denoted in Step 3 and Step 4, we select the step-size i as follows
1 0 < it < (3) + 2-Ind+ xma(aInd BTB + L + 2DTD + 2 PTP)
where a is a tunable parameter. Then, we select the tunable parameter b as follows
41t8 < b < 28iXmin(BTB) 48oi(o- - [t) (4) a o- au 1.
3. Information exchange
Step 6: According to the weight of the communication network, each node i E V exchanges variable (information) x with its neighboring nodes jV. Then, each node 1 E V computes the weighted sum E mji agj(x - x ) for k > 0, where ai ;> 0 is the weight between node jand node 1.
1.
4. Stochastic gradient computation
Step 7: Each node i must own a gradient table that possesses all local constituent gradients Vfj,(tj,), where tij is the most recent estimator at which the constituent gradient Vfi,j was evaluated. At each iteration k > 0, each nodei uniformly at random selects one constituent function that indexed by xci E{1, ... , ei} from its own local data batch, and then generates the local stochastic gradient g kas
= Vfx()- Vfix( ) Vfi,(t)
After generating gk, the entry Vfi'X (t ) is replaced by the newly constituent gradient
Vfi'x (Xi, while the other entries remain the same. That is to say, if j= xi then store
ij Vfi,(x)in gradient table position; elseVf,(t ')= Vfi 1 (tt9.
1.
5. Variables update
Step 8: Each node i E V implements the projection step of estimator x kon the local stochastic gradient gi, i.e.,
k+1 S X k+ = p . fk+Di ±Fi(+P-g) T(A[k ii+ DiX - s + ± q(g4 + B(/ + Bix- c,) ±)+ + En, 1 au(a- r )) }(6) (6)
Step 9: Based on the projection operator and Step 8, each node ic V updates the variable a> according to
a+1 = g( k + Pq)+1 (7)
Step 10: Based on the Step 8, each node ic V updates the variable 0k+1 according to
t'1= #<±BxO+1- c+ (8)
Step 11: Based on the definition of (-)+ and Step 8, each node ic V updates the variable A+1 according to
A+1= (A+Dix+1- + (9)
Step 12: Based on the Step 8, each node ic V updates the variable j>+' according to
t-+1'=t--y aij (x+1 _+1) (10) j='1#i
Step 13: Each node ic V sets k +1 and goes to Step 7 until a certain stopping criterion is satisfied, e.g., k > kmax where kmax is the maximum number of iterations.
AU2020101237A 2020-07-03 2020-07-03 A computation-efficient distributed algorithm for convex constrained optimization problem Ceased AU2020101237A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020101237A AU2020101237A4 (en) 2020-07-03 2020-07-03 A computation-efficient distributed algorithm for convex constrained optimization problem

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020101237A AU2020101237A4 (en) 2020-07-03 2020-07-03 A computation-efficient distributed algorithm for convex constrained optimization problem

Publications (1)

Publication Number Publication Date
AU2020101237A4 true AU2020101237A4 (en) 2020-08-06

Family

ID=71833569

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020101237A Ceased AU2020101237A4 (en) 2020-07-03 2020-07-03 A computation-efficient distributed algorithm for convex constrained optimization problem

Country Status (1)

Country Link
AU (1) AU2020101237A4 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064726A (en) * 2021-04-01 2021-07-02 北京理工大学 Distributed image segmentation method based on sparsity and Burer-Monteiro decomposition
CN113268862A (en) * 2021-05-01 2021-08-17 群智未来人工智能科技研究院(无锡)有限公司 Distributed discrete time algorithm for local fully-constrained optimization problem on directed non-equilibrium graph
CN113591290A (en) * 2021-07-20 2021-11-02 上海华虹宏力半导体制造有限公司 OPC model simulation method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064726A (en) * 2021-04-01 2021-07-02 北京理工大学 Distributed image segmentation method based on sparsity and Burer-Monteiro decomposition
CN113064726B (en) * 2021-04-01 2022-07-29 北京理工大学 Distributed image segmentation method based on sparsity and Burer-Monteiro decomposition
CN113268862A (en) * 2021-05-01 2021-08-17 群智未来人工智能科技研究院(无锡)有限公司 Distributed discrete time algorithm for local fully-constrained optimization problem on directed non-equilibrium graph
CN113591290A (en) * 2021-07-20 2021-11-02 上海华虹宏力半导体制造有限公司 OPC model simulation method
CN113591290B (en) * 2021-07-20 2024-02-06 上海华虹宏力半导体制造有限公司 OPC model simulation method

Similar Documents

Publication Publication Date Title
AU2020101237A4 (en) A computation-efficient distributed algorithm for convex constrained optimization problem
Garriga-Alonso et al. Deep convolutional networks as shallow gaussian processes
Latafat et al. A new randomized block-coordinate primal-dual proximal algorithm for distributed optimization
Ghosh et al. On convergence of differential evolution over a class of continuous functions with unique global optimum
Pratama et al. Automatic construction of multi-layer perceptron network from streaming examples
Ishii et al. The PageRank problem, multiagent consensus, and web aggregation: A systems and control viewpoint
CN116034382A (en) Privacy preserving asynchronous federal learning of vertically partitioned data
AU2020101959A4 (en) Decentralized optimization algorithm for machine learning tasks in networks: Resource efficient
Gratton et al. Distributed ridge regression with feature partitioning
Xiong et al. Straggler-resilient distributed machine learning with dynamic backup workers
Wang et al. Unsupervised learning for asynchronous resource allocation in ad-hoc wireless networks
Zhao et al. Spatiotemporal graph convolutional recurrent networks for traffic matrix prediction
CN113228059A (en) Cross-network-oriented representation learning algorithm
Hu et al. The Barzilai–Borwein Method for distributed optimization over unbalanced directed networks
Zhang et al. Gt-storm: Taming sample, communication, and memory complexities in decentralized non-convex learning
Zhang et al. Net-fleet: Achieving linear convergence speedup for fully decentralized federated learning with heterogeneous data
Tu et al. Byzantine-robust distributed sparse learning for M-estimation
Mitropoulou et al. Anomaly Detection in Cloud Computing using Knowledge Graph Embedding and Machine Learning Mechanisms
Lyu et al. Personalized federated learning with multiple known clusters
Devert et al. Robust multi-cellular developmental design
Camisa et al. Distributed constraint-coupled optimization over random time-varying graphs via primal decomposition and block subgradient approaches
Ay Information geometry on complexity and stochastic interaction
He et al. Byzantine-robust stochastic gradient descent for distributed low-rank matrix completion
Olshevsky et al. Asymptotic network independence in distributed optimization for machine learning
Rastegarnia et al. An incremental LMS network with reduced communication delay

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry