CN114826929B

CN114826929B - Centreless random gradient descent method based on multi-entity machine environment

Info

Publication number: CN114826929B
Application number: CN202210228961.2A
Authority: CN
Inventors: 凌青; 朱多煜
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2023-06-16
Anticipated expiration: 2042-03-08
Also published as: CN114826929A

Abstract

The invention discloses a centerless random gradient descent method based on a multi-entity machine environment, which comprises the following steps: s1: determining a network structure according to the collected multi-entity machine information to be participated; s2: after the network structure is determined, a communication network is established according to each node and neighbor nodes in the network, so that a network topology is established; s3: after the network topology is built, the nodes needing iteration start to run a random gradient descent algorithm to perform iterative optimization. The invention breaks through the high hardware performance limit realized by the centerless distributed algorithm. The cross-network cooperation of a plurality of common entity machines can be realized, and large-scale machine learning tasks can be completed together.

Description

Centreless random gradient descent method based on multi-entity machine environment

Technical Field

The invention relates to the technical field of distributed random optimization, in particular to a centerless random gradient descent method based on a multi-entity machine environment.

Background

With the arrival of the big data age and the development of the deep neural network, a single machine cannot bear huge data calculation, and the distributed optimization algorithm becomes an optimal method for solving the problems of large data quantity and long time consumption of the deep neural network training. And a large amount of data is distributed on a plurality of nodes for parallel calculation and then is aggregated, so that the time required by calculation can be greatly shortened, and the requirement on the hardware performance of the nodes can be reduced.

The distributed random optimization algorithm originally used a network structure with a center, as shown in fig. 1, data are distributed on all child nodes, and aggregation is performed by the center node. The distributed random optimization algorithm with the center has a simple structure and is easy to realize. Because the central node exists, the regulation and the supervision of each child node are also very convenient. However, since all the child nodes need to send calculation data to the central node, as the number of nodes increases, the communication and calculation capabilities of the central node become bottlenecks for limiting the expansion of the network. Once the central node is attacked, the whole network is paralyzed and cannot normally run. In addition, because the central node can access all data, it is difficult to meet increasingly stringent data security and privacy requirements.

In order to overcome a series of problems caused by the central node, the architecture of the network without the center starts to gradually replace the network with the center. The centerless network has no central node, and each node only communicates with own neighbors, so that the centerless network can well overcome the central bottleneck and the data security problem. And it has now also been theoretically demonstrated that both centered and non-centered networks have the same convergence speed and accuracy for the same algorithm. However, the existing centerless algorithm implementation is based on a single machine multi-process mode or multi-GPU cluster implementation. These implementations have high requirements on hardware cost, require data to be concentrated in one node or one cluster, and cannot be practically applied to solve the problem of multiparty collaboration in life.

Disclosure of Invention

The invention provides a centerless random gradient descent method based on a multi-entity machine environment, which can realize multi-end cross-network cooperation in order to overcome the defects that the existing centerless distributed optimization algorithm has high requirements on hardware environment and cannot be applied to daily scenes such as multi-mechanism cooperation and the like.

In order to achieve the above purpose of the present invention, the following technical scheme is adopted:

a centerless random gradient descent method based on a multi-entity machine environment comprises the following steps:

s1: determining a network structure according to the collected multi-entity machine information to be participated;

s2: after the network structure is determined, a communication network is established according to each node and neighbor nodes in the network, so that a network topology is established;

s3: after the network topology is built, the nodes needing iteration start to run a random gradient descent algorithm to perform iterative optimization.

Preferably, the multi-entity machine information includes trust degree, geographic position and communication condition among different multi-entity machines.

Further, when determining the network structure, preferentially connecting the nodes with high trust degree as neighbor nodes; preferentially connecting nodes with short distances as neighbor nodes; and preferentially connecting the nodes with good communication conditions as neighbor nodes.

Preferably, step S2, after determining the network structure, distributes the neighbor node information to each node, after the node receives the neighbor node information of itself, the node establishes communication with the neighbor node of itself by using the communication transmission protocol, and feeds back the success information after all the neighbor nodes successfully establish communication, so as to complete the construction of the network topology.

Further, step S2 specifically includes the following steps:

s201: the pseudo master node responsible for supervision sorts the determined network topology information, and then distributes neighbor node information of each node;

s202: after each node takes the IP information of the neighbor node, sending a query request to all the neighbor nodes, and entering a waiting state; the nodes waiting for the state have two operations: firstly, responding when a query request is received, and secondly, recording all received query responses;

s203: after receiving the query responses of all the neighbor nodes, each node sends a confirm request to the neighbor node of itself again, and enters a waiting state; the same node responds when receiving the confirm request and records all confirm responses received by the node;

s204: and after collecting all the complex information, the pseudo master node sends a start instruction to each node to indicate that the construction of the network topology is completed and enter the next step.

Still further, there are two states for the node when iterating: one is a computing state and one is a communication state;

the node in the calculation state firstly extracts samples from a local data set and calculates gradients, then aggregates information sent by other nodes, and enters a communication state after gradient updating;

and the node in the communication state sends the updated variable to the neighbor node through a communication protocol, waits for collecting variable information sent by the neighbor node, and enters the next iteration after the collection is completed.

Still further, the node in the computing state does not perform any communication and response operations, and only after the node is in the communication state after the computing is completed, the node can accept and send information.

Still further, in step S3, the random gradient descent algorithm specifically includes the following steps:

s301: parameter initialization, namely setting an update step length lambda, the sample number batch_size, a training maximum round number max_epoch, a mixing matrix W, a node waiting time t, a maximum error number max_error and a random initialization variable x;

s302: extracting the batch_size training samples from the local dataset and computing a random gradient using the extracted samples

S303: the variable sent by all neighbor nodes and the variable of the variable are weighted and aggregated by utilizing the mixing matrix W;

s304: carrying out gradient descent by utilizing the random gradient obtained by local calculation and the new variable obtained by aggregation to obtain the calculation result of the iteration, and ending the calculation state;

s305: the node enters a communication state, sends the calculation result of the time to the neighbor, collects the calculation result sent by the neighbor, and ends the communication state after waiting for a period of time t.

A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the steps of a centreless random gradient descent method based on a multi-entity machine environment when said computer program is executed.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a centreless random gradient descent method based on a multi-entity machine environment.

The beneficial effects of the invention are as follows:

the invention breaks through the high hardware performance limit realized by the centerless distributed algorithm. The cross-network cooperation of a plurality of common entity machines can be realized, and large-scale machine learning tasks can be completed together.

The invention reduces the requirement of algorithm realization on hardware performance. Taking the problem of image classification on the CIFAR10 dataset as an example, in the existing implementation of the centreless distributed optimization algorithm, if a slightly complex neural network structure is used, the implementation of single-machine multi-process cannot be operated, because the multi-process parallel operation is performed, even if the performance of the single machine is very high, the resources which can be distributed to each process are limited in practice. Multiple GPU clusters require significant equipment costs and are not affordable to a typical company or individual. The centreless random gradient descent method provided by the invention is a portable computer or a desktop host computer, and can cooperate to complete the optimization task as long as the method can normally run.

The invention can realize cross-network communication and has better applicability. Existing centreless distributed optimization algorithm implementations require data to be concentrated in one point or one cluster because cross-network communication is not possible. With the improvement of the requirements of data privacy and security, the data are basically scattered on nodes in different network environments, and the existing implementation method cannot be used. The centreless random gradient descent method can achieve cross-network communication among all entity machines, and is suitable for the application environment of the centreless distributed algorithm.

The invention has certain robustness to the severe network environment. The existing centerless distributed algorithm is realized without considering errors caused in the communication process. Because the communication between the processes does not pass through a communication link, the current packet loss or transmission error does not occur, the communication link between the nodes in the GPU cluster has high bandwidth and small time delay, and the transmission is stable and reliable. The communication environment in daily life is severe, and the phenomenon of packet loss or transmission error can often occur, which is also an important reason for the weak applicability of the existing centerless algorithm implementation. The centreless random gradient descent method of the invention designs robust measures aiming at unstable network environments, and can still converge to better precision when the network environments are worse.

Drawings

Fig. 1 is a prior art network topology with a center.

FIG. 2 is a flow chart of the centerless random gradient descent method described in example 1.

Fig. 3 is a centerless network topology.

Fig. 4 is a communication flow chart constructed in example 1.

Fig. 5 is a pseudo code of a random gradient descent algorithm.

Fig. 6 is a comparison of experimental results.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 2, a centreless random gradient descent method based on a multi-entity machine environment includes the following steps:

in yet another embodiment, information of multiple entity machines to be participated is collected, wherein the information of multiple entity machines includes trust degree, geographic position and communication condition among different entity machines. For five existing hosts with different specifications, a network structure shown in fig. 3 can be designed, wherein

nodes

1 and 2 are in dormitory, 3 and 4 are in laboratory, 5 are in home, and an ali cloud server with a public network ip is applied for serving as a pseudo master node and a communication bridge for realizing cross-network communication.

When the network structure is determined, the nodes with high priority connection trust degree are used as neighbor nodes, and the privacy of the data can be better ensured; the nodes with the short distances are preferentially connected to serve as neighbor nodes, so that the time required by cross-network communication can be reduced, and the algorithm efficiency is improved; the nodes with good communication conditions are preferentially connected as neighbor nodes, so that the occurrence of communication errors and other problems can be reduced, and the accuracy of an algorithm is improved; in addition, it is desirable that the edges be as few as possible, while still ensuring overall network connectivity.

in a specific embodiment, the ali cloud server distributes the neighbor node ip information of each node to each node, after the node receives the neighbor information of the node, the node sends an inquiry request once by using a communication transmission protocol (TCP), the communication is established between the inquiry request and the neighbor node of the node once by confirming the inquiry request once, and after all the neighbor nodes successfully construct the communication, the information that the node has successfully constructed is fed back to the cloud server, so that the construction of the network topology is completed. The detailed flow is shown in FIG. 4

In a specific embodiment, step S2 specifically includes the following steps:

s201: the cloud server is used as a pseudo master node responsible for supervision to sort the determined network topology information, and then the neighbor node information of each node is distributed;

S3: after the network topology is built, the nodes needing iteration start to run a random gradient descent algorithm to perform iterative optimization. Iterative optimization, after the network topology is built, each node starts to train a model of Resnet 18. The node has two states when iterating, one is a computing state and one is a communication state. The node in the calculation state does not perform any communication and response operation, and only after the node is in the communication state after the calculation is completed, the node can accept and send information. And (3) calculating the nodes in the state, firstly extracting samples from the local data set, calculating the gradient, then aggregating information sent by other nodes, carrying out gradient update, and entering the communication state. The node in the communication state sends the updated variable to the neighbor through an unreliable communication protocol UDP, waits for collecting variable information sent by the neighbor, and enters the next iteration after the collection is completed. The pseudo code of the algorithm is shown in fig. 5.

In a specific embodiment, step S3, the random gradient descent algorithm specifically includes the following steps:

s301: parameter initialization, setting an updating step length lambda=0.01 of an algorithm, a sample number batch_size=64, a training maximum round max_epoch=400, a mixing matrix W which is equal to each neighbor weight, a waiting time t=50 ms of a node, a maximum error number max_error=5 and a random initialization variable x.

S302: randomly and uniformly distributing the data set of CIFAR10 to five nodes, extracting batch_size training samples from the local data set from each node, and calculating a random gradient by using the extracted samples

According to different situations, step S305 may include different details of operations, which are as follows:

1. when the information sent by all neighbors is correctly accepted within the waiting time t, the communication state can be ended after waiting for the full t time.

2. When the information of a certain node is not received or the received information is wrong but not exceeds max_error in the waiting time t, the value used for the last iteration is supplemented after the waiting state is ended.

3. When the information of a certain node is not received or the received information is wrong and exceeds max_error in the waiting time t, reporting a pseudo master node before ending the waiting state, suspending iteration by the pseudo master, directly discarding the node, modifying the network topology, and continuing to start iteration after redistributing the information.

4. And (3) calculating all data in the local data set of each node once to be one epoch, repeating the steps 3-2 to 3-5 until the max_epoch round is completed, and ending the algorithm.

The data for each epoch is recorded and the loss is calculated using cross entropy.

The result comparison as shown in fig. 6 can be obtained, wherein the solid line is the existing centerless distributed algorithm, the dotted line is the existing centerless distributed algorithm of our multi-entity machine, and the stability and convergence accuracy of our algorithm are better than those of the existing algorithm.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples.

Example 2

A computer system comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor, when executing the computer program, performs the steps of a centreless random gradient descent method based on a multi-entity machine environment as follows:

Example 3

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a centreless random gradient descent method based on a multi-entity machine environment, comprising:

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A centerless random gradient descent method based on a multi-entity machine environment is characterized in that: the method comprises the following steps:

s3: after the network topology is built, the node needing iteration starts to run a random gradient descent algorithm to perform iteration optimization;

the step S3 includes: there are two states for the node when iterating: one is a computing state and one is a communication state;

the node in the communication state sends the updated variable to the neighbor node through a communication protocol, waits for collecting variable information sent by the neighbor node, and enters the next iteration after the collection is completed;

the node in the calculation state does not perform any communication and response operation, and only after the node is in the communication state after the calculation is completed, the node can accept and send information;

step S3, a random gradient descent algorithm specifically comprises the following steps:

s305: the node enters a communication state, sends the result of the calculation to the neighbor, collects the calculation result sent by the neighbor, and ends the communication state after waiting for a period of time t;

(1) When the information sent by all neighbors is correctly accepted within the waiting time t, the communication state can be ended after waiting for the full t time;

(2) When the information of a certain node is not received or the received information is wrong but not exceeds max_error in the waiting time t, supplementing the value used for the last iteration after finishing the waiting state;

(3) When the information of a certain node is not received or the received information is wrong and exceeds max_error in the waiting time t, reporting a pseudo master node before ending the waiting state, suspending iteration by the pseudo master, directly discarding the node, modifying the network topology, and continuing to start iteration after redistributing the information;

(4) Step 302 to step 305 are repeated for all data in each node local data set participating in one calculation as one epoch until the max_epoch round is completed, ending the algorithm.

2. The centreless random gradient descent method based on the multi-entity machine environment according to claim 1, wherein: the multi-entity machine information comprises trust degree, geographic position and communication condition among different multi-entity machines.

3. The centreless random gradient descent method based on the multi-entity machine environment according to claim 2, wherein: when determining the network structure, preferentially connecting nodes with high trust degree as neighbor nodes; preferentially connecting nodes with short distances as neighbor nodes; and preferentially connecting the nodes with good communication conditions as neighbor nodes.

4. The centreless random gradient descent method based on the multi-entity machine environment according to claim 1, wherein: and S2, after the network structure is determined, distributing neighbor node information to each node, after the node receives the neighbor node information, establishing communication with the neighbor node by using a communication transmission protocol, and feeding back successful information after all the neighbor nodes successfully establish the communication, thereby completing the construction of the network topology.

5. The method for centreless random gradient descent based on a multi-entity machine environment according to claim 4, wherein the method comprises the following steps: step S2, specifically comprising the following steps:

6. A computer system comprising a memory, a processor, and a computer program stored on the memory and running on the processor, characterized by: the processor, when executing the computer program, performs the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 5.