CN114186671A  Largebatch decentralized distributed image classifier training method and system  Google Patents
Largebatch decentralized distributed image classifier training method and system Download PDFInfo
 Publication number
 CN114186671A CN114186671A CN202111516644.2A CN202111516644A CN114186671A CN 114186671 A CN114186671 A CN 114186671A CN 202111516644 A CN202111516644 A CN 202111516644A CN 114186671 A CN114186671 A CN 114186671A
 Authority
 CN
 China
 Prior art keywords
 node
 neural network
 network model
 parameters
 batch
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Pending
Links
 238000004364 calculation method Methods 0.000 claims abstract description 7
 238000003062 neural network model Methods 0.000 claims description 56
 239000011159 matrix material Substances 0.000 claims description 11
 238000000034 method Methods 0.000 claims description 10
 238000004590 computer program Methods 0.000 claims description 6
 230000000875 corresponding Effects 0.000 claims description 4
 230000000977 initiatory Effects 0.000 claims description 4
 238000010606 normalization Methods 0.000 claims description 2
 238000004891 communication Methods 0.000 abstract description 14
 238000010801 machine learning Methods 0.000 description 5
 238000005070 sampling Methods 0.000 description 3
 238000005516 engineering process Methods 0.000 description 2
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reaction Methods 0.000 description 1
 238000005457 optimization Methods 0.000 description 1
Images
Classifications

 G06N3/045—

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Computing arrangements based on biological models using neural network models
 G06N3/08—Learning methods
 G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
 G06N3/00—Computing arrangements based on biological models
 G06N3/02—Computing arrangements based on biological models using neural network models
 G06N3/08—Learning methods
 G06N3/084—Backpropagation
Abstract
The invention discloses a batch decentralized distributed image classifier training method and a batch decentralized distributed image classifier training system. Each node communicates with the neighbor nodes to obtain the latest image classifier parameters of the nodes, and the latest image classifier parameters and the local image classifier parameters of the nodes are weighted and averaged to be used as new local parameters to participate in the next round of updating. And continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter. The method cancels the central node, and the problem of congestion at the central node can not occur, meanwhile, the method is suitable for training of a largebatch image classifier, parameter updating and communication times can be reduced through largebatch training, and therefore calculation resources such as GPU can be fully utilized, and training efficiency is greatly improved.
Description
Technical Field
The invention relates to a batch decentralized distributed image classifier training method and system, and belongs to the technical field of image classification and machine learning.
Background
Training of many image classifiers can be formalized as solving the following finite sumform optimization problem:
where x is a parameter of the model, d represents a dimension of the model parameter, n represents the total number of training samples, ξ_{i}Denotes the ith sample, f (x; xi)_{i}) It is the loss function corresponding to the ith sample.
In recent years, deep learning is vigorously developed, large data sets and large models are continuously developed, so that the computing power of a single machine can not meet the requirement any more, and a distributed machine learning technology for completing a training task by the cooperative work of a plurality of machines becomes an important option for solving the problem. In addition, in the scenarios of federal learning, edge computing, and the like, the training data can only be stored in each terminal device for the requirements of privacy protection and the like, and in this case, the distributed machine learning technology also needs to be used.
The parameter server architecture is the most commonly used architecture in distributed machine learning. The architecture comprises a server node (or cluster) and a plurality of working nodes (or clusters). The task of the server node is to maintain globally shared parameters, while the working nodes compute local data, such as gradients, using locally stored training samples. The working nodes cannot directly communicate with each other, and can only complete the tasks of searching and updating the parameters through communication with the server node. Since all the working nodes only communicate with the server node, the server node is generally called a central node, and the parameter server architecture is a typical centralized architecture. Such architectures often place high demands on the underlying communication hardware. In case of high latency or low bandwidth, etc., the communication congestion occurring at the central node may slow down the entire training process.
The decentralized scenario refers to the role of removing a central server in a distributed architecture, and each working node performs peertopeer communication with a neighbor node in an "equal" manner. The most classical of the decentralization methods is the GossiD SGD method: the connection weight between each node can be modeled into a weight matrix, in each iteration process, each node randomly samples small batch of data according to local parameters and local data to perform gradient calculation, updates the local parameters by using the calculated gradient, then communicates with neighboring nodes, obtains the parameters on the neighboring nodes, performs weighted average on each parameter according to the weight matrix, and enters next iteration as new local parameters. And finally stopping iterative updating after the stopping condition is met, and outputting the parameter average value on each node as the final model parameter.
The current image classifier training usually adopts a smaller batch size, and the increase of the batch size can fully utilize the computing power of a GPU multicore system and accelerate the training speed. In a distributed environment, increasing the batch size may reduce the number of parameter updates and the number of communications. However, the blind increase of the batch size can lead to the reduction of the generalization performance of the trained final image classifier, so that a special image classifier training method suitable for the batch training needs to be designed.
Disclosure of Invention
The purpose of the invention is as follows: in the current training task of image classification by using an image classifier, a random gradient descent method in a decentralized scene usually uses a small batch size, that is, only a small number of image samples are sampled each time, which can cause difficulty in fully utilizing the calculation power of the GPU. In a distributed environment, a smallbatch stochastic gradient descent method requires a large number of parameter updates, which brings frequent communication times and large communication overhead. And the blind increase of the batch size of the sampled images can cause the prediction accuracy of the finally trained image classifier on the test image set to be reduced. Aiming at the problems and the defects in the prior art, the invention provides a batch decentralized distributed image classifier training method and a system. The method is simple and easy to implement, has no extra overhead, can be applied to reducing the communication times and parameter updating times in distributed training in a decentralized scene, and improves the efficiency of distributed training of the image classifier.
The technical scheme is as follows: a large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to locally stored image samples, performs normalization processing on the gradients, and updates momentum and local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
The training process comprises the following specific steps:
step 100, inputting a neural network model and randomly initializing a global neural network model parameter x^{0}Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; w ═ W_{i，j})_{K×K}The weight is used for modeling whether the nodes are communicated or not and performing weighted average during each communication; each element of which is a value between 0 and 1. w is a_{i，j}0 denotes no communication between node i and node j, w_{i，j}And > 0 represents the weight of the weighted average of the communication parameters.
Step 102, initializing a counter t to 0;
Step 105, inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating the gradientWhereinRepresenting the ith image xi_{i}The gradient of the corresponding loss function under the neural network model parameter of the current node;
Step 108, the current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes Representing a set consisting of neighbor nodes of the current node k and the node k itself;
step 109, do the following according to the weight matrix WWeighted average is carried out to obtain new neural network model parameters of the current node k
Step 110, updating a counter t to t + 1;
step 111, judging whether the number T of the current finished iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, returning to the step 104 to continue training;
step 112, averaging the neural network model parameters on each node to obtain an average neural network model parameter And obtaining the trained image classifier as an output parameter. A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x^{0}Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete image sample set tableK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a batch decentralized distributed image classifier training method as described above when executing the computer program.
A computer readable storage medium storing a computer program for performing a mass decentralized distributed image classifier training method as described above.
Has the advantages that: compared with the prior art, the largebatch decentralized distributed image classifier training method provided by the invention is simple and easy to implement, has no extra overhead, cancels the central node, and does not cause the problem of communication congestion at the central node.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A batch decentralized distributed image classifier training method is suitable for scenes with a large number of samples of an image data set to be processed and large models. Taking a neural network model as an example of distributed training of an image classifier, a specific workflow of the method of the embodiment is as follows:
the work flow of the batch decentralized distributed image classifier training method on the kth work node is shown in fig. 1. Firstly, inputting a neural network model and randomly initializing a global neural network model parameter x^{0}Saving partial image samples at each node(complete set of image samples)) Randomly acquiring the batch size b, the learning rate eta, the weight matrix W, the total iteration round number T and the momentum coefficient beta of the image samples each time (step 100), and initializing the parameters of the neural network model(step 101), the initialization counter t is set to 0 (step 102), and the momentum is initialized(step 103). Partial image set saved from local (node) nextRandomly selecting a small batch of image data(step 104), inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating a random gradientWhereinRepresenting the ith image xi_{i}The gradient of the corresponding loss function under the current local neural network model parameters (step 105), and then the random gradient in the previous step is normalized and the momentum is updated(step 106), updating local neural network model parameters(step 107), communicating with the neighbor nodes to obtain the neural network model parameters thereof Representing the set formed by the neighbor nodes of the node k and the node k (step 108), and carrying out weighted average according to the weight matrix W to obtain new local neural network model parameters(step 109), updating the counter T to T +1 (step 110), judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if yes, ending the training process, otherwise, returning to the step 104 to continue training (step 111), and taking the average neural network model parameter on each node as the output parameter(step 112).
A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x^{0}Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
It will be apparent to those skilled in the art that the modules of the batch decentralized distributed image classifier training system or the steps of the batch decentralized distributed image classifier training method according to the embodiments of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, or alternatively, they may be implemented by program code executable by a computing device, so that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be executed in a different order than here, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The method of the present invention was experimented with on multiple image classification datasets. In the experiment, the classification accuracy of the finally trained image classifier on the test image set under different batch size settings during random sampling is compared. The experimental result shows that the method provided by the invention can ensure that the classification accuracy of the finally trained image classifier has no obvious loss under the condition that the batch size is increased by several times, so that the calculation resources such as GPU (graphics processing unit) and the like can be more fully utilized, the parameter updating times are reduced, and the machine learning training efficiency is improved. Meanwhile, under the decentralized scene setting, the selection of the communication topology among the nodes can be more flexible, and the problem of communication congestion of the central node is avoided, so that the training process is accelerated.
Claims (7)
1. A large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to image samples stored by the nodes, performs normalization processing on the gradients, and updates momentum and the local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
2. The batch decentralized distributed image classifier training method according to claim 1, wherein a neural network model is used for training the image classifier; inputting neural network model and randomly initializing global neural network model parameter x^{0}Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
3. The batch decentralized distributed image classifier training method according to claim 2, wherein the partial image sets saved from the nodesRandomly selecting a small batch of image data
Inputting randomly selected small batch of image data into a neural network model, performing back propagation, and calculating gradientWhereinRepresenting the ith image xi_{i}The gradient of the corresponding loss function under the neural network model parameter of the current node;
normalizing the gradient and updating the momentum
4. The batch decentralized distributed image classifier training method according to claim 2, wherein neural network model parameters of current nodes are updatedThe current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes Representing a set consisting of neighbor nodes of the current node k and the node k itself; carrying out weighted average according to the weight matrix W to obtain new neural network model parameters of the current node kUpdating the counter t to t + 1; judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, continuing to train the neural network model; after training is finished, the neural network model parameters on each node are averaged to obtain average neural network model parametersAs output parameters.
5. A batch decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x^{0}Saving a set of partial image samples on each nodeRandomly acquiring image samples each timeThe batch size is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
6. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the batch decentralized distributed image classifier training method according to any one of claims 14.
7. A computerreadable storage medium storing a computer program for executing a batch decentralized distributed image classifier training method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN202111516644.2A CN114186671A (en)  20211207  20211207  Largebatch decentralized distributed image classifier training method and system 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN202111516644.2A CN114186671A (en)  20211207  20211207  Largebatch decentralized distributed image classifier training method and system 
Publications (1)
Publication Number  Publication Date 

CN114186671A true CN114186671A (en)  20220315 
Family
ID=80604642
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN202111516644.2A Pending CN114186671A (en)  20211207  20211207  Largebatch decentralized distributed image classifier training method and system 
Country Status (1)
Country  Link 

CN (1)  CN114186671A (en) 
Cited By (1)
Publication number  Priority date  Publication date  Assignee  Title 

CN115081024A (en) *  20220816  20220920  杭州金智塔科技有限公司  Decentralized business model training method and device based on privacy protection 

2021
 20211207 CN CN202111516644.2A patent/CN114186671A/en active Pending
Cited By (1)
Publication number  Priority date  Publication date  Assignee  Title 

CN115081024A (en) *  20220816  20220920  杭州金智塔科技有限公司  Decentralized business model training method and device based on privacy protection 
Similar Documents
Publication  Publication Date  Title 

CN108711141B (en)  Motion blurred image blind restoration method using improved generation type countermeasure network  
CN106951926B (en)  Deep learning method and device of hybrid architecture  
CN110263921B (en)  Method and device for training federated learning model  
JP2019528502A (en)  Method and apparatus for optimizing a model applicable to pattern recognition and terminal device  
CN109617888B (en)  Abnormal flow detection method and system based on neural network  
EP3540652A1 (en)  Method, device, chip and system for training neural network model  
US11429853B2 (en)  Systems and methods for determining an artificial intelligence model in a communication system  
WO2022042123A1 (en)  Image recognition model generation method and apparatus, computer device and storage medium  
CN110689136B (en)  Deep learning model obtaining method, device, equipment and storage medium  
CN114186671A (en)  Largebatch decentralized distributed image classifier training method and system  
CN111079921A (en)  Efficient neural network training and scheduling method based on heterogeneous distributed system  
CN112509600A (en)  Model training method and device, voice conversion method and device and storage medium  
CN109657794B (en)  Instruction queuebased distributed deep neural network performance modeling method  
US20200151584A1 (en)  Systems and methods for determining an artificial intelligence model in a communication system  
CN115186821B (en)  Core particleoriented neural network inference overhead estimation method and device and electronic equipment  
CN111353534B (en)  Graph data category prediction method based on adaptive fractional order gradient  
CN110321799B (en)  Scene number selection method based on SBR and average interclass distance  
CN112328715A (en)  Visual positioning method, training method of related model, related device and equipment  
CN110414569B (en)  Clustering implementation method and device  
CN112529165A (en)  Deep neural network pruning method, device, terminal and storage medium  
CN113253989B (en)  Software and hardware cooperative integration architecture method based on embedded system  
CN114237869B (en)  Ray doublelayer scheduling method and device based on reinforcement learning and electronic equipment  
CN114445692B (en)  Image recognition model construction method and device, computer equipment and storage medium  
US20220318412A1 (en)  Privacyaware pruning in machine learning  
US20220343162A1 (en)  Method for structure learning and model compression for deep neural network 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination 