CN114186671A - Large-batch decentralized distributed image classifier training method and system - Google Patents
Large-batch decentralized distributed image classifier training method and system Download PDFInfo
- Publication number
- CN114186671A CN114186671A CN202111516644.2A CN202111516644A CN114186671A CN 114186671 A CN114186671 A CN 114186671A CN 202111516644 A CN202111516644 A CN 202111516644A CN 114186671 A CN114186671 A CN 114186671A
- Authority
- CN
- China
- Prior art keywords
- node
- neural network
- network model
- parameters
- batch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000003062 neural network model Methods 0.000 claims description 56
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000000034 method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 230000000875 corresponding Effects 0.000 claims description 4
- 230000000977 initiatory Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 2
- 238000004891 communication Methods 0.000 abstract description 14
- 238000010801 machine learning Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 1
- 238000006011 modification reaction Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G06N3/045—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Computing arrangements based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Computing arrangements based on biological models using neural network models
- G06N3/08—Learning methods
- G06N3/084—Back-propagation
Abstract
The invention discloses a batch decentralized distributed image classifier training method and a batch decentralized distributed image classifier training system. Each node communicates with the neighbor nodes to obtain the latest image classifier parameters of the nodes, and the latest image classifier parameters and the local image classifier parameters of the nodes are weighted and averaged to be used as new local parameters to participate in the next round of updating. And continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter. The method cancels the central node, and the problem of congestion at the central node can not occur, meanwhile, the method is suitable for training of a large-batch image classifier, parameter updating and communication times can be reduced through large-batch training, and therefore calculation resources such as GPU can be fully utilized, and training efficiency is greatly improved.
Description
Technical Field
The invention relates to a batch decentralized distributed image classifier training method and system, and belongs to the technical field of image classification and machine learning.
Background
Training of many image classifiers can be formalized as solving the following finite sum-form optimization problem:
where x is a parameter of the model, d represents a dimension of the model parameter, n represents the total number of training samples, ξiDenotes the ith sample, f (x; xi)i) It is the loss function corresponding to the ith sample.
In recent years, deep learning is vigorously developed, large data sets and large models are continuously developed, so that the computing power of a single machine can not meet the requirement any more, and a distributed machine learning technology for completing a training task by the cooperative work of a plurality of machines becomes an important option for solving the problem. In addition, in the scenarios of federal learning, edge computing, and the like, the training data can only be stored in each terminal device for the requirements of privacy protection and the like, and in this case, the distributed machine learning technology also needs to be used.
The parameter server architecture is the most commonly used architecture in distributed machine learning. The architecture comprises a server node (or cluster) and a plurality of working nodes (or clusters). The task of the server node is to maintain globally shared parameters, while the working nodes compute local data, such as gradients, using locally stored training samples. The working nodes cannot directly communicate with each other, and can only complete the tasks of searching and updating the parameters through communication with the server node. Since all the working nodes only communicate with the server node, the server node is generally called a central node, and the parameter server architecture is a typical centralized architecture. Such architectures often place high demands on the underlying communication hardware. In case of high latency or low bandwidth, etc., the communication congestion occurring at the central node may slow down the entire training process.
The decentralized scenario refers to the role of removing a central server in a distributed architecture, and each working node performs peer-to-peer communication with a neighbor node in an "equal" manner. The most classical of the decentralization methods is the GossiD SGD method: the connection weight between each node can be modeled into a weight matrix, in each iteration process, each node randomly samples small batch of data according to local parameters and local data to perform gradient calculation, updates the local parameters by using the calculated gradient, then communicates with neighboring nodes, obtains the parameters on the neighboring nodes, performs weighted average on each parameter according to the weight matrix, and enters next iteration as new local parameters. And finally stopping iterative updating after the stopping condition is met, and outputting the parameter average value on each node as the final model parameter.
The current image classifier training usually adopts a smaller batch size, and the increase of the batch size can fully utilize the computing power of a GPU multi-core system and accelerate the training speed. In a distributed environment, increasing the batch size may reduce the number of parameter updates and the number of communications. However, the blind increase of the batch size can lead to the reduction of the generalization performance of the trained final image classifier, so that a special image classifier training method suitable for the batch training needs to be designed.
Disclosure of Invention
The purpose of the invention is as follows: in the current training task of image classification by using an image classifier, a random gradient descent method in a decentralized scene usually uses a small batch size, that is, only a small number of image samples are sampled each time, which can cause difficulty in fully utilizing the calculation power of the GPU. In a distributed environment, a small-batch stochastic gradient descent method requires a large number of parameter updates, which brings frequent communication times and large communication overhead. And the blind increase of the batch size of the sampled images can cause the prediction accuracy of the finally trained image classifier on the test image set to be reduced. Aiming at the problems and the defects in the prior art, the invention provides a batch decentralized distributed image classifier training method and a system. The method is simple and easy to implement, has no extra overhead, can be applied to reducing the communication times and parameter updating times in distributed training in a decentralized scene, and improves the efficiency of distributed training of the image classifier.
The technical scheme is as follows: a large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to locally stored image samples, performs normalization processing on the gradients, and updates momentum and local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
The training process comprises the following specific steps:
step 100, inputting a neural network model and randomly initializing a global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; w ═ Wi,j)K×KThe weight is used for modeling whether the nodes are communicated or not and performing weighted average during each communication; each element of which is a value between 0 and 1. w is ai,j0 denotes no communication between node i and node j, wi,jAnd > 0 represents the weight of the weighted average of the communication parameters.
Step 102, initializing a counter t to 0;
Step 105, inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating the gradientWhereinRepresenting the ith image xiiThe gradient of the corresponding loss function under the neural network model parameter of the current node;
Step 108, the current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes Representing a set consisting of neighbor nodes of the current node k and the node k itself;
step 109, do the following according to the weight matrix WWeighted average is carried out to obtain new neural network model parameters of the current node k
Step 110, updating a counter t to t + 1;
step 111, judging whether the number T of the current finished iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, returning to the step 104 to continue training;
step 112, averaging the neural network model parameters on each node to obtain an average neural network model parameter And obtaining the trained image classifier as an output parameter. A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete image sample set tableK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a batch decentralized distributed image classifier training method as described above when executing the computer program.
A computer readable storage medium storing a computer program for performing a mass decentralized distributed image classifier training method as described above.
Has the advantages that: compared with the prior art, the large-batch decentralized distributed image classifier training method provided by the invention is simple and easy to implement, has no extra overhead, cancels the central node, and does not cause the problem of communication congestion at the central node.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A batch decentralized distributed image classifier training method is suitable for scenes with a large number of samples of an image data set to be processed and large models. Taking a neural network model as an example of distributed training of an image classifier, a specific workflow of the method of the embodiment is as follows:
the work flow of the batch decentralized distributed image classifier training method on the k-th work node is shown in fig. 1. Firstly, inputting a neural network model and randomly initializing a global neural network model parameter x0Saving partial image samples at each node(complete set of image samples)) Randomly acquiring the batch size b, the learning rate eta, the weight matrix W, the total iteration round number T and the momentum coefficient beta of the image samples each time (step 100), and initializing the parameters of the neural network model(step 101), the initialization counter t is set to 0 (step 102), and the momentum is initialized(step 103). Partial image set saved from local (node) nextRandomly selecting a small batch of image data(step 104), inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating a random gradientWhereinRepresenting the ith image xiiThe gradient of the corresponding loss function under the current local neural network model parameters (step 105), and then the random gradient in the previous step is normalized and the momentum is updated(step 106), updating local neural network model parameters(step 107), communicating with the neighbor nodes to obtain the neural network model parameters thereof Representing the set formed by the neighbor nodes of the node k and the node k (step 108), and carrying out weighted average according to the weight matrix W to obtain new local neural network model parameters(step 109), updating the counter T to T +1 (step 110), judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if yes, ending the training process, otherwise, returning to the step 104 to continue training (step 111), and taking the average neural network model parameter on each node as the output parameter(step 112).
A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
It will be apparent to those skilled in the art that the modules of the batch decentralized distributed image classifier training system or the steps of the batch decentralized distributed image classifier training method according to the embodiments of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, or alternatively, they may be implemented by program code executable by a computing device, so that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be executed in a different order than here, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The method of the present invention was experimented with on multiple image classification datasets. In the experiment, the classification accuracy of the finally trained image classifier on the test image set under different batch size settings during random sampling is compared. The experimental result shows that the method provided by the invention can ensure that the classification accuracy of the finally trained image classifier has no obvious loss under the condition that the batch size is increased by several times, so that the calculation resources such as GPU (graphics processing unit) and the like can be more fully utilized, the parameter updating times are reduced, and the machine learning training efficiency is improved. Meanwhile, under the decentralized scene setting, the selection of the communication topology among the nodes can be more flexible, and the problem of communication congestion of the central node is avoided, so that the training process is accelerated.
Claims (7)
1. A large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to image samples stored by the nodes, performs normalization processing on the gradients, and updates momentum and the local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
2. The batch decentralized distributed image classifier training method according to claim 1, wherein a neural network model is used for training the image classifier; inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
3. The batch decentralized distributed image classifier training method according to claim 2, wherein the partial image sets saved from the nodesRandomly selecting a small batch of image data
Inputting randomly selected small batch of image data into a neural network model, performing back propagation, and calculating gradientWhereinRepresenting the ith image xiiThe gradient of the corresponding loss function under the neural network model parameter of the current node;
normalizing the gradient and updating the momentum
4. The batch decentralized distributed image classifier training method according to claim 2, wherein neural network model parameters of current nodes are updatedThe current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes Representing a set consisting of neighbor nodes of the current node k and the node k itself; carrying out weighted average according to the weight matrix W to obtain new neural network model parameters of the current node kUpdating the counter t to t + 1; judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, continuing to train the neural network model; after training is finished, the neural network model parameters on each node are averaged to obtain average neural network model parametersAs output parameters.
5. A batch decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each nodeRandomly acquiring image samples each timeThe batch size is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
6. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the batch decentralized distributed image classifier training method according to any one of claims 1-4.
7. A computer-readable storage medium storing a computer program for executing a batch decentralized distributed image classifier training method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111516644.2A CN114186671A (en) | 2021-12-07 | 2021-12-07 | Large-batch decentralized distributed image classifier training method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111516644.2A CN114186671A (en) | 2021-12-07 | 2021-12-07 | Large-batch decentralized distributed image classifier training method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114186671A true CN114186671A (en) | 2022-03-15 |
Family
ID=80604642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111516644.2A Pending CN114186671A (en) | 2021-12-07 | 2021-12-07 | Large-batch decentralized distributed image classifier training method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114186671A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115081024A (en) * | 2022-08-16 | 2022-09-20 | 杭州金智塔科技有限公司 | Decentralized business model training method and device based on privacy protection |
-
2021
- 2021-12-07 CN CN202111516644.2A patent/CN114186671A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115081024A (en) * | 2022-08-16 | 2022-09-20 | 杭州金智塔科技有限公司 | Decentralized business model training method and device based on privacy protection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108711141B (en) | Motion blurred image blind restoration method using improved generation type countermeasure network | |
CN106951926B (en) | Deep learning method and device of hybrid architecture | |
CN110263921B (en) | Method and device for training federated learning model | |
JP2019528502A (en) | Method and apparatus for optimizing a model applicable to pattern recognition and terminal device | |
CN109617888B (en) | Abnormal flow detection method and system based on neural network | |
EP3540652A1 (en) | Method, device, chip and system for training neural network model | |
US11429853B2 (en) | Systems and methods for determining an artificial intelligence model in a communication system | |
WO2022042123A1 (en) | Image recognition model generation method and apparatus, computer device and storage medium | |
CN110689136B (en) | Deep learning model obtaining method, device, equipment and storage medium | |
CN114186671A (en) | Large-batch decentralized distributed image classifier training method and system | |
CN111079921A (en) | Efficient neural network training and scheduling method based on heterogeneous distributed system | |
CN112509600A (en) | Model training method and device, voice conversion method and device and storage medium | |
CN109657794B (en) | Instruction queue-based distributed deep neural network performance modeling method | |
US20200151584A1 (en) | Systems and methods for determining an artificial intelligence model in a communication system | |
CN115186821B (en) | Core particle-oriented neural network inference overhead estimation method and device and electronic equipment | |
CN111353534B (en) | Graph data category prediction method based on adaptive fractional order gradient | |
CN110321799B (en) | Scene number selection method based on SBR and average inter-class distance | |
CN112328715A (en) | Visual positioning method, training method of related model, related device and equipment | |
CN110414569B (en) | Clustering implementation method and device | |
CN112529165A (en) | Deep neural network pruning method, device, terminal and storage medium | |
CN113253989B (en) | Software and hardware cooperative integration architecture method based on embedded system | |
CN114237869B (en) | Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment | |
CN114445692B (en) | Image recognition model construction method and device, computer equipment and storage medium | |
US20220318412A1 (en) | Privacy-aware pruning in machine learning | |
US20220343162A1 (en) | Method for structure learning and model compression for deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |