CN114186671A - Large-batch decentralized distributed image classifier training method and system - Google Patents

Large-batch decentralized distributed image classifier training method and system Download PDF

Info

Publication number
CN114186671A
CN114186671A CN202111516644.2A CN202111516644A CN114186671A CN 114186671 A CN114186671 A CN 114186671A CN 202111516644 A CN202111516644 A CN 202111516644A CN 114186671 A CN114186671 A CN 114186671A
Authority
CN
China
Prior art keywords
node
neural network
network model
parameters
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111516644.2A
Other languages
Chinese (zh)
Inventor
李武军
史长伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202111516644.2A priority Critical patent/CN114186671A/en
Publication of CN114186671A publication Critical patent/CN114186671A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06N3/045
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding or deleting nodes or connections, pruning
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/08Learning methods
    • G06N3/084Back-propagation

Abstract

The invention discloses a batch decentralized distributed image classifier training method and a batch decentralized distributed image classifier training system. Each node communicates with the neighbor nodes to obtain the latest image classifier parameters of the nodes, and the latest image classifier parameters and the local image classifier parameters of the nodes are weighted and averaged to be used as new local parameters to participate in the next round of updating. And continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter. The method cancels the central node, and the problem of congestion at the central node can not occur, meanwhile, the method is suitable for training of a large-batch image classifier, parameter updating and communication times can be reduced through large-batch training, and therefore calculation resources such as GPU can be fully utilized, and training efficiency is greatly improved.

Description

Large-batch decentralized distributed image classifier training method and system
Technical Field
The invention relates to a batch decentralized distributed image classifier training method and system, and belongs to the technical field of image classification and machine learning.
Background
Training of many image classifiers can be formalized as solving the following finite sum-form optimization problem:
where x is a parameter of the model, d represents a dimension of the model parameter, n represents the total number of training samples, ξiDenotes the ith sample, f (x; xi)i) It is the loss function corresponding to the ith sample.
In recent years, deep learning is vigorously developed, large data sets and large models are continuously developed, so that the computing power of a single machine can not meet the requirement any more, and a distributed machine learning technology for completing a training task by the cooperative work of a plurality of machines becomes an important option for solving the problem. In addition, in the scenarios of federal learning, edge computing, and the like, the training data can only be stored in each terminal device for the requirements of privacy protection and the like, and in this case, the distributed machine learning technology also needs to be used.
The parameter server architecture is the most commonly used architecture in distributed machine learning. The architecture comprises a server node (or cluster) and a plurality of working nodes (or clusters). The task of the server node is to maintain globally shared parameters, while the working nodes compute local data, such as gradients, using locally stored training samples. The working nodes cannot directly communicate with each other, and can only complete the tasks of searching and updating the parameters through communication with the server node. Since all the working nodes only communicate with the server node, the server node is generally called a central node, and the parameter server architecture is a typical centralized architecture. Such architectures often place high demands on the underlying communication hardware. In case of high latency or low bandwidth, etc., the communication congestion occurring at the central node may slow down the entire training process.
The decentralized scenario refers to the role of removing a central server in a distributed architecture, and each working node performs peer-to-peer communication with a neighbor node in an "equal" manner. The most classical of the decentralization methods is the GossiD SGD method: the connection weight between each node can be modeled into a weight matrix, in each iteration process, each node randomly samples small batch of data according to local parameters and local data to perform gradient calculation, updates the local parameters by using the calculated gradient, then communicates with neighboring nodes, obtains the parameters on the neighboring nodes, performs weighted average on each parameter according to the weight matrix, and enters next iteration as new local parameters. And finally stopping iterative updating after the stopping condition is met, and outputting the parameter average value on each node as the final model parameter.
The current image classifier training usually adopts a smaller batch size, and the increase of the batch size can fully utilize the computing power of a GPU multi-core system and accelerate the training speed. In a distributed environment, increasing the batch size may reduce the number of parameter updates and the number of communications. However, the blind increase of the batch size can lead to the reduction of the generalization performance of the trained final image classifier, so that a special image classifier training method suitable for the batch training needs to be designed.
Disclosure of Invention
The purpose of the invention is as follows: in the current training task of image classification by using an image classifier, a random gradient descent method in a decentralized scene usually uses a small batch size, that is, only a small number of image samples are sampled each time, which can cause difficulty in fully utilizing the calculation power of the GPU. In a distributed environment, a small-batch stochastic gradient descent method requires a large number of parameter updates, which brings frequent communication times and large communication overhead. And the blind increase of the batch size of the sampled images can cause the prediction accuracy of the finally trained image classifier on the test image set to be reduced. Aiming at the problems and the defects in the prior art, the invention provides a batch decentralized distributed image classifier training method and a system. The method is simple and easy to implement, has no extra overhead, can be applied to reducing the communication times and parameter updating times in distributed training in a decentralized scene, and improves the efficiency of distributed training of the image classifier.
The technical scheme is as follows: a large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to locally stored image samples, performs normalization processing on the gradients, and updates momentum and local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
The training process comprises the following specific steps:
step 100, inputting a neural network model and randomly initializing a global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samples
Figure BDA0003396248200000022
K represents the number of nodes; w ═ Wi,j)K×KThe weight is used for modeling whether the nodes are communicated or not and performing weighted average during each communication; each element of which is a value between 0 and 1. w is ai,j0 denotes no communication between node i and node j, wi,jAnd > 0 represents the weight of the weighted average of the communication parameters.
Step 101, initializing neural network model parameters
Figure BDA0003396248200000023
Step 102, initializing a counter t to 0;
step 103, initializing momentum
Figure BDA0003396248200000024
Step 104, partial image collection saved from node
Figure BDA0003396248200000025
Randomly selecting a small batch of image data
Figure BDA0003396248200000026
Figure BDA0003396248200000031
Step 105, inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating the gradient
Figure BDA0003396248200000032
WhereinRepresenting the ith image xiiThe gradient of the corresponding loss function under the neural network model parameter of the current node;
106, normalizing the gradient in the previous step and updating the momentum
Figure BDA0003396248200000034
Step 107, updating the neural network model parameters of the current node
Figure BDA0003396248200000035
Step 108, the current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes
Figure BDA0003396248200000036
Representing a set consisting of neighbor nodes of the current node k and the node k itself;
step 109, do the following according to the weight matrix WWeighted average is carried out to obtain new neural network model parameters of the current node k
Figure BDA0003396248200000038
Step 110, updating a counter t to t + 1;
step 111, judging whether the number T of the current finished iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, returning to the step 104 to continue training;
step 112, averaging the neural network model parameters on each node to obtain an average neural network model parameter
Figure BDA0003396248200000039
And obtaining the trained image classifier as an output parameter. A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each node
Figure BDA00033962482000000311
The batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete image sample set table
Figure BDA00033962482000000312
K represents the number of nodes; initializing neural network model parameters
Figure BDA00033962482000000313
Initializing a counter t to be 0; initiating momentum
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a batch decentralized distributed image classifier training method as described above when executing the computer program.
A computer readable storage medium storing a computer program for performing a mass decentralized distributed image classifier training method as described above.
Has the advantages that: compared with the prior art, the large-batch decentralized distributed image classifier training method provided by the invention is simple and easy to implement, has no extra overhead, cancels the central node, and does not cause the problem of communication congestion at the central node.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
A batch decentralized distributed image classifier training method is suitable for scenes with a large number of samples of an image data set to be processed and large models. Taking a neural network model as an example of distributed training of an image classifier, a specific workflow of the method of the embodiment is as follows:
the work flow of the batch decentralized distributed image classifier training method on the k-th work node is shown in fig. 1. Firstly, inputting a neural network model and randomly initializing a global neural network model parameter x0Saving partial image samples at each node
Figure BDA0003396248200000041
(complete set of image samples)) Randomly acquiring the batch size b, the learning rate eta, the weight matrix W, the total iteration round number T and the momentum coefficient beta of the image samples each time (step 100), and initializing the parameters of the neural network model
Figure BDA0003396248200000043
(step 101), the initialization counter t is set to 0 (step 102), and the momentum is initialized
Figure BDA0003396248200000044
(step 103). Partial image set saved from local (node) next
Figure BDA0003396248200000045
Randomly selecting a small batch of image data(step 104), inputting the randomly selected small batch of image data into a neural network model, executing back propagation, and calculating a random gradient
Figure BDA0003396248200000047
Wherein
Figure BDA0003396248200000048
Representing the ith image xiiThe gradient of the corresponding loss function under the current local neural network model parameters (step 105), and then the random gradient in the previous step is normalized and the momentum is updated
Figure BDA0003396248200000049
(step 106), updating local neural network model parameters(step 107), communicating with the neighbor nodes to obtain the neural network model parameters thereof Representing the set formed by the neighbor nodes of the node k and the node k (step 108), and carrying out weighted average according to the weight matrix W to obtain new local neural network model parameters(step 109), updating the counter T to T +1 (step 110), judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if yes, ending the training process, otherwise, returning to the step 104 to continue training (step 111), and taking the average neural network model parameter on each node as the output parameter
Figure BDA0003396248200000051
(step 112).
A bulk decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each nodeThe batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samples
Figure BDA0003396248200000053
K represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
Figure BDA0003396248200000055
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
It will be apparent to those skilled in the art that the modules of the batch decentralized distributed image classifier training system or the steps of the batch decentralized distributed image classifier training method according to the embodiments of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, or alternatively, they may be implemented by program code executable by a computing device, so that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be executed in a different order than here, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The method of the present invention was experimented with on multiple image classification datasets. In the experiment, the classification accuracy of the finally trained image classifier on the test image set under different batch size settings during random sampling is compared. The experimental result shows that the method provided by the invention can ensure that the classification accuracy of the finally trained image classifier has no obvious loss under the condition that the batch size is increased by several times, so that the calculation resources such as GPU (graphics processing unit) and the like can be more fully utilized, the parameter updating times are reduced, and the machine learning training efficiency is improved. Meanwhile, under the decentralized scene setting, the selection of the communication topology among the nodes can be more flexible, and the problem of communication congestion of the central node is avoided, so that the training process is accelerated.

Claims (7)

1. A large batch decentralized distributed image classifier training method is characterized in that each working node uses local image classifier parameters, calculates random gradients according to image samples stored by the nodes, performs normalization processing on the gradients, and updates momentum and the local parameters by using the normalized gradients; each node communicates with a neighbor node to obtain the latest image classifier parameters of the node, and the latest image classifier parameters and the local image classifier parameters of the node are weighted and averaged to be used as new local parameters to participate in the next round of updating; and continuously repeating the training steps until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as the final output parameter.
2. The batch decentralized distributed image classifier training method according to claim 1, wherein a neural network model is used for training the image classifier; inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each node
Figure FDA0003396248190000011
The batch size of randomly acquiring image samples each time is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samples
Figure FDA0003396248190000012
K represents the number of nodes; initializing neural network model parametersInitializing a counter t to be 0; initiating momentum
Figure FDA0003396248190000014
3. The batch decentralized distributed image classifier training method according to claim 2, wherein the partial image sets saved from the nodes
Figure FDA0003396248190000015
Randomly selecting a small batch of image data
Figure FDA0003396248190000016
Inputting randomly selected small batch of image data into a neural network model, performing back propagation, and calculating gradientWherein
Figure FDA0003396248190000018
Representing the ith image xiiThe gradient of the corresponding loss function under the neural network model parameter of the current node;
normalizing the gradient and updating the momentum
4. The batch decentralized distributed image classifier training method according to claim 2, wherein neural network model parameters of current nodes are updated
Figure FDA00033962481900000110
The current node communicates with the neighbor nodes to obtain the neural network model parameters of the neighbor nodes
Figure FDA00033962481900000111
Representing a set consisting of neighbor nodes of the current node k and the node k itself; carrying out weighted average according to the weight matrix W to obtain new neural network model parameters of the current node k
Figure FDA00033962481900000113
Updating the counter t to t + 1; judging whether the number T of the current completed iteration rounds reaches the total number T of the iteration rounds, if so, ending the training process, otherwise, continuing to train the neural network model; after training is finished, the neural network model parameters on each node are averaged to obtain average neural network model parameters
Figure FDA00033962481900000114
As output parameters.
5. A batch decentralized distributed image classifier training system, comprising:
an initialization module: inputting neural network model and randomly initializing global neural network model parameter x0Saving a set of partial image samples on each node
Figure FDA00033962481900000115
Randomly acquiring image samples each timeThe batch size is b, the learning rate is eta, the weight matrix is W, the total iteration number is T, and the momentum coefficient is beta; complete set of image samplesK represents the number of nodes; initializing neural network model parameters
Figure FDA0003396248190000021
Initializing a counter t to be 0; initiating momentum
Figure FDA0003396248190000022
A gradient calculation module: after random gradient is calculated according to the image samples stored in the nodes, the gradient is normalized;
a training process module: updating the neural network model parameters of the momentum and the nodes by using the normalized gradient; each node communicates with the neighbor nodes to obtain the latest neural network model parameters of the node, and the latest neural network model parameters of the node and the neural network model parameters of the node are weighted and averaged to be used as the neural network model parameters of a new node to participate in the next round of updating; and continuously repeating the training until the stopping condition is reached, stopping each node, and taking the parameter average value on each node as a final output parameter to obtain the parameters of the image classifier.
6. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the batch decentralized distributed image classifier training method according to any one of claims 1-4.
7. A computer-readable storage medium storing a computer program for executing a batch decentralized distributed image classifier training method according to any one of claims 1 to 4.
CN202111516644.2A 2021-12-07 2021-12-07 Large-batch decentralized distributed image classifier training method and system Pending CN114186671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111516644.2A CN114186671A (en) 2021-12-07 2021-12-07 Large-batch decentralized distributed image classifier training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111516644.2A CN114186671A (en) 2021-12-07 2021-12-07 Large-batch decentralized distributed image classifier training method and system

Publications (1)

Publication Number Publication Date
CN114186671A true CN114186671A (en) 2022-03-15

Family

ID=80604642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111516644.2A Pending CN114186671A (en) 2021-12-07 2021-12-07 Large-batch decentralized distributed image classifier training method and system

Country Status (1)

Country Link
CN (1) CN114186671A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081024A (en) * 2022-08-16 2022-09-20 杭州金智塔科技有限公司 Decentralized business model training method and device based on privacy protection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081024A (en) * 2022-08-16 2022-09-20 杭州金智塔科技有限公司 Decentralized business model training method and device based on privacy protection

Similar Documents

Publication Publication Date Title
CN108711141B (en) Motion blurred image blind restoration method using improved generation type countermeasure network
CN106951926B (en) Deep learning method and device of hybrid architecture
CN110263921B (en) Method and device for training federated learning model
JP2019528502A (en) Method and apparatus for optimizing a model applicable to pattern recognition and terminal device
CN109617888B (en) Abnormal flow detection method and system based on neural network
EP3540652A1 (en) Method, device, chip and system for training neural network model
US11429853B2 (en) Systems and methods for determining an artificial intelligence model in a communication system
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN110689136B (en) Deep learning model obtaining method, device, equipment and storage medium
CN114186671A (en) Large-batch decentralized distributed image classifier training method and system
CN111079921A (en) Efficient neural network training and scheduling method based on heterogeneous distributed system
CN112509600A (en) Model training method and device, voice conversion method and device and storage medium
CN109657794B (en) Instruction queue-based distributed deep neural network performance modeling method
US20200151584A1 (en) Systems and methods for determining an artificial intelligence model in a communication system
CN115186821B (en) Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN111353534B (en) Graph data category prediction method based on adaptive fractional order gradient
CN110321799B (en) Scene number selection method based on SBR and average inter-class distance
CN112328715A (en) Visual positioning method, training method of related model, related device and equipment
CN110414569B (en) Clustering implementation method and device
CN112529165A (en) Deep neural network pruning method, device, terminal and storage medium
CN113253989B (en) Software and hardware cooperative integration architecture method based on embedded system
CN114237869B (en) Ray double-layer scheduling method and device based on reinforcement learning and electronic equipment
CN114445692B (en) Image recognition model construction method and device, computer equipment and storage medium
US20220318412A1 (en) Privacy-aware pruning in machine learning
US20220343162A1 (en) Method for structure learning and model compression for deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination