CN118133929A

CN118133929A - Method and device for accelerating neural network training based on node freezing

Info

Publication number: CN118133929A
Application number: CN202410545941.7A
Authority: CN
Inventors: 陈翯; 李石坚; 张犁; 潘纲
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-05-06
Filing date: 2024-05-06
Publication date: 2024-06-04
Anticipated expiration: 2044-05-06

Abstract

The invention discloses a method and a device for accelerating neural network training based on node freezing, wherein the method comprises the following steps: (1) acquiring a neural network model to be trained; (2) Initializing a neural network model, and performing the steps (2-1) - (2-4) to train the neural network model; (2-1) selecting a next node to be frozen from all candidate nodes of the neural network model; (2-2) generating a corresponding evaluation subgraph according to the node to be frozen, wherein the evaluation subgraph is used for evaluating whether the node to be frozen can be frozen or not; (2-3) if the convergence degree of the node to be frozen does not reach the threshold value, updating the weight of the neural network model, returning to the step (2-2), and if the convergence degree of the node to be frozen does not reach the threshold value, executing the step (2-4); (2-4) freezing the node with the convergence reaching the set threshold value, and skipping the back propagation and weight updating processes of the node; and (3) performing image classification application after model training is finished. The invention can accelerate model training under the condition of reducing the loss of training precision as much as possible.

Description

Method and device for accelerating neural network training based on node freezing

Technical Field

The invention belongs to the technical field of neural network training, and particularly relates to a method and a device for accelerating neural network training based on node freezing.

Background

Deep neural network (Deep-Learning Neural Network, DNN) is a multi-layer unsupervised neural network that obtains better feature expression on input data samples through layer-by-layer feature learning.

Modern DNNs consist of several tens to several hundreds of layers, each layer receiving an input tensor of a feature and outputting a corresponding activation value. Feature extraction and expression is achieved by training a DNN through multiple iterations and minimizing the loss function. The training iteration of DNN includes three steps of forward propagation, backward propagation, and parameter synchronization. In each iteration, the forward propagation receives a small batch of data and calculates the loss value relative to the target tag layer by layer through the model. Back propagation the gradient of the parameter is calculated sequentially from back to front according to the derivative chain law. After the back propagation is finished, the model parameters are updated by using an optimization algorithm. In data-parallel distributed training, gradients computed independently from all working nodes need to be aggregated to update the shared model parameters.

In recent years, as the scale and data volume of deep neural network models are continuously increased, the training cost of DNN is also increased. How to reduce the training cost and accelerate the model training becomes a problem to be solved urgently.

Many studies have focused on reducing the cost of communications in a multi-machine multi-card environment, increasing the parallelism of model training by overlapping the time of computation, communications, or employing different scheduling approaches. In addition, structural optimization of the computational graph is also a common approach. Subgraph replacement is a typical way of computational graph optimization, speeding up model training by replacing computational subgraphs with less costly equivalent subgraphs. The above-described methods can effectively reduce training time, but the overall calculation remains substantially unchanged.

In order to practically reduce the total calculation amount in the large-scale DNN model training, a learner proposes to accelerate the DNN training by adding a node freezing method in the model training process. The essence of node freezing is to stop the parameter updating of a given node before the model training iteration is completed, skip the back propagation and parameter updating processes, and thus reduce the calculation amount. In the training process of the model, the calculated amount of counter propagation is obviously larger than that of forward propagation, so that the training efficiency can be obviously improved due to node freezing, and the model training is accelerated.

However, existing node freezing schemes still have drawbacks. Firstly, the granularity of freezing is mostly single layers, the freezing is carried out layer by layer from shallow to deep according to the layer sequence, and the flexibility of the modern multi-branch network structure is relatively poor; second, there is currently no unified theorem of when each node of the model converges during the training process. Further research is still needed on how to freeze the layer without sacrificing training accuracy as much as possible.

Disclosure of Invention

The invention provides a method and a device for accelerating neural network training based on node freezing, which can accelerate model training under the condition of reducing training precision loss as much as possible and improve image classification efficiency.

A method for accelerating neural network training based on node freezing, comprising the following steps:

(1) Acquiring a neural network model to be trained, wherein the neural network model is a multi-branch structure model for image classification;

(2) Initializing the degree of entry of each node in the neural network model as the number of precursor nodes, and training the neural network model by cycling the following steps (2-1) - (2-4);

(2-1) selecting a next node to be frozen from all candidate nodes of the neural network model; the candidate nodes are nodes which meet the condition that the precursors of the candidate nodes are frozen and the candidate nodes are not frozen;

(2-2) generating a corresponding evaluation subgraph according to the node to be frozen, wherein the evaluation subgraph is used for evaluating whether the node to be frozen can be frozen or not;

Wherein, the evaluation subgraph is divided into a reference node and a similarity measurement node: the reference node is a copy of the node to be frozen; the intermediate activation layer of the node to be frozen and the reference node contains semantic information, the semantic information is used as the input of the similarity measurement node together, the similarity measurement node calculates the semantic similarity of the node to be frozen and the reference node, and then the convergence of the node to be frozen is obtained;

(2-3) if the convergence degree of the node to be frozen does not reach the set threshold value, updating the weight of the neural network model, returning to the step (2-2), and if the convergence degree of the node to be frozen reaches the set threshold value, executing the step (2-4);

(2-4) freezing the node with convergence reaching the set threshold value, and skipping the back propagation and weight updating processes of the node to realize acceleration of model training;

(3) And (3) finishing training the neural network model, and applying image classification by using the trained model.

Further, the specific process of the step (2-1) is as follows:

And adding all candidate nodes of the neural network model into a priority queue by adopting a priority queue mode based on a self-defined comparator, and selecting a queue head element each time as a next node to be frozen.

The custom comparator of the priority queue needs to comprehensively evaluate the depth of the candidate node and the calculated amount of the candidate node, and the candidate node with smaller depth and smaller calculated amount has higher priority.

In step (2-2), the reference node has the same input as the node to be frozen, and the weight of the reference node is periodically aligned with the node to be frozen.

In the step (2-2), a corresponding evaluation sub-graph is generated by an evaluation sub-graph generating module, and the evaluation sub-graph generating module is an extension on the neural network model to be trained.

In the step (2-3), the selection of the set threshold is related to the type and parameter setting of the node to be frozen, and the set thresholds of different nodes to be frozen are different.

The specific process of the step (2-4) is as follows:

Marking the node with the convergence reaching the set threshold as the frozen node, skipping the back propagation and the weight updating of the node, subtracting one from the ingress degree of all the subsequent nodes, and if the ingress degree of the subsequent nodes is reduced to zero, adding the subsequent nodes into the priority queue in the step (2-3).

The invention also provides a device for training the acceleration neural network based on the node freezing, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the method for training the acceleration neural network based on the node freezing when executing the executable codes.

Compared with the prior art, the invention has the following beneficial effects:

1. Aiming at the modern multi-branch network structure, the invention adopts a freezing strategy with nodes as freezing units, and realizes the selection of the nodes to be frozen in the network based on the priority queue of the custom comparator. The structural characteristics of the network model are fully utilized, and a better solution is provided for the selection of the nodes in the original relative fuzzy freezing theory;

2. the invention expands and evaluates the sub-graph module on the basis of the training network, and evaluates the convergence degree of the node to be frozen according to the semantic information of the node to be frozen. The nodes reaching the threshold value are frozen, gradients are not calculated and parameters of the nodes are updated in the subsequent back propagation process, so that the complexity of the network training process is improved, the network training time is effectively reduced, and the training efficiency is improved.

Drawings

FIG. 1 is a flow chart of a method for accelerating neural network training based on node freezing.

Fig. 2 is a block diagram of the modules according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of node selection to be frozen in an embodiment of the invention.

Fig. 4 is a schematic diagram of evaluation sub-graph generation in an embodiment of the present invention.

Fig. 5 is a schematic diagram of similarity metric node generation in an embodiment of the present invention.

Detailed Description

The invention will be described in further detail with reference to the drawings and examples, it being noted that the examples described below are intended to facilitate the understanding of the invention and are not intended to limit the invention in any way.

As shown in fig. 1, a method for accelerating neural network training based on node freezing comprises the following steps:

Step 1, a neural network model to be trained is obtained, and in this embodiment, the neural network model is a multi-branch structure model for image classification.

Specifically, a relevant training data set (image data set in this embodiment) is first loaded, and all data is normalized. And then loading the neural network model to be trained. Finally, setting reasonable training super parameters, and finishing initialization operation before training. The initializing further comprises: initializing the degree of entry of each node in the neural network model as the number of precursor nodes.

And 2, training the neural network to be trained, and cycling the following steps (2.1) - (2.4) until the training is finished.

Specifically, in the embodiment of the present invention, the connection relationship of each module is shown in fig. 2. And selecting candidate nodes to be frozen as the input of the node selection module according to the structure of the training network and the current freezing condition. The node selection module selects the candidate node which is most easy to converge from the candidate nodes as the next node to be frozen. And the evaluation subgraph generation module generates a corresponding evaluation subgraph according to the node to be frozen, and the corresponding evaluation subgraph is used as an extension of the training network. The convergence evaluation module takes the evaluation subgraph and an intermediate activation layer of the node to be frozen in the original training network as inputs together, evaluates the convergence of the current node to be frozen, and sends a frozen signal to the layer freezing module according to the evaluation result. After the layer freezing module receives the freezing signal, marking the node with convergence reaching the expected value as the frozen node, skipping the back propagation and parameter updating of the node, and starting a new round of node selection to be frozen.

(2.1) A node selection module selecting a next node to be frozen from the candidate nodes. Candidate nodes are nodes that satisfy that their precursors are both frozen and that themselves are not frozen. The node selection module is realized by a priority queue based on a custom comparator, all candidate nodes are added into the priority queue, and a queue head element is selected each time as a next node to be frozen.

As shown in fig. 3, the black nodes of the left graph represent frozen nodes, the gray nodes represent candidate nodes, and the candidate nodes in the left graph are used as inputs of the node selection module. The node selection module is realized by a priority queue based on a custom comparator, and the custom comparator comprehensively evaluates the depth of the candidate nodes and the calculated amount of the nodes, wherein the nodes with smaller depth and smaller calculated amount have higher priority. And selecting the candidate node which is most easy to converge as the next node to be frozen each time according to the depth and the calculated amount of the node.

Specifically, all nodes without precursor dependency are added to a priority queue in a node selection module at the beginning. Before training starts, taking out the head element in the queue as a first node to be frozen, and transmitting the first element into an evaluation subgraph generation module to generate a corresponding evaluation subgraph. After the training iteration starts, the convergence evaluation and the iterative training of the network model are synchronously carried out. And if the current node to be frozen reaches the set freezing threshold, the node freezing module freezes the current node to be frozen and reenters the node selection module to perform new-round node selection to be frozen. And after updating the candidate nodes, taking out the head element in the priority queue each time as the next node to be frozen.

(2.2) An evaluation sub-graph generating module for generating an evaluation sub-graph for evaluating whether the node to be frozen can be frozen. The evaluation subgraph is divided into a reference node and a similarity measurement node: the reference node is a copy of the node to be frozen, and the convergence of the node to be frozen is estimated through semantic information of the node to be frozen; the input of the similarity measure node is the intermediate active layer of the node to be frozen and the reference node.

As shown in fig. 4, black nodes represent frozen nodes, and light gray nodes represent nodes to be frozen. In the evaluation sub-graph generation module, the evaluation sub-graph is an extension on the model to be trained, and the connection relation between the evaluation sub-graph and the model to be trained is as follows: the reference node is a copy of the node to be frozen, which has the same input as the node to be frozen; the reference node and an intermediate activation layer of the node to be frozen are used as the input of the similarity measurement node together; the output of the similarity measure node is the output of the evaluation subgraph.

Specifically, the reference node is used for measuring the node to be frozen before a plurality of rounds of iteration, and the weight of the reference node is regularly aligned with the node to be frozen. The intermediate activation layer of the reference node and the node to be frozen contains semantic information, the semantic information is used as the input of the similarity measurement node, and the similarity measurement node evaluates the semantic similarity of the reference node and the node to be frozen, so that the convergence of the node to be frozen is obtained.

The application scene of the neural network model to be trained can be other scenes besides image classification. As shown in fig. 5, a plurality of reference tasks and similarity measure mode selection mechanisms for different application scenarios are constructed in advance. When the network model is transmitted, the optimal measurement mode is adaptively selected according to the application scene to which the model is applicable, and corresponding similarity measurement nodes are generated.

Specifically, the input of the similarity measurement node is an intermediate activation layer of the node to be frozen and the reference node, and the similarity measurement node analyzes the semantic similarity of the node to be frozen and the reference node in a specified measurement mode. And finally obtaining the convergence of the node to be frozen by calculating the semantic similarity of the two, and transmitting the convergence as input to a convergence evaluation module for further evaluation.

And (2.3) a convergence evaluation module, obtaining the convergence of the node to be frozen according to the evaluation subgraph, repeating the steps of the current module until the convergence reaches a set threshold value, and entering the next module.

Specifically, obtaining the convergence degree of the node to be frozen according to the evaluation subgraph, entering the next module if the convergence degree reaches a set threshold value, and otherwise repeating the current module step. The selection of the set threshold is related to the node type and the super parameter setting, and different thresholds are set for different nodes to be frozen.

And (2.4) a node freezing module freezes the node with convergence reaching the expected value, skips the back propagation and parameter updating processes of the node, and realizes acceleration of model training.

Specifically, the node with convergence reaching the expected value is marked as a frozen node, the back propagation and parameter updating of the node are skipped, the ingress of all the subsequent nodes is reduced by one, and if the ingress of the subsequent nodes is reduced to zero, the subsequent nodes are added into a priority queue in the convergence evaluation module.

And step 3, after the neural network model training is finished, returning a required result. And applying the image classification by using the trained model.

Based on the same inventive principles, the embodiment of the invention also provides a device based on the node freezing acceleration neural network training, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for realizing the method based on the node freezing acceleration neural network training of the embodiment when executing the executable codes.

To verify the effect of the present invention, a classical multi-branch network structure GoogLeNet was used as an experimental network, and the operation results (including training time and accuracy) of the neural network with freezing and the original neural network were compared to obtain the comparison results shown in table 1.

TABLE 1

As can be seen from the graph, the training time length of each training time can be reduced by 15% by the method for accelerating the training of the neural network, and the training time length of the neural network is greatly shortened. Meanwhile, the accelerated neural network training basically has no loss of accuracy, and the average accuracy change on a test set is 0.06%. Therefore, the improved neural network greatly improves the operation speed of the neural network under the condition of basically not sacrificing the accuracy.

The foregoing embodiments have described in detail the technical solution and the advantages of the present invention, it should be understood that the foregoing embodiments are merely illustrative of the present invention and are not intended to limit the invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the invention.

Claims

1. The method for accelerating the neural network training based on the node freezing is characterized by comprising the following steps of:

2. The method for training a node freezing-based acceleration neural network according to claim 1, wherein the specific process of the step (2-1) is as follows:

3. The method for accelerating neural network training based on node freezing according to claim 2, wherein the custom comparator of the priority queue needs to comprehensively evaluate the depth of the candidate node and the calculation amount of the candidate node, and the candidate node with smaller depth and smaller calculation amount has higher priority.

4. The method of node-freeze accelerated neural network training of claim 1, wherein in step (2-2), the reference node has the same input as the node to be frozen, and the weight of the reference node is periodically aligned with the node to be frozen.

5. The method for accelerating neural network training based on node freezing according to claim 1, wherein in the step (2-2), the corresponding evaluation subgraph is generated by an evaluation subgraph generation module, and the evaluation subgraph generation module is an extension on the neural network model to be trained.

6. The method for accelerating neural network training based on node freezing according to claim 1, wherein in the step (2-3), the set threshold is selected in association with the type of the node to be frozen and the parameter setting, and the set threshold is different for different nodes to be frozen.

7. The method for training a neural network based on node freezing acceleration according to claim 1, wherein the specific process of the step (2-4) is as follows:

8. An apparatus for node-freezing-based accelerated neural network training, comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, to implement the method for node-freezing-based accelerated neural network training of any of claims 1-7.