CN116992943B

CN116992943B - Link pruning method, device, equipment and medium of deep neural network model

Info

Publication number: CN116992943B
Application number: CN202311255746.2A
Authority: CN
Inventors: 朱克峰; 李兵兵; 戴钰桀; 黄伟; 宿栋栋; 王彦伟
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2024-02-09
Anticipated expiration: 2043-09-27
Also published as: CN116992943A

Abstract

The invention relates to the technical field of deep learning, and discloses a link pruning method, device, equipment and medium of a deep neural network model. When a target adjacent vertex with a link relation appears, filling the link relation corresponding to the target adjacent vertex into the first link diagram; and (3) until all adjacent vertexes in the first link diagram are traversed, taking the final first link diagram as a second link diagram. Based on the pruning requirement, determining all layers to be pruned and pruning dimensions associated with the pruning requirement from the second link diagram, pruning the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned, and thus rapidly completing the link pruning of the model to be pruned. By constructing the second link graph reflecting the link relation among the vertexes, the structured pruning strategy can be automatically realized quickly, and the pruning efficiency of the model is greatly improved.

Description

Link pruning method, device, equipment and medium of deep neural network model

Technical Field

The invention relates to the technical field of deep learning, in particular to a link pruning method, device, equipment and medium of a deep neural network model.

Background

As deep learning models become more complex and huge, so too does the demand for computational resources and memory space, researchers have found that there are a large number of redundant parameters and connections in many deep neural networks, and these redundancies lead to excessive complexity of the model. Due to limitations of hardware resources and energy consumption, these large models often face performance bottlenecks and delay problems in practical deployment, and particularly show up in scenes with limited computing power, such as edge devices.

To solve this problem, model pruning techniques were introduced. The structural pruning technology of the traditional model layer is mostly carried out by adopting a zero setting simulation mode. However, this approach does not allow for the acquisition of a true pruning performance improvement by directly running the pruned model, which can only be estimated. With the continued development of models, it is increasingly difficult to achieve accurate levels of such estimation. And for the hardware deployment of real pruning, the zeroed parts still need to be adjusted to obtain the real acceleration performance.

It is desirable to be able to implement pruning by actually pruning the corresponding model structure, rather than simple zeroing. But this way of operation brings new problems: when a model is subtracted by a portion in a dimension, the dimension changes, and the change affects the dimension linkage of some adjacent layers, and therefore, the dependent layers need to be correspondingly pruned. For different models, we need to manually explore the link relation between the layers of the model forward from the starting point of pruning according to the characteristics of the models and sequentially perform pruning operation. The whole implementation process needs manual pruning, and pruning efficiency is low.

It can be seen how to improve the pruning efficiency of the model, which is a problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the invention aims to provide a link pruning method, device and equipment of a deep neural network model and a computer readable storage medium, which can solve the problem of low pruning efficiency of the model.

In order to solve the above technical problems, an embodiment of the present invention provides a link pruning method of a deep neural network model, including:

constructing a first link diagram according to a model diagram structure corresponding to the model to be pruned; in the first link diagram, the input of each layer of the to-be-pruned model and the output of each layer are taken as vertexes, the link relation between the vertexes is taken as an edge, and the link relation is empty in an initialized state;

sequentially judging whether each adjacent vertex in the first link graph has a link relation or not; the link relation comprises a link relation among layers in the model diagram structure and a link relation among input and output of each layer;

under the condition that a target adjacent vertex with a link relation appears, filling the link relation corresponding to the target adjacent vertex into the first link diagram; taking the final first link diagram as a second link diagram until all adjacent vertexes in the first link diagram are traversed;

Based on pruning requirements, determining all to-be-pruned layers and pruning dimensions associated with the pruning requirements from the second link graph, and pruning the to-be-pruned layers according to the pruning dimensions corresponding to the to-be-pruned layers so as to complete link pruning of the to-be-pruned model.

In one aspect, determining, based on the pruning requirement, all to-be-pruned layers and pruning dimensions associated with the pruning requirement from the second link graph, pruning the to-be-pruned layers according to the pruning dimensions corresponding to each to-be-pruned layer, so as to complete link pruning of the to-be-pruned model includes:

determining mutually independent link subgraphs according to the link relation among the vertexes in the second link graph;

selecting a target link subgraph matched with the pruning requirement from each link subgraph; and pruning corresponding links are carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to pruning dimensions corresponding to the vertexes in the target link subgraph.

In one aspect, selecting a target link subgraph matching the pruning requirement from each of the link subgraphs includes:

determining the link score of each link sub-graph based on the weight between adjacent vertexes in each link sub-graph;

And determining a target link subgraph to be pruned according to the link score of each link subgraph and a score threshold carried by pruning requirements.

In one aspect, the determining the link score of each link sub-graph based on the weights between the adjacent vertices in each link sub-graph includes:

determining the weight corresponding to each adjacent vertex in each link subgraph according to the weight between each adjacent vertex in the model graph structure;

and taking an average value of L2 norms of weights corresponding to all adjacent vertexes in each link subgraph to obtain the link score of each link subgraph.

and averaging the weights corresponding to all adjacent vertexes in each link subgraph to obtain the link score of each link subgraph.

In one aspect, determining the target link subgraph to be pruned according to the link score of each link subgraph and the score threshold carried by the pruning requirement includes:

Selecting a maximum value and a minimum value of the link scores in all the link subgraphs;

normalizing the link score of each link sub-graph based on the link score maximum value and the link score minimum value to obtain the normalized score of each link sub-graph;

and taking the link subgraph with the normalized score smaller than the score threshold as a target link subgraph to be pruned.

In one aspect, normalizing the link score of each link sub-graph based on the link score maximum value and the link score minimum value to obtain a normalized score of each link sub-graph includes:

calculating a difference between the maximum value of the link score and the minimum value of the link score;

and taking the ratio of the link score of each link sub-graph to the difference value as the normalized score of each link sub-graph.

In one aspect, the pruning of the corresponding link for the to-be-pruned layer corresponding to the target link sub-graph in the to-be-pruned model according to the pruning dimension corresponding to each vertex in the target link sub-graph includes:

and deleting layers corresponding to all vertexes in the target link subgraph from the model to be pruned.

In one aspect, selecting a target link subgraph matched with the pruning requirement from each link subgraph, and pruning corresponding to a to-be-pruned layer corresponding to the target link subgraph in the to-be-pruned model according to pruning dimensions corresponding to each vertex in the target link subgraph includes:

determining a target link subgraph matched with the vertex identification according to the vertex identification carried by the pruning requirement;

and pruning corresponding links are carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to the target link subgraph and the dimension carried by the pruning requirement.

In one aspect, the pruning of the corresponding link for the to-be-pruned layer corresponding to the target link sub-graph in the to-be-pruned model according to the target link sub-graph and the dimension carried by the pruning requirement includes:

determining a layer of pruning required and pruning dimension according to the input and output dimensions of each vertex in the target link subgraph and the dimension carried by the pruning requirement; wherein each vertex corresponds to a layer of the model to be pruned;

pruning is carried out on the layers of the required pruning in the model to be pruned according to the pruning dimension corresponding to each layer of the required pruning.

In one aspect, pruning the layer of the required pruning in the to-be-pruned model according to the pruning dimension corresponding to each layer of the required pruning includes:

and deleting the layers corresponding to all vertexes in the target link subgraph from the to-be-pruned model under the condition that the input and output dimensions of the layers of the needed prune are equal to the corresponding prune dimensions.

and pruning the layer of the required pruning in the to-be-pruned model according to the pruning dimension under the condition that the input/output dimension of the layer of the required pruning is larger than the corresponding pruning dimension.

In one aspect, the constructing the first link map according to the model map structure corresponding to the to-be-pruned model includes:

taking each layer in the to-be-pruned model as a vertex, and taking the link relation between the vertices as an edge to construct a model graph structure; wherein, each side in the model diagram structure has a corresponding weight;

splitting each vertex in the model diagram structure into an input vertex and an output vertex to obtain a first link diagram.

In one aspect, the sequentially determining whether each adjacent vertex in the first link graph has a link relationship includes:

traversing whether each adjacent vertex in the first link graph has a link relation according to a selected search traversal algorithm; wherein the search traversal algorithm comprises a depth-first search algorithm or a breadth-first search algorithm.

In one aspect, traversing each adjacent vertex in the first link graph sequentially according to the selected search traversal algorithm includes:

judging whether a first vertex is matched with a set network layer set or not, and judging whether the first vertex has a link relation with the next vertex adjacent to the first vertex or not; the network layer set comprises network layers with the same input dimension and output dimension; the first vertex is any vertex in the first link graph;

when the first vertex is matched with a set network layer set and/or the first vertex has a link relation with the next vertex adjacent to the first vertex, the first vertex and the next vertex adjacent to the first vertex are used as target adjacent vertices with the link relation;

and under the condition that the first vertex is not matched with the set network layer set and the next vertex adjacent to the first vertex does not have a link relation or the link relation corresponding to the target adjacent vertex is filled into the first link graph, selecting any one vertex from the rest vertexes in the first link graph as the first vertex, and returning to the step of judging whether the first vertex is matched with the set network layer set or not and judging whether the first vertex has the link relation with the next vertex adjacent to the first vertex or not until the judgment of all adjacent vertexes in the first link graph is completed.

In one aspect, the filling the link relation corresponding to the target adjacent vertex in the first link graph when the target adjacent vertex with the link relation appears includes:

and filling the link relation between the first vertex and the next vertex adjacent to the first vertex into the first link graph under the condition that the first vertex is matched with the set network layer set.

and filling the link relation between the first vertex and the next vertex adjacent to the first vertex into the first link graph under the condition that the first vertex has the link relation with the next vertex adjacent to the first vertex.

In one aspect, for the determination of the link relationship, the method includes:

counting network layers with the same input dimension and output dimension in the deep learning model to obtain a network layer set;

and setting a link relation based on the link relation among the vertexes in the model diagram structure and the network layer set.

In one aspect, after the network layers with the same input dimension and output dimension in the statistical deep learning model are obtained to obtain a network layer set, the method further includes:

Determining a target network layer with the same input dimension and output dimension according to the type of the model to be pruned;

and supplementing the target network layer to the network layer set.

The embodiment of the invention also provides a link pruning device of the deep neural network model, which comprises a construction unit, a judging unit, a filling unit, a serving unit and a pruning unit;

the construction unit is used for constructing a first link diagram according to a model diagram structure corresponding to the to-be-pruned model; in the first link diagram, the input of each layer of the to-be-pruned model and the output of each layer are taken as vertexes, the link relation between the vertexes is taken as an edge, and the link relation is empty in an initialized state;

the judging unit is used for sequentially judging whether each adjacent vertex in the first link graph has a link relation or not; the link relation comprises a link relation among layers in the model diagram structure and a link relation among input and output of each layer;

the filling unit is used for filling the link relation corresponding to the target adjacent vertex into the first link graph under the condition that the target adjacent vertex with the link relation appears;

The unit is used for taking the final first link diagram as a second link diagram until all adjacent vertexes in the first link diagram are traversed;

the pruning unit is configured to determine, based on a pruning requirement, all to-be-pruned layers and pruning dimensions associated with the pruning requirement from the second link graph, and prune the to-be-pruned layers according to the pruning dimensions corresponding to the to-be-pruned layers, so as to complete link pruning of the to-be-pruned model.

In one aspect, the pruning unit is configured to determine link subgraphs that are independent of each other according to a link relationship between vertices in the second link graph; selecting a target link subgraph matched with the pruning requirement from each link subgraph, and pruning corresponding to a to-be-pruned layer corresponding to the target link subgraph in the to-be-pruned model according to pruning dimensions corresponding to each vertex in the target link subgraph.

In one aspect, the pruning unit comprises a score determination subunit and a sub-graph determination subunit;

the score determining subunit is configured to determine a link score of each link sub-graph based on weights between adjacent vertices in each link sub-graph;

The sub-graph determining sub-unit is used for determining a target link sub-graph to be pruned according to the link score of each link sub-graph and the score threshold carried by the pruning requirement.

In one aspect, the score determining subunit is configured to determine, according to weights between adjacent vertices in the model graph structure, weights corresponding to the adjacent vertices in each link subgraph;

In one aspect, the sub-graph determining subunit is configured to select a maximum value and a minimum value of a link score in all the link sub-graphs;

In one aspect, the sub-graph determining subunit is configured to calculate a difference between the link score maximum value and the link score minimum value;

In one aspect, the pruning unit is configured to delete layers corresponding to all vertices in the target link subgraph from the to-be-pruned model.

In one aspect, the pruning unit is configured to determine, according to a vertex identifier carried by the pruning requirement, a target link subgraph matched with the vertex identifier; and pruning corresponding links are carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to the target link subgraph and the dimension carried by the pruning requirement.

In one aspect, the pruning unit comprises a dimension determination subunit and a pruning subunit;

the dimension determining subunit is configured to determine a layer of pruning required and a pruning dimension according to an input/output dimension of each vertex in the target link subgraph and a dimension carried by the pruning requirement; wherein each vertex corresponds to a layer of the model to be pruned;

The pruning subunit is configured to prune the layer of the required pruning in the to-be-pruned model according to the pruning dimension corresponding to each layer of the required pruning.

In one aspect, the pruning subunit is configured to delete, from the to-be-pruned model, a layer corresponding to all vertices in the target link subgraph, where an input/output dimension of the layer of the required pruning is equal to the corresponding pruning dimension.

In one aspect, the pruning subunit is configured to prune the layer of the required pruning in the to-be-pruned model according to the pruning dimension when the input/output dimension of the layer of the required pruning is greater than the corresponding pruning dimension.

In one aspect, the building unit is configured to use each layer in the to-be-pruned model as a vertex, and a link relationship between the vertices is used as an edge, so as to build a model graph structure; wherein, each side in the model diagram structure has a corresponding weight; splitting each vertex in the model diagram structure into an input vertex and an output vertex to obtain a first link diagram.

On the one hand, the judging unit is used for traversing whether each adjacent vertex in the first link graph has a link relation according to the selected search traversing algorithm; wherein the search traversal algorithm comprises a depth-first search algorithm or a breadth-first search algorithm.

In one aspect, the judging unit includes a vertex judging subunit, a selecting subunit and a vertex judging subunit;

the vertex judging subunit is used for judging whether a first vertex is matched with a set network layer set and whether the first vertex has a link relation with the next vertex adjacent to the first vertex; the network layer set comprises network layers with the same input dimension and output dimension; the first vertex is any vertex in the first link graph;

the subunit, configured to, when the first vertex matches the set network layer set and/or the first vertex has a link relationship with a next vertex adjacent to the first vertex, take the first vertex and the next vertex adjacent to the first vertex as a target adjacent vertex having a link relationship;

and the selecting subunit is configured to, when the first vertex is not matched with the set network layer set and a next vertex adjacent to the first vertex does not have a link relationship or the link relationship corresponding to the target adjacent vertex is completely filled in the first link graph, select any one vertex from the remaining vertices in the first link graph as the first vertex, and return to the step of determining whether the first vertex is matched with the set network layer set and whether the first vertex has a link relationship with the next vertex adjacent to the first vertex until the determination of all adjacent vertices in the first link graph is completed.

In one aspect, the filling unit is configured to, when the first vertex matches the set network layer set, fill the link relationship between the first vertex and the next vertex adjacent to the first vertex in the first link map.

In one aspect, the filling unit is configured to fill, in the first link graph, the link relationship between the first vertex and the next vertex adjacent to the first vertex if the first vertex has the link relationship with the next vertex adjacent to the first vertex.

In one aspect, for determining the link relationship, the apparatus includes a statistics unit and a setting unit;

the statistics unit is used for counting network layers with the same input dimension and output dimension in the deep learning model to obtain a network layer set;

the setting unit is configured to set a link relationship based on the link relationship between the vertices in the model graph structure and the network layer set.

In one aspect, the device further comprises a supplementing unit;

the supplementing unit is used for determining a target network layer with the same input dimension and output dimension according to the type of the model to be pruned; and supplementing the target network layer to the network layer set.

The embodiment of the invention also provides electronic equipment, which comprises:

A memory for storing a computer program;

a processor for executing the computer program to implement the steps of the link pruning method of the deep neural network model as described above.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the steps of the link pruning method of the deep neural network model when being executed by a processor.

According to the technical scheme, a first link diagram is constructed according to a model diagram structure corresponding to the to-be-pruned model; in the first link diagram, input of each layer of the to-be-pruned model and output of each layer are taken as vertexes, a link relation between the vertexes is taken as an edge, and the link relation is null in an initialized state. In order to identify the association between different layers and between the same layers, a link relationship may be set, which includes a link relationship between layers in the model diagram structure and a link relationship between input and output of each layer. And sequentially judging whether each adjacent vertex in the first link graph has a link relation. Under the condition that the target adjacent vertexes with the link relation appear, filling the link relation corresponding to the target adjacent vertexes into the first link diagram; and until all adjacent vertexes in the first link graph are traversed, the filling of the link relation corresponding to all the adjacent vertexes is finished, and the final first link graph can be used as a second link graph. Based on the second link diagram and the pruning requirement, all the layers and dimensions to be pruned which are related to the pruning requirement can be determined, and pruning is carried out on the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned, so that the link pruning of the model to be pruned is rapidly completed. The method has the advantages that the structured pruning strategy can be automatically realized rapidly by constructing the second link graph reflecting the link relation among the vertexes, and the dimension of the related layer is not required to be adjusted step by step manually in a traditional mode, so that the pruning efficiency of the model is greatly improved.

Drawings

For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

Fig. 1 is a flowchart of a link pruning method of a deep neural network model according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a link correlation according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for performing link pruning on a model to be pruned according to an embodiment of the present invention;

fig. 4 is a flowchart of a method for pruning a corresponding link of a to-be-pruned model according to a target link subgraph according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of constructing a link relation model for a RESNET18 model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a link pruning device of a deep neural network model according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

The terms "comprising" and "having" in the description of the invention and the claims and in the above-mentioned figures, as well as any variations thereof that relate to "comprising" and "having", are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description.

Model pruning is a technique for optimizing deep learning models aimed at reducing the computational resource requirements and memory space occupation of the model while maintaining its performance. Model pruning may be based on a variety of criteria, such as importance of parameters, gradient information, weight size, and the like. Model pruning retains the main feature representation and knowledge by identifying and removing redundant parameters and connections in the model, thereby significantly reducing the computational and storage requirements of the model while maintaining similar performance.

With the continued development of model pruning techniques, more and more research effort is devoted to exploring more efficient and accurate pruning methods. Meanwhile, pruning technology is often combined with other model optimization technologies, such as quantization, distillation and the like, so that a lightweight and efficient deep learning model is constructed together, and the application of deep learning in various devices and scenes is promoted. In general, the development of model pruning technology has important technical background and application value for promoting the practicability and popularization of the deep learning model.

Model pruning is mainly divided into two categories: unstructured pruning and structured pruning. Unstructured pruning achieves model compaction by directly removing individual weights or parameters in the neural network, but results in irregular model structure and sparsity that is inconvenient to store. And the structural pruning reduces parameters by pruning the whole neuron, channel or layer, maintains a regular model structure and can be better suitable for hardware deployment. The structured pruning technique has higher storage and computation efficiency than unstructured pruning because it generates a more compact model suitable for use in a variety of resource-limited devices and environments, such as mobile devices and edge computing.

However, unstructured pruning cannot obtain a real pruning performance improvement by directly running the pruned model, and the actual performance improvement can only be estimated. And as models continue to evolve, it becomes increasingly difficult for such estimates to reach an accurate level. For real pruning hardware deployment, additional effort is still required, i.e., these zeroed out portions still need to be adjusted to achieve real acceleration performance.

It is desirable to be able to implement pruning by actually pruning the corresponding model structure, rather than simple zeroing. The model pruning realized in this way can obtain a real pruned model, and from the software layer, the model performance evaluation after pruning can be obtained directly through model operation, and the optimization work can be greatly reduced for hardware deployment, so that the hardware coding deployment efficiency is greatly improved.

However, the actual pruning operation brings new problems: when a model is partially subtracted in a dimension, the dimension changes, which requires the dimension of a series of layers with dependencies, which may be referred to as "relationship links", to be sequentially adjusted, we need to prune sequentially along the relationship links. This faces new problems: for different models, the relation links are manually explored forward from the starting point of pruning according to the characteristics of the models, and pruning operation is sequentially carried out, so that the universality is poor. And with the continuous development of the model, the larger model is paid more attention to, and the manual link relation searching and pruning method is low in efficiency and poor in practicability.

Accordingly, the embodiment of the invention provides a link pruning method, a device, equipment and a computer readable storage medium of a deep neural network model, which construct a first link graph according to a model graph structure corresponding to a model to be pruned; in the first link diagram, the input of each layer and the output of each layer of the to-be-pruned model are taken as vertexes, the link relation between the vertexes is taken as an edge, and the link relation is null in an initialized state. Setting a link relation, wherein the link relation comprises a link relation between layers in a model graph structure and a link relation between input and output of each layer. Filling corresponding link relations for each vertex in the first link graph in sequence; and obtaining a second link diagram until all adjacent vertexes in the first link diagram are traversed. Based on the second link diagram and the pruning requirement, all the layers and dimensions to be pruned which are related to the pruning requirement can be determined, and pruning is carried out on the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned, so that the link pruning of the model to be pruned is completed rapidly, the dimensions of the relevant layers are not required to be manually pruned step by step in the whole implementation process, and the pruning efficiency of the model is greatly improved.

Next, a link pruning method of the deep neural network model provided by the embodiment of the invention is described in detail. Fig. 1 is a flowchart of a link pruning method of a deep neural network model according to an embodiment of the present invention, where the method includes:

s101: and constructing a first link diagram according to a model diagram structure corresponding to the model to be pruned.

In the embodiment of the invention, each layer in the to-be-pruned model can be used as a vertex, and the link relation between the vertices is used as an edge to construct a model diagram structure; wherein each edge in the model graph structure has its corresponding weight. Splitting each vertex in the model graph structure into an input vertex and an output vertex to obtain a first link graph. Namely, the first link graph takes the input of each layer and the output of each layer of the to-be-pruned model as vertexes, takes the link relation between the vertexes as edges, and takes the link relation as blank in the initialized state.

In practical application, the whole model to be pruned can be regarded as a graph, and the structure of the model graph is defined asG=(L，E) Wherein, the method comprises the steps of, wherein,Lrepresenting layers in the model to be pruned, wherein one layer is a vertex;Erepresenting the link relationship between layers.

Each layer can be expressed as L _i The weights on each side may be signedWThe representation is made of a combination of a first and a second color,W _ij representation ofL _i To the point ofL _j Weights of (1) are. If it isL _i To the point ofL _j Without a linking relationship, the weight may be set to null.

The first link map may be defined as；

Wherein,representing vertices in the first link graph, different from the model to be prunedGIn (a) and (b)L，/>Comprising two separate elements for input and output of each layer, thus +.>Is twice the number of total layers.

Because ofThe elements in (a) are toLIs split into two separate elements, input and output, specifically denoted +.>And->。

The first link map contains all layers of the model to be pruned, each layer corresponding to two separate elements of input and output. The first link map does not include link relationships between each layer and in each layer, and filling of the first link map can be achieved according to the operations of S102 and S103, so that a required second link map is obtained.

S102: and sequentially judging whether each adjacent vertex in the first link graph has a link relation.

The link relation may depend on the link dependency setting.

Fig. 2 is a schematic diagram of constructing a second link diagram based on link correlation of a model to be pruned according to an embodiment of the present invention, in which four layers are taken as an example in fig. 2, respectively denoted by L ₁ 、L ₂ 、L ₃ And L ₄ Wherein L is ₁ 、L ₂ And L ₄ For layers multiplied by matrix, L ₃ To activate a layer (ReLU), a first link graph needs to be constructed first, i.e., the input and output of each layer are each taken as vertices, so that a total of 8 vertices are constructed for 4 layers. Next, it is necessary to determine whether each adjacent vertex in the first link graph has a link relationship.

From the model to be pruned in FIG. 2, L can be seen ₁ 、L ₂ 、L ₃ And L ₄ The output dimension of the previous layer of the four layers has a dependency relationship with the input dimension of the next layer. And L is ₃ There is a dependency on the input and output dimensions of (a)Is tied up.

For example, L ₂ If the second line of (2) is cut out, i.e. L is modified ₂ Is not affected by L ₂ And L is equal to ₃ Is linked with L only ₁ Is related to the output dimension of (1), at which time L must be taken into account ₁ The second column of (2) is cut out correspondingly to enable L ₁ And L is equal to ₂ Dimension matching and thus normal linking, which accounts for L ₁ Output of (2) and L ₂ The inputs of (a) belong to the same associated link; also if we cut L ₃ In the second line of (1), i.e. modify L ₃ Then accordingly we need to modify L ₂ I.e. L needs to be trimmed out ₂ Is described as L ₂ Output of (2) and L ₃ The inputs of (a) belong to the same associated link.

Based on the inter-layer related links, Description of the inventionL _i Output of (2)L _j Based on which the link relationship of layers belonging to the same relevant link can be defined as the following formula (1):

（1）；

wherein,represent the firstiOutput of the vertices>Represent the firstjInput of the vertices>Representing the first link diagramiOutput of each vertex and the firstjThe input of the vertices has a linked relationship.

In addition to the link relationships between different layers, the model to be pruned also has layers with the same input and output dimensions, such as L in FIG. 2 ₃ When the input dimension of the layer of the type changes, the output dimension of the layer needs to be correspondingly modified.

Common layers with the same input and output dimensions include an active layer (ReLU), batch normalization (batch norm), layer normalization (LayerNorm), anti-overfitting (Dropout), etc., and the properties of these computing layers are that the dimensions of the input and output are the same, so that these computing layers may be referred to as an Inplace layer, which may be simply referred to as an I layer, and the operations of these layers may be directly computed and updated locally without intermediate variables because of the same input and output dimensions. Thus, such a computation layer has its inputs and outputs naturally dependent, i.e. belonging to the same relevant link.

In the embodiment of the invention, a layer with a natural dependency relationship between input and output can be defined as I, and based on the I, a link relationship can be defined as the following formula (2):

（2）；

Wherein,represent the firstiInput of the vertices>Represent the firstiOutput of the vertices>Representing the first link diagramiThe input and output of the vertices have a linked relationship, < >>Represent the firstiThe vertices belong to a layer where the input and output have natural dependencies.

Therefore, in the embodiment of the present invention, a link relationship may be set, where the link relationship includes a link relationship between layers in the model graph structure and a link relationship between input and output of each layer. The linking relation may be presented in the form of the above-described formula (1) and formula (2).

The judgment mode of the link relation of each vertex in the first link graph is similar, and in practical application, whether each adjacent vertex in the first link graph has the link relation can be judged in sequence.

In the embodiment of the invention, the network layers with the same input dimension and output dimension in the deep learning model can be counted to obtain a network layer set; and setting a link relation based on the link relation among the vertexes in the model diagram structure and the network layer set.

The network layer set may record information such as names or other identifications of network layers with the same input dimension and output dimension.

In practical application, except for the network layers with the same input dimension and output dimension in the conventional deep learning model, the network layers of different models can be changed, and the changed network layers can be characterized by different names or identifiers. In order to more comprehensively and accurately analyze the relevance of each layer in the model to be pruned, after the network layers with the same input dimension and output dimension in the statistical deep learning model are obtained to obtain a network layer set, a target network layer with the same input dimension and output dimension can be further determined according to the type of the model to be pruned; the target network layer is supplemented to the network layer set.

S103: and under the condition that the target adjacent vertexes with the link relation appear, filling the link relation corresponding to the target adjacent vertexes into the first link graph until all the adjacent vertexes in the first link graph are traversed, and taking the final first link graph as a second link graph.

In practical application, each adjacent vertex in the first link graph can be traversed in sequence, and whether each vertex and the next adjacent vertex meet the formula (1) or not is judged, and whether the vertex meets the formula (2) or not is judged. And finishing filling of the link relation corresponding to all the adjacent vertexes until all the adjacent vertexes are traversed, so that the required second link diagram is obtained.

After filling the link relation corresponding to the adjacent vertex in the first link graph,comprises the following contents: if any two layers->There is a link relationship between ∈>。

The first discrimination condition can be obtained according to the link relation between different layers: when the output of the previous layer in the model to be pruned has a direct correlation with the pruned dimension of the input of the next layer, and thus such a link relation is added, in connection with fig. 2, it is necessary to add a related link (edge) in the model diagram structure at the lower side of fig. 2, such asAnd->Edge between- >And->The edge between,And->And a border therebetween.

The second discrimination condition can be obtained according to the layer with the natural dependency relationship of the input and the output: when a layer has the property of "input dimension is the same as output dimension" in the model to be pruned (e.g. an active layer, etc.), the input and output of the layer are related, and such I-related links (edges) are added, in combination with fig. 2, it is necessary to add related links (edges) in the model diagram structure at the lower side of fig. 2, such asAnd->And a border therebetween.

Traversing each adjacent vertex of the first link graph according to the above discrimination conditions, and finally completing construction of the second link graph, so as to obtain a plurality of related links, such as related link 1, related link 2 and the like in fig. 2, wherein one related link is an independent link subgraph.

S104: based on the pruning requirement, determining all layers and dimensions to be pruned which are related to the pruning requirement from a second link diagram, pruning the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned, and completing the link pruning of the model to be pruned.

Each layer may have multiple dimensions, and pruning requirements may include pruning the entire vertex, or pruning input and output dimensions of one or more vertices, specifically to which columns or rows of vertices are pruned.

After the second link diagram is obtained, other vertexes with a link relation of each vertex can be quickly determined, so that the link pruning of the model to be pruned can be quickly completed by combining pruning requirements. Taking the example of carrying vertex identifiers and dimensions in pruning requirements, it is assumed that the L shown in FIG. 2 needs to be pruned ₂ The second dimension of the layer output, then accordingly, L needs to be ₂ And outputting the relation link where the vertex is located, namely, the relevant dimensions of all layers in the relevant link 2, and automatically pruning in sequence.

The implementation manner of completing link pruning in combination with the score threshold may be referred to in the description of fig. 3, and will not be described herein. Note that, except for carrying a score threshold in pruning demand; the score threshold value can be preset, and when the pruning requirement indicates pruning according to the weight, the links and the dimensions of the needed pruning can be screened out by combining with the score threshold value.

The implementation manner of completing link pruning by combining the vertex identifier and the dimension carried in the pruning requirement can be referred to in the description of fig. 4, and will not be described herein.

According to the technical scheme, a first link diagram is constructed according to a model diagram structure corresponding to the to-be-pruned model; in the first link diagram, the input of each layer of the to-be-pruned model and the output of each layer are taken as vertexes, the link relation between the vertexes is taken as an edge, and the link relation is empty in an initialized state; in order to identify the association between different layers and between the same layers, a link relationship may be set, which includes a link relationship between layers in the model diagram structure and a link relationship between input and output of each layer. And sequentially judging whether each adjacent vertex in the first link graph has a link relation. Under the condition that the target adjacent vertexes with the link relation appear, filling the link relation corresponding to the target adjacent vertexes into the first link diagram; and until all adjacent vertexes in the first link graph are traversed, the filling of the link relation corresponding to all the adjacent vertexes is finished, and the final first link graph can be used as a second link graph. Based on the second link diagram and the pruning requirement, all the layers and dimensions to be pruned which are related to the pruning requirement can be determined, and pruning is carried out on the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned, so that the link pruning of the model to be pruned is rapidly completed. The method has the advantages that the structured pruning strategy can be automatically realized rapidly by constructing the second link graph reflecting the link relation among the vertexes, and the dimension of the related layer is not required to be adjusted step by step manually in a traditional mode, so that the pruning efficiency of the model is greatly improved.

Fig. 3 is a flowchart of a method for performing link pruning on a model to be pruned according to an embodiment of the present invention, where the method includes:

s301: and determining mutually independent link subgraphs according to the link relation among the vertexes in the second link graph.

In the embodiment of the invention, in order to quickly locate all adjacent vertexes associated with each vertex, mutually independent link subgraphs can be determined according to the link relation among the vertexes in the second link graph.

For example, assuming a total of 10 vertices, vertex 1 through vertex 10, vertex 1 and vertex 2 have a chained relationship, vertex 3 through vertex 6 have a chained relationship, and vertex 7 through vertex 10 have a chained relationship, then 3 chained sub-graphs can be determined, the first chained sub-graph including vertex 1 and vertex 2, the second chained sub-graph including vertex 3 and vertex 6, and the third chained sub-graph including vertex 7 and vertex 10.

When link pruning is required, a target link subgraph matched with the score threshold value can be selected from the link subgraphs, and pruning of a corresponding link is performed on a layer to be pruned corresponding to the target link subgraph in a model to be pruned according to pruning dimensions corresponding to vertexes in the target link subgraph, and the operations of S302 to S304 can be seen specifically.

S302: and determining the link score of each link sub-graph based on the weight between each adjacent vertex in each link sub-graph.

In the embodiment of the invention, the pruning requirement can be a layer of the required pruning designated by the user, or can evaluate which layers can prune based on the link weight.

Taking link weight evaluation as an example, the weight corresponding to each adjacent vertex in each link subgraph can be determined according to the weight between each adjacent vertex in the model graph structure; and taking an average value of the L2 norms of the weights corresponding to all the adjacent vertexes in each link subgraph to obtain the link score of each link subgraph. Besides the average value of the L2 norms, the average value of weights corresponding to all adjacent vertexes in each link sub-graph can be directly obtained, so that the link score of each link sub-graph is obtained.

Taking the L2 norm of the weights as the averageiThe link scores corresponding to the link subgraphs can be calculated according to the following formula (3):

（3）；

wherein,S _i represent the firstiThe link scores corresponding to the individual link subgraphs,w _ij represent the firstiThe first link sub-graphiThe first vertex has a link relation with the first vertexjThe weight between the vertices of the graph,represent the firstiThe sub-graph of the individual links,Mrepresent the first iTotal number of ownership weights in each link sub-graph.

S303: and determining a target link subgraph to be pruned according to the link score of each link subgraph and the score threshold carried by the pruning requirement.

In the embodiment of the invention, the link score of each link sub-graph can be directly compared with the score threshold value, and the link sub-graph with the link score smaller than the score threshold value is selected as the target link sub-graph to be pruned.

In order to realize unified comparison of the link scores of all the link subgraphs, the link scores of all the link subgraphs can be normalized.

In practical application, the maximum value and the minimum value of the link scores in all the link subgraphs can be selected; and carrying out normalization processing on the link scores of the link subgraphs based on the maximum value and the minimum value of the link scores to obtain the normalization scores of the link subgraphs.

Specific ways of normalization may include calculating the difference between the maximum value of the link score and the minimum value of the link score; and taking the ratio of the link score of each link sub-graph to the difference value as the normalized score of each link sub-graph.

In the embodiment of the invention, the link score may be normalized according to the following formula to obtain a normalized score:

；

Wherein,Prunea layer representing the desired pruning is provided,S _max representing the maximum value of the link score,S _min representing the minimum value of the link score,αrepresenting a score threshold value,w _ij represent the firstiThe first link sub-graphiThe first vertex has a link relation with the first vertexjThe weight between the vertices of the graph,represent the firstiIndividual link sub-setsA drawing.

After the normalized score of each link sub-graph is obtained, the link sub-graph with the normalized score smaller than the score threshold value can be used as the target link sub-graph to be pruned.

S304: and pruning corresponding links is carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to pruning dimensionalities corresponding to the vertexes in the target link subgraph.

After the target link subgraph is evaluated based on the link weights, layers corresponding to all vertices in the target link subgraph can be deleted from the model to be pruned, and the dimension of pruning can be set, which is not limited in detail.

In the embodiment of the invention, the weight evaluation strategies of different link subgraphs are constructed, so that not only can the second link graph be utilized to adjust the dimension of the existing pruning scheme, but also the relation links with normalized scores smaller than the score threshold value can be directly pruned after the link weight values are ordered, thereby achieving more efficient and universal pruning effect.

Fig. 4 is a flowchart of a method for pruning a corresponding link of a to-be-pruned model according to a target link subgraph, where the method includes:

s401: and determining mutually independent link subgraphs according to the link relation among the vertexes in the second link graph.

The implementation of S401 may be referred to the description of S301, and will not be described herein.

S402: and determining a target link subgraph matched with the vertex identification according to the vertex identification carried by the pruning requirement.

In the embodiment of the present invention, the pruning requirement may be a layer of the required pruning specified by the user, and the pruning requirement may carry the vertex identifier and the dimension of the required pruning.

One layer of the model to be pruned serves as one vertex, and in order to facilitate distinguishing between different vertices, a unique vertex identifier of the model to be pruned can be set for each vertex. The vertex identification may be provided by numbers or letters or a combination of both, and is not limited herein.

In practical application, according to the vertex identifier carried by the pruning requirement, a link sub-graph matched with the vertex identifier can be determined, and in order to facilitate distinguishing with other link sub-graphs, the link sub-graph matched with the vertex identifier can be called a target link sub-graph.

S403: and pruning corresponding links are carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to the target link subgraph and the dimension carried by pruning requirements.

In the embodiment of the invention, the layer and pruning dimension of the needed pruning can be determined according to the input and output dimensions of each vertex in the target link subgraph and the dimension carried by the pruning requirement; wherein each vertex corresponds to a layer of the model to be pruned. Pruning is carried out on the layers to be pruned in the pruned model according to the corresponding pruning dimension of each layer to be pruned.

In the case that the input/output dimension of the layer of the required pruning is equal to the corresponding pruning dimension, it is described that the layer of the required pruning needs to be entirely subtracted, and the layer having a link relationship with the layer of the required pruning is entirely subtracted, so that in the case that the input/output dimension of the layer of the required pruning is equal to the corresponding pruning dimension, the layers corresponding to all vertices in the target link subgraph can be deleted from the model to be pruned.

When the input/output dimension of the layer of the required pruning is larger than the corresponding pruning dimension, the step of carrying out partial pruning on the layer of the required pruning is described, and at this time, pruning can be carried out on elements of relevant dimensions of the layer of the required pruning in the pruning model according to the pruning dimension.

In combination with the schematic diagram shown in FIG. 2, FIG. 2、/>、/>And->Belonging to the same link subgraph. And the vertex mark carried in the pruning requirement is the vertex mark of the second layer, and the dimension is the second column, so that the need of pruning the second column element of the second layer, the second row element of the third layer, the second column element of the third layer and the second row element of the fourth layer can be determined according to the dependency relationship between the input dimension and the output dimension among the layers.

In the embodiment of the invention, the link subgraphs which are independent of each other are determined, so that the link subgraphs where the layer of the needed pruning is positioned can be rapidly determined according to the vertex mark carried by the pruning requirement. And combining the dimension carried by the pruning requirement, the pruning dimension of each vertex in the link subgraph can be determined, so that the pruning operation of the model to be pruned is completed rapidly.

In the embodiment of the invention, whether each adjacent vertex in the first link graph has a link relation or not can be sequentially traversed according to the selected search traversal algorithm.

The search traversal algorithm may include a depth-first search algorithm or a breadth-first search algorithm, among others. The traversal of all adjacent vertices in the first link graph can be automatically completed by selecting any one of the search traversal algorithms.

The judging manner of whether each adjacent vertex has a link relationship is similar, taking any vertex in the first link graph, namely the first vertex as an example, whether the first vertex is matched with the set network layer set or whether the first vertex has a link relationship with the next adjacent vertex can be judged.

The network layer set comprises network layers with the same input dimension and output dimension.

When the first vertex matches the set network layer set and/or the first vertex has a link relationship with a next vertex adjacent to the first vertex, it is described that the first vertex and the next vertex adjacent to the first vertex have a corresponding link relationship, and at this time, the first vertex and the next vertex adjacent to the first vertex can be used as a target adjacent vertex having a link relationship.

In the case where the first vertex matches the set network layer set, the first link graph may be populated with the linking relationship of the first vertex to its next adjacent vertex.

In the case where a first vertex has a linking relationship with its next vertex adjacent thereto, the linking relationship of the first vertex with its next vertex adjacent thereto may be filled into the first link graph.

And under the condition that the first vertex is not matched with the set network layer set and the next vertex adjacent to the first vertex does not have a link relation, or the link relation corresponding to the target adjacent vertex is filled into the first link graph, any vertex can be selected from the rest vertexes in the first link graph to serve as the first vertex, and the step of judging whether the first vertex is matched with the set network layer set and whether the first vertex has the link relation with the next vertex adjacent to the first vertex is returned until the judgment of all adjacent vertexes in the first link graph is completed.

In the embodiment of the invention, whether each vertex is matched with the set network layer set or not and whether each vertex has a link relation with the next vertex adjacent to the vertex are sequentially judged, so that the link relation of each vertex can be effectively identified, the filling of the first link diagram is completed, and a second link diagram capable of reflecting the link relation of each vertex is finally obtained, and the pruning of the model can be rapidly and automatically completed by virtue of the second link diagram.

Fig. 5 is a schematic diagram of constructing a link relation model for a reset 18 model according to an embodiment of the present invention, where the reset 18 model refers to a network with a basic architecture of res net (deep convolutional neural network model, residual Neural Network), and the depth of the network with weights is 18 layers.

The middle portion of FIG. 5 corresponds to the RESNET18 model, since the RESNET18 model includes a plurality of layers, not shown in FIG. 5, and is primarily implemented for conv1 layer (convolutional layer), bn layer (batch normalization layer), relu layer (active layer), pooling layer, 1.0. Conv1 layer, 1.0. Bn1 layer, 1.0. Relu layer, 1.0. Conv2 layer, 1.0. Bn2 layer, 1.1. Conv1 layer, 1.1. Bn1 layer, 1.1. Relu layer, 1.1. Conv2 layer, 1.1. Bn2 layer, 2.0. Conv1 layer, 2.0. Bn1 layer, 2.0. Relu layer, 2.0. Conv2 layer, 2.0. Bn2 layer, 2.0. Downsampling 0 (convolutional layer), 2.0. Downsampling 1 (bn) layer, 2.1.1. 2.0. Layer, 2.1. 2. Layer, 2.0. Bn1 layer, 2.1. 2. Layer, and full-scale representation of the RESNET18 model. Based on the link relation between adjacent vertexes in the RESNET18 model, a second link diagram can be constructed, wherein the second link diagram comprises 12 link subgraphs, namely a link 1 to a link 12 respectively.

The second link diagram provided by the embodiment of the invention aiming at the structural model can be used as a basic tool for structural pruning, the construction mode of the second link diagram can be used as a general pruning tool, and the method is applicable to different types of models and is very friendly to emerging large model compression optimization with complex structure.

Fig. 6 is a schematic structural diagram of a link pruning device of a deep neural network model according to an embodiment of the present invention, which includes a construction unit 61, a judgment unit 62, a filling unit 63, a serving unit 64 and a pruning unit 65;

a construction unit 61, configured to construct a first link map according to a model map structure corresponding to the to-be-pruned model; in the first link diagram, the input of each layer of the to-be-pruned model and the output of each layer are taken as vertexes, the link relation between the vertexes is taken as an edge, and the link relation is empty in an initialized state;

a judging unit 62, configured to sequentially judge whether each adjacent vertex in the first link graph has a link relationship; the link relation comprises the link relation among layers in the model diagram structure and the link relation among input and output of each layer;

a filling unit 63, configured to fill, in the first link graph, a link relationship corresponding to a target adjacent vertex when the target adjacent vertex having the link relationship appears;

As a unit 64, configured to take the final first link map as the second link map until all adjacent vertices in the first link map have been traversed;

the pruning unit 65 is configured to determine, based on the pruning requirement, all to-be-pruned layers and dimensions associated with the pruning requirement from the second link graph, and prune the to-be-pruned layers according to the pruning dimensions corresponding to each to-be-pruned layer, so as to complete link pruning of the to-be-pruned model.

In some embodiments, the pruning unit is configured to determine link subgraphs that are independent of each other according to a link relationship between vertices in the second link graph; selecting a target link subgraph matched with the vertex mark and the dimension carried by the pruning requirement or the carried score threshold value from each link subgraph, and pruning corresponding to a to-be-pruned layer corresponding to the target link subgraph in the to-be-pruned model according to the pruning dimension corresponding to each vertex in the target link subgraph.

In some embodiments, the pruning unit includes a score determination subunit and a subgraph determination subunit;

the score determining subunit is used for determining the link score of each link sub-graph based on the weight between each adjacent vertex in each link sub-graph;

And the sub-graph determining sub-unit is used for determining a target link sub-graph to be pruned according to the link scores of the link sub-graphs and the score threshold carried by the pruning requirement.

In some embodiments, the score determining subunit is configured to determine, according to weights between adjacent vertices in the model graph structure, weights corresponding to the adjacent vertices in each link subgraph;

and taking an average value of the L2 norms of the weights corresponding to all the adjacent vertexes in each link subgraph to obtain the link score of each link subgraph.

In some embodiments, the sub-graph determination subunit is configured to select a link score maximum value and a link score minimum value in all link sub-graphs;

based on the maximum value and the minimum value of the link scores, carrying out normalization processing on the link scores of all the link subgraphs to obtain normalized scores of all the link subgraphs;

And taking the link subgraphs with normalized scores smaller than the score threshold as target link subgraphs to be pruned.

In some embodiments, the sub-graph determination subunit is configured to calculate a difference between the link score maximum and the link score minimum;

In some embodiments, the pruning unit is configured to delete layers corresponding to all vertices in the target link subgraph from the to-be-pruned model.

In some embodiments, the pruning unit is configured to determine, according to a vertex identifier carried by a pruning requirement, a target link subgraph matched with the vertex identifier; and pruning corresponding links are carried out on the layer to be pruned corresponding to the target link subgraph in the model to be pruned according to the target link subgraph and the dimension carried by pruning requirements.

In some embodiments, the pruning unit includes a dimension determination subunit and a pruning subunit;

the dimension determining subunit is used for determining a layer of pruning required and pruning dimension according to the input and output dimension of each vertex in the target link subgraph and the dimension carried by the pruning requirement; wherein each vertex corresponds to one layer of the model to be pruned;

The pruning sub-unit is used for pruning the layers required to be pruned in the pruning model according to the pruning dimension corresponding to each layer required to be pruned.

In some embodiments, the pruning subunit is configured to delete, from the to-be-pruned model, a layer corresponding to all vertices in the target link subgraph, where an input-output dimension of the layer of the required pruning is equal to the corresponding pruning dimension.

In some embodiments, the pruning subunit is configured to prune the layer of the required pruning in the pruning model according to the pruning dimension if the input-output dimension of the layer of the required pruning is greater than the corresponding pruning dimension.

In some embodiments, the building unit is configured to use each layer in the to-be-pruned model as a vertex, and a link relationship between the vertices is used as an edge, so as to build a model graph structure; wherein, each side in the model diagram structure has its corresponding weight; splitting each vertex in the model graph structure into an input vertex and an output vertex to obtain a first link graph.

In some embodiments, the judging unit is configured to sequentially traverse, according to the selected search traversal algorithm, whether each adjacent vertex in the first link graph has a link relationship; the search traversal algorithm comprises a depth-first search algorithm or a breadth-first search algorithm.

In some embodiments, the judging unit includes a vertex judging subunit, a selecting subunit and a vertex judging subunit;

the vertex judging subunit is used for judging whether a first vertex is matched with the set network layer set and whether the first vertex and the next vertex adjacent to the first vertex have a link relation; the network layer set comprises network layers with the same input dimension and output dimension; the first vertex is any vertex in the first link graph;

the sub-unit is used for taking the first vertex and the adjacent next vertex thereof as a target adjacent vertex with a link relation under the condition that the first vertex is matched with the set network layer set and/or the first vertex and the adjacent next vertex thereof have the link relation;

and a selecting subunit, configured to, when the first vertex is not matched with the set network layer set and the next vertex adjacent to the first vertex does not have a link relationship or the link relationship corresponding to the target adjacent vertex is completely filled into the first link graph, select any one vertex from the remaining vertices in the first link graph as the first vertex, and return to the step of determining whether the first vertex is matched with the set network layer set and whether the first vertex has a link relationship with the next vertex adjacent to the first vertex until the determination of all adjacent vertices in the first link graph is completed.

In some embodiments, the filling unit is configured to fill, in the first link map, a link relationship between the first vertex and a next vertex adjacent to the first vertex, in a case where the first vertex matches the set network layer set.

In some embodiments, the filling unit is configured to fill the link relation between the first vertex and the next vertex adjacent thereto in the first link map in a case where the first vertex has the link relation with the next vertex adjacent thereto.

In some embodiments, for determination of a link relationship, an apparatus includes a statistics unit and a setting unit;

the setting unit is used for setting the link relation based on the link relation among the vertexes in the model diagram structure and the network layer set.

In some embodiments, a supplemental unit is also included;

the supplementing unit is used for determining a target network layer with the same input dimension and output dimension according to the type of the model to be pruned; the target network layer is supplemented to the network layer set.

The description of the features of the embodiment corresponding to fig. 6 may be referred to the related description of the embodiment corresponding to fig. 1 to 4, and will not be repeated here.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 7, where the electronic device includes: a memory 70 for storing a computer program;

a processor 71 for implementing the steps of the link pruning method of the deep neural network model of the above embodiment when executing a computer program.

The electronic device provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.

Processor 71 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 71 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 71 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 71 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 71 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 70 may include one or more computer-readable storage media, which may be non-transitory. Memory 70 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 70 is at least used to store a computer program 701, where the computer program, when loaded and executed by the processor 71, is capable of implementing the relevant steps of the link pruning method of the deep neural network model disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 70 may further include an operating system 702, data 703, and the like, where the storage manner may be transient storage or permanent storage. The operating system 702 may include Windows, unix, linux, among other things. The data 703 may include, but is not limited to, the linking relationships between the layers in the model graph structure, the linking relationships between the inputs and outputs of each layer, and the like.

In some embodiments, the electronic device may further include a display screen 72, an input-output interface 73, a communication interface 74, a power supply 75, and a communication bus 77.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the electronic device and may include more or fewer components than shown.

It will be appreciated that if the link pruning method of the deep neural network model in the above embodiment is implemented in the form of a software functional unit and sold or used as a separate product, it may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or in whole or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), an electrically erasable programmable ROM, registers, a hard disk, a removable disk, a CD-ROM, a magnetic disk, or an optical disk, etc. various media capable of storing program codes.

Based on the above, the embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the link pruning method of the deep neural network model

The method, the device, the equipment and the computer readable storage medium for pruning the link of the deep neural network model provided by the embodiment of the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The method, the device, the equipment and the computer readable storage medium for pruning the link of the deep neural network model provided by the invention are described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A structured link pruning method for a deep neural network large model of an edge computing device, comprising:

taking a second link diagram as a structured pruning tool, determining all layers to be pruned and pruning dimensions associated with pruning requirements from the second link diagram based on the pruning requirements, and pruning the layers to be pruned according to the pruning dimensions corresponding to the layers to be pruned so as to complete link pruning of the model to be pruned;

determining all to-be-pruned layers and pruning dimensions associated with the pruning requirement from the second link graph based on the pruning requirement, pruning the to-be-pruned layers according to the pruning dimensions corresponding to the to-be-pruned layers to complete link pruning of the to-be-pruned model, wherein the step of pruning the to-be-pruned layers to complete link pruning of the to-be-pruned model comprises the following steps:

selecting a target link subgraph matched with the pruning requirement from each link subgraph; and pruning corresponding links are carried out on a layer to be pruned corresponding to the target link subgraph in the model to be pruned according to pruning dimensions corresponding to each vertex in the target link subgraph, parameters are reduced by pruning the whole neuron, channel or layer through structured link pruning, a regular model structure is maintained, and hardware deployment is adapted.

2. The structured link pruning method for a deep neural network large model of an edge computing device of claim 1, wherein the selecting a target link subgraph from the link subgraphs that matches the pruning requirement comprises:

3. The structured link pruning method for a deep neural network large model of an edge computing device of claim 2, wherein the determining the link score for each of the link subgraphs based on weights between adjacent vertices in each of the link subgraphs comprises:

4. The structured link pruning method for a deep neural network large model of an edge computing device of claim 2, wherein the determining the link score for each of the link subgraphs based on weights between adjacent vertices in each of the link subgraphs comprises:

5. The method for structured link pruning of a deep neural network large model for an edge computing device according to claim 2, wherein determining the target link subgraph to be pruned according to the link score of each link subgraph and the score threshold carried by the pruning requirement comprises:

6. The structured link pruning method for a deep neural network large model of an edge computing device of claim 5, wherein normalizing the link score of each link sub-graph based on the link score maximum and the link score minimum to obtain a normalized score for each link sub-graph comprises:

7. The method for structured link pruning of a deep neural network large model for an edge computing device according to claim 5, wherein the pruning of the corresponding link for the to-be-pruned layer of the to-be-pruned model corresponding to the target link sub-graph according to pruning dimensions corresponding to each vertex in the target link sub-graph comprises:

8. The structured link pruning method for a deep neural network large model of an edge computing device of claim 1, wherein the selecting a target link subgraph that matches the pruning requirement from the link subgraphs; according to pruning dimensions corresponding to each vertex in the target link subgraph, pruning corresponding links to a to-be-pruned layer corresponding to the target link subgraph in the to-be-pruned model comprises:

Pruning corresponding to the target link subgraph in the to-be-pruned model according to the target link subgraph and the dimension carried by the pruning requirement; the model to be pruned is a RESNET18 model.

9. The method for structured link pruning of a deep neural network large model of an edge computing device according to claim 8, wherein the pruning of the corresponding link to the target link subgraph in the to-be-pruned model according to the target link subgraph and the dimension carried by the pruning requirement comprises:

10. The structured link pruning method for a deep neural network large model of an edge computing device of claim 9, wherein pruning the layers of desired pruning in the to-be-pruned model according to their respective pruning dimensions comprises:

11. The structured link pruning method for a deep neural network large model of an edge computing device of claim 9, wherein pruning the layers of desired pruning in the to-be-pruned model according to their respective pruning dimensions comprises:

12. The method for structured link pruning of a deep neural network large model of an edge computing device according to claim 1, wherein constructing a first link graph according to a model graph structure corresponding to a model to be pruned comprises:

13. The structured link pruning method for a deep neural network large model of an edge computing device of claim 1, wherein the sequentially determining whether each neighboring vertex in the first link graph has a link relationship comprises:

14. The structured link pruning method for a deep neural network large model of an edge computing device of claim 13, wherein traversing each adjacent vertex in the first link graph sequentially according to the selected search traversal algorithm comprises:

15. The method of structured link pruning for a deep neural network large model of an edge computing device of claim 14, wherein, in the event that a target neighboring vertex with a link relationship occurs, populating the first link graph with the link relationship corresponding to the target neighboring vertex comprises:

16. The method of structured link pruning for a deep neural network large model of an edge computing device of claim 15, wherein, in the event that a target neighboring vertex with a link relationship occurs, populating the first link graph with the link relationship corresponding to the target neighboring vertex comprises:

17. The structured link pruning method for a deep neural network large model of an edge computing device of any one of claims 1-16, wherein for the determination of the link relationship, the method comprises:

18. The structured link pruning method for a deep neural network large model of an edge computing device of claim 17, further comprising, after the statistical deep learning model has network layers with the same input and output dimensions to obtain a set of network layers:

and supplementing the target network layer to the network layer set.

19. The structured link pruning device for the deep neural network large model of the edge computing equipment is characterized by comprising a construction unit, a judging unit, a filling unit, a serving unit and a pruning unit;

the pruning unit is configured to use the second link diagram as a tool for structured pruning, determine, based on a pruning requirement, all to-be-pruned layers and dimensions associated with the pruning requirement from the second link diagram, and prune the to-be-pruned layers according to pruning dimensions corresponding to each to-be-pruned layer, so as to complete link pruning of the to-be-pruned model;

the pruning unit is used for determining mutually independent link subgraphs according to the link relation among the vertexes in the second link graph; selecting a target link subgraph matched with the pruning requirement from each link subgraph, pruning corresponding to a layer to be pruned corresponding to the target link subgraph in the model to be pruned according to pruning dimension corresponding to each vertex in the target link subgraph, reducing parameters by pruning the whole neuron, channel or layer by structuring the link pruned, maintaining a regular model structure, and adapting to hardware deployment.

20. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the steps of the structured link pruning method for deep neural network large models of edge computing devices as claimed in any one of claims 1 to 18.

21. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the structured link pruning method for deep neural network large models of edge computing devices according to any one of claims 1 to 18.