CN112884123A

CN112884123A - Neural network optimization method and device, electronic equipment and readable storage medium

Info

Publication number: CN112884123A
Application number: CN202110204808.1A
Authority: CN
Inventors: 张凯; 谭文明; 李哲暘; 张如意
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-06-01
Anticipated expiration: 2041-02-23
Also published as: CN112884123B

Abstract

The application provides a neural network optimization method, a device, an electronic device and a readable storage medium, wherein the neural network optimization method comprises the following steps: dividing the neural network to be optimized into subnetworks, and performing network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet; and performing layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized. The method can improve the efficiency of determining the optimal fusion result of the neural network to be optimized under the condition of ensuring that the optimal fusion result under the condition of meeting the preset fusion rule and the fusion target is obtained.

Description

Neural network optimization method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network optimization method and apparatus, an electronic device, and a readable storage medium.

Background

Neural networks (NN for short) are a research hotspot and a key point in the field of artificial intelligence, and huge calculation amount and broadband requirements of the Neural networks become main bottlenecks of Neural Network deployment.

In order to reduce the bandwidth requirement of the neural network deployed in a computing platform, the neural network optimization can be realized by combining a plurality of layers (network layers) of the neural network into a level (a hardware basic computing unit can be called as a level) according to the constraint of the computing platform, so that the input and the output of the plurality of layers combined into the level only occupy bandwidth once, and the frequency of interaction between the computing platform and external storage data is reduced.

Practice shows that different fusion modes have different effects on reducing bandwidth requirements, and how to minimize the bandwidth requirements of a neural network by fusion becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of the above, the present application provides a neural network optimization method, apparatus, electronic device and readable storage medium.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of embodiments of the present application, there is provided a neural network optimization method, including:

dividing the neural network to be optimized into subnetworks, and performing network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet;

performing layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized; and any one of the optimal fusion results of the neural network to be optimized comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the subnet identical to the layers contained by the level.

According to a second aspect of embodiments of the present application, there is provided a neural network optimization apparatus, including:

the dividing unit is used for carrying out subnet division on the neural network to be optimized;

the optimization unit is used for respectively carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet;

the optimization unit is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized; and any one of the optimal fusion results of the neural network to be optimized comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the subnet identical to the layers contained by the level.

According to a third aspect of embodiments of the present application, there is provided an electronic device, including a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being configured to execute the machine-executable instructions to implement the above-mentioned people archiving method.

According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions that, when executed by a processor, implement the above-mentioned human archiving method.

The technical scheme provided by the application can at least bring the following beneficial effects:

the method comprises the steps of dividing the neural network to be optimized, performing network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet, performing layer fusion on the neural network to be optimized according to the optimal fusion result, the preset fusion rule and the fusion target of each subnet to obtain the optimal fusion result of the neural network to be optimized, and improving the efficiency of determining the optimal fusion result of the neural network to be optimized under the condition that the optimal fusion result meeting the preset fusion rule and the fusion target is ensured to be obtained.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a neural network optimization method in accordance with an exemplary embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating a process of performing subnet division on a neural network to be optimized and performing network layer convergence on each subnet according to a preset convergence rule and a convergence target, according to an exemplary embodiment of the present application;

FIG. 3A is a schematic diagram of a neural network shown in an exemplary embodiment of the present application;

FIG. 3B is a diagram illustrating an optimal fusion result under a greedy fusion scheme according to an exemplary embodiment of the present application;

fig. 3C is a schematic diagram illustrating an optimal fusion result obtained by using the technical solution provided by the embodiment of the present application according to the exemplary embodiment of the present application;

FIG. 4 is a schematic flow chart diagram illustrating a neural network optimization method in accordance with an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make those skilled in the art better understand the technical solutions provided by the embodiments of the present application, a brief description will be given below of some technical terms related to the embodiments of the present application.

layer: on the network side, basic constituent units in the neural network, such as convolutional layers, pooling layers, and the like;

level: the system comprises a hierarchy, a basic computing unit and one or more layers, wherein the basic computing unit is a basic computing unit when a neural network is deployed in a computing platform;

in practical applications, there may be a case where one layer is split into a plurality of levels, but the probability of occurrence of this case is low.

Bw (bandwidth): the throughput of data, the BW of input and output when the neural network runs in a computing platform can be understood as the bandwidth required by the neural network;

ker: broadly convolutional layer weights (coefficients);

coefficient caching: a cache in the computing platform holding ker;

and Map caching: a cache in the computing platform that stores feature maps.

In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic flow chart of a neural network optimization method according to an embodiment of the present disclosure is shown in fig. 1, where the neural network optimization method may include the following steps:

and S100, carrying out subnet division on the neural network to be optimized, and respectively carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimized fusion result of each subnet.

In the embodiment of the application, in order to reduce the requirement of the bandwidth of the neural network deployed in a computing platform (such as a chip), the interaction bandwidth of the neural network and the data stored outside can be reduced by fusing the layer in the neural network, so that the optimization of the neural network is realized.

For example, for a neural network to be optimized (referred to as a neural network to be optimized herein), subnet division may be performed first to obtain a plurality of subnets, and layer fusion is performed on each subnet according to a preset fusion rule and a fusion target (referred to as a preset fusion rule and a fusion target herein) to obtain an optimal fusion result of each subnet.

For example, for a neural network with N layers (N ≧ 3), the subnet of the neural network can be divided into a plurality of subnets with 2 layers.

For example, assuming that N ═ 3 (the neural network includes layer1, layer2, and layer3), the subnet of layer2 may include a subnet of layer1 and layer2, a subnet of layer2 and layer 3; for nonlinear networks, a subnet of layer1 and layer3 may also be included.

For ease of understanding and explanation, a linear network is used as an example, i.e., layer1 is directly connected to layer2, and layer2 is directly connected to layer 3.

Illustratively, for the neural network to be optimized, subnet division can be performed according to various different subnet division modes to obtain various different types of subnets, and layer fusion is performed on the different types of subnets respectively.

For example, still taking an N-layer neural network as an example, assuming that N is 4, the neural network may be divided into a plurality of 2-layer subnets, and layer fusion is performed on each 2-layer subnet; and dividing the neural network into a plurality of 3-layer subnets according to another subnet dividing mode, and respectively carrying out layer fusion on each 3-layer subnet.

Illustratively, the fusion rule is used to limit the layer participating in the fusion, which may include, but is not limited to, ker cache limits (i.e., coefficient cache limits), Map cache limits, layer-to-layer fusion limits, and the like.

For example, a maximum ker buffer (i.e., a coefficient buffer) may be preset, and the coefficients of the merged level cannot exceed the preset maximum ker buffer, thereby limiting the number of layers participating in layer merging.

The fusion objective is used for the purpose of characterizing layer fusion on neural networks, for example, to minimize the bandwidth requirements of neural network deployment (i.e., to minimize the throughput of data interaction of the neural network with external storage).

Step S110, presetting a fusion rule and a fusion target according to the optimal fusion result of each subnet, and performing layer fusion on the neural network to be optimized to obtain the optimal fusion result of the neural network to be optimized; and any one of the optimal fusion results of the neural network to be optimized comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the subnet identical to the layers contained by the level.

In the embodiment of the present application, in consideration of the process of layer fusion on the neural network to be optimized, at least one fusion scheme needs to use the optimal fusion result of the subnets in step S100.

For example, for an N-layer neural network, assuming that N is 3, its candidate fusion schemes (which need to satisfy the preset fusion rule, the same applies below) may include the following schemes:

scheme 1, each layer is not fused;

scheme 2, layer1 and layer2 fused, layer3 did not participate in the fusion;

scheme 3, layer2 and layer3 fused, layer1 did not participate in the fusion;

scheme 4, layer1, layer2, and layer3 fusions (assuming 3 layer fusions can meet preset fusion rule requirements).

Wherein, both scheme 2 and scheme 3 require the use of the fusion result of the layer2 subnets.

For example, for the scheme 2, when layer1 and layer2 are fused, the optimal fusion result is the optimal fusion result of the layer2 subnets corresponding to layer1 and layer 2.

That is, when determining the optimal fusion result of the neural network to be optimized, the optimal fusion result of the subnet of the neural network to be optimized needs to be used. Therefore, the optimal fusion result of each subnet is obtained by dividing the subnet of the neural network to be optimized and performing layer fusion on each subnet respectively, and then the optimal fusion result of the neural network to be optimized is obtained by performing layer fusion on the neural network to be optimized according to the optimal fusion result of the subnet, so that the calculation for determining the optimal fusion result of the neural network to be optimized can be simplified, and the efficiency for determining the optimal fusion result of the neural network to be optimized is improved.

For example, for any level of the optimal fusion result of the neural network to be optimized, which includes multiple layers, if there is a subnet (referred to as a target subnet herein) that is the same as the layer included by the level, the structure of the level in the optimal fusion result of the neural network to be optimized is consistent with the structure of the optimal fusion result of the target subnet.

For the above example, assuming that the optimal fusion result of the neural network to be optimized is scheme 3, for a level including layer2 and layer3, the structure of the level is consistent with the structure of the optimal fusion result of the layer2 subnet including layer2 and layer 3.

As can be seen, in the method flow shown in fig. 1, the neural network to be optimized is subjected to subnet division, and layer fusion is performed on each subnet according to the preset fusion rule and the fusion target to obtain the optimal fusion result of each subnet, and then layer fusion is performed on the neural network to be optimized according to the optimal fusion result, the preset fusion rule and the fusion target of each subnet to obtain the optimal fusion result of the neural network to be optimized.

In some embodiments, as shown in fig. 2, in step S100, the to-be-optimized neural network is divided into subnetworks, and network layer fusion is performed on each subnet according to a preset fusion rule and a fusion target, which may be implemented by the following steps:

step S101, respectively dividing the subnets of the neural network to be optimized according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the number of layers included in the subnets obtained in different subnet division modes is different;

step S102, layer fusion is carried out on each bottom sub-network respectively according to a preset fusion rule and a fusion target, and an optimal fusion result of each bottom sub-network is obtained; the lowest subnet is the subnet with the least number of layer in various different types of subnets;

step S103, performing layer fusion on the high-level sub-network according to the optimal fusion result of the bottom-level sub-network, a preset fusion rule and a fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of the layers included in the high-level sub-network is greater than the number of the layers included in the bottom-level sub-network, any level of the optimal fusion result of the high-level sub-network includes a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target bottom-level sub-network, and the target bottom-level sub-network is the bottom-level sub-network which is the same as the layers included in the level.

For example, when the to-be-optimized neural network is divided into subnetworks, the to-be-optimized neural network may be divided into subnetworks of different types according to different subnet division modes.

Illustratively, different types of subnets include different numbers of layers.

Considering that when the layer fusion is carried out on the sub-networks, the layer fusion of the sub-networks at the upper layer needs to use the optimal fusion result of the layer of the sub-networks at the lower layer.

It should be noted that, in the embodiment of the present application, the lower-layer subnets and the upper-layer subnets are relative, rather than absolute, and for two different types of subnets, the subnet with a large number of layers is the upper-layer subnet, and the subnet with a small number of layers is the lower-layer subnet.

For example, for a layer2 subnet (one subnet includes 2 layers) and a layer3 subnet (one subnet includes 3 layers), the layer2 subnet is the bottom subnet and the layer3 subnet is the top subnet.

For a layer3 subnet and a layer4 subnet (one subnet includes 4 layers), the layer3 subnet is a bottom subnet, and the layer4 subnet is a top subnet.

It should be noted that, for the layer4 subnet, the layer2 subnet and the layer3 subnet both belong to the bottom subnet.

When performing layer fusion on subnets, layer fusion may be performed on the lowest-layer subnet (i.e., the subnet including the least number of layers) to obtain the optimal fusion result of each lowest-layer subnet.

When the optimal fusion result of the lowest subnet is determined, layer fusion can be performed on the high-level subnets according to the sequence of the subnets comprising the number of layers from small to large and according to the optimal fusion result of the bottom subnet, the preset fusion rule and the fusion target in sequence to obtain the optimal fusion result of the high-level subnets.

For example, for any of the optimal fusion results of any higher-layer subnet that includes levels of multiple layers, if there is a lower-layer subnet (referred to as a target lower-layer subnet herein) that is the same as the layers included by the levels, the structure of the level in the optimal fusion result of the higher-layer subnet is consistent with the structure of the optimal fusion result of the target lower-layer subnet.

For example, for a 4-layer subnet (assuming that including layer 1-layer 4), if the optimal fusion result is that layer 1-layer 3 are fused and layer4 does not participate in the fusion (it can also be understood that layer4 is a single layer as a level, and the same applies to the case of other single layers), then for the level obtained by the fusion of layer 1-layer 3, the structure of the level is consistent with the structure of the optimal fusion result for the 3-layer subnet including layer 1-layer 3.

In one example, the number of layers included in the lowest subnet is 1; the subnets obtained under different subnet dividing modes comprise 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum in-out bandwidth;

in step S103, performing layer fusion on the higher-level subnet according to the optimal fusion result, the preset fusion rule, and the fusion target of the lower-level subnet, which may include:

for any 2-layer subnet, respectively determining a first in-out bandwidth of the 2-layer subnet under the condition that two layers in the 2-layer subnet are fused into one level and a second in-out bandwidth of the bottommost subnet under the condition that the two layers are not fused;

if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the two layers are fused into one level as the optimal fusion result of the 2-layer subnet;

and if the first in-out bandwidth is larger than the second in-out bandwidth, determining that the layer is not fused into the optimal fusion result of the 2-layer subnet.

It should be noted that, for a subnet with 1 number of layers (which may be referred to as a layer-1 subnet), the optimal merging result may form a level for 1 layer.

Illustratively, the optimal fusion result is taken as the fusion result with the minimum in-out bandwidth, that is, the fusion target is to minimize the throughput of data interaction between the neural network and the external storage.

For any layer2 subnet (taking layer1 and layer2 as examples), the candidate fusion schemes for the layer2 subnet may include the following schemes:

scheme 1: layer1 and layer2 are fused into 1 level;

scheme 2: layer1 and layer2 do not fuse (i.e., layer1 as a level and layer2 as a level).

An ingress and egress bandwidth corresponding to scheme 1 (referred to herein as a first ingress and egress bandwidth) and an ingress and egress bandwidth corresponding to scheme 2 (referred to herein as a second ingress and egress bandwidth) may be determined, respectively.

For example, for any fusion scheme, the ingress and egress bandwidth corresponding to the fusion scheme is a sum of bandwidths corresponding to the input feature and the output feature of each level in the fusion scheme, and a specific implementation thereof may be described in the following with reference to a specific example, which is not described herein again in this embodiment of the present application.

The first in-out bandwidth and the second in-out bandwidth can be compared, and if the first in-out bandwidth is smaller than the second in-out bandwidth, the scheme 1 is determined to be the optimal fusion result; and if the first access bandwidth is larger than the second access bandwidth, determining the scheme 2 as an optimal fusion result.

It should be noted that, for the case that the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, the scheme 1 may be used as the optimal fusion result, and the scheme 2 may also be used as the optimal fusion result.

In addition, when the subnet division is performed, the division of the layer1 subnet may not be performed, that is, the lowermost subnet is not the layer1 subnet, and for example, the layer2 subnet division, the layer3 subnet division, …, and the (N-1) subnet division may be performed, and in this case, the lowermost subnet may be the layer2 subnet.

In one example, if the first in-out bandwidth is equal to the second in-out bandwidth, then scenario 2 is determined to be the optimal fusion result.

In one example, in step S103, performing layer fusion on the upper-level subnet according to the optimal fusion result of the lower-level subnet, the preset fusion rule, and the fusion target to obtain the optimal fusion result of the upper-level subnet, which may further include:

and for any high-level sub-network with the number of the layers being k, fusing the high-level sub-network according to the optimal fusion result of the bottom-level sub-network with the number of the layers being less than k, a preset fusion rule and a fusion target to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the number of the layers of the network to be optimized.

For example, when the neural network to be optimized (assumed to be N layers) is divided into subnets, the subnets may be divided according to a dividing manner of a layer2 subnet (i.e., each subnet is a layer2 subnet), divided according to a dividing manner of a layer3 subnet (i.e., each subnet is a layer3 subnet), …, and divided according to a dividing manner of a layer (N-1) subnet (i.e., each subnet is a layer (N-1) subnet).

When layer fusion is carried out on the subnets, the optimal fusion result of each layer2 of subnets can be determined firstly, then the optimal fusion result of each layer3 of subnets is determined according to the optimal fusion result of each layer of subnets and the optimal fusion result of each layer2 of subnets, then the optimal fusion result of each layer4 of subnets is determined according to the optimal fusion result of each layer of subnets, the optimal fusion result of each layer2 of subnets and the optimal fusion result of each layer3 of subnets, and the like until the optimal fusion result of each highest layer of subnets (such as (N-1) layer subnets) is determined.

It should be noted that, when layer fusion is performed on the sub-network, the fusion rule and the fusion target are the same as those when layer fusion is performed on the neural network to be optimized.

In one example, for any subnet with k number of included layers, the candidate fusion scheme for performing layer fusion on the subnet includes at least 2 layers and at most m layers, where m is less than or equal to k, and m satisfies the preset fusion rule limit.

Illustratively, the layer participating in the fusion may include, but is not limited to, a Conv (convolutional) layer, a non-linear layer, a pool (pooling) layer, a fully-connected layer, an deconvolution layer, an upsampling layer, or other network base layer.

Illustratively, the non-Linear layer may include a Linear rectification function (which may also be referred to as a modified Linear Units, ReLU for short) or other activation function.

For example, when layer fusion is performed on any subnet or neural network to be optimized, the maximum number of fusible layers (denoted as m herein) may be determined according to the layers participating in the fusion and a preset fusion rule.

It should be noted that, for different layers, the maximum number of layers that can be fused may be different under the same fusion rule.

In one example, for a network to be fused (including a neural network to be optimized or a subnet of the neural network to be optimized), the candidate fusion scheme may include all layer fusion, or, the candidate fusion scheme is divided into two optimal subnetworks (one optimal subnetwork includes x layers, the other optimal subnetwork includes y-x layers, and y is the total number of layers in the network to be fused), and for the optimal subnetwork including x layers, the structure of the optimal subnetwork is consistent with the structure of the optimal fusion scheme of the x-layer subnetworks of the network to be fused (i.e., the subnet including x layers); for the optimal sub-network of y-x layers, the structure is consistent with the structure under the optimal fusion scheme of the y-x layer sub-network (i.e. the sub-network comprising y-x layers) of the network to be fused.

For example, for the k-layer subnet, if m is equal to k, it is determined that the optimal fusion scheme is that all k layers are fused when the constraint of the fusion rule is satisfied; if m is less than k, the optimal fusion scheme may be to divide k layers of subnets into 2 optimal subnets, one including m layers of layers and one including k-m layers of layers. The result of the optimal subnet comprising m layers of layers is consistent with the structure under the optimal fusion scheme of m layers of subnets, and the result of the optimal subnet comprising k-m layers of layers is consistent with the structure under the optimal fusion scheme of k-m layers of subnets.

In some embodiments, before performing subnet division on the neural network to be optimized in step S100, the method may further include:

acquiring a network splitting configuration instruction;

splitting the neural network to be optimized into at least two parts to be optimized according to the acquired network splitting configuration instruction;

in step S100, performing subnet division on the neural network to be optimized may include:

respectively carrying out subnet division on each part to be optimized;

in step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result, the preset fusion rule, and the fusion target of each subnet, which may include:

for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, a preset fusion rule and a fusion target to obtain the optimal fusion result of the part to be optimized;

and determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.

For example, in order to further improve the efficiency of optimizing the neural network, before the neural network is optimized according to the method flow shown in fig. 1, a network splitting configuration instruction may be further obtained, and the neural network to be optimized is split into at least two portions to be optimized according to the obtained network splitting configuration instruction, so that the optimal fusion schemes of the portions to be optimized may be respectively determined, and the optimal fusion scheme of the neural network to be optimized is obtained.

For any part to be optimized, an optimal fusion scheme can be determined according to the method flow shown in fig. 1.

Illustratively, the network split configuration instructions may be determined from a priori knowledge.

In the method flow shown in fig. 1, the optimal fusion scheme of the subnetworks may be determined first, and then the optimal fusion scheme of the neural network to be optimized may be determined according to the optimal fusion scheme of the subnetworks, so that for the neural network to be optimized, if it may be determined that some layers of the neural network to be optimized are not fused into one level according to the existing priori knowledge, the neural network to be optimized may be split first before the neural network to be optimized is divided into subnetworks.

For example, for a neural network with N layers, if it is known from prior knowledge that the front N layers and the rear (N-N) layers need to be split apart on the premise of meeting the fusion target, where N is greater than or equal to 1 and less than N, the neural network may be split into two parts including the front N layers and the rear (N-N) layers, and the optimal fusion results of the parts are determined according to the flow shown in fig. 1, and then the optimal fusion result of the neural network is determined, so as to further improve the efficiency of determining the optimal fusion result of the neural network to be optimized.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.

In the embodiment, the subnets of the N-layer neural network are divided according to a subnet division mode in which one subnet includes 2 layers, one subnet includes 3 layers, …, and one subnet includes (N-1) subnets, and according to the sequence of the subnets including the number of the layers from small to large, the optimal fusion result of each subnet is determined in a bottom-up manner, and further, the optimal fusion result of the N-layer neural network is determined (which may be referred to as a pyramid layer fusion scheme).

Illustratively, for a neural network with N layers, a bottom-up calculation strategy is adopted, and the optimal fusion result of the sub-networks with 2 layers is calculated firstly; calculating the optimal fusion result of the 3-layer subnets according to the optimal fusion result of the 2-layer subnets; and calculating the optimal fusion result of the 4-layer subnets based on the optimal fusion result of the 2-layer subnets and the optimal fusion result of the 3-layer subnets, and so on until the optimal fusion result of the N-layer neural network is determined.

In the process of fusing any subnet or neural network, at least 2 adjacent layers are respectively selected for fusing according to a fusion rule (which can also be an optimization rule) (for example, a horizontal adjacent layer with the same feature map input or a feature map calculation result of a previous layer is a vertical adjacent layer of at least a part of a next layer input) so as to determine an optimal fusion result meeting a fusion target.

It should be noted that the number of layers participating in the fusion cannot exceed the fusion rule limit.

Illustratively, the adjacent layers may include a Conv layer, a non-linear layer, a pool layer, a fully connected layer, an deconvolution layer, an upsampling layer, or the like.

Illustratively, defining the problem f (i, j) as determining the optimal fusion result of the subnetworks consisting of layer i to layer j, then f (1, h) exists for the problem f (1, N), i.e. in the process of determining the optimal fusion result of the N-layer neural network, the optimal fusion result of the subnetworks consisting of layer1 to layer h needs to be determined (h < N).

Accordingly, in solving for f (1, N), each f (i, j) in the following table needs to be solved separately:

f(1，1)	f(1，2)	f(1，3)	…	f(1，N)
					f(2，2)	f(2，3)	f(h-2，h)	…
		f(3，3)	f(h-1，h)	f(N-2，N)
							f(h，h)	f(N-1，N)
				f(N，N)

for example, each f (i, j) can be solved sequentially from right to left and from bottom to top until the solution f (1, N) is completed, so as to obtain an optimal fusion result of the N-layer neural network.

The following describes the effects of the technical solutions provided by the embodiments of the present application with reference to examples.

By taking the comparison with the following greedy fusion scheme as an example, assume that an implementation flow of the greedy fusion scheme is as follows:

1. setting the size of a cache in a chip and inputting the cache into a network structure of a neural network;

2. if the current layer can be put down by the cache in the chip, putting in;

3. if the current layer cannot be put down by the in-chip cache, ending the current level, emptying the current cache, and putting the current layer into a new level;

4. if all the layers are divided, ending; otherwise, go to 2).

For the neural network shown in fig. 3A, the optimal fusion results obtained according to the greedy fusion scheme and the technical scheme provided by the embodiment of the present application may be respectively shown in fig. 3B and fig. 3C.

Where 3 × 128 × 64 represents that the convolution kernel is 3 × 3 ("+" can also be described as "x"), the number of input/output channels is 128 and 64, stride (step size) is 1, and the input feature map size is w × h.

For the fusion result shown in fig. 3B, the bandwidth is 128wh +256wh +256wh +128 wh-768 wh

For the fusion result shown in fig. 3C, the bandwidth is 128wh +64wh +64wh +128wh 384 wh.

As can be seen, for the neural network shown in fig. 3A, the bandwidth of the optimal fusion result obtained by using the technical scheme provided in the embodiment of the present application is half of the bandwidth of the fusion result obtained by using the greedy fusion scheme.

The methods provided herein are described above. The following describes the apparatus provided in the present application:

referring to fig. 4, a schematic structural diagram of a neural network optimization device according to an embodiment of the present disclosure is shown in fig. 4, where the neural network optimization device may include:

a dividing unit 410, configured to perform subnet division on the neural network to be optimized;

the optimizing unit 420 is configured to perform network layer fusion on each subnet according to a preset fusion rule and a fusion target, so as to obtain an optimal fusion result of each subnet;

the optimizing unit 420 is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule, and the fusion target, so as to obtain an optimal fusion result of the neural network to be optimized; and any one of the optimal fusion results of the neural network to be optimized comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the subnet identical to the layers contained by the level.

In some embodiments, the dividing unit 410 performs subnet division on the neural network to be optimized, including:

respectively dividing the to-be-optimized neural network into subnets according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the number of layers included in the subnets obtained in different subnet division modes is different;

the optimizing unit 420 performs network layer fusion on each subnet according to a preset fusion rule and a fusion target, respectively, including:

according to the preset fusion rule and the fusion target, layer fusion is carried out on each bottom layer sub-network respectively to obtain the optimal fusion result of each bottom layer sub-network; the lowest subnet is the subnet with the least number of layer in the plurality of different types of subnets;

and performing layer fusion on the high-level sub-network according to the optimal fusion result of the bottom-level sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of the layers included by the high-level sub-network is greater than the number of the layers included by the bottom-level sub-network, any level of the optimal fusion result of the high-level sub-network, the structure of the level is consistent with that of the optimal fusion result of the target bottom-level sub-network, and the target bottom-level sub-network is the same as the layers included by the level.

In some embodiments, the bottom-most subnet comprises a number of layers of 1; the subnets obtained in different subnet dividing modes comprise 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum in-out bandwidth;

the optimizing unit 420 performs layer fusion on the high-level subnet according to the optimal fusion result of the bottom-level subnet, the preset fusion rule and the fusion target, including:

for any 2-layer subnet, respectively determining a first in-out bandwidth of the lowest-layer subnet under the condition that two layers in the 2-layer subnet are fused into one level and a second in-out bandwidth of the lowest-layer subnet under the condition that the two layers are not fused;

if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the level of the two layers is fused into one layer as the optimal fusion result of the 2-layer subnet;

In some embodiments, the optimizing unit 420 performs layer fusion on the upper-level subnet according to the optimal fusion result of the bottom-level subnet, the preset fusion rule, and the fusion target to obtain the optimal fusion result of the upper-level subnet, and further includes:

and for any high-level sub-network with the number of the layers being k, fusing the high-level sub-network according to the optimal fusion result of the bottom-level sub-network with the number of the layers being less than k, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the number of the layers of the network to be optimized.

In some embodiments, for any subnet with k number of layer, the candidate fusion scheme for performing layer fusion on the subnet includes at least 2 layers and at most m layers, where m is less than or equal to k, and m satisfies the fusion rule limit;

the layer comprises a convolution Conv layer, a nonlinear layer, a pooling pool layer, a fully connected layer, a deconvolution layer, or an upsampling layer.

In some embodiments, before the dividing unit 410 performs subnet division on the neural network to be optimized, the method further includes:

acquiring a network splitting configuration instruction;

splitting the neural network to be optimized into at least two parts to be optimized according to the network splitting configuration instruction;

the dividing unit 410 performs subnet division on the neural network to be optimized, including:

respectively carrying out subnet division on each part to be optimized;

the optimizing unit 420 performs layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, and includes:

for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, the preset fusion rule and the fusion target to obtain the optimal fusion result of the part to be optimized;

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure. The electronic device may include a processor 501, a memory 502 storing machine executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the neural network optimization method described above by reading and executing machine executable instructions in the memory 502 corresponding to the encoded control logic.

The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

In some embodiments, there is also provided a machine-readable storage medium, such as the memory 502 in fig. 5, having stored therein machine-executable instructions that, when executed by a processor, implement the neural network optimization method described above. For example, the machine-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and so forth.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A neural network optimization method, comprising:

dividing subnetworks of a neural network to be optimized, and performing network layer fusion on the subnetworks according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet;

performing network layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized; wherein, any one of the optimal fusion results of the neural network to be optimized comprises a plurality of network layer hierarchies, the structure of the hierarchy is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the network layer included by the hierarchy.

2. The method of claim 1, wherein the dividing the neural network to be optimized into subnetworks and performing network layer fusion on each subnet according to a preset fusion rule and a fusion target respectively comprises:

respectively carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the number of network layers included in the subnets obtained in different subnet division modes is different;

respectively carrying out network layer fusion on each bottom layer sub-network according to the preset fusion rule and the fusion target to obtain the optimal fusion result of each bottom layer sub-network; the lowest subnet is the subnet with the least number of network layers in the plurality of different types of subnets;

and performing network layer fusion on the high-level subnets according to the optimal fusion result of the bottom-level subnets, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level subnets, wherein the number of the network layers included in the high-level subnets is larger than that of the network layers included in the bottom-level subnets, the structure of any one of the optimal fusion results of the high-level subnets including a plurality of network layers is consistent with that of the optimal fusion result of the target bottom-level subnets, and the target bottom-level subnets are the same as the network layers included in the hierarchy.

3. The method of claim 2, wherein the lowest subnet comprises a number of network layers of 1; the subnets obtained in different subnet dividing modes comprise 2 layers of subnets, and the number of the network layers of the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum in-out bandwidth;

the network layer fusion of the high-level sub-network is carried out according to the optimal fusion result of the bottom sub-network, the preset fusion rule and the fusion target, and the method comprises the following steps:

for any 2-layer subnet, respectively determining a first in-out bandwidth of the lowest-layer subnet under the condition that two network layers in the 2-layer subnet are fused into one layer, and a second in-out bandwidth of the lowest-layer subnet under the condition that the two network layers are not fused;

if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the two network layers are fused into a layer which is the optimal fusion result of the 2-layer subnet;

and if the first access bandwidth is larger than the second access bandwidth, determining that the network layer is not fused into the optimal fusion result of the 2-layer subnet.

4. The method according to claim 3, wherein the performing network-level fusion on the upper-level subnets according to the optimal fusion result of the lower-level subnets, the preset fusion rule, and the fusion target to obtain the optimal fusion result of the upper-level subnets further comprises:

and for any high-level sub-network with the network layer number of k, fusing the high-level sub-network according to the optimal fusion result of the bottom-level sub-network with the network layer number of less than k, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the network layer number of the network to be optimized.

5. The method according to claim 4, wherein for any subnet with k number of network layers, the candidate fusion scheme for performing network layer fusion on the subnet includes at least 2 network layers and at most m network layers, where m is less than or equal to k and m satisfies the fusion rule limit;

the network layer comprises a convolution layer, a nonlinear layer, a pooling layer, a full-link layer, an anti-convolution layer or an up-sampling layer.

6. The method according to any one of claims 1 to 5, wherein before the sub-network partitioning of the neural network to be optimized, further comprising:

acquiring a network splitting configuration instruction;

the subnet division is performed on the neural network to be optimized, and the subnet division comprises the following steps:

respectively carrying out subnet division on each part to be optimized;

the network layer fusion of the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target comprises the following steps:

for any part to be optimized, carrying out network layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, the preset fusion rule and the fusion target to obtain the optimal fusion result of the part to be optimized;

7. An apparatus for neural network optimization, comprising:

the optimization unit is further configured to perform network layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain an optimal fusion result of the neural network to be optimized; wherein, any one of the optimal fusion results of the neural network to be optimized comprises a plurality of network layer hierarchies, the structure of the hierarchy is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the network layer included by the hierarchy.

8. The apparatus of claim 7,

the dividing unit divides the sub-networks of the neural network to be optimized, and comprises the following steps:

the optimization unit respectively performs network layer fusion on each subnet according to a preset fusion rule and a fusion target, and the method comprises the following steps:

performing network layer fusion on the high-level subnets according to the optimal fusion result of the bottom-level subnets, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level subnets, wherein the number of the network layers included in the high-level subnets is larger than that of the network layers included in the bottom-level subnets, the structure of any one of the optimal fusion results of the high-level subnets including a plurality of network layers is consistent with that of the optimal fusion result of the target bottom-level subnets, and the target bottom-level subnets are the same as the network layers included in the hierarchy;

wherein, the number of network layers included in the lowest subnet is 1; the subnets obtained in different subnet dividing modes comprise 2 layers of subnets, and the number of the network layers of the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum in-out bandwidth;

the optimization unit performs network layer fusion on the high-level sub-network according to the optimal fusion result of the bottom sub-network, the preset fusion rule and the fusion target, and the method comprises the following steps:

if the first access bandwidth is larger than the second access bandwidth, determining that the network layer is not fused into the optimal fusion result of the 2-layer subnet;

the optimization unit performs network layer fusion on the high-level sub-network according to the optimal fusion result of the bottom sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, and the optimization method further includes:

for any high-level sub-network with the number of network layers being k, fusing the high-level sub-network according to the optimal fusion result of the bottom-level sub-network with the number of network layers being less than k, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the number of network layers of the network to be optimized;

for any subnet with k network layers, the candidate fusion scheme for fusing the network layers of the subnet comprises at least 2 network layers and at most m network layers, wherein m is less than or equal to k and satisfies the constraint of the fusion rule;

the network layer comprises a convolution layer, a nonlinear layer, a pooling layer, a full-connection layer, a reverse convolution layer or an upper sampling layer;

and/or the presence of a gas in the gas,

before the dividing unit divides the sub-network of the neural network to be optimized, the method further comprises the following steps:

acquiring a network splitting configuration instruction;

respectively carrying out subnet division on each part to be optimized;

the optimization unit performs network layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, and the method comprises the following steps:

9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor being configured to execute the machine executable instructions to implement the method of any one of claims 1 to 6.

10. A machine-readable storage medium having stored therein machine-executable instructions which, when executed by a processor, implement the method of any one of claims 1-6.