CN111931930A

CN111931930A - Model pruning method and device and electronic equipment

Info

Publication number: CN111931930A
Application number: CN202010769414.6A
Authority: CN
Inventors: 张弓
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-03
Filing date: 2020-08-03
Publication date: 2020-11-13

Abstract

The application discloses a model pruning method and device and electronic equipment. The method comprises the following steps: calculating second structure information through a second neural network based on the current first structure information of the first neural network and the corresponding set index value; the initial first neural network is constructed based on the third structural information of the third neural network; the corresponding set index value is used for updating the weight parameter of a second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after the weight parameter is updated; performing structure updating on the current first neural network based on the second structure information; under the condition that the second neural network reaches a set convergence condition, determining the first neural network with the updated structure as a model pruning result corresponding to the third neural network; in the case where the first neural network constructs an initial or first neural network structure update, the first neural network is trained based on the set training samples.

Description

Model pruning method and device and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model pruning method and device and electronic equipment.

Background

The neural network model needs more computing resources and storage resources for support, and the computing resources and the storage resources of the mobile terminal are limited, so that the application of the neural network model in the mobile terminal is limited. In the related art, the operation amount of the neural network is reduced by pruning the neural network model, so that the consumption of the calculation resource and the storage resource by the neural network is reduced. However, the process of pruning the neural network model is complex, and the method consumes much time and has low efficiency.

Disclosure of Invention

In view of this, embodiments of the present application are expected to provide a model pruning method, a model pruning device, and an electronic device, so as to solve the technical problems in the related art that the process of pruning a neural network model is complex and requires much time.

In order to achieve the purpose, the technical scheme of the application is realized as follows:

the embodiment of the application provides a model pruning method, which comprises the following steps:

calculating second structure information through a second neural network based on the current first structure information of the first neural network and the corresponding set index value; the current first neural network finishes training based on a set training sample; the initial first neural network is constructed based on the third structural information of the third neural network; the corresponding set index value is used for updating the weight parameter of the second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after the weight parameter is updated;

performing structure updating on the current first neural network based on the second structure information;

under the condition that the second neural network reaches a set convergence condition, determining the first neural network with the updated structure as a model pruning result corresponding to the third neural network; wherein the content of the first and second substances,

in the case where the first neural network constructs an initial or first neural network structure update, the first neural network is trained based on the set training samples.

In the above scheme, when the initial first neural network is constructed, the method includes:

inputting third structural information of the third neural network into the second neural network to obtain fourth structural information output by the second neural network;

and constructing an initial first neural network based on the fourth structural information.

In the above scheme, the calculating the second structural information by the second neural network includes:

testing the current first neural network by adopting at least one set test sample to obtain a test result corresponding to each test sample in the at least one set test sample; the test result represents a set index value corresponding to the corresponding test sample;

calculating a loss value corresponding to the second neural network by adopting a set loss function based on a test result corresponding to each test sample in the at least one set test sample;

updating the weight parameters of the second neural network according to the calculated loss values;

and under the condition that the weight parameters are updated, inputting the current first structural information of the first neural network into the second neural network to obtain second structural information output by the second neural network.

In the above solution, the performing structure update on the current first neural network based on the second structure information includes at least one of:

updating the current topology of the first neural network based on the topology included in the second structure information;

updating the current weight channel of the corresponding layer of the first neural network based on the number of the weight channels of the corresponding layer included in the second structural information; the weight channel number represents the number of input channels and the number of output channels of a corresponding layer;

updating the weight value of the corresponding layer of the current first neural network based on the weight precision of the corresponding layer included in the second structure information; the weight precision represents the bit number occupied by the weight value of the corresponding layer;

updating the output precision of the activation function of the corresponding layer of the current first neural network based on the output precision of the activation function of the corresponding layer included in the second structure information; the output precision of the activation function represents the precision of the output result of the corresponding activation function;

updating the current weight value of the corresponding layer of the first neural network based on the pruning threshold value of the weight of the corresponding layer included in the second structure information; when the absolute value of the weight value of the corresponding layer is smaller than the corresponding pruning threshold value, setting the corresponding weight value to zero; and when the absolute value of the weight value of the corresponding layer is greater than or equal to the corresponding pruning threshold, keeping the weight value of the corresponding layer unchanged.

In the above solution, the setting of the index value includes at least one of:

a first index value; the first index value characterizes the current performance parameter of the first neural network;

at least one second index value; the second index value characterizes a cost parameter of the current first neural network in processing each of the at least one set test sample.

In the above scheme, when the first neural network is trained based on the set training samples, the method includes:

determining the set training sample; wherein the content of the first and second substances,

the set training samples comprise at least one sample pair; each sample pair of the at least one sample pair comprises an input sample and a corresponding first calibration sample; the first calibration sample is used for comparing with an output result obtained by the first neural network under the condition of inputting a corresponding input sample; the first calibration sample is determined by a second calibration sample corresponding to the corresponding input sample and a corresponding reference sample; the second calibration sample represents a calibration sample corresponding to the input sample of the third neural network; the reference sample characterizes an output result of the third neural network obtained in case of inputting the corresponding input sample.

In the above scheme, when determining the set training sample, the method includes:

fusing a second calibration sample corresponding to the input sample and a corresponding reference sample based on the set weight value to obtain a corresponding first calibration sample; and the weight value of the second calibration sample corresponding to the input sample is greater than that of the corresponding reference sample.

The embodiment of the present application further provides a model pruning device, including:

the calculating unit is used for calculating second structure information through a second neural network based on the current first structure information of the first neural network and the corresponding set index value; the initial first neural network is constructed based on third structural information of a third neural network; the corresponding set index value is used for updating the weight parameter of the second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after the weight parameter is updated;

the updating unit is used for carrying out structure updating on the current first neural network based on the second structure information;

a determining unit, configured to determine, when the second neural network reaches a set convergence condition, the first neural network after structure update as a model pruning result corresponding to the third neural network; wherein the content of the first and second substances,

An embodiment of the present application further provides an electronic device, including: a processor and a memory for storing a computer program capable of running on the processor,

wherein the processor is configured to implement any of the above model pruning methods when running the computer program.

The embodiment of the application also provides a storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement any one of the model pruning methods.

According to the method and the device, an initial first neural network is constructed based on structural information of a third neural network needing pruning, the third structural information is calculated by adopting a second neural network used for pruning based on the first structural information of the first neural network and a corresponding set index value, a structure updating structure is carried out on the second neural network based on the calculated third structural information so as to prune the second neural network, and the first neural network after structure updating is determined as a model pruning result corresponding to the third neural network under the condition that the second neural network achieves a set convergence condition. By the mode, the structure of the whole first neural network can be updated based on the structural information calculated by the second neural network, and the first neural network does not need to be pruned layer by layer, so that the model pruning process is simplified, and the model pruning efficiency is improved; in addition, in the model pruning process, a derivative function for pruning does not need to be designed, and the model pruning method is wider in application range.

Drawings

Fig. 1 is a schematic flow chart of an implementation of a model pruning method provided in an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating an implementation of a model pruning method according to another embodiment of the present application;

fig. 3 is a schematic diagram illustrating determination of second structure information in a model pruning method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of an implementation process for constructing an initial first neural network in the model pruning method according to the embodiment of the present application;

fig. 5 is a schematic flow chart illustrating an implementation process of determining second structure information in a model pruning method according to another embodiment of the present application;

fig. 6 is a schematic structural diagram of a neural network according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a model pruning device provided in an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware component structure of an electronic device according to an embodiment of the present application.

Detailed Description

Before the technical scheme of the application is introduced, a model pruning method in the related art is introduced.

The related art provides a model pruning method, which mainly prunes a neural network layer by layer in a mode of reducing the filter dimension of a corresponding layer. In the process of pruning layer by layer, more time is needed to finally determine the filter dimension of each layer of the neural network. The dimension of the filter represents the size information of the filter, and the size information can represent the length and the width of the filter and can also represent the depth of the filter.

Another model pruning method is also provided in the related art, in which a pruning module (e.g., a threshold function for thinning out weights) is embedded in a neural network, and the embedded pruning module is updated along with weight parameters of the neural network during training of the neural network, so as to implement model pruning. In the course of training neural networks, to ensure that back propagation can proceed properly, the embedded pruning module must be a derivative function. When the embedded pruning module is not conductive, the method cannot be used for model pruning, and the application range of the model pruning method is limited.

In order to solve the technical problem, the present application provides a model pruning method, where an initial first neural network is constructed based on structural information of a third neural network that needs to be pruned, the third structural information is calculated by using a second neural network for pruning based on the first structural information of the first neural network and a corresponding set index value, a structure update structure is performed on the second neural network based on the calculated third structural information to prune the second neural network, and the first neural network after structure update is determined as a model pruning result corresponding to the third neural network when the second neural network reaches a set convergence condition. According to the technical scheme, the structure of the whole first neural network can be updated based on the structural information calculated by the second neural network, the first neural network does not need to be pruned layer by layer, the model pruning process is simplified, the model pruning efficiency is improved, in addition, in the model pruning process, a conductible function for pruning does not need to be designed, and the model pruning method is wider in application range.

The technical solution of the present application is further described in detail with reference to the drawings and specific embodiments of the specification.

Fig. 1 shows a schematic flow chart of an implementation of a model pruning method provided in an embodiment of the present application. In the embodiment of the present application, the execution main body of the model pruning method may be an electronic device such as a terminal and a server.

Referring to fig. 1, a model pruning method provided in an embodiment of the present application includes:

s101: calculating second structure information through a second neural network based on the current first structure information of the first neural network and the corresponding set index value; the current first neural network finishes training based on a set training sample; the initial first neural network is constructed based on the third structural information of the third neural network; the corresponding set index value is used for updating the weight parameter of the second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after updating the weight parameter.

Here, the current first neural network is the trained first neural network. When the electronic equipment executes S101 for the first time, the current first neural network is obtained by training the initial first neural network; and when the step S101 is executed for the Nth time, the current first neural network is obtained by training the first neural network with the updated structure obtained by the step S102 executed for the (N-1) th time. N is an integer greater than or equal to 2.

The initial first neural network is a neural network constructed based on third structural information of a trained third neural network, and the third neural network is a neural network needing pruning or an original network. The second neural network is used for pruning the current second neural network. The second neural network is different from the first neural network.

The corresponding set index value characterizes the performance of the current first neural network. The corresponding set index value can be obtained by the electronic device when the current first neural network is tested by adopting at least one set test sample. And the corresponding set index value is used for calculating a loss value of the second neural network, so that the electronic equipment updates the weight parameter of the second neural network based on the loss value of the second neural network, and outputs second structure information based on the current first structure information of the first neural network when the weight parameter is updated.

The training samples used for training the third neural network may be the same as or different from the training samples used for training the first neural network.

In practical applications, the structural information of the initial first neural network and the structural information of the third neural network may be the same or different. That is, the initial first neural network and the third neural network may be the same or different. When the initial first neural network is different from the third neural network, it may be embodied as any one of the following:

the structure of the initial first neural network is simpler than that of the third neural network; for example, the number of layers of the first neural network is less than the number of layers of the third neural network, and the number of neurons of the corresponding layer of the first neural network is less than the number of neurons of the corresponding layer in the third neural network.

The precision of the weight value of the initial first neural network is smaller than that of the weight value corresponding to the third neural network.

Referring to fig. 2 and fig. 3 together, fig. 2 is a schematic diagram illustrating an implementation flow of a model pruning method according to another embodiment of the present application; fig. 3 is a schematic diagram illustrating determination of second structure information in a model pruning method provided in an embodiment of the present application. The following describes the implementation process of the model pruning method with reference to fig. 2 and 3.

And under the condition that the electronic equipment completes training on the third neural network by adopting at least one set first training sample, constructing an initial first neural network based on third structural information of the third neural network.

The electronic equipment trains the initial first neural network by adopting at least one set second training sample to obtain a trained first neural network, and the trained first neural network is determined as the current first neural network.

In practical applications, the first training sample and the second training sample may be the same or different. Each training sample comprises an input sample and a corresponding calibration sample, and the calibration sample is used for comparing with a result output by the neural network under the condition of inputting the corresponding input sample.

Here, in the process of training the third neural network or the first neural network, the electronic device calculates a loss value of the neural network based on an output result corresponding to the input sample in the corresponding training sample and the corresponding calibration sample using the corresponding loss function. Updating a weight parameter of the neural network based on the calculated loss value. The electronic device performs back propagation on the calculated loss value in the neural network, calculates the gradient of the corresponding loss function according to the loss value in the process of back propagation of the calculated loss value to each layer of the neural network, and updates the weight parameter back propagated to the current layer along the descending direction of the gradient.

And when the neural network meets the set updating stopping condition, stopping updating the weight parameters of the neural network, and determining the weight parameters obtained by the last updating as the weight parameters used by the trained neural network. The set update stopping condition may be a loss function convergence of the neural network, or may be a set training round (epoch), where one training round is a process of training the neural network once according to an input sample and a corresponding calibration sample in at least one training sample. Of course, the update stop condition is not limited to this, and may be, for example, a set Average accuracy (mAP) or the like. The average accuracy is calculated based on the output results corresponding to the input samples in all the training samples participating in the training and the corresponding calibration samples.

Under the condition that the current first neural network is trained, the electronic equipment determines a set index value corresponding to the current first neural network, and calculates a loss value of the second neural network by using the set first loss function based on the determined set index value. The test sample is different from the training sample, and the training sample includes the first training sample and the second training sample.

In the case of calculating the loss value of the second neural network, the electronic device detects whether the second neural network reaches a set convergence condition. Here, the convergence condition may be the weight update frequency (or the back propagation frequency) or the convergence of the first loss function. The first loss function convergence characterizes that the loss value of the first loss function tends to be stable, or that the loss value of the first loss function approaches a certain constant.

And when the detection result represents that the second neural network does not reach the set convergence condition, the electronic equipment updates the weight parameters of the second neural network based on the calculated loss value. And when the detection result represents that the second neural network reaches the set convergence condition, stopping updating the weight parameters of the second neural network. The electronic device performs back propagation on the calculated loss value in the second neural network, calculates a gradient of the first loss function according to the calculated loss value in the process of back propagation of the calculated loss value to each layer of the second neural network, and updates the weight parameter back propagated to the current layer along the descending direction of the gradient.

After updating the weight parameters of the second neural network, or when stopping updating the weight parameters of the second neural network, the electronic device inputs the current first structural information of the first neural network into the second neural network to obtain second structural information output by the second neural network.

It should be noted that, the aforementioned that the first training sample is the same as the second training sample means that the input sample in the first training sample is the same as the input sample in the second training sample, and the corresponding calibration sample in the first training sample is also the same as the corresponding calibration sample in the second training sample.

When the first training sample is different from the second training sample, there are two cases:

the input sample in the first training sample is the same as the input sample in the second training sample, and the corresponding calibration sample is different; for example, the corresponding calibration sample in the first training sample is the calibration sample of the corresponding input sample, and the corresponding calibration sample in the second training sample is the output result obtained by the third neural network under the condition that the corresponding input sample is input;

the input samples in the first training sample are different from the input samples in the second training sample, and the corresponding calibration samples are also different.

S102: and updating the structure of the current first neural network based on the second structure information.

And under the condition that the second structure information calculated by the second neural network is obtained, the electronic equipment carries out pruning processing on the current first neural network based on the second structure information so as to carry out structure updating on the current first neural network.

The second structure information may include at least one of:

the topology, the number of weight channels of the corresponding layer, the weight precision of the corresponding layer, the output precision of the activation function of the corresponding layer, the pruning threshold value of the weight of the corresponding layer, and the like.

The weight channel number represents the number of input channels and the number of output channels of the corresponding layer. The weight precision of the corresponding layer represents the bit number occupied by the weight of the corresponding layer. The output precision of the activation function represents the precision of the output result of the corresponding activation function. The clipping threshold of the weight of the corresponding layer is used to set the weight value satisfying the set clipping condition to zero. Here, the setting of the trimming conditions includes any one of: a pruning threshold value of the weight value that is less than the corresponding weight; the absolute value of the weight value is less than the pruning threshold of the corresponding weight.

After the structure of the current first neural network is updated, the electronic equipment trains the first neural network by adopting at least one set second training sample to obtain a trained first neural network, and the trained first neural network is determined as the current first neural network.

And when the detection result represents that the second neural network does not reach the set convergence condition and the training of the first neural network after the structure is updated is finished, returning to S101, and executing S101-S102 again. Here, when the detection result indicates that the second neural network does not reach the set convergence condition, it is necessary to perform training again based on the set index value corresponding to the first neural network after the structure update.

And when the detection result represents that the second neural network reaches the set convergence condition, stopping updating the weight parameters of the second neural network, and determining the weight parameters obtained by the last updating as the weight parameters used by the second neural network by the electronic equipment. The electronic device executes S103 after performing structure update on the current first neural network based on the second structure information output by the second neural network.

S103: under the condition that the second neural network reaches a set convergence condition, determining the first neural network with the updated structure as a model pruning result corresponding to the third neural network; and under the condition that the first neural network constructs initial or the first neural network structure is updated, training the first neural network based on the set training sample.

Here, when the second neural network reaches the first set convergence condition, the electronic device determines the first neural network after the structure update in S102 as the model pruning result corresponding to the third neural network. The first neural network after structure updating is the first neural network obtained by updating second structure information output after the weight parameters are updated for the last time based on the second neural network.

In some embodiments, the electronic device may also train the first neural network after the last structure update by using at least one set second training sample, and determine the trained first neural network as a model pruning result corresponding to the third neural network.

In the technical scheme provided in this embodiment, an initial first neural network is constructed based on structural information of a third neural network (i.e., an original network), second structural information is calculated by using a second neural network for pruning based on first structural information of the first neural network and a corresponding set index value, the first neural network is structurally updated based on the calculated second structural information, so as to prune the constructed first neural network, and the first neural network after structure update is determined as a model pruning result corresponding to the third neural network when the second neural network reaches a set convergence condition. By the mode, the structure of the whole first neural network can be updated based on the structural information calculated by the second neural network, and the first neural network does not need to be pruned layer by layer, so that the model pruning process is simplified, and the model pruning efficiency is improved; in addition, in the model pruning process, a derivative function for pruning does not need to be designed, and the model pruning method is wider in application range.

As another embodiment of the present application, when training the first neural network based on the set training samples, the method includes:

Here, the second calibration sample is used for comparison with an output result obtained when the corresponding input sample is input to the third neural network.

In the technical solution provided in this embodiment, the first calibration sample in the training samples of the first neural network is calculated from the second calibration sample corresponding to the corresponding input sample and the corresponding reference sample, so that the performance of the first neural network is closer to the performance of the third neural network.

As another embodiment of the present application, in the determining the set training sample, the method includes:

Here, the sum of the weight value of the second calibration sample corresponding to the input sample and the weight value of the corresponding reference sample is 1.

In order to improve the accuracy or precision of the first neural network, the weight value of the second calibration sample corresponding to the input sample is greater than the weight value of the corresponding reference sample. For example, the weight value of the second calibration sample corresponding to the input sample is 0.8, and the weight value of the reference sample corresponding to the input sample is 0.2.

As another embodiment of the present application, fig. 4 shows an implementation flow diagram of constructing an initial first neural network in the model pruning method provided by the embodiment of the present application. Referring to fig. 4, in constructing an initial first neural network, the method includes:

s201: and inputting the third structural information of the third neural network into the second neural network to obtain fourth structural information output by the second neural network.

And the electronic equipment initializes the second neural network, and under the condition that the training of the third neural network is finished, the electronic equipment inputs the third structural information of the third neural network into the second neural network to obtain the fourth structural information output by the second neural network.

S202: and constructing an initial first neural network based on the fourth structural information.

Here, when the third structural information and the fourth structural information are different, an initial first neural network is constructed based on the fourth structural information. In practical application, an initial first neural network can be constructed based on information such as a topological structure and a weight parameter in the fourth structural information; or copying a third neural network, and pruning the copied neural network based on the fourth structural information to obtain the initial first neural network.

And under the condition that the third structural information and the fourth structural information are the same, copying the third neural network to obtain the initial first neural network.

In the technical solution provided in this embodiment, an initial first neural network having performance close to that of a third neural network may be constructed through the second neural network based on structural information of the third neural network.

As another embodiment of the present application, fig. 5 shows a schematic implementation flow diagram for determining second structure information in a model pruning method provided by another embodiment of the present application. Referring to fig. 5, the calculating of the second structural information by the second neural network includes:

s301: testing the current first neural network by adopting at least one set test sample to obtain a test result corresponding to each test sample in the at least one set test sample; and the test result represents a set index value corresponding to the corresponding test sample.

Under the condition that the current first neural network is trained, the electronic equipment inputs at least one set test sample into the current first neural network, and obtains a test result corresponding to each test sample in the at least one set test sample output by the current first neural network.

Here, the test result corresponding to the test sample may be an output result obtained when the corresponding test sample is input to the first neural network, or may be related data acquired in a process of testing the first neural network.

In practical application, the training sample and the test sample corresponding to the current first neural network are different. In testing the current first neural network, the test is performed once each time with one test sample.

In one embodiment, the setting of the indicator value comprises at least one of:

at least one second index value; the second index value characterizes a cost parameter of the current first neural network in processing each of the at least one set test sample; wherein the content of the first and second substances,

the second index value includes at least one of:

the current computation load of the first neural network;

a current bandwidth of the first neural network; the bandwidth represents the sum of the data quantity transmitted by the current first neural network in unit time;

the storage space occupied by the current weight parameter of the first neural network;

a current execution time of the first neural network; the execution time represents the duration corresponding to the current first neural network executing one round of test;

current execution power consumption of the first neural network; the execution power consumption represents the power consumption corresponding to the current first neural network executing one round of test.

The current computation amount of the first neural network may be measured by the number of times of Multiply Accumulate operations (MAC) performed by the first neural network.

S302: and calculating a loss value corresponding to the second neural network by adopting a set loss function based on the test result corresponding to each test sample in the at least one set test sample.

The electronic device may determine a set index value corresponding to the test sample based on a test result corresponding to the test sample, and calculate a loss value corresponding to the second neural network using the set loss function.

In practical applications, when the set index value includes the first index value and the at least one second index value, the set loss function may be expressed by: loss is cost- λ × effect.

Wherein Loss represents a Loss value; the cost represents at least one second index value of the current first neural network, and the higher the value of the cost is, the higher the cost is, the cost for representing the first neural network is; the effect represents the current first index value of the first neural network, and the higher the value of the effect, the better the performance of the first neural network. λ is a set constant. Here, when the set index value includes at least two second index values, the cost corresponds to the total second index value. The electronic device may perform weighting processing on the at least two second index values to obtain a total second index value.

In one embodiment, the set loss function may also be expressed as:

Loss_n＝(cost_n–λ×effect_n)-(cost_n-1–λ×effect_n-1)。

therein, Loss_nRepresenting a loss value corresponding to the current first neural network tested for the nth time; cost_nCharacterizing a second index value corresponding to the current first neural network tested for the nth time; effect_nCharacterizing a first index value corresponding to the current first neural network tested for the nth time; cost_n-1Characterizing a second index value corresponding to the current first neural network tested for the (n-1) th time; effect_n-1And characterizing a first index value corresponding to the current first neural network tested at the (n-1) th time. n is an integer greater than or equal to 1.

S303: and updating the weight parameters of the second neural network according to the calculated loss values.

In the case where a loss value of the second neural network is calculated, the electronic device updates the weight parameter of the second neural network based on the calculated loss value. The electronic device performs back propagation on the calculated loss value in the second neural network, calculates a gradient of the first loss function according to the calculated loss value in the process of back propagation of the calculated loss value to each layer of the second neural network, and updates the weight parameter back propagated to the current layer along the descending direction of the gradient.

S304: and under the condition that the weight parameters are updated, inputting the current first structural information of the first neural network into the second neural network to obtain second structural information output by the second neural network.

In practical application, the first neural network and the third neural network may both be image super-resolution networks. As shown in fig. 6, the first neural network includes a first convolutional layer, a second convolutional layer, and a third convolutional layer. The first neural network is used for performing super-resolution processing on the first image and outputting a second image; wherein the resolution of the second image is greater than the resolution of the first image. The number of convolution kernels of the first convolutional layer (or the number of output channels) is m1, the number of convolution kernels of the second convolutional layer is m2, the number of convolution kernels of the third convolutional layer is 1, and the number of convolution kernels of all convolutional layers can be 3 × 3.

Since the output channel of the third convolutional layer in the first network is limited by the function and must be 1, the second neural network is used to optimize the number of output channels m1 of the first convolutional layer and the number of output channels m2 of the second convolutional layer, thereby optimizing the first index value and the second index value of the first neural network. In practical applications, m1 is 64, m2 is 32; m '1 and m' 2 in the second structure information are both positive integers. m '1 and m' 2 satisfy any one of: m' 1 is less than m 1; m' 2 is less than m 2.

The first index value of the first neural network at least comprises a Peak Signal to Noise Ratio (PSNR) of the second image; the second index value of the first neural network at least comprises the weight number of the neural network, and the weight number is calculated based on the size of the convolution kernel and the weight channel number. The size of the convolution kernel characterizes the length and width of the convolution kernel and may also characterize the depth of the convolution kernel.

Wherein the weight number of the first neural network is 3 × 3 × (m1+ m1 × m2+ m 2). Wherein, the weight channel number of the first neural network corresponds to (m1+ m1 × m2+ m 2).

Accordingly, the loss function of the second neural network may be expressed as:

Loss_n＝(PSNR_n–λ×quantity_n)-(PSNR_n-1–λ×quantity_n-1)。

PSNR_ncharacterizing a peak signal-to-noise ratio, quality, corresponding to the nth test of the first neural network_nRepresenting the corresponding weight number when the first neural network is tested for the nth time; PSNR_n-1Characterizing the corresponding peak signal-to-noise ratio, quality, at the time of the (n-1) th test on the first neural network_n-1And characterizing the corresponding weight quantity when the first neural network is tested for the (n-1) th time. n is an integer greater than or equal to 1.

Wherein, when n is 1, PSNR_n-1And qualification_n-1May all be zero.

The second neural network inputs the number m '1 of output channels of the first convolutional layer and the number m' 2 of output channels of the second convolutional layer, which are included in the second structure information of the first output, and are both positive integers.

In the technical solution provided in this embodiment, a loss function corresponding to the second neural network is calculated based on the first index value and the second index value of the first neural network, the weight parameter of the second neural network is updated based on the calculated loss function, and the first structural information of the first neural network is input to the second neural network under the condition that the weight parameter of the second neural network is updated, so as to obtain the second structural information output by the second neural network. The first index value and the second index value of the first neural network are considered in the process of training the second neural network, and the first neural network obtained by updating based on the second structure information can achieve balance between performance and cost.

As another embodiment of the present application, in S102, the performing a structure update on the current first neural network based on the second structure information includes at least one of:

updating the current topology of the first neural network based on the topology included in the second structure information; the topological structure can represent information including the number of layers, the name of each layer, the topological structure of each layer and the like of the first neural network; the topological structure of the corresponding layer can represent the number of neurons included in the corresponding layer, the connection relation among the neurons and the like;

In practical applications, the second structure information may be represented in the form of an array, a matrix, or the like. For example, a first array is used to represent the topology of each layer in a first neural network; representing the number of weight channels of each layer in the first neural network by using a second array; representing the weight precision of each layer in the first neural network by adopting a third array; adopting a fourth array to represent the output precision of the activation function of each layer in the first neural network; a fifth array is used to represent the pruning threshold for the weight of each layer in the first neural network. Wherein the content of the first and second substances,

when the number of weight channels of the corresponding layer in the first neural network is less than or equal to zero, the characterization electronic device deletes the layer when updating the structural information of the first neural network. The weight channel number less than or equal to zero means that both the input channel number and the output channel number of the corresponding layer are less than or equal to zero.

When the bit number occupied by the weight value of the corresponding layer represented by the weight precision included in the second structure information of the first neural network is k, the electronic equipment updates the weight value of the corresponding layer based on the absolute value of the weight value w of the corresponding layer when the current first neural network is structurally updated. When the absolute value of the weight value of the current layer corresponding to the first neural network is less than 1, updating the weight value of the current layer corresponding to the first neural network to int (w × 2)ⁿ)/2ⁿHere, int (w × 2)ⁿ) Pair w × 2ⁿAnd cutting off and rounding. When the absolute value of the weight value of the current corresponding layer of the first neural network is greater than or equal to 1, updating the weight value of the current corresponding layer of the first neural network to int (w/2)ⁿ)×2ⁿ，int(w/2ⁿ) Represents the pair w/2ⁿAnd cutting off and rounding.

It should be noted that, the weight precision of the corresponding layer of the first neural network is used to reduce the number of bits occupied by the weight value of the corresponding layer, so as to reduce the storage amount and the computation amount of the first neural network. For example, the weight precision representation in the first structural information indicates that the weight value of the corresponding layer occupies 32 bits (bit), the weight value representation in the second structural information indicates that the weight value of the corresponding layer occupies 8 bits (namely, a numerical value corresponding to a high eight-bit is reserved), after the first neural network is structurally updated based on the second structural information, the storage space occupied by the first neural network is reduced by three quarters, and the hardware multiplier resources consumed by operation are also greatly reduced.

When the output precision of the activation function included in the second structure information of the first neural network represents that the number of bits occupied by the output result of the activation function of the corresponding layer is j, the electronic device updates the output precision of the activation function of the corresponding layer based on the absolute value of the output result of the activation function of the corresponding layer after the current first neural network is updated in structure. When the absolute value of the output result q of the activation function of the current layer corresponding to the first neural network is smaller than 1, the output result of the activation function of the current layer corresponding to the first neural network is updated to int (q × 2j)/2j, where int (q × 2j) represents rounding off q × 2 j. When the absolute value of the output result q of the activation function of the current layer corresponding to the first neural network is greater than or equal to 1, updating the output result of the activation function of the current layer corresponding to the first neural network to int (q/2j) x 2 j; here, int (q/2j) denotes truncating q/2j by a round.

It should be noted that, by reducing the output accuracy of the activation function of the first neural network, the bandwidth required by the first neural network can be reduced, and the computation amount of the first neural network can also be reduced.

When the absolute value of the weight value of the corresponding layer of the first neural network is smaller than the corresponding pruning threshold, the corresponding weight value is set to zero, so that the pruning threshold of the weight can improve the proportion of zero in the weight value, and the compression rate of the weight is improved.

In order to implement the method according to the embodiment of the present application, an embodiment of the present application further provides a model pruning device, which is disposed on an electronic device such as a terminal or a server, and as shown in fig. 7, the model pruning device includes:

a calculating unit 71, configured to calculate second structure information through a second neural network based on current first structure information of the first neural network and a corresponding setting index value; the current first neural network finishes training based on a set training sample; the initial first neural network is constructed based on the third structural information of the third neural network; the corresponding set index value is used for updating the weight parameter of the second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after the weight parameter is updated;

an updating unit 72, configured to perform a structure update on the current first neural network based on the second structure information;

a determining unit 73, configured to determine, when the second neural network reaches a set convergence condition, the first neural network after structure update as a model pruning result corresponding to the third neural network; wherein the content of the first and second substances,

In an embodiment, in constructing the initial first neural network, the computing unit 71 is further configured to:

In an embodiment, the calculating unit 71, when calculating the second structure information through the second neural network, is configured to:

and under the condition that the weight parameters of the second neural network are updated, inputting the current first structural information of the first neural network into the second neural network to obtain second structural information output by the second neural network.

In an embodiment, the performing the structure update on the current first neural network based on the second structure information includes at least one of:

a second index value; the second index value characterizes a cost parameter of the current first neural network in processing each of the at least one set test sample; wherein the content of the first and second substances,

the second index value includes at least one of:

the current computation load of the first neural network;

In an embodiment, in the case that the first neural network constructs an initial or first neural network structure update, when the first neural network is trained based on a set training sample, the determining unit 73 is configured to determine the set training sample; wherein the content of the first and second substances,

In an embodiment, the determining unit 73 is configured to: fusing a second calibration sample corresponding to the input sample and a corresponding reference sample based on the set weight value to obtain a corresponding first calibration sample; and the weight value of the second calibration sample corresponding to the input sample is greater than that of the corresponding reference sample.

In practical applications, the units included in the model pruning device can be implemented by a processor in the model pruning device. Of course, the processor needs to run the program stored in the memory to realize the functions of the above-described program modules.

It should be noted that: in the model pruning device provided in the above embodiment, when performing model pruning, only the division of each program module is illustrated, and in practical applications, the processing may be distributed to different program modules according to needs, that is, the internal structure of the model pruning device is divided into different program modules to complete all or part of the processing described above. In addition, the model pruning device provided by the above embodiment and the model pruning method embodiment belong to the same concept, and the specific implementation process thereof is described in the method embodiment and will not be described herein again.

Based on the hardware implementation of the program module, in order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an electronic device. Fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 8, the electronic device includes:

a communication interface 1 capable of information interaction with other devices such as network devices and the like;

and the processor 2 is connected with the communication interface 1 to realize information interaction with other equipment, and is used for executing the model pruning method provided by one or more technical schemes when running a computer program. And the computer program is stored on the memory 3.

In practice, of course, the various components in the electronic device are coupled together by the bus system 4. It will be appreciated that the bus system 4 is used to enable connection communication between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. For the sake of clarity, however, the various buses are labeled as bus system 4 in fig. 8.

The memory 3 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.

It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced Synchronous Dynamic Random Access Memory), Synchronous linked Dynamic Random Access Memory (DRAM, Synchronous Link Dynamic Random Access Memory), Direct Memory (DRmb Random Access Memory). The memory 3 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiment of the present application may be applied to the processor 2, or implemented by the processor 2. The processor 2 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 2. The processor 2 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 2 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 3, and the processor 2 reads the program in the memory 3 and in combination with its hardware performs the steps of the aforementioned method.

When the processor 2 executes the program, the corresponding processes in the methods according to the embodiments of the present application are realized, and for brevity, are not described herein again.

In an exemplary embodiment, the present application further provides a storage medium, i.e. a computer storage medium, specifically a computer readable storage medium, for example, including a memory 3 storing a computer program, which can be executed by a processor 2 to implement the steps of the foregoing method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing module, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model pruning, comprising:

calculating second structure information through a second neural network based on the current first structure information of the first neural network and the corresponding set index value; the initial first neural network is constructed based on third structural information of a third neural network; the corresponding set index value is used for updating the weight parameter of the second neural network, and the second neural network is used for outputting the second structure information based on the input first structure information after the weight parameter is updated;

2. The method of claim 1, wherein in constructing the initial first neural network, the method comprises:

3. The method of claim 1, wherein calculating the second structural information by the second neural network comprises:

and under the condition that the weight parameters are updated by the second neural network, inputting the current first structural information of the first neural network into the second neural network to obtain second structural information output by the second neural network.

4. The method of any one of claims 1-3, wherein the performing the structure update on the current first neural network based on the second structure information comprises at least one of:

5. The method according to claim 1 or 3, wherein the setting of the indicator value comprises at least one of:

6. The method of claim 1, wherein when training the first neural network based on the set training samples, the method comprises:

7. The method of claim 6, wherein in the determining the set training sample, the method comprises:

8. A model pruning device, comprising:

9. An electronic device, comprising: a processor and a memory for storing a computer program capable of running on the processor,

wherein the processor is adapted to perform the steps of the method of any one of claims 1 to 7 when running the computer program.

10. A storage medium having a computer program stored thereon, the computer program, when being executed by a processor, realizing the steps of the method of any one of claims 1 to 7.