CN115374936A

CN115374936A - Neural network model clipping method, device, equipment and medium

Info

Publication number: CN115374936A
Application number: CN202211250546.3A
Authority: CN
Inventors: 赵东艳; 李德建; 种挺; 任增民; 马俊; 李雷
Original assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Current assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Priority date: 2022-10-13
Filing date: 2022-10-13
Publication date: 2022-11-22

Abstract

The disclosure relates to the technical field of neural network compression, in particular to a neural network model clipping method, a neural network model clipping device and a medium, wherein the neural network model clipping method comprises the following steps: acquiring the environmental state information of a neural network model to be cut through reinforcement learning; acquiring the target clipping rate of each layer in the neural network model to be clipped according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function; in response to the fact that the target reward value is larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model. The method not only reduces the calculated amount of the neural network model, but also reduces the influence of errors possibly occurring when the cut neural network model runs on the whole system, thereby ensuring that the reliability and the energy consumption of the cut neural network model meet the requirements.

Description

Neural network model clipping method, device, equipment and medium

Technical Field

The disclosure relates to the technical field of neural network compression, in particular to a neural network model clipping method, device, equipment and medium.

Background

With the rapid development of neural network algorithms and neural network hardware chips, the application of neural networks in the industries of image processing, power systems, and the like becomes a promising solution. The neural network model has the advantages of strong and rapid parallel computing capability, high fault tolerance, strong learning capability and the like. In general, the neural network model requires a very large computational cost and memory space, and parameters or memory space of the neural network model can be reduced by neural network compression.

In the related art, the model clipping technology is a mainstream compression scheme. Specifically, the model clipping technique is to remove unimportant weights (connections) or neurons in the neural network model, and greatly reduce the size and the calculation amount of the model without losing the model precision, thereby achieving the effect of reducing energy consumption.

However, even if the neural network model is cut according to the above scheme, the calculation amount of the neural network model is reduced, but the probability that an error may occur when the cut neural network model runs under partial conditions is high, the error may propagate layer by layer in the neural network model to affect the output of the whole system, so that the reliability of the system is greatly reduced, and thus the accuracy of image processing performed on the cut neural network model is low, and how to ensure that the reliability and the energy consumption of the cut neural network model can meet the demand becomes an urgent problem to be solved.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure provide a neural network model clipping method, apparatus, device, and medium.

In a first aspect, a neural network model clipping method is provided in the embodiments of the present disclosure.

Specifically, the neural network model clipping method includes:

acquiring environment state information of a neural network model to be cut through reinforcement learning, wherein the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;

acquiring the target clipping rate of each layer in the neural network model to be clipped according to the environment state information;

obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate;

in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate;

and carrying out image processing on the target object according to the target neural network model obtained after cutting.

With reference to the first aspect, in a first implementation manner of the first aspect, before obtaining a target bonus value according to the target clipping rate and a reward function, the method further includes:

obtaining the reliability of the cut neural network model according to the reliability evaluation function;

acquiring the energy consumption of the cut neural network model according to an energy consumption evaluation function;

and acquiring the return function according to the reliability of the cut neural network model and the energy consumption of the cut neural network model.

With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, before the obtaining, according to the reliability evaluation function, the reliability of the trimmed neural network model, the method further includes:

acquiring a system architecture reliability parameter and a neuron sensitivity parameter;

constructing the reliability evaluation function according to the system architecture reliability parameter and the neuron sensitivity parameter;

the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the cut neural network model on the reliability of the cut neural network model, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the cut neural network model on the reliability of the whole cut neural network model.

With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the obtaining a reliability parameter of an architecture includes:

according to

Obtaining the system architecture reliability parameters

；

Wherein the content of the first and second substances,

for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,

for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,

for the multiply-add computation of all layers in the trimmed neural network model,

for the ith layer in the neural network model after clippingThe amount of memory access of (a),

the memory access amount of all layers in the trimmed neural network model is obtained.

With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the obtaining the reliability of the neural network model after the clipping according to the reliability evaluation function includes:

according to

Obtaining reliability of a tailored neural network model

；

Wherein the content of the first and second substances,

for the purpose of the architecture reliability parameters described,

for the parameter of the sensitivity of the neuron,

for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the layer i in the neural network model after cutting on hardware,

for after cuttingThe memory access amount of the ith layer in the neural network model,

for the memory access of all layers in the trimmed neural network model,

is the neuron sensitivity parameter.

With reference to the first aspect and the first implementation manner of the first aspect, in a fifth implementation manner of the first aspect, before the obtaining, according to the energy consumption evaluation function, energy consumption of the trimmed neural network model, the method further includes:

acquiring full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;

and constructing the energy consumption evaluation function according to the full-load operation energy consumption and the bottleneck operation energy consumption.

With reference to the first aspect, the first implementation manner of the first aspect, and the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the obtaining full-load operation energy consumption of the trimmed neural network model includes:

acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;

obtaining a first estimated power according to the first calculated amount and the first memory access amount;

and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the basic feature information includes at least one of the following:

the number of layers of the neural network model, the number of channels of the input feature map, the length of the input feature map, the width of the input feature map, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.

With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and the sixth implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the enhanced feature information includes at least one of the following:

energy consumption of the neural network model after cutting, reliability of the neural network model after cutting and distribution parameters of each layer in the neural network model after cutting.

In a second aspect, an embodiment of the present disclosure provides a neural network model clipping apparatus.

Specifically, the neural network model clipping device includes:

the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is configured to obtain environment state information of a neural network model to be cut through reinforcement learning, and the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;

the second acquisition module is configured to acquire a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;

a third obtaining module, configured to obtain a target reward value according to the target cutting rate and a reward function, where the reward function is used to indicate a correspondence between the cutting rate and reliability and energy consumption of the neural network model after cutting, and the target reward value is used to indicate a benefit of the target neural network model obtained after cutting the neural network model to be cut according to the target cutting rate;

the first execution module is configured to respond to the target reward value being larger than or equal to a preset threshold value, and cut the neural network model to be cut according to the target cutting rate;

and the processing module is configured to perform image processing on the target image according to the target neural network model obtained after cutting.

With reference to the second aspect, in a first implementation manner of the second aspect, the neural network model clipping device further includes:

the fourth obtaining module is configured to obtain the reliability of the neural network model after cutting according to the reliability evaluation function;

the fourth obtaining module is further configured to obtain the energy consumption of the trimmed neural network model according to an energy consumption evaluation function;

and the fifth obtaining module is configured to obtain the return function according to the reliability of the trimmed neural network model and the energy consumption of the trimmed neural network model.

With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, an embodiment of the present disclosure further includes:

a sixth obtaining module configured to obtain an architecture reliability parameter and a neuron sensitivity parameter;

a second execution module configured to construct the reliability evaluation function according to the architecture reliability parameter and the neuron sensitivity parameter;

With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the sixth obtaining module is configured to:

according to

Obtaining the system architecture reliability parameters

；

Wherein, the first and the second end of the pipe are connected with each other,

for the calculated quantity of multiplication and addition of the ith layer in the neural network model after clipping,

for the memory access amount of the ith layer in the trimmed neural network model,

With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the fourth obtaining module is configured to:

according to

Obtaining reliability of a tailored neural network model

；

Wherein the content of the first and second substances,

for the purpose of the architecture reliability parameters described,

for the parameter of the sensitivity of the neuron,

for the multiply-add computation of all layers in the pruned neural network model,

to tailor the memory access volume of all layers in the neural network model,

is the neuron sensitivity parameter.

With reference to the second aspect and the first implementation manner of the second aspect, in a fifth implementation manner of the second aspect, an embodiment of the present disclosure further includes:

the seventh obtaining module is configured to obtain full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;

a third execution module configured to construct the energy consumption assessment function according to the full load operation energy consumption and the bottleneck operation energy consumption.

With reference to the second aspect, the first implementation manner of the second aspect, and the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the seventh obtaining module is configured to:

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the basic feature information includes at least one of the following:

the number of layers of the neural network model to be cut, the number of channels for inputting the characteristic diagram, the length of the input characteristic diagram, the width of the input characteristic diagram, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.

With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and the sixth implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the enhanced feature information includes at least one of the following:

the energy consumption of the neural network model after cutting, the reliability of the neural network model after cutting and the distribution parameters of each layer in the neural network model after cutting.

In a third aspect, the disclosed embodiments provide an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to any one of claims 1 to 9.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 9.

According to the technical scheme provided by the embodiment of the disclosure, the neural network model is cut through the scheme, so that the calculated amount of the neural network model can be reduced, and the influence of errors possibly occurring in the operation of the cut neural network model on the whole system is reduced, thereby ensuring that the reliability and the energy consumption of the cut neural network model meet the requirements, and improving the accuracy of image processing of the cut neural network model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings.

Fig. 1 shows a flow diagram of a neural network model clipping method according to an embodiment of the present disclosure.

Fig. 2 shows a block diagram of a neural network model clipping device according to an embodiment of the present disclosure.

Fig. 3 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

FIG. 4 shows a schematic block diagram of a computer system suitable for use in implementing a method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should also be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In the present disclosure, if an operation of acquiring user information or user data or an operation of presenting user information or user data to others is involved, the operations are all operations authorized, confirmed by a user, or actively selected by the user.

As mentioned above, with the rapid development of neural network algorithms and neural network hardware chips, the application of neural networks to power systems becomes a promising solution, and the neural network model has the advantages of powerful and rapid parallel computing capability, high fault tolerance, strong learning capability, and the like. In general, the neural network model requires a very large computational cost and memory space, and parameters or memory space of the neural network model can be reduced by neural network compression.

In the related art, the model clipping technique is a mainstream compression scheme. Specifically, the model clipping technique is to remove unimportant weights (connections) or neurons in the neural network model, and greatly reduce the size and the calculation amount of the model without losing the model precision, thereby achieving the effect of reducing energy consumption.

However, even if the neural network model is cut according to the above scheme, the calculation amount of the neural network model is reduced, but the probability that an error may occur when the cut neural network model runs under some conditions is high, and the error propagates layer by layer in the neural network model to affect the output of the whole system, so that the reliability of the system is greatly reduced, and therefore, how to ensure that the reliability and the energy consumption of the cut neural network model can meet the demand becomes an urgent problem to be solved.

In view of the technical defects, the embodiment of the present disclosure provides a neural network model clipping method, which obtains environment state information of a neural network model to be clipped through reinforcement learning, where the environment state information includes basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. By means of the technical scheme, the neural network model is cut, the calculated amount of the neural network model can be reduced, the influence of errors possibly occurring when the cut neural network model runs on the whole system is reduced, the reliability and the energy consumption of the cut neural network model meet requirements, and the accuracy of image processing of the cut neural network model is improved.

Fig. 1 shows a flow diagram of a neural network model clipping method according to an embodiment of the present disclosure. As shown in fig. 1, the neural network model clipping method may include the following steps S101 to S105:

in step S101, the environmental state information of the neural network model to be clipped is acquired through reinforcement learning.

The environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut.

In step S102, a target clipping rate of each layer in the neural network model to be clipped is obtained according to the environment state information.

In step S103, a target bonus value is obtained according to the target clipping rate and the reward function.

The return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate.

In step S104, in response to that the target reward value is greater than or equal to a preset threshold, the neural network model to be clipped is clipped according to the target clipping rate.

In step S105, image processing is performed on the target image according to the target neural network model obtained by the clipping.

In an embodiment of the present disclosure, the neural network model clipping method may be applied to a computer, an electronic device, and the like for neural network model clipping.

Reinforcement learning is a branch of machine learning and can be considered as a method of learning in the exploration process. In reinforcement learning, the subject of learning is the reinforcement learning agent, and the designer does not provide the agent with a supervisory signal. Instead, the agent predicts its next activity at each moment in time, and gets a reward signal for each activity in the interaction with the environment. Through the different reward signals, the intelligent agent can gradually change the behavior prediction rule of the intelligent agent, so that the reward accumulated by a series of behaviors is maximum, and the optimal solution of the target problem is automatically explored. Reference may be made to the detailed description in the related art, which is not repeated in the embodiments of the present disclosure.

In a possible application scenario, in an electric power system, a series of image data acquired by electric power equipment, a large amount of acquired historical operation data, failure log data and the like can be used as a training data set, so that the data sets can be input into a neural network and trained in combination with a preset target value to obtain a trained neural network model, namely a neural network model to be cut. The target value may be set manually by a technician or obtained according to historical data. For example, the neural network model may be applied to aspects of power system transmission stability analysis, load detection, static and dynamic stability analysis, fault prediction, and the like.

In another possible application scenario, the neural network model may also be applied in the field of image processing, for example, image recognition, edge monitoring of an image, image segmentation, image compression, image restoration, and the like.

In an embodiment of the present disclosure, the neural network model to be pruned may include a deep neural network model, a convolutional neural network model, a recurrent neural network model, and the like.

In an embodiment of the present disclosure, the basic feature information may be understood as inherent to the network model to be cut, and the basic feature information may be estimated in an offline manner.

In an embodiment of the present disclosure, the basic feature information may include at least one of:

In an embodiment of the present disclosure, the enhanced feature information may represent a variation amount that is easy to occur in the neural network model, where the variation amount will vary with a variation of each iterative clipping rate, and may be obtained in an online manner after each clipping of the neural network model to be clipped.

In an embodiment of the present disclosure, the enhanced feature information may include at least one of:

In an embodiment of the present disclosure, both the energy consumption of the trimmed neural network model and the reliability of the trimmed neural network model may be understood as the energy consumption and the reliability obtained by the evaluation, and specific reference may be made to the detailed description in the following examples.

In an embodiment of the present disclosure, the distribution parameters of each layer in the trimmed neural network model may be understood as statistical parameters, and may include any one of the following: maximum, minimum, median, mean, variance, and the like.

In an embodiment of the present disclosure, the enhanced feature information may further include information such as a historical cropping rate.

In the embodiment, after the reinforcement learning is used for acquiring the environmental state information of the neural network model to be cut, the proportion of unimportant convolution kernels in each layer in the cutting neural network model can be identified, and the target cutting rate of each layer is generated. Further, the number of convolution kernels is different for each layer, and the target clipping rate is correspondingly different for each layer.

In an embodiment of the present disclosure, after obtaining the target clipping rate of each layer in the neural network model to be clipped, the number of convolution kernels that need to be clipped in each layer can be obtained by obtaining the target clipping rate of each layer and the total number of convolution kernels in the current layer, that is, the number of convolution kernels to be clipped in each layer is determined under the current situation. And acquiring the reliability of the convoluted neurons of each layer, taking the reliability as a score, and determining the cut convolution kernels of each layer according to the target cutting rate.

In this embodiment, the convolution kernels with high fault tolerance of each layer in the neural network model to be pruned can be deleted in turn, and the convolution kernels with low fault tolerance will be preserved.

Alternatively, the preset threshold may be understood as a maximum reward value, and the preset threshold may be set by a user in a customized manner or not obtained according to historical experience data.

In one embodiment of the present disclosure, the reward functions are different for the two different requirements of high reliability and high energy efficiency.

Illustratively, for the requirement of high reliability, the reliability of the neural network model is improved as much as possible on the basis of meeting the qualitative requirement of energy consumption. In this case, the reward function may be as shown in the following formula (1), so that if the target reward value is obtained

：

(1)

Wherein Etotal represents energy consumption consumed by the target neural network model at the target clipping rate, etarget is a required target energy consumption value, and Ra represents a reliability evaluation result of the target neural network model at the target clipping rate.

It should be noted that, the penalty of the energy consumption not meeting the condition is aggravated by using the square in the above formula (1), and the reduced energy consumption brings a larger return, and when the energy consumption meets the condition, the condition item of the energy consumption is set to 1, and the reduced energy consumption will not bring the benefit, and the benefit will come from the reliability.

Illustratively, for the requirement of high energy efficiency (i.e. low energy consumption), on the basis of meeting the qualitative requirement of reliability, the energy efficiency of the neural network model is improved as much as possible, i.e. the energy consumption of the neural network model is saved. In this case, the reward function may be as shown in the following formula (2), so that the target bonus value can be obtained

：

(2)

Wherein the content of the first and second substances,

represents the evaluation reliability of the target neural network model obtained under the target cutting rate,

then for the required target reliability,

the energy efficiency performance evaluation result of the target neural network model obtained after the target neural network model is cut under the target cutting rate is represented. Also, reliability in equation (2)Squared weighted represents a higher optimization priority.

In an embodiment of the present disclosure, the target image may be one or more of any type of image captured by a computing device.

Specifically, the image processing on the target image according to the target neural network model obtained after the clipping may include processing modes such as image recognition, image segmentation, image restoration, and the like, which are not limited in the embodiment of the present disclosure.

In this embodiment, the cutting of the neural network model to be cut according to the target cutting rate may be understood as compressing the calculated amount and the occupied storage space of the neural network model to be cut, so that the calculated amount and the occupied storage space of the target neural network model obtained after cutting may be greatly reduced, the cost of the computing device in which the target neural network model is deployed may be saved, and the target neural network model has a higher operation speed and a higher accuracy. Therefore, the neural network model deployed on the hardware platform can be ensured to normally operate, and the requirements on reliability and energy consumption are met.

The embodiment of the disclosure provides a neural network model clipping method, which can acquire environment state information of a neural network model to be clipped through reinforcement learning, wherein the environment state information comprises basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. By means of the technical scheme, the neural network model is cut, the calculated amount of the neural network model can be reduced, the influence of errors possibly occurring when the cut neural network model runs on the whole system is reduced, the reliability and the energy consumption of the cut neural network model meet requirements, and the accuracy of image processing of the cut neural network model is improved.

In an embodiment of the present disclosure, before the step of obtaining the target bonus value according to the target clipping rate and the reward function at step S103, the method may further include the steps of:

and obtaining the reliability of the cut neural network model according to the reliability evaluation function.

And acquiring the energy consumption of the neural network model after cutting according to the energy consumption evaluation function.

In an embodiment of the present disclosure, the reliability evaluation function is used to evaluate the reliability of the entire neural network model, and the energy consumption evaluation function is used to evaluate the energy consumption of the entire neural network model.

In an embodiment of the present disclosure, the energy consumption and the reliability may be regarded as two optimization targets of the neural network model to be clipped, and the neural network model to be clipped is clipped to achieve balance between the two targets. The reliability and the energy consumption of the neural network model to be cut change along with the change of the cutting rate. Therefore, after the neural network model to be cut is cut through the cutting rate of each layer, the energy consumption and the reliability of the obtained cut neural network model are reduced as much as possible.

In an embodiment of the present disclosure, the reliability evaluation function and the energy consumption evaluation function are both related to a clipping rate of each layer in the neural network model after clipping.

In the disclosed embodiment, the reliability and the energy consumption of the neural network model after cutting can be respectively obtained according to the reliability evaluation function and the energy consumption evaluation function, so that the reliability and the energy consumption of the neural network model after cutting can be evaluated by means of the two evaluation models, and the optimization target of the neural network model to be cut is realized.

In an embodiment of the present disclosure, before the step of obtaining the reliability of the trimmed neural network model according to the reliability evaluation function, the method may further include the following steps:

and constructing a reliability evaluation function according to the reliability parameter of the system architecture and the sensitivity parameter of the neuron.

Illustratively, the neural network model after clipping is taken as the DNN model. Each layer in the DNN has huge calculation amount and access amount, which is the characteristic of operation on hardware. When soft errors occur in a neural network at a certain failure rate, a large number of parameters must be mistaken. For these characteristics, when an error occurs in a certain layer of the neural network, it is sufficiently reasonable to consider that the Reliability of the corresponding layer is closely related to the access amount, the calculation amount and the time of residence on the chip occupied by the current layer, so that a parameter for evaluating the influence of each layer of soft errors in the neural network model on the Reliability of the neural network model, namely an Architecture Reliability parameter (ARF), is provided.

In an embodiment of the present disclosure, the step of obtaining the reliability parameter of the architecture may specifically include the following steps:

according to

Acquisition system frameConstruct reliability parameters

。

It should be noted that in the above formula

The larger the parameter quantity is, the more the parameter quantity shows that the neural network model after cutting has errors, and P represents the operation intensity of the ith layer in the neural network model after cutting, and has close relation with the performance of hardware according to the Amidel law. For simple calculation, in actual calculation, the ratio of the number of cycles occupied by calculation to the number of cycles occupied by memory access is used for representing p, and if p is larger, the calculation needs to occupy more execution time, data can reside in more cycles in the calculation process, and at the moment, errors are more prone to occur in the calculation link of the whole system, the whole reliability of the system is poor, and the same is true in the memory access period. WhileBecause each layer has different calculated amount and access amount, the error size between different layers is different, the failure probability of the layer with more calculated access amount is far higher than that of the layer with less calculated access amount, and therefore the probability is expressed by using the occupation ratio of the parameter amount and the calculated amount in the whole model. Further, in the above formula

And

and the layer type of the ith layer in the neural network model after clipping is determined.

In the disclosed embodiment, the calculation amount, the memory access amount and the running time are included in the influence factors of the reliability, so that the reliability of the neural network model after being cut is evaluated according to the reliability parameters of the architecture.

In an embodiment of the present disclosure, since different neurons in the neural network model have different vulnerabilities, the neuron sensitivity parameter is obtained by analyzing the different vulnerabilities of the different neurons in the trimmed neural network model

. In particular, the amount of the solvent to be used,

can be obtained by the following formula:

wherein the content of the first and second substances,

in neurons as a loss function

The first-order partial derivative of (a),

the size of the neuron changes before and after being trimmed.

It should be noted that, in the following description,

representing the sensitivity of a neuron, the sensitivity of a neural network depends primarily on the first-order partial derivatives of the neuron and the corresponding magnitude of the neuron's changes. The greater the sensitivity, the greater the change in the loss function due to the corresponding neuron change, and the less accurate it will be, and vice versa.

In the disclosed embodiment, the reliability evaluation function is constructed by combining the system architecture reliability parameter and the neuron sensitivity parameter, so that the reliability of the trimmed neural network model can be evaluated conveniently from a software level and a system architecture level in multiple directions.

In an embodiment of the present disclosure, the step of obtaining the reliability of the trimmed neural network model according to the reliability evaluation function may specifically include the following steps:

according to

Obtaining reliability of a tailored neural network model

。

Wherein the content of the first and second substances,

for the purpose of the architecture reliability parameters described,

for the parameter of the sensitivity of the neuron,

for the memory access of all layers in the trimmed neural network model,

is the neuron sensitivity parameter.

In one embodiment, the

It can be understood as an evaluation analysis for the reliability of a single convolution kernel in the trimmed neural network model, and if the reliability of a certain layer of the neural network model needs to be obtained, the reliability can be obtained by the ratio of the sum of the reliabilities of all convolution kernels of the layer to the number of all convolution kernels of the layer.

In the disclosed embodiment, the reliability of the neural network model after cutting, namely the reliability result obtained by comprehensively analyzing the software level and the system architecture level, can be obtained according to the information such as the fault rate of each layer of the neural network model after cutting, the calculation access ratio calculated on hardware, the multiplication and addition calculated amount and the like, so that the accuracy of reliability analysis is improved.

In an embodiment of the present disclosure, before the step of obtaining the energy consumption of the trimmed neural network model according to the energy consumption evaluation function, the method may further include the following steps:

In an embodiment of the present disclosure, in actual operation, the neural network model is often operated on different hardware platforms, and the hardware platforms have resources with large differences, such as computational capability, storage size, bandwidth, and the like, so that the same neural network model also has large energy consumption differences when operated on different platforms, and thus energy consumption of the trimmed neural network model when operated on a hardware system, that is, full-load operation energy consumption and bottleneck operation energy consumption, needs to be obtained.

In the disclosed embodiment, the energy consumption evaluation function is constructed by obtaining the full-load operation energy consumption and the bottleneck operation energy consumption of the trimmed neural network model, so that the energy consumption of the trimmed neural network model can be conveniently evaluated in a multi-aspect mode by combining a model scale level and a hardware characteristic level.

In an embodiment of the present disclosure, the step of obtaining the full-load operating energy consumption of the trimmed neural network model may specifically include the following steps:

and acquiring full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.

In an embodiment of the present disclosure, different types of network layers in the neural network model correspond to different computation amounts and memory access amounts, so that the computation amounts and memory access amounts of all layers of the trimmed neural network model can be obtained. The following table 1 shows the multiply-add number and the memory access amount corresponding to several common layer types, and according to the table, the multiply-add calculated amount and the memory access amount of each layer in the cut neural network model can be estimated, so that the calculated amount and the memory access amount of the whole neural network model can be calculated.

TABLE 1 multiply-add number and access number corresponding to different types of network layers

In an embodiment of the present disclosure, a gaussian regression model may be adopted to perform fitting according to the first calculated amount and the first memory access amount, so as to obtain a first estimated power.

In an embodiment of the present disclosure, the full-load operation energy consumption is obtained according to a product of the first estimated power and the first operation time. The first running time is understood to be the shortest time in the calculation and calculation consumption.

In an embodiment of the present disclosure, when a memory bottleneck occurs in a hardware platform, the bottleneck operation energy consumption may be understood as memory access energy consumption; when a hardware platform presents a computation bottleneck, the bottleneck operation energy consumption can be understood as the energy consumption consumed by computation.

It can be understood that different hardware platforms have different hardware characteristics, and accordingly, performance bottlenecks are different, so that different bottleneck operation energy consumption is obtained.

For example, the energy consumption evaluation function can be expressed by the following formula, so that the energy consumption of the trimmed neural network model can be obtained

。

Wherein the content of the first and second substances,

in order to consume energy when the engine is operated at full load,

in order to reduce the energy consumption during the bottleneck operation,

representing power at full load operation (i.e. the first predictive function),

the run time for full load operation (i.e. the first run time) is represented,

represents the power in the case of a bottleneck operation,

it represents the runtime of the bottleneck run.

For the power under the bottleneck operation condition, reference may be made to the specific description of the first estimated power in the foregoing embodiment, which is not described in detail in the embodiments of the present disclosure.

In the disclosed embodiment, the calculated amount and the access amount of each layer in the neural network model after cutting are calculated on the model scale level, and the power of the neural network after cutting is estimated based on the Gaussian regression model, so that the accuracy of energy consumption estimation is improved.

Fig. 2 illustrates a block diagram of a neural network model clipping device according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.

As shown in fig. 2, the neural network model clipping apparatus includes a first obtaining module 201, a second obtaining module 202, a third obtaining module 203, a first executing module 204, and a processing module 205.

A first obtaining module 201, configured to obtain, through reinforcement learning, environment state information of a neural network model to be cut, where the environment state information includes basic feature information and enhanced feature information of the neural network model to be cut;

a second obtaining module 202, configured to obtain a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;

a third obtaining module 203, configured to obtain a target reward value according to the target cutting rate and a reward function, where the reward function is used to indicate a corresponding relationship between the cutting rate and reliability and energy consumption of the neural network model after cutting, and the target reward value is used to indicate a benefit of the target neural network model obtained after cutting the neural network model to be cut according to the target cutting rate;

a first executing module 204, configured to, in response to that the target reward value is greater than or equal to a preset threshold, crop the neural network model to be cropped according to the target cropping rate;

and the processing module 205 is configured to perform image processing on the target image according to the target neural network model obtained after the cropping.

In an embodiment of the present disclosure, the neural network model clipping apparatus further includes:

In an embodiment of the present disclosure, the neural network model clipping device further includes:

the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the neural network model after cutting on the reliability of the neural network model after cutting, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the neural network model after cutting on the reliability of the whole neural network model after cutting.

In an embodiment of the present disclosure, the sixth obtaining module is configured to:

according to

Obtaining the system architecture reliability parameters

；

Wherein the content of the first and second substances,

the memory access amount of all layers in the neural network model after clipping.

In an embodiment of the present disclosure, the fourth obtaining module is configured to:

according to

Obtaining reliability of a tailored neural network model

；

Wherein the content of the first and second substances,

for the purpose of the architecture reliability parameters described,

is a parameter of sensitivity of said neuron or neurons,

to tailor the memory access volume of all layers in the neural network model,

is the neuron sensitivity parameter.

In an embodiment of the present disclosure, the seventh obtaining module is configured to:

In an embodiment of the present disclosure, the basic feature information includes at least one of:

the number of layers of the neural network model to be cut, the number of channels of the input feature map, the length of the input feature map, the width of the input feature map, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.

In an embodiment of the present disclosure, the enhanced feature information includes at least one of:

The embodiment of the disclosure provides a neural network model clipping device, which can acquire environment state information of a neural network model to be clipped through reinforcement learning, wherein the environment state information comprises basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. The device is used for cutting the neural network model, so that the calculated amount of the neural network model can be reduced, and the influence of errors possibly occurring during the operation of the cut neural network model on the whole system is reduced, thereby ensuring that the reliability and the energy consumption of the neural network model obtained after cutting meet the requirements, and improving the accuracy of image processing of the cut neural network model.

The present disclosure also discloses an electronic device, and fig. 3 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

As shown in fig. 3, the electronic device includes a memory and a processor, where the memory is to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement a method according to an embodiment of the disclosure.

As shown in fig. 4, the computer system includes a processing unit that can execute the various methods in the above-described embodiments according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The processing unit, the ROM, and the RAM are connected to each other through a bus. An input/output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs a communication process via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary. The processing unit can be realized as a CPU, a GPU, a TPU, an FPGA, an NPU and other processing units.

In particular, the methods described above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A neural network model clipping method, the method comprising:

and carrying out image processing on the target image according to the target neural network model obtained after cutting.

2. The method according to claim 1, wherein before obtaining a target reward value according to the target clipping rate and a reward function, the method further comprises:

3. The method of claim 2, wherein before obtaining the reliability of the pruned neural network model according to the reliability evaluation function, the method further comprises:

4. The method of claim 3, wherein obtaining the architecture reliability parameter comprises:

according to

Obtaining the system architecture reliability parameters

；

5. The method according to claim 3, wherein obtaining the reliability of the neural network model after the clipping according to the reliability evaluation function comprises:

according to

Obtaining reliability of a tailored neural network model

；

Wherein the content of the first and second substances,

can be constructed for the systemThe reliability parameter is set according to the characteristics of the human body,

for the parameter of the sensitivity of the neuron,

for the memory access of all layers in the trimmed neural network model,

is the neuron sensitivity parameter.

6. The method of claim 2, wherein before obtaining the energy consumption of the pruned neural network model according to the energy consumption evaluation function, the method further comprises:

7. The method of claim 6, wherein obtaining full-load running energy consumption of the pruned neural network model comprises:

8. The method according to any one of claims 1 to 7, wherein the base feature information comprises at least one of:

9. The method according to any of claims 1 to 7, wherein the enhanced feature information comprises at least one of:

10. A neural network model clipping device, characterized in that the neural network model clipping device includes:

the second obtaining module is configured to obtain a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;

11. The apparatus of claim 10, wherein the neural network model clipping means further comprises:

12. The apparatus of claim 11, wherein the neural network model clipping means further comprises:

13. The apparatus of claim 12, wherein the sixth obtaining module is configured to:

according to

Obtaining the system architecture reliability parameters

；

14. The apparatus of claim 12, wherein the fourth obtaining module is configured to:

according to

Obtaining reliability of a tailored neural network model

；

Wherein the content of the first and second substances,

for the purpose of the architecture reliability parameters described,

for the parameter of the sensitivity of the neuron,

for the memory access of all layers in the trimmed neural network model,

is the neuron sensitivity parameter.

15. The apparatus of claim 11, wherein the neural network model clipping means further comprises:

16. The apparatus of claim 15, wherein the seventh obtaining module is configured to:

acquiring a first calculated amount and a first memory access amount of a clipped neural network model in a full-load running state;

17. The apparatus according to any one of claims 10 to 16, wherein the base feature information comprises at least one of:

18. The apparatus according to any of claims 10 to 16, wherein the enhanced feature information comprises at least one of:

19. An electronic device comprising a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method steps of any of claims 1 to 9.

20. A computer-readable storage medium, on which computer instructions are stored, characterized in that the computer instructions, when executed by a processor, implement the method steps of any of claims 1 to 9.