CN115374936A - Neural network model clipping method, device, equipment and medium - Google Patents

Neural network model clipping method, device, equipment and medium Download PDF

Info

Publication number
CN115374936A
CN115374936A CN202211250546.3A CN202211250546A CN115374936A CN 115374936 A CN115374936 A CN 115374936A CN 202211250546 A CN202211250546 A CN 202211250546A CN 115374936 A CN115374936 A CN 115374936A
Authority
CN
China
Prior art keywords
neural network
network model
cutting
reliability
energy consumption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211250546.3A
Other languages
Chinese (zh)
Inventor
赵东艳
李德建
种挺
任增民
马俊
李雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smartchip Microelectronics Technology Co Ltd
Original Assignee
Beijing Smartchip Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smartchip Microelectronics Technology Co Ltd filed Critical Beijing Smartchip Microelectronics Technology Co Ltd
Priority to CN202211250546.3A priority Critical patent/CN115374936A/en
Publication of CN115374936A publication Critical patent/CN115374936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The disclosure relates to the technical field of neural network compression, in particular to a neural network model clipping method, a neural network model clipping device and a medium, wherein the neural network model clipping method comprises the following steps: acquiring the environmental state information of a neural network model to be cut through reinforcement learning; acquiring the target clipping rate of each layer in the neural network model to be clipped according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function; in response to the fact that the target reward value is larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model. The method not only reduces the calculated amount of the neural network model, but also reduces the influence of errors possibly occurring when the cut neural network model runs on the whole system, thereby ensuring that the reliability and the energy consumption of the cut neural network model meet the requirements.

Description

Neural network model clipping method, device, equipment and medium
Technical Field
The disclosure relates to the technical field of neural network compression, in particular to a neural network model clipping method, device, equipment and medium.
Background
With the rapid development of neural network algorithms and neural network hardware chips, the application of neural networks in the industries of image processing, power systems, and the like becomes a promising solution. The neural network model has the advantages of strong and rapid parallel computing capability, high fault tolerance, strong learning capability and the like. In general, the neural network model requires a very large computational cost and memory space, and parameters or memory space of the neural network model can be reduced by neural network compression.
In the related art, the model clipping technology is a mainstream compression scheme. Specifically, the model clipping technique is to remove unimportant weights (connections) or neurons in the neural network model, and greatly reduce the size and the calculation amount of the model without losing the model precision, thereby achieving the effect of reducing energy consumption.
However, even if the neural network model is cut according to the above scheme, the calculation amount of the neural network model is reduced, but the probability that an error may occur when the cut neural network model runs under partial conditions is high, the error may propagate layer by layer in the neural network model to affect the output of the whole system, so that the reliability of the system is greatly reduced, and thus the accuracy of image processing performed on the cut neural network model is low, and how to ensure that the reliability and the energy consumption of the cut neural network model can meet the demand becomes an urgent problem to be solved.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present disclosure provide a neural network model clipping method, apparatus, device, and medium.
In a first aspect, a neural network model clipping method is provided in the embodiments of the present disclosure.
Specifically, the neural network model clipping method includes:
acquiring environment state information of a neural network model to be cut through reinforcement learning, wherein the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;
acquiring the target clipping rate of each layer in the neural network model to be clipped according to the environment state information;
obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate;
in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate;
and carrying out image processing on the target object according to the target neural network model obtained after cutting.
With reference to the first aspect, in a first implementation manner of the first aspect, before obtaining a target bonus value according to the target clipping rate and a reward function, the method further includes:
obtaining the reliability of the cut neural network model according to the reliability evaluation function;
acquiring the energy consumption of the cut neural network model according to an energy consumption evaluation function;
and acquiring the return function according to the reliability of the cut neural network model and the energy consumption of the cut neural network model.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, before the obtaining, according to the reliability evaluation function, the reliability of the trimmed neural network model, the method further includes:
acquiring a system architecture reliability parameter and a neuron sensitivity parameter;
constructing the reliability evaluation function according to the system architecture reliability parameter and the neuron sensitivity parameter;
the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the cut neural network model on the reliability of the cut neural network model, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the cut neural network model on the reliability of the whole cut neural network model.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the obtaining a reliability parameter of an architecture includes:
according to
Figure 942083DEST_PATH_IMAGE001
Obtaining the system architecture reliability parameters
Figure 3580DEST_PATH_IMAGE003
Wherein the content of the first and second substances,
Figure 881406DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 605518DEST_PATH_IMAGE007
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 836779DEST_PATH_IMAGE009
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 346258DEST_PATH_IMAGE011
for the ith layer in the neural network model after clippingThe amount of memory access of (a),
Figure 270351DEST_PATH_IMAGE013
the memory access amount of all layers in the trimmed neural network model is obtained.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the obtaining the reliability of the neural network model after the clipping according to the reliability evaluation function includes:
according to
Figure 514382DEST_PATH_IMAGE014
Obtaining reliability of a tailored neural network model
Figure 549334DEST_PATH_IMAGE016
Wherein the content of the first and second substances,
Figure 178899DEST_PATH_IMAGE018
for the purpose of the architecture reliability parameters described,
Figure 273893DEST_PATH_IMAGE020
for the parameter of the sensitivity of the neuron,
Figure 503755DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the layer i in the neural network model after cutting on hardware,
Figure 342398DEST_PATH_IMAGE021
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 92049DEST_PATH_IMAGE022
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 436573DEST_PATH_IMAGE023
for after cuttingThe memory access amount of the ith layer in the neural network model,
Figure 107726DEST_PATH_IMAGE024
for the memory access of all layers in the trimmed neural network model,
Figure 750060DEST_PATH_IMAGE025
is the neuron sensitivity parameter.
With reference to the first aspect and the first implementation manner of the first aspect, in a fifth implementation manner of the first aspect, before the obtaining, according to the energy consumption evaluation function, energy consumption of the trimmed neural network model, the method further includes:
acquiring full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
and constructing the energy consumption evaluation function according to the full-load operation energy consumption and the bottleneck operation energy consumption.
With reference to the first aspect, the first implementation manner of the first aspect, and the fifth implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the obtaining full-load operation energy consumption of the trimmed neural network model includes:
acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the basic feature information includes at least one of the following:
the number of layers of the neural network model, the number of channels of the input feature map, the length of the input feature map, the width of the input feature map, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, the fourth implementation manner of the first aspect, the fifth implementation manner of the first aspect, and the sixth implementation manner of the first aspect, in an eighth implementation manner of the first aspect, the enhanced feature information includes at least one of the following:
energy consumption of the neural network model after cutting, reliability of the neural network model after cutting and distribution parameters of each layer in the neural network model after cutting.
In a second aspect, an embodiment of the present disclosure provides a neural network model clipping apparatus.
Specifically, the neural network model clipping device includes:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is configured to obtain environment state information of a neural network model to be cut through reinforcement learning, and the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;
the second acquisition module is configured to acquire a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;
a third obtaining module, configured to obtain a target reward value according to the target cutting rate and a reward function, where the reward function is used to indicate a correspondence between the cutting rate and reliability and energy consumption of the neural network model after cutting, and the target reward value is used to indicate a benefit of the target neural network model obtained after cutting the neural network model to be cut according to the target cutting rate;
the first execution module is configured to respond to the target reward value being larger than or equal to a preset threshold value, and cut the neural network model to be cut according to the target cutting rate;
and the processing module is configured to perform image processing on the target image according to the target neural network model obtained after cutting.
With reference to the second aspect, in a first implementation manner of the second aspect, the neural network model clipping device further includes:
the fourth obtaining module is configured to obtain the reliability of the neural network model after cutting according to the reliability evaluation function;
the fourth obtaining module is further configured to obtain the energy consumption of the trimmed neural network model according to an energy consumption evaluation function;
and the fifth obtaining module is configured to obtain the return function according to the reliability of the trimmed neural network model and the energy consumption of the trimmed neural network model.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, an embodiment of the present disclosure further includes:
a sixth obtaining module configured to obtain an architecture reliability parameter and a neuron sensitivity parameter;
a second execution module configured to construct the reliability evaluation function according to the architecture reliability parameter and the neuron sensitivity parameter;
the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the cut neural network model on the reliability of the cut neural network model, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the cut neural network model on the reliability of the whole cut neural network model.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the sixth obtaining module is configured to:
according to
Figure 603484DEST_PATH_IMAGE001
Obtaining the system architecture reliability parameters
Figure 899336DEST_PATH_IMAGE018
Wherein, the first and the second end of the pipe are connected with each other,
Figure 542939DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 723384DEST_PATH_IMAGE007
for the calculated quantity of multiplication and addition of the ith layer in the neural network model after clipping,
Figure 447627DEST_PATH_IMAGE009
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 632489DEST_PATH_IMAGE026
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 684759DEST_PATH_IMAGE013
the memory access amount of all layers in the trimmed neural network model is obtained.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the fourth obtaining module is configured to:
according to
Figure 793529DEST_PATH_IMAGE014
Obtaining reliability of a tailored neural network model
Figure 247644DEST_PATH_IMAGE027
Wherein the content of the first and second substances,
Figure 901610DEST_PATH_IMAGE028
for the purpose of the architecture reliability parameters described,
Figure 769072DEST_PATH_IMAGE029
for the parameter of the sensitivity of the neuron,
Figure 556900DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 239423DEST_PATH_IMAGE030
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 188924DEST_PATH_IMAGE022
for the multiply-add computation of all layers in the pruned neural network model,
Figure 74841DEST_PATH_IMAGE031
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 479408DEST_PATH_IMAGE024
to tailor the memory access volume of all layers in the neural network model,
Figure 642536DEST_PATH_IMAGE025
is the neuron sensitivity parameter.
With reference to the second aspect and the first implementation manner of the second aspect, in a fifth implementation manner of the second aspect, an embodiment of the present disclosure further includes:
the seventh obtaining module is configured to obtain full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
a third execution module configured to construct the energy consumption assessment function according to the full load operation energy consumption and the bottleneck operation energy consumption.
With reference to the second aspect, the first implementation manner of the second aspect, and the fifth implementation manner of the second aspect, in a sixth implementation manner of the second aspect, the seventh obtaining module is configured to:
acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and the sixth implementation manner of the second aspect, in a seventh implementation manner of the second aspect, the basic feature information includes at least one of the following:
the number of layers of the neural network model to be cut, the number of channels for inputting the characteristic diagram, the length of the input characteristic diagram, the width of the input characteristic diagram, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, the fourth implementation manner of the second aspect, the fifth implementation manner of the second aspect, and the sixth implementation manner of the second aspect, in an eighth implementation manner of the second aspect, the enhanced feature information includes at least one of the following:
the energy consumption of the neural network model after cutting, the reliability of the neural network model after cutting and the distribution parameters of each layer in the neural network model after cutting.
In a third aspect, the disclosed embodiments provide an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to any one of claims 1 to 9.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1 to 9.
According to the technical scheme provided by the embodiment of the disclosure, the neural network model is cut through the scheme, so that the calculated amount of the neural network model can be reduced, and the influence of errors possibly occurring in the operation of the cut neural network model on the whole system is reduced, thereby ensuring that the reliability and the energy consumption of the cut neural network model meet the requirements, and improving the accuracy of image processing of the cut neural network model.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings.
Fig. 1 shows a flow diagram of a neural network model clipping method according to an embodiment of the present disclosure.
Fig. 2 shows a block diagram of a neural network model clipping device according to an embodiment of the present disclosure.
Fig. 3 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
FIG. 4 shows a schematic block diagram of a computer system suitable for use in implementing a method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should also be noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In the present disclosure, if an operation of acquiring user information or user data or an operation of presenting user information or user data to others is involved, the operations are all operations authorized, confirmed by a user, or actively selected by the user.
As mentioned above, with the rapid development of neural network algorithms and neural network hardware chips, the application of neural networks to power systems becomes a promising solution, and the neural network model has the advantages of powerful and rapid parallel computing capability, high fault tolerance, strong learning capability, and the like. In general, the neural network model requires a very large computational cost and memory space, and parameters or memory space of the neural network model can be reduced by neural network compression.
In the related art, the model clipping technique is a mainstream compression scheme. Specifically, the model clipping technique is to remove unimportant weights (connections) or neurons in the neural network model, and greatly reduce the size and the calculation amount of the model without losing the model precision, thereby achieving the effect of reducing energy consumption.
However, even if the neural network model is cut according to the above scheme, the calculation amount of the neural network model is reduced, but the probability that an error may occur when the cut neural network model runs under some conditions is high, and the error propagates layer by layer in the neural network model to affect the output of the whole system, so that the reliability of the system is greatly reduced, and therefore, how to ensure that the reliability and the energy consumption of the cut neural network model can meet the demand becomes an urgent problem to be solved.
In view of the technical defects, the embodiment of the present disclosure provides a neural network model clipping method, which obtains environment state information of a neural network model to be clipped through reinforcement learning, where the environment state information includes basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. By means of the technical scheme, the neural network model is cut, the calculated amount of the neural network model can be reduced, the influence of errors possibly occurring when the cut neural network model runs on the whole system is reduced, the reliability and the energy consumption of the cut neural network model meet requirements, and the accuracy of image processing of the cut neural network model is improved.
Fig. 1 shows a flow diagram of a neural network model clipping method according to an embodiment of the present disclosure. As shown in fig. 1, the neural network model clipping method may include the following steps S101 to S105:
in step S101, the environmental state information of the neural network model to be clipped is acquired through reinforcement learning.
The environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut.
In step S102, a target clipping rate of each layer in the neural network model to be clipped is obtained according to the environment state information.
In step S103, a target bonus value is obtained according to the target clipping rate and the reward function.
The return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate.
In step S104, in response to that the target reward value is greater than or equal to a preset threshold, the neural network model to be clipped is clipped according to the target clipping rate.
In step S105, image processing is performed on the target image according to the target neural network model obtained by the clipping.
In an embodiment of the present disclosure, the neural network model clipping method may be applied to a computer, an electronic device, and the like for neural network model clipping.
Reinforcement learning is a branch of machine learning and can be considered as a method of learning in the exploration process. In reinforcement learning, the subject of learning is the reinforcement learning agent, and the designer does not provide the agent with a supervisory signal. Instead, the agent predicts its next activity at each moment in time, and gets a reward signal for each activity in the interaction with the environment. Through the different reward signals, the intelligent agent can gradually change the behavior prediction rule of the intelligent agent, so that the reward accumulated by a series of behaviors is maximum, and the optimal solution of the target problem is automatically explored. Reference may be made to the detailed description in the related art, which is not repeated in the embodiments of the present disclosure.
In a possible application scenario, in an electric power system, a series of image data acquired by electric power equipment, a large amount of acquired historical operation data, failure log data and the like can be used as a training data set, so that the data sets can be input into a neural network and trained in combination with a preset target value to obtain a trained neural network model, namely a neural network model to be cut. The target value may be set manually by a technician or obtained according to historical data. For example, the neural network model may be applied to aspects of power system transmission stability analysis, load detection, static and dynamic stability analysis, fault prediction, and the like.
In another possible application scenario, the neural network model may also be applied in the field of image processing, for example, image recognition, edge monitoring of an image, image segmentation, image compression, image restoration, and the like.
In an embodiment of the present disclosure, the neural network model to be pruned may include a deep neural network model, a convolutional neural network model, a recurrent neural network model, and the like.
In an embodiment of the present disclosure, the basic feature information may be understood as inherent to the network model to be cut, and the basic feature information may be estimated in an offline manner.
In an embodiment of the present disclosure, the basic feature information may include at least one of:
the number of layers of the neural network model to be cut, the number of channels for inputting the characteristic diagram, the length of the input characteristic diagram, the width of the input characteristic diagram, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
In an embodiment of the present disclosure, the enhanced feature information may represent a variation amount that is easy to occur in the neural network model, where the variation amount will vary with a variation of each iterative clipping rate, and may be obtained in an online manner after each clipping of the neural network model to be clipped.
In an embodiment of the present disclosure, the enhanced feature information may include at least one of:
energy consumption of the neural network model after cutting, reliability of the neural network model after cutting and distribution parameters of each layer in the neural network model after cutting.
In an embodiment of the present disclosure, both the energy consumption of the trimmed neural network model and the reliability of the trimmed neural network model may be understood as the energy consumption and the reliability obtained by the evaluation, and specific reference may be made to the detailed description in the following examples.
In an embodiment of the present disclosure, the distribution parameters of each layer in the trimmed neural network model may be understood as statistical parameters, and may include any one of the following: maximum, minimum, median, mean, variance, and the like.
In an embodiment of the present disclosure, the enhanced feature information may further include information such as a historical cropping rate.
In the embodiment, after the reinforcement learning is used for acquiring the environmental state information of the neural network model to be cut, the proportion of unimportant convolution kernels in each layer in the cutting neural network model can be identified, and the target cutting rate of each layer is generated. Further, the number of convolution kernels is different for each layer, and the target clipping rate is correspondingly different for each layer.
In an embodiment of the present disclosure, after obtaining the target clipping rate of each layer in the neural network model to be clipped, the number of convolution kernels that need to be clipped in each layer can be obtained by obtaining the target clipping rate of each layer and the total number of convolution kernels in the current layer, that is, the number of convolution kernels to be clipped in each layer is determined under the current situation. And acquiring the reliability of the convoluted neurons of each layer, taking the reliability as a score, and determining the cut convolution kernels of each layer according to the target cutting rate.
In this embodiment, the convolution kernels with high fault tolerance of each layer in the neural network model to be pruned can be deleted in turn, and the convolution kernels with low fault tolerance will be preserved.
Alternatively, the preset threshold may be understood as a maximum reward value, and the preset threshold may be set by a user in a customized manner or not obtained according to historical experience data.
In one embodiment of the present disclosure, the reward functions are different for the two different requirements of high reliability and high energy efficiency.
Illustratively, for the requirement of high reliability, the reliability of the neural network model is improved as much as possible on the basis of meeting the qualitative requirement of energy consumption. In this case, the reward function may be as shown in the following formula (1), so that if the target reward value is obtained
Figure 153152DEST_PATH_IMAGE033
Figure 244474DEST_PATH_IMAGE034
(1)
Wherein Etotal represents energy consumption consumed by the target neural network model at the target clipping rate, etarget is a required target energy consumption value, and Ra represents a reliability evaluation result of the target neural network model at the target clipping rate.
It should be noted that, the penalty of the energy consumption not meeting the condition is aggravated by using the square in the above formula (1), and the reduced energy consumption brings a larger return, and when the energy consumption meets the condition, the condition item of the energy consumption is set to 1, and the reduced energy consumption will not bring the benefit, and the benefit will come from the reliability.
Illustratively, for the requirement of high energy efficiency (i.e. low energy consumption), on the basis of meeting the qualitative requirement of reliability, the energy efficiency of the neural network model is improved as much as possible, i.e. the energy consumption of the neural network model is saved. In this case, the reward function may be as shown in the following formula (2), so that the target bonus value can be obtained
Figure 108525DEST_PATH_IMAGE036
Figure 516372DEST_PATH_IMAGE037
(2)
Wherein the content of the first and second substances,
Figure 151884DEST_PATH_IMAGE039
represents the evaluation reliability of the target neural network model obtained under the target cutting rate,
Figure 887759DEST_PATH_IMAGE040
then for the required target reliability,
Figure 680134DEST_PATH_IMAGE042
the energy efficiency performance evaluation result of the target neural network model obtained after the target neural network model is cut under the target cutting rate is represented. Also, reliability in equation (2)Squared weighted represents a higher optimization priority.
In an embodiment of the present disclosure, the target image may be one or more of any type of image captured by a computing device.
Specifically, the image processing on the target image according to the target neural network model obtained after the clipping may include processing modes such as image recognition, image segmentation, image restoration, and the like, which are not limited in the embodiment of the present disclosure.
In this embodiment, the cutting of the neural network model to be cut according to the target cutting rate may be understood as compressing the calculated amount and the occupied storage space of the neural network model to be cut, so that the calculated amount and the occupied storage space of the target neural network model obtained after cutting may be greatly reduced, the cost of the computing device in which the target neural network model is deployed may be saved, and the target neural network model has a higher operation speed and a higher accuracy. Therefore, the neural network model deployed on the hardware platform can be ensured to normally operate, and the requirements on reliability and energy consumption are met.
The embodiment of the disclosure provides a neural network model clipping method, which can acquire environment state information of a neural network model to be clipped through reinforcement learning, wherein the environment state information comprises basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. By means of the technical scheme, the neural network model is cut, the calculated amount of the neural network model can be reduced, the influence of errors possibly occurring when the cut neural network model runs on the whole system is reduced, the reliability and the energy consumption of the cut neural network model meet requirements, and the accuracy of image processing of the cut neural network model is improved.
In an embodiment of the present disclosure, before the step of obtaining the target bonus value according to the target clipping rate and the reward function at step S103, the method may further include the steps of:
and obtaining the reliability of the cut neural network model according to the reliability evaluation function.
And acquiring the energy consumption of the neural network model after cutting according to the energy consumption evaluation function.
And acquiring the return function according to the reliability of the cut neural network model and the energy consumption of the cut neural network model.
In an embodiment of the present disclosure, the reliability evaluation function is used to evaluate the reliability of the entire neural network model, and the energy consumption evaluation function is used to evaluate the energy consumption of the entire neural network model.
In an embodiment of the present disclosure, the energy consumption and the reliability may be regarded as two optimization targets of the neural network model to be clipped, and the neural network model to be clipped is clipped to achieve balance between the two targets. The reliability and the energy consumption of the neural network model to be cut change along with the change of the cutting rate. Therefore, after the neural network model to be cut is cut through the cutting rate of each layer, the energy consumption and the reliability of the obtained cut neural network model are reduced as much as possible.
In an embodiment of the present disclosure, the reliability evaluation function and the energy consumption evaluation function are both related to a clipping rate of each layer in the neural network model after clipping.
In the disclosed embodiment, the reliability and the energy consumption of the neural network model after cutting can be respectively obtained according to the reliability evaluation function and the energy consumption evaluation function, so that the reliability and the energy consumption of the neural network model after cutting can be evaluated by means of the two evaluation models, and the optimization target of the neural network model to be cut is realized.
In an embodiment of the present disclosure, before the step of obtaining the reliability of the trimmed neural network model according to the reliability evaluation function, the method may further include the following steps:
acquiring a system architecture reliability parameter and a neuron sensitivity parameter;
and constructing a reliability evaluation function according to the reliability parameter of the system architecture and the sensitivity parameter of the neuron.
The system architecture reliability parameter is used for evaluating the influence of each layer of faults in the cut neural network model on the reliability of the cut neural network model, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the cut neural network model on the reliability of the whole cut neural network model.
Illustratively, the neural network model after clipping is taken as the DNN model. Each layer in the DNN has huge calculation amount and access amount, which is the characteristic of operation on hardware. When soft errors occur in a neural network at a certain failure rate, a large number of parameters must be mistaken. For these characteristics, when an error occurs in a certain layer of the neural network, it is sufficiently reasonable to consider that the Reliability of the corresponding layer is closely related to the access amount, the calculation amount and the time of residence on the chip occupied by the current layer, so that a parameter for evaluating the influence of each layer of soft errors in the neural network model on the Reliability of the neural network model, namely an Architecture Reliability parameter (ARF), is provided.
In an embodiment of the present disclosure, the step of obtaining the reliability parameter of the architecture may specifically include the following steps:
according to
Figure 817855DEST_PATH_IMAGE001
Acquisition system frameConstruct reliability parameters
Figure 122803DEST_PATH_IMAGE028
Wherein, the first and the second end of the pipe are connected with each other,
Figure 205028DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 942040DEST_PATH_IMAGE030
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 12896DEST_PATH_IMAGE009
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 770636DEST_PATH_IMAGE043
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 746682DEST_PATH_IMAGE013
the memory access amount of all layers in the trimmed neural network model is obtained.
It should be noted that in the above formula
Figure 130128DEST_PATH_IMAGE005
The larger the parameter quantity is, the more the parameter quantity shows that the neural network model after cutting has errors, and P represents the operation intensity of the ith layer in the neural network model after cutting, and has close relation with the performance of hardware according to the Amidel law. For simple calculation, in actual calculation, the ratio of the number of cycles occupied by calculation to the number of cycles occupied by memory access is used for representing p, and if p is larger, the calculation needs to occupy more execution time, data can reside in more cycles in the calculation process, and at the moment, errors are more prone to occur in the calculation link of the whole system, the whole reliability of the system is poor, and the same is true in the memory access period. WhileBecause each layer has different calculated amount and access amount, the error size between different layers is different, the failure probability of the layer with more calculated access amount is far higher than that of the layer with less calculated access amount, and therefore the probability is expressed by using the occupation ratio of the parameter amount and the calculated amount in the whole model. Further, in the above formula
Figure 304757DEST_PATH_IMAGE030
And
Figure 374344DEST_PATH_IMAGE011
and the layer type of the ith layer in the neural network model after clipping is determined.
In the disclosed embodiment, the calculation amount, the memory access amount and the running time are included in the influence factors of the reliability, so that the reliability of the neural network model after being cut is evaluated according to the reliability parameters of the architecture.
In an embodiment of the present disclosure, since different neurons in the neural network model have different vulnerabilities, the neuron sensitivity parameter is obtained by analyzing the different vulnerabilities of the different neurons in the trimmed neural network model
Figure 447474DEST_PATH_IMAGE029
. In particular, the amount of the solvent to be used,
Figure 260709DEST_PATH_IMAGE029
can be obtained by the following formula:
Figure 352162DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure 169814DEST_PATH_IMAGE046
in neurons as a loss function
Figure 854873DEST_PATH_IMAGE048
The first-order partial derivative of (a),
Figure 330854DEST_PATH_IMAGE050
the size of the neuron changes before and after being trimmed.
It should be noted that, in the following description,
Figure 230808DEST_PATH_IMAGE029
representing the sensitivity of a neuron, the sensitivity of a neural network depends primarily on the first-order partial derivatives of the neuron and the corresponding magnitude of the neuron's changes. The greater the sensitivity, the greater the change in the loss function due to the corresponding neuron change, and the less accurate it will be, and vice versa.
In the disclosed embodiment, the reliability evaluation function is constructed by combining the system architecture reliability parameter and the neuron sensitivity parameter, so that the reliability of the trimmed neural network model can be evaluated conveniently from a software level and a system architecture level in multiple directions.
In an embodiment of the present disclosure, the step of obtaining the reliability of the trimmed neural network model according to the reliability evaluation function may specifically include the following steps:
according to
Figure 376619DEST_PATH_IMAGE014
Obtaining reliability of a tailored neural network model
Figure 673608DEST_PATH_IMAGE016
Wherein the content of the first and second substances,
Figure 828645DEST_PATH_IMAGE018
for the purpose of the architecture reliability parameters described,
Figure 81641DEST_PATH_IMAGE051
for the parameter of the sensitivity of the neuron,
Figure 975517DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 635168DEST_PATH_IMAGE021
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 452951DEST_PATH_IMAGE052
for the multiply-add computation of all layers in the trimmed neural network model,
Figure DEST_PATH_IMAGE053
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 796339DEST_PATH_IMAGE013
for the memory access of all layers in the trimmed neural network model,
Figure 674165DEST_PATH_IMAGE025
is the neuron sensitivity parameter.
In one embodiment, the
Figure 289954DEST_PATH_IMAGE027
It can be understood as an evaluation analysis for the reliability of a single convolution kernel in the trimmed neural network model, and if the reliability of a certain layer of the neural network model needs to be obtained, the reliability can be obtained by the ratio of the sum of the reliabilities of all convolution kernels of the layer to the number of all convolution kernels of the layer.
In the disclosed embodiment, the reliability of the neural network model after cutting, namely the reliability result obtained by comprehensively analyzing the software level and the system architecture level, can be obtained according to the information such as the fault rate of each layer of the neural network model after cutting, the calculation access ratio calculated on hardware, the multiplication and addition calculated amount and the like, so that the accuracy of reliability analysis is improved.
In an embodiment of the present disclosure, before the step of obtaining the energy consumption of the trimmed neural network model according to the energy consumption evaluation function, the method may further include the following steps:
acquiring full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
and constructing the energy consumption evaluation function according to the full-load operation energy consumption and the bottleneck operation energy consumption.
In an embodiment of the present disclosure, in actual operation, the neural network model is often operated on different hardware platforms, and the hardware platforms have resources with large differences, such as computational capability, storage size, bandwidth, and the like, so that the same neural network model also has large energy consumption differences when operated on different platforms, and thus energy consumption of the trimmed neural network model when operated on a hardware system, that is, full-load operation energy consumption and bottleneck operation energy consumption, needs to be obtained.
In the disclosed embodiment, the energy consumption evaluation function is constructed by obtaining the full-load operation energy consumption and the bottleneck operation energy consumption of the trimmed neural network model, so that the energy consumption of the trimmed neural network model can be conveniently evaluated in a multi-aspect mode by combining a model scale level and a hardware characteristic level.
In an embodiment of the present disclosure, the step of obtaining the full-load operating energy consumption of the trimmed neural network model may specifically include the following steps:
acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
In an embodiment of the present disclosure, different types of network layers in the neural network model correspond to different computation amounts and memory access amounts, so that the computation amounts and memory access amounts of all layers of the trimmed neural network model can be obtained. The following table 1 shows the multiply-add number and the memory access amount corresponding to several common layer types, and according to the table, the multiply-add calculated amount and the memory access amount of each layer in the cut neural network model can be estimated, so that the calculated amount and the memory access amount of the whole neural network model can be calculated.
TABLE 1 multiply-add number and access number corresponding to different types of network layers
Figure 160696DEST_PATH_IMAGE054
In an embodiment of the present disclosure, a gaussian regression model may be adopted to perform fitting according to the first calculated amount and the first memory access amount, so as to obtain a first estimated power.
In an embodiment of the present disclosure, the full-load operation energy consumption is obtained according to a product of the first estimated power and the first operation time. The first running time is understood to be the shortest time in the calculation and calculation consumption.
In an embodiment of the present disclosure, when a memory bottleneck occurs in a hardware platform, the bottleneck operation energy consumption may be understood as memory access energy consumption; when a hardware platform presents a computation bottleneck, the bottleneck operation energy consumption can be understood as the energy consumption consumed by computation.
It can be understood that different hardware platforms have different hardware characteristics, and accordingly, performance bottlenecks are different, so that different bottleneck operation energy consumption is obtained.
For example, the energy consumption evaluation function can be expressed by the following formula, so that the energy consumption of the trimmed neural network model can be obtained
Figure DEST_PATH_IMAGE055
Figure 670175DEST_PATH_IMAGE056
Wherein the content of the first and second substances,
Figure 594269DEST_PATH_IMAGE058
in order to consume energy when the engine is operated at full load,
Figure 838299DEST_PATH_IMAGE060
in order to reduce the energy consumption during the bottleneck operation,
Figure 201148DEST_PATH_IMAGE062
representing power at full load operation (i.e. the first predictive function),
Figure 971657DEST_PATH_IMAGE064
the run time for full load operation (i.e. the first run time) is represented,
Figure 440554DEST_PATH_IMAGE066
represents the power in the case of a bottleneck operation,
Figure 765356DEST_PATH_IMAGE068
it represents the runtime of the bottleneck run.
For the power under the bottleneck operation condition, reference may be made to the specific description of the first estimated power in the foregoing embodiment, which is not described in detail in the embodiments of the present disclosure.
In the disclosed embodiment, the calculated amount and the access amount of each layer in the neural network model after cutting are calculated on the model scale level, and the power of the neural network after cutting is estimated based on the Gaussian regression model, so that the accuracy of energy consumption estimation is improved.
Fig. 2 illustrates a block diagram of a neural network model clipping device according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.
As shown in fig. 2, the neural network model clipping apparatus includes a first obtaining module 201, a second obtaining module 202, a third obtaining module 203, a first executing module 204, and a processing module 205.
A first obtaining module 201, configured to obtain, through reinforcement learning, environment state information of a neural network model to be cut, where the environment state information includes basic feature information and enhanced feature information of the neural network model to be cut;
a second obtaining module 202, configured to obtain a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;
a third obtaining module 203, configured to obtain a target reward value according to the target cutting rate and a reward function, where the reward function is used to indicate a corresponding relationship between the cutting rate and reliability and energy consumption of the neural network model after cutting, and the target reward value is used to indicate a benefit of the target neural network model obtained after cutting the neural network model to be cut according to the target cutting rate;
a first executing module 204, configured to, in response to that the target reward value is greater than or equal to a preset threshold, crop the neural network model to be cropped according to the target cropping rate;
and the processing module 205 is configured to perform image processing on the target image according to the target neural network model obtained after the cropping.
In an embodiment of the present disclosure, the neural network model clipping apparatus further includes:
the fourth obtaining module is configured to obtain the reliability of the neural network model after cutting according to the reliability evaluation function;
the fourth obtaining module is further configured to obtain the energy consumption of the trimmed neural network model according to an energy consumption evaluation function;
and the fifth obtaining module is configured to obtain the return function according to the reliability of the trimmed neural network model and the energy consumption of the trimmed neural network model.
In an embodiment of the present disclosure, the neural network model clipping device further includes:
a sixth obtaining module configured to obtain an architecture reliability parameter and a neuron sensitivity parameter;
a second execution module configured to construct the reliability evaluation function according to the architecture reliability parameter and the neuron sensitivity parameter;
the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the neural network model after cutting on the reliability of the neural network model after cutting, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the neural network model after cutting on the reliability of the whole neural network model after cutting.
In an embodiment of the present disclosure, the sixth obtaining module is configured to:
according to
Figure 728633DEST_PATH_IMAGE001
Obtaining the system architecture reliability parameters
Figure 353649DEST_PATH_IMAGE028
Wherein the content of the first and second substances,
Figure 760491DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 166064DEST_PATH_IMAGE007
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 339557DEST_PATH_IMAGE009
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 819080DEST_PATH_IMAGE011
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 833041DEST_PATH_IMAGE013
the memory access amount of all layers in the neural network model after clipping.
In an embodiment of the present disclosure, the fourth obtaining module is configured to:
according to
Figure 522648DEST_PATH_IMAGE014
Obtaining reliability of a tailored neural network model
Figure 703094DEST_PATH_IMAGE016
Wherein the content of the first and second substances,
Figure 443648DEST_PATH_IMAGE018
for the purpose of the architecture reliability parameters described,
Figure 379243DEST_PATH_IMAGE020
is a parameter of sensitivity of said neuron or neurons,
Figure 165933DEST_PATH_IMAGE005
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 359909DEST_PATH_IMAGE021
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure DEST_PATH_IMAGE069
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 673078DEST_PATH_IMAGE053
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 451678DEST_PATH_IMAGE024
to tailor the memory access volume of all layers in the neural network model,
Figure 601031DEST_PATH_IMAGE025
is the neuron sensitivity parameter.
In an embodiment of the present disclosure, the neural network model clipping device further includes:
the seventh obtaining module is configured to obtain full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
a third execution module configured to construct the energy consumption assessment function according to the full load operation energy consumption and the bottleneck operation energy consumption.
In an embodiment of the present disclosure, the seventh obtaining module is configured to:
acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
In an embodiment of the present disclosure, the basic feature information includes at least one of:
the number of layers of the neural network model to be cut, the number of channels of the input feature map, the length of the input feature map, the width of the input feature map, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
In an embodiment of the present disclosure, the enhanced feature information includes at least one of:
energy consumption of the neural network model after cutting, reliability of the neural network model after cutting and distribution parameters of each layer in the neural network model after cutting.
The embodiment of the disclosure provides a neural network model clipping device, which can acquire environment state information of a neural network model to be clipped through reinforcement learning, wherein the environment state information comprises basic feature information and enhanced feature information of the neural network model to be clipped; acquiring the target cutting rate of each layer in the neural network model to be cut according to the environment state information; obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate; in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate; and carrying out image processing on the target image according to the target neural network model obtained after cutting. The device is used for cutting the neural network model, so that the calculated amount of the neural network model can be reduced, and the influence of errors possibly occurring during the operation of the cut neural network model on the whole system is reduced, thereby ensuring that the reliability and the energy consumption of the neural network model obtained after cutting meet the requirements, and improving the accuracy of image processing of the cut neural network model.
The present disclosure also discloses an electronic device, and fig. 3 shows a block diagram of the electronic device according to an embodiment of the present disclosure.
As shown in fig. 3, the electronic device includes a memory and a processor, where the memory is to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement a method according to an embodiment of the disclosure.
FIG. 4 shows a schematic block diagram of a computer system suitable for use in implementing a method according to an embodiment of the present disclosure.
As shown in fig. 4, the computer system includes a processing unit that can execute the various methods in the above-described embodiments according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The processing unit, the ROM, and the RAM are connected to each other through a bus. An input/output (I/O) interface is also connected to the bus.
The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs a communication process via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary. The processing unit can be realized as a CPU, a GPU, a TPU, an FPGA, an NPU and other processing units.
In particular, the methods described above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (20)

1. A neural network model clipping method, the method comprising:
acquiring environment state information of a neural network model to be cut through reinforcement learning, wherein the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;
acquiring the target clipping rate of each layer in the neural network model to be clipped according to the environment state information;
obtaining a target reward value according to the target cutting rate and a return function, wherein the return function is used for indicating the corresponding relation between the cutting rate and the reliability and energy consumption of the neural network model after cutting, and the target reward value is used for indicating the income of the target neural network model obtained after the neural network model to be cut is cut according to the target cutting rate;
in response to the target reward value being larger than or equal to a preset threshold value, cutting the neural network model to be cut according to the target cutting rate;
and carrying out image processing on the target image according to the target neural network model obtained after cutting.
2. The method according to claim 1, wherein before obtaining a target reward value according to the target clipping rate and a reward function, the method further comprises:
obtaining the reliability of the cut neural network model according to the reliability evaluation function;
acquiring the energy consumption of the cut neural network model according to an energy consumption evaluation function;
and acquiring the return function according to the reliability of the cut neural network model and the energy consumption of the cut neural network model.
3. The method of claim 2, wherein before obtaining the reliability of the pruned neural network model according to the reliability evaluation function, the method further comprises:
acquiring a system architecture reliability parameter and a neuron sensitivity parameter;
constructing the reliability evaluation function according to the system architecture reliability parameter and the neuron sensitivity parameter;
the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the neural network model after cutting on the reliability of the neural network model after cutting, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the neural network model after cutting on the reliability of the whole neural network model after cutting.
4. The method of claim 3, wherein obtaining the architecture reliability parameter comprises:
according to
Figure 154720DEST_PATH_IMAGE001
Obtaining the system architecture reliability parameters
Figure 337440DEST_PATH_IMAGE002
Wherein, the first and the second end of the pipe are connected with each other,
Figure 648336DEST_PATH_IMAGE003
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 574703DEST_PATH_IMAGE004
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 169502DEST_PATH_IMAGE005
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 54281DEST_PATH_IMAGE006
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 399943DEST_PATH_IMAGE007
the memory access amount of all layers in the neural network model after clipping.
5. The method according to claim 3, wherein obtaining the reliability of the neural network model after the clipping according to the reliability evaluation function comprises:
according to
Figure 130002DEST_PATH_IMAGE008
Obtaining reliability of a tailored neural network model
Figure 330039DEST_PATH_IMAGE009
Wherein the content of the first and second substances,
Figure 588982DEST_PATH_IMAGE002
can be constructed for the systemThe reliability parameter is set according to the characteristics of the human body,
Figure 123737DEST_PATH_IMAGE010
for the parameter of the sensitivity of the neuron,
Figure 391907DEST_PATH_IMAGE003
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 446451DEST_PATH_IMAGE004
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 673033DEST_PATH_IMAGE005
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 930970DEST_PATH_IMAGE006
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 533990DEST_PATH_IMAGE007
for the memory access of all layers in the trimmed neural network model,
Figure 708619DEST_PATH_IMAGE011
is the neuron sensitivity parameter.
6. The method of claim 2, wherein before obtaining the energy consumption of the pruned neural network model according to the energy consumption evaluation function, the method further comprises:
acquiring full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
and constructing the energy consumption evaluation function according to the full-load operation energy consumption and the bottleneck operation energy consumption.
7. The method of claim 6, wherein obtaining full-load running energy consumption of the pruned neural network model comprises:
acquiring a first calculated amount and a first memory access amount of the trimmed neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
8. The method according to any one of claims 1 to 7, wherein the base feature information comprises at least one of:
the number of layers of the neural network model to be cut, the number of channels for inputting the characteristic diagram, the length of the input characteristic diagram, the width of the input characteristic diagram, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
9. The method according to any of claims 1 to 7, wherein the enhanced feature information comprises at least one of:
the energy consumption of the neural network model after cutting, the reliability of the neural network model after cutting and the distribution parameters of each layer in the neural network model after cutting.
10. A neural network model clipping device, characterized in that the neural network model clipping device includes:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is configured to obtain environment state information of a neural network model to be cut through reinforcement learning, and the environment state information comprises basic characteristic information and enhanced characteristic information of the neural network model to be cut;
the second obtaining module is configured to obtain a target clipping rate of each layer in the neural network model to be clipped according to the environment state information;
a third obtaining module, configured to obtain a target reward value according to the target cutting rate and a reward function, where the reward function is used to indicate a correspondence between the cutting rate and reliability and energy consumption of the neural network model after cutting, and the target reward value is used to indicate a benefit of the target neural network model obtained after cutting the neural network model to be cut according to the target cutting rate;
the first execution module is configured to respond to the target reward value being larger than or equal to a preset threshold value, and cut the neural network model to be cut according to the target cutting rate;
and the processing module is configured to perform image processing on the target image according to the target neural network model obtained after cutting.
11. The apparatus of claim 10, wherein the neural network model clipping means further comprises:
the fourth obtaining module is configured to obtain the reliability of the neural network model after cutting according to the reliability evaluation function;
the fourth obtaining module is further configured to obtain the energy consumption of the trimmed neural network model according to an energy consumption evaluation function;
and the fifth obtaining module is configured to obtain the return function according to the reliability of the trimmed neural network model and the energy consumption of the trimmed neural network model.
12. The apparatus of claim 11, wherein the neural network model clipping means further comprises:
a sixth obtaining module configured to obtain an architecture reliability parameter and a neuron sensitivity parameter;
a second execution module configured to construct the reliability evaluation function according to the architecture reliability parameter and the neuron sensitivity parameter;
the system architecture reliability parameter is used for evaluating the influence of each layer of faults in the neural network model after cutting on the reliability of the neural network model after cutting, and the neuron sensitivity parameter is used for evaluating the influence of the faults of neurons of the neural network model after cutting on the reliability of the whole neural network model after cutting.
13. The apparatus of claim 12, wherein the sixth obtaining module is configured to:
according to
Figure 824212DEST_PATH_IMAGE001
Obtaining the system architecture reliability parameters
Figure 84292DEST_PATH_IMAGE002
Wherein, the first and the second end of the pipe are connected with each other,
Figure 428685DEST_PATH_IMAGE003
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 457821DEST_PATH_IMAGE004
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 511359DEST_PATH_IMAGE005
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 789894DEST_PATH_IMAGE006
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 937978DEST_PATH_IMAGE007
the memory access amount of all layers in the trimmed neural network model is obtained.
14. The apparatus of claim 12, wherein the fourth obtaining module is configured to:
according to
Figure 336467DEST_PATH_IMAGE008
Obtaining reliability of a tailored neural network model
Figure 544595DEST_PATH_IMAGE009
Wherein the content of the first and second substances,
Figure 44846DEST_PATH_IMAGE002
for the purpose of the architecture reliability parameters described,
Figure 747354DEST_PATH_IMAGE010
for the parameter of the sensitivity of the neuron,
Figure 751082DEST_PATH_IMAGE003
for the fault rate of the neural network model after cutting on the ith layer, p represents the calculation access memory ratio of the neural network model after cutting on the ith layer operated on hardware,
Figure 395690DEST_PATH_IMAGE004
for the calculated quantity of the multiplication and addition of the ith layer in the neural network model after cutting,
Figure 852079DEST_PATH_IMAGE005
for the multiply-add computation of all layers in the trimmed neural network model,
Figure 856813DEST_PATH_IMAGE006
for the memory access amount of the ith layer in the trimmed neural network model,
Figure 449469DEST_PATH_IMAGE007
for the memory access of all layers in the trimmed neural network model,
Figure 61716DEST_PATH_IMAGE011
is the neuron sensitivity parameter.
15. The apparatus of claim 11, wherein the neural network model clipping means further comprises:
the seventh obtaining module is configured to obtain full-load operation energy consumption and bottleneck operation energy consumption of the cut neural network model;
a third execution module configured to construct the energy consumption assessment function according to the full load operation energy consumption and the bottleneck operation energy consumption.
16. The apparatus of claim 15, wherein the seventh obtaining module is configured to:
acquiring a first calculated amount and a first memory access amount of a clipped neural network model in a full-load running state;
obtaining a first estimated power according to the first calculated amount and the first memory access amount;
and acquiring the full-load operation energy consumption according to the first estimated power and the first operation time of the trimmed neural network model in the full-load operation state.
17. The apparatus according to any one of claims 10 to 16, wherein the base feature information comprises at least one of:
the number of layers of the neural network model to be cut, the number of channels for inputting the characteristic diagram, the length of the input characteristic diagram, the width of the input characteristic diagram, the number of convolution kernels, the length of the convolution kernels, the width of the convolution kernels and the step length of the convolution kernels.
18. The apparatus according to any of claims 10 to 16, wherein the enhanced feature information comprises at least one of:
the energy consumption of the neural network model after cutting, the reliability of the neural network model after cutting and the distribution parameters of each layer in the neural network model after cutting.
19. An electronic device comprising a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method steps of any of claims 1 to 9.
20. A computer-readable storage medium, on which computer instructions are stored, characterized in that the computer instructions, when executed by a processor, implement the method steps of any of claims 1 to 9.
CN202211250546.3A 2022-10-13 2022-10-13 Neural network model clipping method, device, equipment and medium Pending CN115374936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211250546.3A CN115374936A (en) 2022-10-13 2022-10-13 Neural network model clipping method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211250546.3A CN115374936A (en) 2022-10-13 2022-10-13 Neural network model clipping method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115374936A true CN115374936A (en) 2022-11-22

Family

ID=84072612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211250546.3A Pending CN115374936A (en) 2022-10-13 2022-10-13 Neural network model clipping method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115374936A (en)

Similar Documents

Publication Publication Date Title
CN110705718A (en) Model interpretation method and device based on cooperative game and electronic equipment
CN111881023B (en) Software aging prediction method and device based on multi-model comparison
US11580194B2 (en) Information processing apparatus, information processing method, and program
CN110570544A (en) method, device, equipment and storage medium for identifying faults of aircraft fuel system
CN112433896B (en) Method, device, equipment and storage medium for predicting server disk faults
CN114282670A (en) Neural network model compression method, device and storage medium
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
CN114580263A (en) Knowledge graph-based information system fault prediction method and related equipment
CN111078456A (en) Equipment fault diagnosis method and device, computer readable storage medium and electronic equipment
WO2021021271A9 (en) Indiagnostics framework for large scale hierarchical time-series forecasting models
CN113946983A (en) Method and device for evaluating weak links of product reliability and computer equipment
CN107463486B (en) System performance analysis method and device and server
CN112801315A (en) State diagnosis method and device for power secondary equipment and terminal
CN111783883A (en) Abnormal data detection method and device
CN115374936A (en) Neural network model clipping method, device, equipment and medium
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN115392441A (en) Method, apparatus, device and medium for on-chip adaptation of quantized neural network model
CN114998649A (en) Training method of image classification model, and image classification method and device
CN114595627A (en) Model quantization method, device, equipment and storage medium
CN111967774A (en) Software quality risk prediction method and device
CN113220551A (en) Index trend prediction and early warning method and device, electronic equipment and storage medium
CN111815442B (en) Link prediction method and device and electronic equipment
CN117349734B (en) Water meter equipment identification method and device, electronic equipment and storage medium
CN115017819A (en) Engine remaining service life prediction method and device based on hybrid model
Mohseni et al. Chaotic behavior in monetary systems: Comparison among different types of taylor rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination