CN113205158A - Pruning quantification processing method, device, equipment and storage medium of network model - Google Patents

Pruning quantification processing method, device, equipment and storage medium of network model Download PDF

Info

Publication number
CN113205158A
CN113205158A CN202110598683.5A CN202110598683A CN113205158A CN 113205158 A CN113205158 A CN 113205158A CN 202110598683 A CN202110598683 A CN 202110598683A CN 113205158 A CN113205158 A CN 113205158A
Authority
CN
China
Prior art keywords
pruning
neural network
convolutional neural
channel
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110598683.5A
Other languages
Chinese (zh)
Inventor
詹雁
潘柳华
徐麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202110598683.5A priority Critical patent/CN113205158A/en
Publication of CN113205158A publication Critical patent/CN113205158A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2133Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on naturality criteria, e.g. with non-negative factorisation or negative correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a pruning quantification processing method, a pruning quantification processing device, equipment and a storage medium of a network model, wherein the method comprises the following steps: pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism; carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment; and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning. Through the processing mode, parameters with small contribution in the convolutional neural network model can be eliminated, and the response speed of the network model is ensured on the premise that the accuracy of the original model is close to that of the smaller parameter.

Description

Pruning quantification processing method, device, equipment and storage medium of network model
Technical Field
The embodiment of the application relates to the field of network model processing, in particular to a pruning quantification processing method, device, equipment and storage medium for a network model.
Background
The law enforcement recorder is a portable device integrating the functions of real-time recording, photographing, shooting and the like, can be widely applied to field law enforcement or other tasks of law enforcement units such as polices (for example, traffic police, public security, fire fighting, criminal investigation and the like), traffic, city management, judicial law and the like, and plays an important role in real-time recording of law enforcement processes, abnormal event detection and backtracking and evidence obtaining of special events. However, when the computational power is limited, the mobile terminal device can quickly respond to an abnormal event, and realizing quick feedback is a significant problem to be solved.
Disclosure of Invention
The application provides a pruning quantification processing method, a pruning quantification processing device, a pruning quantification processing equipment and a storage medium for a network model, which can eliminate parameters with small contribution in a convolutional neural network model, and ensure the response speed of the network model on the premise that the precision of the original model is close to that of the small parameter.
In a first aspect, an embodiment of the present application provides a pruning quantization processing method for a network model, where the method includes:
pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism;
carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning.
Optionally, the pruning the convolutional neural network based on the channel attention mechanism and the weight attention mechanism includes:
pruning the convolutional neural network based on a channel attention mechanism to obtain a first probability matrix;
pruning the convolutional neural network based on a weight attention mechanism and an input image to obtain a second probability matrix;
pruning the convolutional neural network according to the first probability matrix and the second probability matrix.
Optionally, the pruning the convolutional neural network based on the channel attention mechanism to obtain a first probability matrix, including:
carrying out dimensionality reduction transformation on the channel of the convolutional neural network to obtain channel weight;
multiplying the channel by the channel weight corresponding to the channel to obtain an attention matrix corresponding to the channel;
a first probability matrix is acquired using a first function and the attention moment matrix.
Optionally, pruning the convolutional neural network based on the weight attention mechanism and the input image to obtain a second probability matrix, including:
performing linear addition on a characteristic image output by a current convolutional layer in the convolutional neural network and a characteristic image output by a layer of convolutional layer above the current convolutional layer to obtain a characteristic image matrix;
carrying out nonlinear processing on the characteristic image matrix based on an activation function;
performing dimension reduction operation on the feature image matrix after convolution check processing to obtain a dimension reduction image matrix;
acquiring a weight matrix of the dimension reduction image matrix by using a second function;
multiplying the weight matrix with the characteristic image output by the convolution layer of the previous layer to obtain a product matrix;
and acquiring a second probability matrix according to the product matrix and the first function.
Optionally, pruning the convolutional neural network according to the first probability matrix and the second probability matrix, including:
determining channel values in the first probability matrix that are less than a first threshold;
removing channels corresponding to the channel numerical values from the convolutional neural network, and taking the rest channels as reserved channels;
determining the same channel in the second probability matrix according to the reserved channel;
and determining the weight values which are smaller than a second threshold value in the weight values corresponding to the same channels in the second probability matrix, and removing the weight values from the convolutional neural network.
Optionally, performing secondary pruning on the pruned convolutional neural network, including:
and deleting the weight parameters smaller than the pruning threshold in the weight parameters of the convolutional neural network model.
Optionally, the performing a quantization operation on the convolutional neural network after the second pruning includes:
and performing quantization processing on the convolutional neural network after the secondary pruning processing through at least one of a fast convolution algorithm, network layer combination and multi-thread operation.
In a second aspect, an embodiment of the present application further provides a pruning quantization processing apparatus for a network model, where the apparatus includes:
the pruning module is used for carrying out pruning processing on the convolutional neural network based on the channel attention mechanism and the weight attention mechanism;
the pruning module is also used for carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and the quantization module is used for performing quantization operation on the convolutional neural network subjected to the secondary pruning processing.
In a third aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes:
the network model pruning quantization processing method comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and when the computer program is executed by the processor, the pruning quantization processing method of the network model provided by the embodiment of the application is realized.
In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for quantifying pruning of a network model as provided in the present application is implemented.
The application provides a pruning quantification processing method, a pruning quantification processing device, equipment and a storage medium of a network model, wherein the method comprises the following steps: pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism; carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment; and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning. Through the processing mode, parameters with small contribution in the convolutional neural network model can be eliminated, and the response speed of the network model is ensured on the premise that the accuracy of the original model is close to that of the smaller parameter.
Drawings
Fig. 1 is a flowchart of a pruning quantification processing method of a network model in an embodiment of the present application;
FIG. 2 is a flow chart of a method of determining a first probability matrix in an embodiment of the present application;
FIG. 3 is a flow chart of a method of determining a second probability matrix in an embodiment of the present application;
FIG. 4 is a flowchart of a method for pruning according to a first probability matrix and a second probability matrix in an embodiment of the present application;
FIG. 5 is a schematic diagram of a pruning quantization processing device of a network model in an embodiment of the present application;
FIG. 6 is a schematic diagram of a pruning quantization processing device of another network model in the embodiment of the present application;
FIG. 7 is a schematic diagram of a pruning quantization processing device of another network model in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.
In addition, in the embodiments of the present application, the words "optionally" or "exemplarily" are used for indicating as examples, illustrations or explanations. Any embodiment or design described herein as "optionally" or "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "optionally" or "exemplarily" etc. is intended to present the relevant concepts in a concrete fashion.
Fig. 1 is a flowchart of a pruning quantification processing method for a network model according to an embodiment of the present application, where the method may be applied to a portable device such as a law enforcement recorder, and is used to quickly process and respond to information acquired by the device. As shown in fig. 1, the method may include, but is not limited to, the following steps:
and S101, pruning the convolutional neural network based on the channel attention mechanism and the weight attention mechanism.
The convolutional neural network in the embodiment of the application can be used for processing images or videos, and the accuracy close to that of an original model by a small parameter amount can be realized by pruning and optimizing network model parameters in the convolutional neural network by combining a channel attention mechanism and a weight attention mechanism.
For example, in this embodiment of the present application, an implementation manner of pruning the convolutional neural network based on the channel attention mechanism and the weight attention mechanism may include: and pruning the convolutional neural network based on a channel attention mechanism to obtain a first probability matrix. And pruning the convolutional neural network based on the weight attention mechanism and the input image to obtain a second probability matrix. Here, in the case where the convolutional neural network processes a video, the input image here may also be understood as each frame image in the input video. And pruning the convolutional neural network according to the first probability matrix and the second probability matrix.
And S102, carrying out secondary pruning on the convolutional neural network after pruning.
The manner of combining the channel attention mechanism and the weight attention mechanism in step S101 may be understood as a first pruning process for the convolutional neural network, and then, on the basis, the pruned convolutional neural network may be pruned twice from other dimensions, so as to further reduce the size of the network model.
For example, after pruning the convolutional neural network based on the channel attention mechanism and the weight attention mechanism, the secondary pruning may be performed from the perspective of the weight parameter instead of pruning based on the convolutional layer, the channel, and the like, and the weight parameter smaller than the pruning threshold in the weight parameter of the convolutional neural network model is deleted to reduce the computation amount of the model.
Optionally, before the second pruning of the model, the network model after the first pruning may be finely tuned, for example, the network model training is performed in combination with the regularization term L1, so that the weight of the convolutional neural network becomes sparse, which not only can compensate for the lost model precision after the first pruning, but also can ensure the effect of the second pruning.
S103, carrying out quantization operation on the convolutional neural network after the secondary pruning processing.
Illustratively, the quantization processing mode selected in the embodiment of the present application may include at least one of a fast convolution algorithm, network layer merging, and multi-thread running.
Among them, the fast convolution algorithm (Winograd) can reduce the number of multiplications from the viewpoint of mathematical operations, and it implements the formula as follows:
Figure BDA0003092140280000061
wherein m1 ═ (d0-d2) g0,
Figure BDA0003092140280000062
m4 ═ (d1-d3) g2, in the above formula
Figure BDA0003092140280000071
A matrix of the characteristics is represented,
Figure BDA0003092140280000072
representing a convolution kernel.
The drawback of the fast convolution algorithm described above is that it only has a quantization effect in case of large channel sizes.
The network layer merging may be performed by merging operators in the convolutional neural network, for example, merging the convolutional layer (conv), the turbo neural network training layer (BatchNorm, bn), and the activation layer (relu) into one layer, and canceling the connection layer (concat).
Alternatively, the network model may be quantized in hardware and software modes, such as hardware architecture characteristics, a channel (pineline), a cache (cache), memory data rearrangement, a NEON assembly instruction, and the like.
It should be noted that the foregoing quantization methods belong to conventional operations in the prior art, and the embodiments of the present application do not describe in detail specific implementation processes of each quantization method, and those skilled in the art may select one or more combinations of the quantization methods according to actual needs to quantize the pruned network model, which is not limited in the embodiments of the present application.
The embodiment of the application provides a pruning quantification processing method of a network model, which comprises the steps of carrying out pruning processing on a convolutional neural network based on a channel attention mechanism and a weight attention mechanism; carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment; and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning. Through the processing mode, parameters with small contribution in the convolutional neural network model can be eliminated, and the response speed of the network model is ensured on the premise that the accuracy of the original model is close to that of the smaller parameter.
As shown in fig. 2, in an example, the pruning processing on the convolutional neural network based on the channel attention mechanism in step S101 described above to obtain an implementation manner of the first probability matrix, which includes but is not limited to the following steps:
s201, performing dimension reduction transformation on the channel of the convolutional neural network to obtain channel weight.
For example, assuming that the input of the convolutional neural network is H × W × C, where H denotes the image height, W denotes the image width, and C denotes C channels of the convolutional neural network, performing the dimension reduction transform on the channels of the convolutional neural network in this step may be understood as performing the dimension reduction transform on each channel, and then performing the dimension reduction on the input to obtain a 1 × C matrix, so as to obtain the channel weights of the C channels.
S202, multiplying the channel by the channel weight corresponding to the channel to obtain the attention matrix corresponding to the channel.
The attention matrix corresponding to each channel is obtained by multiplying the channel weight corresponding to the channel by the channel weight corresponding to the channel, namely multiplying the C channels by the weight values of the respective channels, so that the attention of the important channels can be enhanced.
S203, acquiring a first probability matrix by using the first function and the attention moment matrix.
For example, the first function may adopt a normalized exponential function (softmax function), and the obtained attention matrix is calculated by using the first function, that is, the first probability matrix may be obtained.
As shown in fig. 3, in an example, the pruning processing on the convolutional neural network based on the weight attention mechanism and the input image in the step S101 to obtain the second probability matrix may include, but is not limited to, the following steps:
s301, carrying out linear addition on the characteristic image output by the current convolutional layer in the convolutional neural network and the characteristic image output by the convolutional layer on the current convolutional layer to obtain a characteristic image matrix.
And S302, carrying out nonlinear processing on the characteristic image matrix based on the activation function.
And S303, performing dimension reduction operation on the feature image matrix subjected to the convolution check processing to obtain a dimension reduction image matrix.
For example, the size of the convolution kernel may be 1 × 1, that is, the feature image matrix after the non-linearization process is calculated based on the 1 × 1 convolution kernel, so as to obtain the reduced-dimension image matrix.
And S304, acquiring a weight matrix of the dimension reduction image matrix by using a second function.
Optionally, the second function may be a threshold function, for example, a sigmoid function, which may map variables to a range of 0 ~ 1. Therefore, the sigmoid function is used for calculating the dimensionality reduction image matrix, and a corresponding weight matrix can be obtained.
S305, multiplying the weight matrix and the characteristic image output by the convolution layer of the previous layer to obtain a product matrix.
S306, acquiring a second probability matrix according to the product matrix and the first function.
Similarly, the first function is the normalized exponential function (softmax function), i.e., the second probability matrix can be obtained by calculating the product matrix through the normalized exponential function.
As shown in fig. 4, in an example, in the step S101, an implementation manner of pruning the convolutional neural network according to the first probability matrix and the second probability matrix may include, but is not limited to, the following steps:
s401, determining channel values smaller than a first threshold value in the first probability matrix.
The first threshold in this step is used to determine a pruning object in the convolutional neural network from the perspective of the channel, and if the form of the first probability matrix obtained in the embodiment of fig. 2 is 1 × C, the channel value smaller than the first threshold in C channels included in the first probability matrix can be determined based on the first threshold.
S402, channels corresponding to the channel numerical values are removed from the convolutional neural network, and the rest channels are used as reserved channels.
After determining the channel value smaller than the first threshold in the first probability matrix based on the step S401, determining the channel corresponding to the channel value as a removal object, removing the corresponding channel from the convolutional neural network, and using the remaining channels as reserved channels.
And S403, determining the same channel in the second probability matrix according to the reserved channel.
In the embodiment of the present application, the information input by the network model is processed from the angles of the channels and the angles of the weights based on the channel attention mechanism and the weight attention mechanism, respectively, similar to the embodiment of fig. 2, the form of the second probability matrix obtained by calculation in the embodiment of fig. 3 is H × W × C, and then the corresponding channel and channel values in the second probability matrix can be obtained according to the reserved channels obtained in step S402.
S404, determining the weight values smaller than a second threshold value in the weight values corresponding to the same channels in the second probability matrix, and removing the weight values from the convolutional neural network.
The second threshold in this step is used to determine a pruning object in the convolutional neural network from the perspective of the weight, after the corresponding reserved channel in the second probability matrix is determined based on step S403, the weight value corresponding to the reserved channel may be judged according to the second threshold, the weight value smaller than the second threshold in the corresponding channel is determined, and the corresponding weight value is removed from the convolutional neural network.
Through the steps, factors with small contribution amount in the network model can be removed from the angles of the channel and the weight respectively, so that the size of the convolutional neural network model is reduced under the condition that the precision is basically not damaged.
After the pruning quantification is carried out on the network model in the mode, the optimized network model can be transplanted to the corresponding electronic equipment for use.
The examples of this application provide a preferred implementation, such as using the Huashi Haisi 35XX development board. Taking Haisi Hi3519 chip as an example, after the optimized model is obtained by the combination of quantization, pruning and low rank in the above process, the optimized model can be converted into a wk model supported under an NNIE framework through onnx and written to the engineering for operation. The onnx is an open file format designed for machine learning and used for storing a trained model, and different artificial intelligence frames can store information in the same format. The NNIE is a short name of a Neural Network Inference Engine (Neural Network Inference Engine), and is a hardware unit dedicated to acceleration processing of a Neural Network in a hai-si media System-on-a-Chip (Soc).
Fig. 5 is a pruning quantization processing apparatus for a network model according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes: a pruning module 501 and a quantization module 502;
the pruning module is used for carrying out pruning processing on the convolutional neural network based on the channel attention mechanism and the weight attention mechanism;
the pruning module is also used for carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and the quantization module is used for performing quantization operation on the convolutional neural network subjected to the secondary pruning processing.
Exemplarily, as shown in fig. 6, the pruning module may further include a first pruning unit 601, a second pruning unit 602, and a third pruning unit 603;
the first pruning unit is used for carrying out pruning processing on the convolutional neural network based on a channel attention mechanism to obtain a first probability matrix;
the second pruning unit is used for carrying out pruning processing on the convolutional neural network based on the weight attention mechanism and the input image to obtain a second probability matrix;
and the third pruning unit is used for pruning the convolutional neural network according to the first probability matrix and the second probability matrix.
In an example, the first pruning unit is configured to perform a dimension reduction transformation on a channel of a convolutional neural network to obtain a channel weight; multiplying the channel by the channel weight corresponding to the channel to obtain an attention matrix corresponding to the channel; and acquiring a first probability matrix using the first function (e.g., softmax function) and the attention moment matrix.
In an example, the second pruning unit is configured to perform linear addition on a feature image output by a current convolutional layer in the convolutional neural network and a feature image output by a convolutional layer one layer above the current convolutional layer to obtain a feature image matrix; carrying out nonlinear processing on the characteristic image matrix based on the activation function; performing dimension reduction operation on the feature image matrix after convolution check processing to obtain a dimension reduction image matrix; acquiring a weight matrix of the dimension-reduced image matrix by using a second function (for example, sigmoid function); multiplying the weight matrix by the characteristic image output by the convolution layer of the previous layer to obtain a product matrix; and acquiring a second probability matrix according to the product matrix and the first function.
In one example, a third pruning unit to determine channel values in the first probability matrix that are less than a first threshold; channels corresponding to the channel numerical values are removed from the convolutional neural network, and the rest channels are used as reserved channels; determining the same channel in the second probability matrix according to the reserved channel; and determining the weight values which are smaller than a second threshold value in the weight values corresponding to the same channels in the second probability matrix, and removing the weight values from the convolutional neural network.
Illustratively, as shown in fig. 7, the pruning module may further include a fourth pruning unit 604;
and the fourth pruning unit is used for deleting the weight parameters smaller than the pruning threshold in the weight parameters of the convolutional neural network model.
Illustratively, the quantization module may be configured to quantize the convolutional neural network after the quadratic pruning processing through at least one of a fast convolution algorithm, network layer merging, and multi-thread running.
The pruning quantization processing device for the network model provided by the embodiment of the application can execute the pruning quantization processing method for the network model provided by the embodiments of fig. 1 to 4 of the application, and has corresponding functional modules and beneficial effects of the execution method.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 8, the electronic device includes a processor 801, a memory 802, an input device 803, and an output device 804; the number of the processors 801 in the device may be one or more, and one processor 801 is taken as an example in fig. 8; the processor 801, the memory 802, the input device 803 and the output device 804 in the apparatus may be connected by a bus or other means, and fig. 8 illustrates an example of a connection by a bus.
The memory 802 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the pruning quantization processing method of the network model in fig. 1-4 in the embodiments of the present application (for example, the pruning module 501 and the quantization module 502 in the pruning quantization processing device of the network model). The processor 801 executes various functional applications and data processing of the electronic device by running software programs, instructions and modules stored in the memory 802, that is, implements the pruning quantization processing method of the network model described above.
The memory 802 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the cloud server, and the like. Further, the memory 802 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 802 may further include memory located remotely from the processor 801, which may be connected to devices/terminals/servers via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input unit 803 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the device. The output device 804 may include a display device such as a display screen.
Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for pruning quantization processing of a network model, the method including:
pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism;
carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning.
Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also execute the pruning quantization processing method of the network model provided in any embodiment of the present application.
From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.
It should be noted that, in the embodiment of the pruning quantization processing device for the network model, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims (10)

1. A pruning quantification processing method of a network model is characterized by comprising the following steps:
pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism;
carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and carrying out quantization operation on the convolutional neural network subjected to the secondary pruning.
2. The method of claim 1, wherein pruning the convolutional neural network based on a channel attention mechanism and a weight attention mechanism comprises:
pruning the convolutional neural network based on a channel attention mechanism to obtain a first probability matrix;
pruning the convolutional neural network based on a weight attention mechanism and an input image to obtain a second probability matrix;
pruning the convolutional neural network according to the first probability matrix and the second probability matrix.
3. The method of claim 2, wherein the pruning the convolutional neural network based on a channel attention mechanism to obtain a first probability matrix comprises:
carrying out dimensionality reduction transformation on the channel of the convolutional neural network to obtain channel weight;
multiplying the channel by the channel weight corresponding to the channel to obtain an attention matrix corresponding to the channel;
a first probability matrix is acquired using a first function and the attention moment matrix.
4. The method of claim 2, wherein pruning the convolutional neural network based on a weighted attention mechanism and an input image to obtain a second probability matrix comprises:
performing linear addition on a characteristic image output by a current convolutional layer in the convolutional neural network and a characteristic image output by a layer of convolutional layer above the current convolutional layer to obtain a characteristic image matrix;
carrying out nonlinear processing on the characteristic image matrix based on an activation function;
performing dimension reduction operation on the feature image matrix after convolution check processing to obtain a dimension reduction image matrix;
acquiring a weight matrix of the dimension reduction image matrix by using a second function;
multiplying the weight matrix with the characteristic image output by the convolution layer of the previous layer to obtain a product matrix;
and acquiring a second probability matrix according to the product matrix and the first function.
5. The method of claim 2, wherein pruning the convolutional neural network according to the first probability matrix and the second probability matrix comprises:
determining channel values in the first probability matrix that are less than a first threshold;
removing channels corresponding to the channel numerical values from the convolutional neural network, and taking the rest channels as reserved channels;
determining the same channel in the second probability matrix according to the reserved channel;
and determining the weight values which are smaller than a second threshold value in the weight values corresponding to the same channels in the second probability matrix, and removing the weight values from the convolutional neural network.
6. The method of claim 1, wherein performing a second pruning process on the pruned convolutional neural network comprises:
and deleting the weight parameters smaller than the pruning threshold in the weight parameters of the convolutional neural network model.
7. The method according to any one of claims 1 to 6, wherein the quantization operation is performed on the convolutional neural network after the quadratic pruning processing, and comprises:
and performing quantization processing on the convolutional neural network after the secondary pruning processing through at least one of a fast convolution algorithm, network layer combination and multi-thread operation.
8. A pruning quantization processing device for a network model is characterized by comprising:
the pruning module is used for carrying out pruning processing on the convolutional neural network based on the channel attention mechanism and the weight attention mechanism;
the pruning module is also used for carrying out secondary pruning treatment on the convolutional neural network after the pruning treatment;
and the quantization module is used for performing quantization operation on the convolutional neural network subjected to the secondary pruning processing.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the pruning quantification processing method for the network model according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a pruning quantification processing method for a network model according to any one of claims 1 to 7.
CN202110598683.5A 2021-05-31 2021-05-31 Pruning quantification processing method, device, equipment and storage medium of network model Pending CN113205158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110598683.5A CN113205158A (en) 2021-05-31 2021-05-31 Pruning quantification processing method, device, equipment and storage medium of network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110598683.5A CN113205158A (en) 2021-05-31 2021-05-31 Pruning quantification processing method, device, equipment and storage medium of network model

Publications (1)

Publication Number Publication Date
CN113205158A true CN113205158A (en) 2021-08-03

Family

ID=77023783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110598683.5A Pending CN113205158A (en) 2021-05-31 2021-05-31 Pruning quantification processing method, device, equipment and storage medium of network model

Country Status (1)

Country Link
CN (1) CN113205158A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739050A (en) * 2022-09-30 2023-09-12 荣耀终端有限公司 Cross-layer equalization optimization method, device and storage medium
CN116992946A (en) * 2023-09-27 2023-11-03 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116739050A (en) * 2022-09-30 2023-09-12 荣耀终端有限公司 Cross-layer equalization optimization method, device and storage medium
CN116992946A (en) * 2023-09-27 2023-11-03 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product
CN116992946B (en) * 2023-09-27 2024-05-17 荣耀终端有限公司 Model compression method, apparatus, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN110598620B (en) Deep neural network model-based recommendation method and device
CN110245579B (en) People flow density prediction method and device, computer equipment and readable medium
CN113205158A (en) Pruning quantification processing method, device, equipment and storage medium of network model
CN111783974A (en) Model construction and image processing method and device, hardware platform and storage medium
CN112052837A (en) Target detection method and device based on artificial intelligence
CN110991568A (en) Target identification method, device, equipment and storage medium
CN113765928B (en) Internet of things intrusion detection method, equipment and medium
CN114693192A (en) Wind control decision method and device, computer equipment and storage medium
CN114637884B (en) Method, device and equipment for matching cable-stayed cable-computed space-time trajectory with road network
CN113673311A (en) Traffic abnormal event detection method, equipment and computer storage medium
CN113591751A (en) Transformer substation abnormal condition warning method and device, computer equipment and storage medium
CN114529750A (en) Image classification method, device, equipment and storage medium
CN116432736A (en) Neural network model optimization method and device and computing equipment
CN115296933B (en) Industrial production data risk level assessment method and system
CN111144492B (en) Scene map generation method for mobile terminal virtual reality and augmented reality
CN112183359B (en) Method, device and equipment for detecting violent content in video
CN111400764B (en) Personal information protection wind control model training method, risk identification method and hardware
CN114445668A (en) Image recognition method and device, electronic equipment and storage medium
CN109614854B (en) Video data processing method and device, computer device and readable storage medium
CN112560953A (en) Method, system, device and storage medium for identifying illegal operation of private car
CN112364682A (en) Case searching method and device
CN114067935B (en) Epidemic disease investigation method, system, electronic equipment and storage medium
CN116708313B (en) Flow detection method, flow detection device, storage medium and electronic equipment
CN114117010A (en) NLP task processing method and device, terminal equipment and storage medium
CN116843991A (en) Model training method, information generating method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination