CN110363279B - Image processing method and device based on convolutional neural network model - Google Patents

Image processing method and device based on convolutional neural network model Download PDF

Info

Publication number
CN110363279B
CN110363279B CN201810250867.0A CN201810250867A CN110363279B CN 110363279 B CN110363279 B CN 110363279B CN 201810250867 A CN201810250867 A CN 201810250867A CN 110363279 B CN110363279 B CN 110363279B
Authority
CN
China
Prior art keywords
weight parameters
weight
image
parameters
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810250867.0A
Other languages
Chinese (zh)
Other versions
CN110363279A (en
Inventor
胡慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810250867.0A priority Critical patent/CN110363279B/en
Priority to PCT/CN2019/079281 priority patent/WO2019184823A1/en
Publication of CN110363279A publication Critical patent/CN110363279A/en
Application granted granted Critical
Publication of CN110363279B publication Critical patent/CN110363279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution

Abstract

The application provides an image processing method and device based on a convolutional neural network model. The method comprises the following steps: acquiring a first weight parameter set corresponding to a neural network layer, wherein the first weight parameter set comprises N1 first weight parameters, and N1 is an integer greater than or equal to 1; respectively calculating the ratio of N1 first weight parameters to the first value m to obtain N1 second weight parameters, wherein | Wmax|≤m≤2|Wmax|,WmaxThe weight parameter with the maximum absolute value in the first weight parameter set is obtained; respectively quantizing the N1 second weight parameters into the sum of at least two Q powers of 2 to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer; acquiring an image to be processed; and processing the image to be processed according to the N1 third weight parameters to obtain an output image. According to the method and the device, errors caused by weight quantization can be reduced, and therefore precision loss is reduced.

Description

Image processing method and device based on convolutional neural network model
Technical Field
The present application relates to the field of image processing, and in particular, to an image processing method and apparatus based on a convolutional neural network model.
Background
In recent years, neural networks, particularly convolutional neural networks, have enjoyed great success in image processing and image recognition applications. A typical convolutional neural network generally comprises a plurality of convolutional layers, fully-connected layers, and the like, and multiplication is a main bottleneck from the viewpoint of calculation. The storage and transmission of the model parameters also consume a lot of energy in view of the required storage space. Many researchers are studying methods for compressing and accelerating neural networks, i.e., the model storage space can be reduced, and the calculation amount (i.e., the number of multiplications) can be greatly reduced.
Quantization is a common model compression and acceleration method, and particularly quantizes parameters to a power of 2 representation, thereby converting multiplication into shift operation. The multiplication is converted into the shift operation, so that the calculation complexity is greatly reduced, and the acceleration effect can be realized.
However, in practical applications, for example, for a convolutional neural network for super-resolution of images, it is necessary to consider that a large compression and acceleration gain is obtained, and at the same time, excessive errors caused by quantization are avoided, so that performance is affected, such as Peak Signal Noise Ratio (PSNR).
Disclosure of Invention
The application provides an image processing method and device based on a convolutional neural network model, which can reduce errors caused by quantization and reduce precision loss.
In a first aspect, an image processing method based on a convolutional neural network model is provided, where a neural network layer of the convolutional neural network model includes at least one convolutional layer and/or at least one fully-connected layer, and the method includes: acquiring a first weight parameter set corresponding to the neural network layer, wherein the first weight parameter set comprises N1 first weight parameters, and N1 is an integer greater than or equal to 1; respectively calculating the ratio of the N1 first weight parameters to the first value m to obtain N1 second weight parameters, wherein | Wmax|≤m≤2|Wmax|,WmaxThe weight parameter with the maximum absolute value in the first weight parameter set is obtained; respectively quantizing the N1 second weight parameters into the sum of at least two Q powers of 2 to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer; acquiring an image to be processed; and processing the image to be processed according to the N1 third weight parameters to obtain an output image.
Therefore, in the embodiment of the present application, a value m is used to perform an approximation process on the weight parameter, and in the processing process, due to the range of the value m, not only can the distribution of the weight parameter in the convolutional neural network be ensured to be unchanged, but also the error in the subsequent quantization process can be reduced. The weight parameters after the approximation are quantized into at least two powers of 2, so that not only is the multiplication calculation amount reduced, but also the error caused by quantization can be reduced, and the precision loss caused by the error is further reduced.
With reference to the first aspect, in certain implementations of the first aspect, the first weight parameter set further includes N2 fourth weight parameters, where N2 is an integer greater than or equal to 1; after the quantizing the N1 second weight parameters into the sum of at least two 2 powers of Q respectively to obtain N1 third weight parameters, the method includes: training the N2 fourth weight parameters according to the quantized quantization result to obtain N2 fifth weight parameters; respectively calculating the ratio of the N2 fifth weight parameters to the first number m to obtain N2 sixth weight parameters; respectively quantizing the N2 sixth weight parameters into the sum of at least two 2P powers to obtain N2 seventh weight parameters, wherein P is less than or equal to 0 and is an integer; the processing the image to be processed according to the N1 third weight parameters to obtain an output image, including: and processing the image to be processed according to the N1 third weight parameters and the N2 seventh weight parameters to obtain an output image.
According to the embodiment of the application, after the N1 sets of weight parameters are approximated and quantized, before the N2 sets of weight parameters are approximated and quantized, the N2 sets of weight parameters are retrained according to errors caused by the approximation and quantization. The precision loss caused by weight parameter approximation and quantization is compensated, so that the integral error can be reduced, and the precision is improved.
With reference to the first aspect, in certain implementations of the first aspect, the method further includes: i W1min|≥|W2maxL, wherein W1minIs the weight parameter with the smallest absolute value in the N1 first weight parameters, W2maxThe weight parameter with the largest absolute value is the weight parameter with the largest absolute value in the N2 fourth weight parameters.
In the embodiment of the present application, the larger the absolute value of the weight parameter is, the larger the contribution to the neural network is. The weight parameters of the important group are firstly approximated and quantized, and then the weight parameters of the secondary important group are retrained to make up the precision loss caused by the approximation and quantization of the weight parameters, so that the integral error in the convolutional neural network can be further ensured, and the precision loss is reduced.
With reference to the first aspect, in certain implementations of the first aspect, the first number m is the number WmaxIs multiplied by a second numerical value c determined from the errors between the N1 first weight parameters and the N1 third weight parameters.
Determining c according to the error of the first weight parameter and the third weight parameter, and further according to c and WmaxDetermines m, therebyThe error of the weight parameter with the maximum absolute value can be preferentially ensured, and the error caused by approximation can be reduced by determining c according to the error.
With reference to the first aspect, in certain implementations of the first aspect, after the quantizing the N1 second weight parameters to a sum of at least two powers Q of 2, respectively, to obtain N1 third weight parameters, the method further includes determining a weight index value list, where the weight index value list is used to characterize N1 binary numbers corresponding to the N1 third weight parameters, respectively, and sign bits of each of the third weight parameters.
With reference to the first aspect, in certain implementations of the first aspect, processing the image to be processed according to the N1 third weight parameters to obtain an output image includes: and carrying out shifting and adding operation on the image to be processed according to the N1 binary numbers in the weight index value list, and multiplying the image to be processed by the first numerical value m to obtain an output image.
In a second aspect, there is provided an image processing apparatus based on a convolutional neural network model, the apparatus comprising means for performing the method of the first aspect or any one of the possible implementations of the first aspect.
In a third aspect, an image processing apparatus based on a convolutional neural network model is provided, which includes a transceiver, a processor, and a memory. The processor is configured to control the transceiver to transmit and receive data, the memory is configured to store a computer program, and the processor is configured to call and run the computer program from the memory, so that the network device performs the method of the first aspect and possible implementations thereof.
In a fourth aspect, there is provided a computer readable medium having stored thereon a computer program which, when executed by a computer, implements the method of the first aspect or any of its possible implementations.
In a fifth aspect, a computer program product is provided, which when executed by a computer implements the method of the first aspect or any of its possible implementations.
In a sixth aspect, a processing apparatus is provided that includes a processor and an interface;
the processor is configured to perform the methods as an execution subject of the methods of the first aspect or any possible implementation manner of the first aspect, wherein a relevant data interaction process (e.g. acquiring an image, acquiring a convolutional neural network model, or acquiring weight parameters) is performed through the interface. In a specific implementation process, the interface may further complete the data interaction process through a transceiver.
It should be understood that the processing device in the above sixth aspect may be a chip, the processor may be implemented by hardware or may be implemented by software, and when implemented by hardware, the processor may be a logic circuit, an integrated circuit, or the like; when implemented in software, the processor may be a general-purpose processor implemented by reading software code stored in a memory, which may be integrated with the processor, located external to the processor, or stand-alone.
Drawings
Fig. 1 is a basic framework diagram of a convolutional neural network.
Fig. 2 is a schematic diagram of the height, width and depth of a three-dimensional image.
FIG. 3 is a diagram of a convolution layer implementation convolution operation.
Fig. 4 is a schematic structural view of a fully connected layer.
Fig. 5 is a schematic diagram of the INQ technique quantization weights.
Fig. 6 is a schematic view of an application scenario provided in an embodiment of the present application.
Fig. 7 is a schematic flowchart of an image processing method based on a convolutional neural network model provided in an embodiment of the present application.
Fig. 8 is a schematic flowchart of a processing procedure of a weight parameter provided in an embodiment of the present application.
Fig. 9 is a schematic diagram of a weight index value structure provided in an embodiment of the present application.
Fig. 10 is a schematic flow chart of a processing procedure of the weight parameter provided in the embodiment of the present application.
Fig. 11 is a schematic diagram of an enlarged view obtained by an image processing method based on a convolutional neural network model according to an embodiment of the present application and a first network enlarged view.
Fig. 12 is another schematic diagram of an enlarged view obtained by an image processing method based on a convolutional neural network model according to an embodiment of the present application and an enlarged view of a first network.
Fig. 13 is a schematic calculation diagram of a shift and add calculation unit of an image processing method based on a convolutional neural network model according to an embodiment of the present application.
FIG. 14 is a schematic block diagram of an image processing apparatus based on a convolutional neural network model according to one embodiment of the present application.
Fig. 15 is a schematic block diagram of an image processing apparatus based on a convolutional neural network model according to another embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
For ease of understanding, the neural network will be described in detail. Neural networks typically include multiple neural network layers, each of which may implement different operations or operations. Common neural network layers include convolutional layers (convolution layer), pooling layers (pooling layer), full-connection layers (full-connection layer), and the like.
Fig. 1 is a basic framework diagram of a Convolutional Neural Network (CNN). Referring to fig. 1, the convolutional neural network includes convolutional layers, pooling layers, and fully-connected layers. The plurality of convolution layers and the plurality of pooling layers are alternately arranged, and the convolution layers may be followed by convolution layers or pooling layers.
The convolutional layer is mainly used for performing convolution operation on the input matrix, and the pooling layer is mainly used for performing pooling operation on the input matrix. Whether a convolution operation or a pooling operation may correspond to one kernel, where the kernel to which the convolution operation corresponds may be referred to as a convolution kernel. The convolution operation and the pooling operation are described in detail below.
Convolution operations are mainly used in the field of image processing, where the input matrix may also be referred to as a feature map. The convolution operation corresponds to a convolution kernel. The convolution kernel may also be referred to as a weight matrix, where each element in the weight matrix is a weight. In the convolution process, the input matrix is divided into a plurality of sub-matrices with the same size as the weight matrix by a sliding window, each sub-matrix and the weight matrix carry out matrix multiplication, and the obtained result is the weighted average of the data elements in each sub-matrix.
For clarity, the terminology used in the present application is explained first.
1. Pixel
A pixel is the most basic element that makes up an image, being a logical unit of size.
2. Size and resolution of image
The size of the image may include a plurality of dimensions, and when the dimension of the image is two-dimensional, the size of the image includes a height and a width; when the dimension of the image is three-dimensional, the size of the image includes width, height, and depth.
It is to be understood that the height of an image may be understood as the number of pixels the image comprises in the height direction; the width of an image may be understood as the number of pixels the image comprises in the width direction; the depth of an image may be understood as the number of channels of the image.
In the convolutional neural network model, the depth of an image can be understood as the number of feature maps (features maps) included in the image, wherein the width and height of any one feature map of the image are the same as those of other feature maps of the image.
That is, one image is a three-dimensional image, and it can be understood that the three-dimensional image is composed of a plurality of two-dimensional feature maps, and the plurality of two-dimensional feature maps have the same size.
It should be understood that an image includes M feature maps, each of the M feature maps having a height of H pixels and a width of W pixels, and that the image is a three-dimensional image having a size of H × W × M, that is, the three-dimensional image includes M H × W two-dimensional feature maps. Wherein H, W is an integer greater than 1, and M is an integer greater than 0.
Fig. 2 shows a 5 × 5 × 3 image, which includes 3 feature maps (e.g., red (R), green (G), and blue (B), each having a size of 5 × 5.
It should be understood that the feature maps of different colors can be understood as different channels of the image, and different channels can be considered as different feature maps in the convolutional neural network.
It should be further understood that fig. 2 only illustrates an image with a depth of 3, and the depth of the image may also be other values, for example, the depth of a grayscale image is 1, the depth of an RGB-depth (depth, D) image is 4, and the like, which is not limited in this embodiment of the application.
It is also understood that the resolution of an image (or feature map) can be understood as the product of the width and height of the image (or feature map), i.e., if the height of the image (or feature map) is H pixels and the width of the image (or feature map) is W pixels, then the resolution of the image (or feature map) is H × W.
It should be understood that the to-be-processed image or the input image mentioned in the embodiment of the present application may be an input feature image, but the embodiment of the present application is not limited thereto.
3. Convolutional layer
In convolutional neural networks, convolutional layers mainly play a role in extracting features. The convolution operation is mainly carried out on the input image according to a set convolution kernel. For ease of understanding, the process of the convolution operation is illustrated below in conjunction with FIG. 3.
As shown in fig. 3, the input matrix is a 3 × 3 matrix. In order to ensure that the input matrix and the output matrix are consistent in dimension, 2 rows, 2 columns and 0 elements need to be supplemented to the edge of the input matrix before the convolution operation is performed on the input matrix, so that the input matrix is converted into a 5 x 5 matrix. The size of the sliding window represents the size of the convolution kernel, and fig. 3 illustrates an example of a weight matrix in which the convolution kernel is 3 × 3. The sliding window may slide according to a certain sliding step length by using the upper left corner position of the input matrix as the starting position, and fig. 3 illustrates the sliding step length as 1. The output matrix is obtained by performing 9 convolution operations in the manner shown in fig. 3, where the first convolution operation results in the element (1,1) of the output matrix, the second convolution operation results in the element (1,2) of the output matrix, and so on.
It should be understood that the convolution operation generally requires the input matrix and the output matrix to have the same dimension, but the embodiment of the present application is not limited thereto, and may not require the input matrix and the output matrix to have the same dimension. If the convolution operation does not require the input matrix and output matrix dimensions to be consistent, then the input matrix may not be complemented by 0 before performing the convolution operation.
It should also be understood that the above is described by taking the example that the sliding step size of the convolution operation is 1, but the embodiment of the present application is not limited thereto, and the sliding step size of the convolution operation may also be greater than 1.
4. Pooling layer
The role of the pooling layer is to reduce the width and height of the feature map and reduce the computation complexity of the convolutional neural network by reducing the data quantity of the feature layer; on one hand, feature compression is carried out, and main features are extracted.
The pooling operation is typically used to reduce the dimensionality of the input matrix, i.e., to down-sample the input matrix. The pooling operation is similar to the convolution operation and is also calculated based on a check input matrix, so there is also a sliding window and the sliding step size of the pooling operation is typically greater than 1 (and may also be equal to 1). The types of pooling operations are various, such as average pooling and maximum pooling. Average pooling is the averaging of all elements in a sliding window. The maximum pooling is the calculation of the maximum of all elements in the sliding window. The pooling process is substantially similar to the convolution process, except that the data elements in the sliding window operate differently and are not described in detail herein.
5. Full connection layer
And connecting several full connection layers at the last of the convolutional neural network for integrating the extracted features and finally outputting the processing result of the image to be processed. The full-concatenation layer is that all nodes of the previous layer are connected with all nodes of the next layer, and each connected node has a weight parameter. Fig. 4 is a schematic diagram of a simple fully connected layer.
As shown in FIG. 4, L1The layer has three common nodes and one offset node, and the circle labeled "+ 1" is called the offset node. L is2Each common node of the layer is associated with L1All nodes of a layer are connected, and each connected node has a weight parameter, L2Output of the layer is for example L2Output of first node of layer
Figure BDA0001607734340000051
Comprises the following steps:
Figure BDA0001607734340000052
wherein the content of the first and second substances,
Figure BDA0001607734340000053
is L1First node of layer and L2The weight parameter between the first nodes of the layer,
Figure BDA0001607734340000054
is L1Second node of layer and L2The weight parameter between the first nodes of the layer,
Figure BDA0001607734340000055
is L1Third node of layer and L2The weight parameter between the first nodes of the layer,
Figure BDA0001607734340000056
is L1Bias node and L of layer2A weight parameter between the first nodes of the layer. f () denotes the activation function, like convolutional layers, e.g. the ReLU activation function. By analogy, L can be obtained2Output of second and third nodes of layer
Figure BDA0001607734340000057
And
Figure BDA0001607734340000058
Figure BDA0001607734340000059
Figure BDA00016077343400000510
similarly, L can be obtained3Output h ofw,b(x)。
It should be understood that the fully connected layer shown in fig. 4 is only an exemplary illustration and is not intended to limit the present application.
6. Quantization
Quantization (quantization), i.e. describing the sample distribution with a discrete codebook. General neural network parameters tend to exhibit non-uniform distributions. Thus, the quantization codebook is a non-uniformly distributed codebook.
In a computer, the decimal value of a 32-bit (bit) floating point number is cut into a plurality of powers of 2 b for storage. Due to the limited decimal place, a deviation in accuracy occurs. In addition, the power of 2 decomposition also consumes a large number of clock cycles, which is also a resource-consuming reason for floating-point operations in hardware systems.
One approach (shiftCNN) is to use this idea, in conjunction with the properties of the neural network itself, to approximate floating point weights as N additions of powers b of 2. The complete codebook includes N sub-codebooks, and each codebook includes M ═ pow (2, B) -1 codebooks, that is, each codebook may be represented by B bits. Each codebook is defined as follows:
Cn={0,±2-n+1,±2-n,±2-n-1,……,±2-n-[M/2]+2}
assuming that N is 2 and B is 4, the codebook is:
C1={0,±2-1,±2-2,±2-3,±2-4±2-5,±2-6}
C2={0,±2-2,±2-3,±2-4,±2-5±2-6,±2-7}
thus, after the quantization of the parameters, the multiplication of the convolution calculation is converted into the shift and addition, thereby reducing the amount of calculation.
In the application of image recognition, the final required result is only the probability that the image belongs to each category, and under the condition of unchanged ordering, the probability has a large floating space, so that the loss precision can be ensured to be less than 1% when N is 2, and lossless compression can be realized when N is 3 for the compression of the network. Thus shiftCNN can achieve high compression gain in such applications.
For the convolutional neural network on the super-resolution, the transformation of each pixel of the image is aimed at, the final result is the value of each pixel point is restored, and the required precision in the whole calculation process is high. In image processing, the Peak Signal to Noise Ratio (PSNR) is often calculated for objective evaluation of an image. PSNR is an objective measure of image distortion or noise level. The larger the PSNR value between 2 images, the more similar.
Super resolution techniques refer to the reconstruction of a corresponding high resolution image from an observed low resolution image. A commonly used index for evaluating the quality of super-resolution networks is PSNR. Table 1 shows the result of applying the above method to a super-resolution convolutional neural network.
TABLE 1 application results on super-resolution convolutional neural networks
First network N=2,B=4 N=3,B=4 N=4,B=4 N=5,B=4 N=6,B=4
PSNR(Set5) 32.79774 25.955 31.559861 32.592048 32.691628 32.778849
Flops(multiply) 2436 81 81 81 81 81
ADD(Times) 0 1*2436 2*2436 3*2436 4*2463 5*2463
The first network in table 1 is a super-resolution convolutional neural network, assuming that the PSNR of the model of the network is 32.79774. Each convolutional layer in the first network is compressed in the manner described above. It can be seen that at least more than 4 (i.e. 4) sub-codebooks are needed to guarantee the loss of precision, i.e. each weight is represented by a power addition of 4 terms 2. N to 6, the loss of accuracy is negligible. When N > is 4, the PSNR accuracy loss can be reduced to 0.2. The adder used in the method is proportional to the number of the sub-codebooks. When N is 4, the number of adders is 4 times, and the bottleneck of calculation is changed from the original multiplier to the adder. When N > -3, the gain of this method is small due to the large increase of the adder. When N is 3 and B is 4, PSNR loss is 1.2, and practical application is not feasible. Therefore, the above method has a poor effect on compression and acceleration of convolutional neural networks on super-resolution applications.
It should be noted that, in the embodiment of the present application, the first network is only an exemplary illustration of a super-resolution network, and different compression methods are adopted on a super-resolution network to indicate different losses of precision caused by the different compression methods.
The loss of precision in the quantization process is not negligible, and therefore a way is proposed, Incremental Network Quantization (INQ). INQ is a neural network lossless low bit quantization technique. Given a full-precision floating point neural network model with any structure, the INQ technology can efficiently convert the model into a lossless low-bit binary model, thereby well solving the defects of a neural network quantization compression method. The INQ technology provides a progressive neural network quantization idea, and the core of the INQ technology is to introduce three operations of parameter grouping, quantization and retraining. In the implementation, each layer of parameters in the full-precision floating point network model is firstly divided into two groups, the parameters in the first group are directly quantized and fixed, and the parameters in the other group are retrained to compensate the precision loss caused by quantization to the model. Then, the above three operations are sequentially applied to the retrained full-precision floating-point parameter part in an iteration mode until the model is completely quantized. Fig. 5 shows a schematic diagram of the INQ technique quantization weights. By skillfully coupling parameter grouping, quantification and retraining operations, the technology inhibits the performance loss caused by model quantification, thereby being suitable for neural network models with any structures in practice.
INQ enables lossless quantization at ResNet-18 with 4 bits (PSNR improvement) and 3 bits (0.2% loss). The INQ combines with a dynamic neural network compression (DNS) technology, taking AlexNet as an example, and realizes a near lossless, hundredfold and binary neural network model for the first time. Further, the INQ technology is popularized from the mode parameter quantization only to the mode parameter quantization and the input and output of each layer, and a lossless, low-bit and full-quantization neural network model (5-bit parameter quantization and 4-bit input and output quantization) is realized on the VGG-16 for the first time.
Table 2 shows the results of INQ on the super-resolution network (first network). Each convolutional layer is compressed by the INQ technique, and the network weight parameters are quantized to 5 bits, 8 bits, and 10 bits, respectively, and the obtained results are shown in table 2 below.
TABLE 2INQ results on super resolution network
5bit 8bit 10bit First network
PSNR 31.6657 31.96 31.69 32.7977
As can be seen from Table 2, the INQ method, with a PSNR loss of about 1dB, is not practically usable. And has the problems of long retraining time and slow convergence.
For the convolutional neural network of image super-resolution, the problems of large weight parameter and large calculation amount (multiplication times) exist. Such networks keep the feature image size constant (as with the input image) from the beginning of the input to the end of the output. And the size of the input image is proportional to the total amount of computation. The need for such applications to reduce the number of multiplications for a single pixel point is even more acute. The two modes are applied to the super-resolution convolutional neural network, and PSNR (Peak Signal to noise ratio) reduction is obvious.
The application combines the power term decomposition idea of 2in a computer and the grouping parameter quantization idea of INQ, and provides an image processing method based on a convolutional neural network model, which can ensure that the PSNR (Peak Signal to noise ratio) on a super-resolution network is not obviously reduced while the large compression and acceleration gains are achieved.
It should be understood that the technical solution provided in the embodiment of the present application may be applied to various scenes that need to perform image processing on an input image to obtain a corresponding output image, and the embodiment of the present application does not limit this.
For example, as shown in fig. 6, the technical solution of the embodiment of the present application may be applied to a terminal device, which may be mobile or fixed, for example, the terminal device may be a mobile phone with an image processing function, a Tablet Personal Computer (TPC), a media player, a smart television, a notebook computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a video camera, a smart watch, a Wearable Device (WD), and the like, which is not limited in the embodiment of the present application.
Fig. 7 is a schematic diagram of an image processing method based on a convolutional neural network model according to an embodiment of the present application. A neural network layer of the convolutional neural network model comprises at least one convolutional layer and/or at least one fully-connected layer, and the method 100 comprises:
110, obtaining a first weight parameter set corresponding to a neural network layer, where the first weight parameter set includes N1 first weight parameters, where N1 is an integer greater than or equal to 1;
120, respectively calculating the ratio of the N1 first weight parameters to the first value m to obtain N1 second weight parameters, wherein | Wmax|≤m≤2|Wmax|,WmaxThe weight parameter with the maximum absolute value in the first weight parameter set is obtained;
130, quantizing the N1 second weight parameters into the sum of at least two 2 powers of Q to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer;
140, acquiring an image to be processed;
150, processing the image to be processed according to the N1 third weight parameters to obtain an output image.
According to the embodiment of the present application, the weight parameter is approximated by a first value m before being quantized to the power of 2 (i.e., an example of the Q power of 2), so that the accuracy loss caused by the error can be minimized on the premise that the distribution of the weight parameter in the neural network is satisfied. Where Q is a power of at least 2, Q ═ 0, -1, -2, … …, i.e., represents at least 2 powers of 2. For example, assuming w is quantized to two powers of 2 for the weight parameter w, then w can be quantized to (2)q1+2q2) Wherein q1 and q2 are integers not exceeding 0, and q1 and q2 may be the same or different.
The convolutional neural network comprises at least one convolutional layer and/or at least one fully-connected layer, each layer having a plurality of weight parameters. The plurality of weight parameters have positive numbers, negative numbers, and magnitudes. For clarity, in the embodiments of the present application, the weight parameters in the convolutional layer are taken as an example for illustration. Let it be assumed that the convolutional layer includes a weight parameter set W, and W is { W | W ═ W1, W2, … …, wn }, where W is a weight parameter. The N1 first weight parameters are any N1 weight parameters in w, and N1 is not more than N. In the embodiment of the present application, the first weight parameter is only one representation of the total of N1 weight parameters, and does not mean that the N1 weight parameters are all the same in size.
The magnitude of the weight parameter absolute value represents the importance of the weight parameter to the neural network. The absolute value of the weight parameter is recorded as the amplitude of the weight parameter, and the weight parameter with larger amplitude is more important, i.e. has larger contribution to the neural network. In the neural network, weighting parameters with different amplitude values are distributed. The maximum amplitude w _ max is max (abs (w)). WmaxThe weight parameter with the largest amplitude in the plurality of weight parameters.
The embodiment of the present application uses a constant Wconst (i.e., an instance of the first value m) to approximate the weight parameter, where the constant is greater than | WmaxIs less than 2| WmaxL. When Wconst is equal to | WmaxWhen |, the amplitude of the weight parameter is 1 at most; when Wconst is 2| WmaxIn | the magnitude of the weight parameter is 0.5 at maximum. Therefore, the constant Wconst can guarantee the distribution of the weight parameter in W. And Wconst is more than or equal to | WmaxI means that it can be quantized more approximately to the power of 2, thereby further reducing the error due to quantization and reducing the loss of precision.
The weight parameters are approximated before being quantized to the power of 2. Specifically, w is divided by Wconst to obtain n new weight parameters (i.e., an example of the second weight parameter). It should be noted that, in the embodiment of the present application, for convenience and clarity of description, a process of dividing the weight parameter by a constant is referred to as a process of approximating the weight parameter, and the obtained weight parameter is denoted as an approximated weight parameter.
The approximate weight parameters are quantized to at least two powers of 2, so that not only can the multiplication calculation amount be reduced and the storage space be saved, but also compared with the method that the approximate weight parameters are quantized to a single power of 2, the error can be reduced and the precision loss can be reduced.
Optionally, the first weight parameter set further includes N2 fourth weight parameters, where N2 is an integer greater than or equal to 1; after the quantizing the N1 second weight parameters into a sum of at least two powers of 2 respectively to obtain N1 third weight parameters, the method includes: training the N2 fourth weight parameters according to the quantized quantization result to obtain N2 fifth weight parameters; respectively calculating the ratio of the N2 fifth weight parameters to the first number m to obtain N2 sixth weight parameters; respectively quantizing the N2 sixth weight parameters into the sum of at least two 2P powers to obtain N2 seventh weight parameters, wherein P is less than or equal to 0 and is an integer; the processing the image to be processed according to the N1 third weight parameters to obtain an output image, including: and processing the image to be processed according to the N1 third weight parameters and the N2 seventh weight parameters to obtain an output image.
Specifically, after N1 first weight parameters are approximated and quantized, the quantized weight parameters are fixed. Before the N2 fourth weight parameters are approximated and quantized, the N2 fourth weight parameters are trained according to the quantization results of the N1 first weight parameters to make up for the accuracy loss of the model caused by approximation and quantization, thereby improving the overall accuracy. It can be understood that the two parts into which the parameter grouping is divided are complementary functions, one of which establishes the basis of the low-precision model, and the other of which compensates for the loss of precision by retraining (retrain), so that iteration eventually results in progressive quantization and precision improvement.
For example, the weight parameter set W in the convolutional layer or the full link layer includes two sets of weight parameters, which are denoted as N1 and N2. The N1 sets of weighting parameters are N1 first weighting parameters, denoted as WN1={w1,w2,w3,……,wN1}; the N2 groups of weight parameters are N2 second weight parameters, denoted as WN2={w’1,w’2,w’3,……,w’N2}. Specifically, the weighting parameters of the N1 groups are first approximated and quantized. Then, the weight parameters of the N2 group are retrained for the result of the approximation and quantization of all the weight parameters of the N1 group, and then the weight parameters of the retrained N2 group are approximated and quantized. In this way, the errors caused by approximation and quantization and the resulting loss of precision can be compensated. And N1 groups due to quantizationThe initial error was small so that the N2 group could converge quickly upon retraining.
And (3) retraining, namely, approximating and quantizing the weight parameters of the N1 group, and then training the weight parameters of the N2 group according to the accuracy loss caused to the model by approximating and quantizing all the weight parameters of the N1 group so as to make up the lost accuracy.
Therefore, the approximate weight parameters of the N1 group and the N2 group are sequentially quantized to at least two powers of 2, so that on one hand, the error caused by quantization can be reduced, and on the other hand, compared with the approximate weight parameters quantized to the power of 2, the embodiment of the application can start from an initial value with smaller error, so that the convergence of subsequent reselection can be faster, and the loss precision can be compensated more easily.
It should be noted that the weight parameter set W here includes N1 groups and N2 groups, which are merely examples, and the embodiments of the present application are not limited thereto. For example, the weight parameter set W may include a plurality of sets (two or more sets), and after the weight parameters of the first set are approximated and quantized, the weight parameters of the first set are fixed, and the weight parameters of the remaining sets are retrained to compensate for the accuracy loss caused by the approximation and quantization of the weight parameters. And then, approximating and quantizing the weight parameters of the second group, fixing the weight parameters of the second group after quantization, and retraining the weight parameters of the later groups to make up for the precision loss caused by approximation and quantization of the weight parameters. And so on. Fig. 8 shows a process flow diagram of the weight parameter. The condition for stopping the iteration may be set in advance, for example, 10% of the weight parameters in the weight parameter set may be approximated and quantized first, then 30% of the weight parameters in the remaining group may be approximated and quantized in turn, and so on until the iteration is finished.
Optionally, | W1min|≥|W2maxL, wherein W1minIs the weight parameter with the smallest absolute value in the N1 first weight parameters, W2maxThe weight parameter with the largest absolute value is the weight parameter with the largest absolute value in the N2 fourth weight parameters.
Specifically, when the weight parameter sets W are grouped, the grouping may be performed according to the degree of importance of the weight parameters. As mentioned above, the importance of the weight parameter can be determined by the magnitude of the weightTo measure. A larger magnitude means that the weight parameter contributes more, i.e. is more important, to the neural network. The maximum amplitude w _ max is max (abs (w)). W1minIs the smallest amplitude weight parameter, W, of the N1 first weight parameters2maxThe largest weight parameter in the N2 fourth weight parameters is used. That is, the weight parameters of the N1 group are more important than the weight parameters of the N2 group. Therefore, in grouping, the weight parameters with high amplitude can be used as the first group, and then the analogy is repeated, and the last group is the weight parameters with minimum amplitude. The grouping formula can ensure the precision loss caused by the quantization of the weight parameter which has the largest contribution to the neural network.
Optionally, the first number m is the number WmaxIs multiplied by a second numerical value c determined from the errors between the N1 first weight parameters and the N1 third weight parameters.
The second value c may be used as a fine tuning factor to adjust for the loss of accuracy caused during the weight parameter approximation.
Specifically, the parameters A, B are predetermined. Where a is the number of power-added terms of 2 and a is an integer no less than 2, in other words, the weight parameter is approximated as the product of a constant (i.e., an example of the first value m) and a power-added terms of 2. B is the bit width of the weight magnitude. Wherein, optionally, 1< a < ═ 3. A cannot be too large, and when A is too large, the adder required by the embodiment of the application is increased by too much, so that the adder becomes a calculation bottleneck. Optionally, B ═ 4. If B is too large, the bit width of the weight parameter is increased, and the purpose of reducing the resource occupied by the storage weight cannot be achieved. Typically A (B +1) <16, the guaranteed weight bit width is less than the half-floating point bit width (16 bits). Each power of 2 term is represented by a bybit, an approximation value, and a 1bit sign bit. Because of the quantization to A addition terms, A (B +1) bits are required. The smaller A (B +1), the smaller the required storage space. When B >4, the approximation error is smaller, but the overall accuracy is not improved much. B <4 the approximation error is too large. Therefore, in the embodiment of the present application, when the weight parameter is quantized into a additive terms, the weight parameter in the embodiment of the present application can be quantized into a (B +1) bits, and the power term of 2 that can be quantized is more than the AB bits, thereby reducing the error.
The constant is denoted as Wconst (i.e., an example of m), and Wconst is c × w _ max. Where c is a factor for fine-tuning the approximation error (i.e., an example of the second value c). As described above, when c is 1.0, the magnitude of the weight parameter is approximately 1 at maximum; when c is 2.0, the maximum amplitude of the weight parameter is approximately 2-1. To ensure reasonable distribution of weight parameters, c is generally selected to be equal to [1.0,2.0]]。
The value of c can be determined by error. Specifically, a value of c is arbitrarily taken within the range of [1.0,2.0], and for example, the value of c may be taken in steps of 0.1 (that is, c is 1.0,1.1,1.2, … …, 2.0). And calculating the error of w and w every time taking a c value, and taking the c value when the average error is minimum. Optionally, empirically, the error is at least e-2. Taking the weighting parameters of the N1 group as an example, w is the first weighting parameter, and w is the third weighting parameter, i.e., the c value is determined according to the error between w and w. Optionally, all w in the N1 group may be traversed, or alternatively, a portion w may be traversed, which is not limited herein.
It should be understood that when any value w is taken within the range of [1.0,2.0], the value w may be taken randomly or based on a step size, and the embodiment of the present application is not limited thereto.
According to the embodiment of the application, a fine adjustment factor c is used for adjusting or approximating the weight parameters, so that the great precision loss can be avoided as much as possible.
Optionally, after the quantizing the N1 second weight parameters to the sum of at least two powers Q of 2 respectively to obtain N1 third weight parameters, the method further includes determining a weight index value list, where the weight index value list is used to characterize N1 binary numbers corresponding to the N1 third weight parameters respectively, and a sign bit of each of the third weight parameters.
In the embodiment of the application, all the additional terms share one codebook, and the codebook does not contain 0 value. The value infinitely close to 0 in a real process is approximately 2-MWherein M is 2B-1。
The above-described process is described next with a specific example. Assuming that Wconst is determined to be 1.04, a is determined to be 2, B is determined to be 4, and the weighting parameter w is determined to be 0.650728, the specific processing procedure is as follows:
first, approximate weight parameters: w is w/Wconst 0.6257, w is the approximate weight of w.
Then, w is approximated as 2 power-of-2 addition terms. The index value of the first addition is-1 according to the logarithm of w, and the second addition is w-2-10.1257, the list of index values list1 of power term of-3, 2 is {1,3}, and each index value in list1 can be represented by B-bit binary. The sign bit list2 is {1,1}, the sign bit 1 represents a positive number, and 0 represents a negative number, i.e., both power of 2 terms are positive numbers.
After the weight approximation, the error after quantization is: 0.6257- (2-1+2-3)=0.0007。
Therefore, assuming that Wconst is 1.04, a is 2, B is 4, and the weighting parameter w is 0.650728 is approximately 10-bit binary number, the data structure is as shown in fig. 9. Fig. 9 includes a first additional term and a second additional term, where the portion of the two additional terms that is not bolded (i.e., the first bit) represents the sign bit of each additional term, 1 represents a positive number, and 0 represents a negative number. The bolded portion of the two additional entries (i.e., the last four bits) represents the index value of each additional entry. Wherein the first addition index value is 1(0001) corresponding to 2-1(ii) a The second value of the addition index is 3(0011), corresponding to 2-3
The embodiment of the application shares one codebook, and the symbol items are independently represented, so that the error can be indirectly reduced.
The embodiment of the application is applied to image processing, and particularly, a convolutional neural network for image super-resolution can ensure that the precision is basically lossless to a great extent. Table 3 shows a comparison of PSNR when the image processing method based on the convolutional neural network model proposed in the embodiment of the present application and ShiftCNN are both used in the first network.
TABLE 3 comparison of PSNR between the technical solution of the present application and shiftCNN
Figure BDA0001607734340000111
As can be seen from table 3, when N is 2 or N is 3 and B is 4, the computational complexity of the convolutional neural network can be significantly reduced. The smaller N, the smaller the computational complexity and the smaller the bit width representing the weight parameter.
As can be seen from table 3, by using the weight parameter processing method of the present application, the PSNR is basically lossless while the computation complexity of the super-resolution network is reduced. When N is 2 and B is 4, the calculation complexity is smaller, the storage weight bit width is smaller (10 bits), and the PSNR is basically lossless. Wherein, as mentioned above, the first network is assumed to be a convolutional neural network for image super-resolution. The following four columns in table 3 are comparisons of the PSNR and the computational resources after compression and acceleration of the first network using the scheme and shiftCNN of the present application, respectively. Here PSNR is the statistics of the 3-fold super resolution results on the SET5 data SET used for the image super resolution algorithm test.
Fig. 10 and 11 are graphs showing the effect of the two images in the SET5 passing through the first network and the 3-fold super-resolution of the convolution neural network model image processing method of the present application when N is 2, B is 4, and N is 3, B is 4. It can be seen that the subjective effect of the enlargement obtained by the compression method of the present application ([ N-2, B-4 ], [ N-3, B-4 ]) is comparable to the enlargement of the original uncompressed (first network).
Therefore, it can be seen that, by the method of the embodiment of the present application, it can be ensured that performance degradation is not significant on the super-resolution network while obtaining large compression and acceleration gains.
Optionally, processing the image to be processed according to the N1 third weight parameters to obtain an output image, including: and carrying out shifting and adding operation on the image to be processed according to the N1 binary numbers in the weight index value list, and multiplying the image to be processed by the first numerical value m to obtain an output image.
And carrying out shifting and adding operations according to binary numbers representing weight parameters in the weight index value list, and multiplying the obtained result by the first numerical value m to obtain an output image. Next, a forward inference process is described. The parameters of each layer of the convolutional neural network model include constant factors (i.e., an instance of the first value m), weight parameter index values, index tables, and offsets. The data result of the index value of the weight parameter is shown in fig. 9, and is not described herein again. The index table is shown in table 4.
Table 4 index table
Figure BDA0001607734340000121
It can be seen that the power of 2 term of the last bit is 2-MIn the process of quantizing the weight parameter, the power term of 2 of the last bit is compared with 2-MSmall, then the term is approximately 2-M
The following describes a specific calculation procedure with reference to fig. 12, taking the convolutional layer as an example.
First, a feature image is input, shifted and added according to an index, and multiplied by a constant factor. The input characteristic image (i.e., an example of an image to be processed) is an input original image.
The resulting result is then added to the offset to obtain the output feature image for that layer.
The specific calculation formula is as follows:
Y=WX+b
Figure BDA0001607734340000122
wherein, W and X are respectively one element of W and X, W is a two-dimensional matrix formed by ownership weight values of the layer, and X is a two-dimensional matrix formed by input characteristic images. According to the binary number in the weight index value list, the process of shifting and adding the image to be processed is carried out, namely:
Figure BDA0001607734340000123
and (4) calculating.
It represents x shifted to the right by bi and then the terms are added.
Note that, in fig. 12, the shift and addition calculation may be performed after multiplying by a constant factor. Specifically, the shift and add operations are shown in fig. 13.
The embodiment of the application combines the power-of-2 decomposition idea of the decimal in a computer and the grouping parameter quantification idea of INQ (inverse quantization) to provide an image processing method based on a convolutional neural network model, wherein a constant is firstly used for approximating the weight parameters, the precision is ensured not to be lost too much in the approximation process, and then the approximated weight parameters are quantized into at least two powers of 2, so that the multiplication calculation amount can be reduced. And furthermore, by grouping and retraining the following groups of weight parameters to make up for the small initial error caused by the first group of quantization, the error can be made up under the condition of fast convergence, so that the error is further reduced, the precision loss is reduced, and the condition that the performance reduction is not obvious on the super-resolution network is ensured while the larger compression and acceleration gains are obtained.
It should be understood that the above examples of fig. 1 to 13 are only for assisting the skilled person in understanding the embodiments of the present application, and are not intended to limit the embodiments of the present application to the specific values or specific scenarios illustrated. It will be apparent to those skilled in the art that various equivalent modifications or variations are possible in light of the examples given in fig. 1-13, and such modifications or variations are intended to be included within the scope of the embodiments of the present application.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The image processing method based on the convolutional neural network model according to the embodiment of the present application is described in detail above with reference to fig. 1 to 13, and the image processing apparatus based on the convolutional neural network model according to the embodiment of the present application is described below with reference to fig. 14 and 15.
FIG. 14 is a schematic block diagram of an image processing apparatus based on a convolutional neural network model according to one embodiment of the present application. It should be understood that the image processing device based on the convolutional neural network model (hereinafter referred to as "image processing device") in the embodiments of the present application may be various devices requiring an image super-resolution function, such as a medical image processing device, a cloud domain, a security inspection domain, a satellite remote sensing image analysis domain, a surveying and mapping domain, a monitoring domain, a television domain, a movie domain, an electronic product domain, and the like, including but not limited to a medical image processing device, a cloud service or cloud computing device (e.g., a cloud service platform), a security inspection device, a satellite remote sensing image analysis device, a surveying and mapping device, a monitoring device, a television (e.g., a digital High Definition Television (HDTV)), a movie playback device, a smart phone, a camera, a video camera, a computing device, a vehicle-mounted device, a wearable device, an unmanned aerial vehicle device, and the like, and the embodiments of the present application are not limited thereto, as long as the image processing apparatus can realize a method of super-resolution of images. Alternatively, the image processing apparatus may also be a chip.
The image processing apparatus 1400 shown in fig. 14 includes: a processing unit 1410 and a storage unit 1420.
A neural network layer of the convolutional neural network model comprises at least one convolutional layer and/or at least one fully-connected layer, in particular, the storage unit 1420 is configured to store the convolutional neural network model, and the processing unit 1410 is configured to:
acquiring a first weight parameter set corresponding to the neural network layer, wherein the first weight parameter set comprises N1 first weight parameters, and N1 is an integer greater than or equal to 1; respectively calculating the ratio of the N1 first weight parameters to a first value m to obtain N1 second weight parameters, wherein m is more than or equal to | Wmax |, and m is less than or equal to 2| Wmax |, and Wmax is the weight parameter with the maximum absolute value in the first weight parameter set; respectively quantizing the N1 second weight parameters into the sum of at least two Q powers of 2 to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer; acquiring an image to be processed; and processing the image to be processed according to the N1 third weight parameters to obtain an output image.
Optionally, as another embodiment, the first weight parameter set further includes N2 fourth weight parameters, where N2 is an integer greater than or equal to 1; the processing unit 1410 is specifically configured to: training the N2 fourth weight parameters according to the quantized quantization result to obtain N2 fifth weight parameters; respectively calculating the ratio of the N2 fifth weight parameters to the first number m to obtain N2 sixth weight parameters; respectively quantizing the N2 sixth weight parameters into the sum of at least two 2P powers to obtain N2 seventh weight parameters, wherein P is less than or equal to 0 and is an integer; processing the image to be processed according to the N1 third weight parameters to obtain an output image, including: and processing the image to be processed according to the N1 third weight parameters and the N2 seventh weight parameters to obtain an output image.
According to the embodiment of the application, after N1 groups of weight parameters are approximated and quantized, before N2 groups of weight parameters are approximated and quantized, the weight parameters of the N2 groups are retrained to make up for the precision loss caused by the approximation and quantization of the weight parameters, so that the overall error can be reduced, and the precision can be improved.
Alternatively, as another embodiment, | W1min|≥|W2maxL, wherein W1minIs the weight parameter with the smallest absolute value in the N1 first weight parameters, W2maxThe weight parameter with the largest absolute value is the weight parameter with the largest absolute value in the N2 fourth weight parameters.
In the embodiment of the present application, the larger the absolute value of the weight parameter is, the larger the contribution to the neural network is. The weight parameters of the important group are firstly approximated and quantized, and then the weight parameters of the secondary important group are retrained to make up the precision loss caused by the weight parameters of the important group, so that the integral error in the convolutional neural network can be further ensured, and the precision loss is reduced.
Optionally, as another embodiment, the first number m is the number WmaxIs multiplied by a second numerical value c determined from the errors between the N1 first weight parameters and the N1 third weight parameters.
Based on the error of the first weight parameter and the third weight parameter to determineC, further according to c and WmaxM is determined, so that not only can the error of the weight parameter with the largest absolute value be preferentially ensured, but also the error caused by approximation can be reduced by determining c according to the error.
Optionally, as another embodiment, after the quantizing the N1 second weight parameters into a sum of at least two powers Q of 2, respectively, to obtain N1 third weight parameters, the processing unit 1410 is further configured to: determining a weight index value list, wherein the weight index value list is used for representing N1 binary numbers corresponding to the N1 third weight parameters respectively, and the sign bit of each third weight parameter.
Optionally, as another embodiment, the processing unit 1410 is specifically configured to: and carrying out shifting and adding operation on the image to be processed according to the N1 binary numbers in the weight index value list, and multiplying the image to be processed by the first numerical value m to obtain an output image.
Optionally, the image processing apparatus shown in fig. 14 may further include a transceiver unit, which may also be referred to as a communication unit, for example, a network interface, and the processing unit may obtain the image to be processed through the transceiver unit, which is not limited in this embodiment of the application.
It should be understood that the image processing apparatus shown in fig. 14 can implement the processes in the method embodiments of fig. 1 to 13, and the operations and/or functions of the modules in the image processing apparatus 1400 may specifically refer to the descriptions in the method embodiments described above in order to implement the corresponding flows in the method embodiments of fig. 1 to 13, respectively, and the detailed descriptions are appropriately omitted here to avoid repetition.
Fig. 15 shows a schematic block diagram of an image processing apparatus 1500 based on a convolutional neural network model according to an embodiment of the present application. Specifically, as shown in fig. 15, the apparatus 1500 of image processing includes: the image processing apparatus 1500 further includes a transceiver 1530, and the transceiver 1530 is connected to the processor 1510, wherein the processor 1510, the transceiver 1530 and the memory 1520 communicate with each other and transmit control and/or data signals through an internal connection path.
The transceiver 1530 may be a component of an input unit, a communication unit, a network interface, etc., and the processor 1510 may acquire an image through the transceiver 1530. The memory 1520 may be used for storing data, for example, a convolutional neural network model, an index value, and the memory 1520 may also be used for storing instructions, the processor 1510 is used for executing the instructions stored in the memory 1520 and controlling the transceiver 1530 to transmit and receive information or signals, and the processor 1510 can execute the instructions in the memory 1520 to implement the processes in the embodiments of the methods in fig. 1 to 13. To avoid repetition, further description is omitted here.
It is to be understood that the apparatus 1500 for image processing may correspond to the apparatus 1400 for image processing in fig. 14 described above, that the functions of the processing unit 1410 in the apparatus 1400 for image processing may be implemented by the processor 1510, the functions of the transceiving unit may be implemented by the transceiver 1530, and the functions of the storage unit 1420 may be implemented by the memory 1520. To avoid repetition, detailed description is appropriately omitted here.
It is to be understood that the image processing apparatus 1500 may also comprise other components, for example a display unit, for example a display. The display unit may display the super-resolution image to a user. It will be understood by those skilled in the art that the configuration of the image processing apparatus shown in the figures is not intended to limit the present application, and may be a bus configuration, a star configuration, a combination of more or fewer components than those shown, or a different arrangement of components. In the embodiments of the present application.
The input unit is used for realizing the interaction of a user and the image processing device and/or inputting information into the mobile terminal. For example, the input unit may receive numeric or character information input by a user to generate a signal input related to user setting or function control. In the embodiments of the present application, the input unit may be a touch panel, other human-computer interaction interfaces, such as an entity input key and a microphone, or other external information capturing devices, such as a camera. A touch panel, also referred to as a touch screen or touch screen, may collect an operation action on which a user touches or approaches. For example, the user uses any suitable object or accessory such as a finger, a stylus, etc. to operate on or near the touch panel, and drives the corresponding connection device according to a preset program. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects touch operation of a user, converts the detected touch operation into an electric signal and transmits the electric signal to the touch controller; the touch controller receives the electrical signal from the touch sensing device and converts it to touch point coordinates, which are then fed to the processing unit. The touch controller can also receive and execute commands sent by the processing unit. In addition, the touch panel may be implemented in various types, such as resistive, capacitive, Infrared (Infrared), and surface acoustic wave. In other embodiments of the present application, the physical input keys employed by the input unit may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. An input unit in the form of a microphone may collect speech input by a user or the environment and convert it into commands executable by the processing unit in the form of electrical signals.
The display may also be referred to as an output unit, which may include, but is not limited to, an image output unit, a sound output unit, and a tactile output unit. The image output unit is used for outputting characters, pictures and/or videos. The image output unit may include a Display panel, such as a Display panel configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), a Field Emission Display (FED), and the like. Alternatively, the image output unit may include a reflective display, such as an electrophoretic (electrophoretic) display, or a display using an Interferometric Modulation of Light (Interferometric Modulation). The image output unit may include a single display or a plurality of displays of different sizes. In an embodiment of the present invention, the touch panel used in the input unit can also be used as a display panel of the output unit. For example, when the touch panel detects a gesture operation of touch or proximity thereon, the gesture operation is transmitted to the processing unit to determine the type of the touch event, and then the processing unit provides a corresponding visual output on the display panel according to the type of the touch event. Although the input unit and the output unit are two independent components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel and the display panel may be integrated to implement the input and output functions of the mobile terminal. For example, the image output unit may display various Graphical User Interfaces (GUIs) as virtual control components, including but not limited to windows, scroll shafts, icons, and scrapbooks, for a user to operate in a touch manner.
The processor is a control center of the apparatus for super-resolution image reconstruction, connects various parts of the entire apparatus for super-resolution image reconstruction using various interfaces and lines, and performs various functions of the apparatus for super-resolution image reconstruction and/or processes data by running or executing software programs and/or modules stored in the memory and calling data stored in the memory.
It should be noted that the processor (e.g., the processor in fig. 15) in the embodiment of the present application may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It will be appreciated that the memory in the embodiments of the subject application (e.g., the memory in FIG. 15) can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The program includes any application program installed on the image processing apparatus, including but not limited to an application that can perform image super-resolution, or an application associated with the application of image super-resolution, such as an image for receiving the application of image super-resolution, an application that processes the image, and the like.
The embodiment of the application also provides a processing device, which comprises a processor and an interface; the processor is used for executing the image processing method based on the convolutional neural network model in any method embodiment.
It should be understood that the processing means may be a chip. For example, the processing device may be a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), a system on chip (SoC), a Central Processing Unit (CPU), a Network Processor (NP), a digital signal processing circuit (DSP), a Microcontroller (MCU), a Programmable Logic Device (PLD), or other integrated chips.
The embodiment of the present application further provides a platform system, which includes the aforementioned image processing apparatus.
The embodiments of the present application also provide a computer-readable medium, on which a computer program is stored, which, when executed by a computer, implements the method of any of the above-mentioned method embodiments.
The embodiment of the present application further provides a computer program product, and the computer program product implements the method of any one of the above method embodiments when executed by a computer.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions described in accordance with the embodiments of the present application occur in whole or in part when the computer instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Video Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
It should be understood that the reference herein to first, second, third, fourth, and various numerical designations is merely for ease of description and distinction and is not intended to limit the scope of the embodiments of the present application.
As used in this specification, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from two components interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (13)

1. An image processing method based on a convolutional neural network model, wherein one neural network layer of the convolutional neural network model comprises at least one convolutional layer and/or at least one fully-connected layer, the method comprising:
acquiring a first weight parameter set corresponding to the neural network layer, wherein the first weight parameter set comprises N1 first weight parameters, and N1 is a positive integer;
respectively calculating the ratios of the N1 first weight parameters to the first value m to obtain N1 second weight parameters, wherein | Wmax|≤m≤2|Wmax|,WmaxThe weight parameter with the maximum absolute value in the first weight parameter set is obtained;
respectively quantizing the N1 second weight parameters into the sum of at least two Q powers of 2 to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer;
acquiring an image to be processed;
and processing the image to be processed according to the N1 third weight parameters to obtain an output image.
2. The method of claim 1, wherein the first set of weight parameters further comprises N2 fourth weight parameters, wherein N2 is an integer greater than or equal to 1;
after the quantizing the N1 second weight parameters into the sum of at least two powers Q of 2, respectively, to obtain N1 third weight parameters, the method includes:
training the N2 fourth weight parameters according to the quantized quantization result to obtain N2 fifth weight parameters;
respectively calculating the ratios of the N2 fifth weight parameters to the first value to obtain N2 sixth weight parameters;
respectively quantizing the N2 sixth weight parameters into the sum of at least two 2P powers to obtain N2 seventh weight parameters, wherein P is less than or equal to 0 and is an integer;
the processing the image to be processed according to the N1 third weight parameters to obtain an output image, including:
and processing the image to be processed according to the N1 third weight parameters and the N2 seventh weight parameters to obtain an output image.
3. The method of claim 2, wherein | W1min|≥|W2maxL, wherein W1minIs the weight parameter with the smallest absolute value in the N1 first weight parameters, W2maxThe weight parameter with the largest absolute value is the weight parameter with the largest absolute value in the N2 fourth weight parameters.
4. The method of any one of claims 1 to 3, wherein the first value is the WmaxIs multiplied by a second numerical value c determined from the errors between the N1 first weight parameters and the N1 third weight parameters.
5. The method according to any one of claims 1 to 3, wherein after the quantizing the N1 second weight parameters to a sum of at least two powers of Q of 2 respectively to obtain N1 third weight parameters, the method further comprises determining a weight index value list for characterizing N1 binary numbers corresponding to the N1 third weight parameters respectively and a sign bit of each of the third weight parameters.
6. The method according to claim 5, wherein processing the image to be processed according to the N1 third weight parameters to obtain an output image comprises:
and carrying out shifting and adding operation on the image to be processed according to the N1 binary numbers in the weight index value list, and multiplying the image to be processed by the first numerical value m to obtain an output image.
7. An image processing apparatus based on a convolutional neural network model, wherein one neural network layer of the convolutional neural network model includes at least one convolutional layer and/or at least one fully-connected layer, the apparatus comprising:
a processing unit and a storage unit, wherein,
the storage unit is used for storing a convolutional neural network model, and the processing unit is used for:
acquiring a first weight parameter set corresponding to the neural network layer, wherein the first weight parameter set comprises N1 first weight parameters, and N1 is an integer greater than or equal to 1;
respectively calculating the ratio of the N1 first weight parameters to the first value m to obtain N1 second weight parameters, wherein | Wmax|≤m≤2|Wmax|,WmaxThe weight parameter with the maximum absolute value in the first weight parameter set is obtained;
respectively quantizing the N1 second weight parameters into the sum of at least two Q powers of 2 to obtain N1 third weight parameters, wherein Q is less than or equal to 0 and is an integer;
acquiring an image to be processed;
and processing the image to be processed according to the N1 third weight parameters to obtain an output image.
8. The apparatus of claim 7, wherein the first set of weight parameters further comprises N2 fourth weight parameters, wherein N2 is an integer greater than or equal to 1;
the processing unit is specifically configured to:
training the N2 fourth weight parameters according to the quantized quantization result to obtain N2 fifth weight parameters;
respectively calculating the ratio of the N2 fifth weight parameters to the first number m to obtain N2 sixth weight parameters;
respectively quantizing the N2 sixth weight parameters into the sum of at least two 2P powers to obtain N2 seventh weight parameters, wherein P is less than or equal to 0 and is an integer;
the processing the image to be processed according to the N1 third weight parameters to obtain an output image, including:
and processing the image to be processed according to the N1 third weight parameters and the N2 seventh weight parameters to obtain an output image.
9. The apparatus of claim 8, wherein:
|W1min|≥|W2maxl, wherein W1minIs the weight parameter with the smallest absolute value in the N1 first weight parameters, W2maxThe weight parameter with the largest absolute value is the weight parameter with the largest absolute value in the N2 fourth weight parameters.
10. The apparatus according to any one of claims 7 to 9, wherein the first value m is the WmaxIs multiplied by a second numerical value c determined from the errors between the N1 first weight parameters and the N1 third weight parameters.
11. The apparatus according to any of claims 7 to 9, wherein after said quantizing said N1 second weight parameters to a sum of at least two powers Q of 2, respectively, to obtain N1 third weight parameters, said processing unit is further configured to: determining a weight index value list, wherein the weight index value list is used for representing N1 binary numbers corresponding to the N1 third weight parameters respectively, and the sign bit of each third weight parameter.
12. The apparatus according to claim 11, wherein the processing unit is specifically configured to:
and carrying out shifting and adding operation on the image to be processed according to the N1 binary numbers in the weight index value list, and multiplying the image to be processed by the first numerical value m to obtain an output image.
13. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 6.
CN201810250867.0A 2018-03-26 2018-03-26 Image processing method and device based on convolutional neural network model Active CN110363279B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810250867.0A CN110363279B (en) 2018-03-26 2018-03-26 Image processing method and device based on convolutional neural network model
PCT/CN2019/079281 WO2019184823A1 (en) 2018-03-26 2019-03-22 Convolutional neural network model-based image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810250867.0A CN110363279B (en) 2018-03-26 2018-03-26 Image processing method and device based on convolutional neural network model

Publications (2)

Publication Number Publication Date
CN110363279A CN110363279A (en) 2019-10-22
CN110363279B true CN110363279B (en) 2021-09-21

Family

ID=68059262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810250867.0A Active CN110363279B (en) 2018-03-26 2018-03-26 Image processing method and device based on convolutional neural network model

Country Status (2)

Country Link
CN (1) CN110363279B (en)
WO (1) WO2019184823A1 (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230273826A1 (en) * 2019-10-12 2023-08-31 Shenzhen Corerain Technologies Co., Ltd. Neural network scheduling method and apparatus, computer device, and readable storage medium
CN110852348B (en) * 2019-10-18 2022-09-30 北京迈格威科技有限公司 Feature map processing method, image processing method and device
CN110782021B (en) * 2019-10-25 2023-07-14 浪潮电子信息产业股份有限公司 Image classification method, device, equipment and computer readable storage medium
CN110852361B (en) * 2019-10-30 2022-10-25 清华大学 Image classification method and device based on improved deep neural network and electronic equipment
CN111144511B (en) * 2019-12-31 2020-10-20 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN113112009B (en) * 2020-01-13 2023-04-18 中科寒武纪科技股份有限公司 Method, apparatus and computer-readable storage medium for neural network data quantization
CN113111997B (en) * 2020-01-13 2024-03-22 中科寒武纪科技股份有限公司 Method, apparatus and related products for neural network data quantization
CN111401518A (en) * 2020-03-04 2020-07-10 杭州嘉楠耘智信息科技有限公司 Neural network quantization method and device and computer readable storage medium
CN111488476B (en) * 2020-04-03 2023-06-27 北京爱芯科技有限公司 Image pushing method, model training method and corresponding devices
CN111401477B (en) * 2020-04-17 2023-11-14 Oppo广东移动通信有限公司 Image processing method, apparatus, electronic device, and computer-readable storage medium
CN113468935B (en) * 2020-05-08 2024-04-02 上海齐感电子信息科技有限公司 Face recognition method
CN111914996A (en) * 2020-06-30 2020-11-10 华为技术有限公司 Method for extracting data features and related device
CN113919405B (en) * 2020-07-07 2024-01-19 华为技术有限公司 Data processing method and device and related equipment
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN112101543A (en) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 Neural network model determination method and device, electronic equipment and readable storage medium
US20220114413A1 (en) * 2020-10-12 2022-04-14 Black Sesame International Holding Limited Integer-based fused convolutional layer in a convolutional neural network
CN114554205B (en) * 2020-11-26 2023-03-10 华为技术有限公司 Image encoding and decoding method and device
CN112598093A (en) * 2020-12-18 2021-04-02 湖南特能博世科技有限公司 Legend complexity ordering method, legend matching method, device and computer equipment
CN112633477A (en) * 2020-12-28 2021-04-09 电子科技大学 Quantitative neural network acceleration method based on field programmable array
CN112784885B (en) * 2021-01-11 2022-05-24 腾讯科技(深圳)有限公司 Automatic driving method, device, equipment, medium and vehicle based on artificial intelligence
CN113284201B (en) * 2021-05-27 2022-08-26 杭州睿影科技有限公司 Security check image generation method, security check system and storage medium
CN113469326B (en) * 2021-06-24 2024-04-02 上海寒武纪信息科技有限公司 Integrated circuit device and board for executing pruning optimization in neural network model
CN113949592B (en) * 2021-12-22 2022-03-22 湖南大学 Anti-attack defense system and method based on FPGA

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778242A (en) * 2015-04-09 2015-07-15 复旦大学 Hand-drawn sketch image retrieval method and system on basis of image dynamic partitioning
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN107211133A (en) * 2015-11-06 2017-09-26 华为技术有限公司 Method, device and the decoding device of inverse quantization conversion coefficient
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
US9916531B1 (en) * 2017-06-22 2018-03-13 Intel Corporation Accumulator constrained quantization of convolutional neural networks
CN107832835A (en) * 2017-11-14 2018-03-23 贵阳海信网络科技有限公司 The light weight method and device of a kind of convolutional neural networks

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886663B2 (en) * 2013-10-08 2018-02-06 Qualcomm Incorporated Compiling network descriptions to multiple platforms
CN106066783A (en) * 2016-06-02 2016-11-02 华为技术有限公司 The neutral net forward direction arithmetic hardware structure quantified based on power weight
CN109284130B (en) * 2017-07-20 2021-03-23 上海寒武纪信息科技有限公司 Neural network operation device and method
CN107395211B (en) * 2017-09-12 2020-12-01 苏州浪潮智能科技有限公司 Data processing method and device based on convolutional neural network model
CN107832832A (en) * 2017-10-19 2018-03-23 珠海格力电器股份有限公司 The pond operation method and device of convolutional neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778242A (en) * 2015-04-09 2015-07-15 复旦大学 Hand-drawn sketch image retrieval method and system on basis of image dynamic partitioning
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN107211133A (en) * 2015-11-06 2017-09-26 华为技术有限公司 Method, device and the decoding device of inverse quantization conversion coefficient
US9916531B1 (en) * 2017-06-22 2018-03-13 Intel Corporation Accumulator constrained quantization of convolutional neural networks
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN107832835A (en) * 2017-11-14 2018-03-23 贵阳海信网络科技有限公司 The light weight method and device of a kind of convolutional neural networks

Also Published As

Publication number Publication date
CN110363279A (en) 2019-10-22
WO2019184823A1 (en) 2019-10-03

Similar Documents

Publication Publication Date Title
CN110363279B (en) Image processing method and device based on convolutional neural network model
US20210125070A1 (en) Generating a compressed representation of a neural network with proficient inference speed and power consumption
CN108022212B (en) High-resolution picture generation method, generation device and storage medium
US11755901B2 (en) Dynamic quantization of neural networks
CN109002889B (en) Adaptive iterative convolution neural network model compression method
US11249721B2 (en) Multiplication circuit, system on chip, and electronic device
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
US20220329807A1 (en) Image compression method and apparatus thereof
CN110663048A (en) Execution method, execution device, learning method, learning device, and program for deep neural network
CN112508125A (en) Efficient full-integer quantization method of image detection model
US7379500B2 (en) Low-complexity 2-power transform for image/video compression
CN114418121A (en) Model training method, object processing method and device, electronic device and medium
CN114612996A (en) Method for operating neural network model, medium, program product, and electronic device
CN112748899A (en) Data processing method and related equipment
CN114730367A (en) Model training method, device, storage medium and program product
CN110651273B (en) Data processing method and equipment
US10291911B2 (en) Classes of tables for use in image compression
US11861452B1 (en) Quantized softmax layer for neural networks
CN114841325A (en) Data processing method and medium of neural network model and electronic device
CN113313253A (en) Neural network compression method, data processing device and computer equipment
Madadum et al. A resource-efficient convolutional neural network accelerator using fine-grained logarithmic quantization
CN110135568B (en) Full-integer neural network method applying bounded linear rectification unit
CN113255576B (en) Face recognition method and device
CN112766472B (en) Data processing method, device, computer equipment and storage medium
CN116149597A (en) Data processing method, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant