CN114065904A

CN114065904A - Neural network model quantification method and device

Info

Publication number: CN114065904A
Application number: CN202010787932.0A
Authority: CN
Inventors: 徐兵; 张楠赓
Original assignee: Canaan Bright Sight Co Ltd
Current assignee: Beijing Sisheng Technology Co.,Ltd.
Priority date: 2020-08-07
Filing date: 2020-08-07
Publication date: 2022-02-18
Also published as: WO2022027862A1

Abstract

The application discloses a neural network model quantification method and device. The method comprises the following steps: processing the multi-frame image by utilizing the convolutional layer in the neural network model to obtain the nth data statistical information and the (N + 1) th data statistical information; obtaining an Nth quantization parameter by using the Nth data statistical information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information; obtaining the total quantization parameter updated for the Nth time according to the quantization parameter for the Nth time and the quantization parameter for the (N + 1) th time; under the condition that the updating times N of the totalization parameters reach the corresponding threshold value, processing the images acquired in the current period by using the convolutional layer to obtain activation output data; and performing quantitative calculation on the activation output data based on the total quantization parameter updated for the Nth time to obtain quantized data. The quantization performance and the scene adaptability are improved, the cache space is saved, and the calculation complexity of the next convolution layer is reduced.

Description

Neural network model quantification method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to the field of neural network model quantification.

Background

The main method of artificial intelligence is deep learning, i.e. a deep neural network is trained by adopting a large amount of data, and then the deep neural network is used for analyzing and predicting problems and other functions. The deep neural network is often composed of dozens or even hundreds of convolutional layers, and the feature mapping generated in the calculation process needs to occupy a large amount of storage space. The network scale is reduced through modes of compression, encoding and the like, and the method has very important significance for reducing the storage of feature mapping and improving the practical value of deep learning. Among them, quantization is one of the most widely adopted compression methods. Quantization is to convert the floating point algorithm of the neural network into a fixed point.

Edge computing chips typically employ fixed point quantization methods. The intermediate data of the activation function needs to be stored and read and written once the convolution layer is calculated, and only the intermediate data is quantized, the storage space and the data read-write bandwidth can be reduced as much as possible, so that the cost and the power consumption of the edge computing chip are reduced. Currently, there are two methods for quantizing the intermediate data of the convolutional layer: one is to count the distribution range of the intermediate data of each convolution layer in real time and perform quantization according to the counted distribution range of the intermediate data. The other method is to count the distribution condition of the intermediate data of each convolution layer required to be realized by utilizing a certain amount of sample data sets, and determine a quantization mode according to the distribution condition of the intermediate data. However, in the first quantization method, in order to find the distribution range of the intermediate data of each convolution layer, the intermediate data with a wider bit width needs to be buffered, and the quantization can be performed again after the distribution range is counted, so that a larger data buffer space is still needed, or if the data buffer space is reduced, the calculation complexity is increased when the intermediate data is input into the next convolution layer for calculation. The second quantization method depends on the sample image set, and the data distribution of the sample image set may be greatly different from the data distribution of the actual working scene, so that the quantization parameter calculated according to the sample image set is not suitable for the actual working scene, and the quantization performance is poor.

Disclosure of Invention

The embodiment of the application provides a neural network model compression method and device, which are used for solving the problems in the related technology and have the following technical scheme:

in a first aspect, an embodiment of the present application provides a neural network model quantization method, including:

processing the multi-frame image by utilizing a convolutional layer in the neural network model to obtain nth data statistical information and (N + 1) th data statistical information, wherein N is greater than or equal to 1;

obtaining an Nth quantization parameter by using the Nth data statistical information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information;

obtaining the total quantization parameter updated for the Nth time according to the quantization parameter for the Nth time and the quantization parameter for the (N + 1) th time;

under the condition that the updating times N of the totalization parameters reach the corresponding threshold value, processing the images acquired in the current period by using the convolutional layer to obtain activation output data;

and performing quantitative calculation on the activation output data based on the total quantization parameter updated for the Nth time to obtain quantized data.

In one embodiment, the method further comprises:

and returning to the step of processing the multi-frame images by utilizing the convolution layer in the neural network model under the condition that the updating times of the totalization parameters do not reach the corresponding threshold value.

In one embodiment, the method further comprises:

processing the preselected multi-frame image by using the convolutional layer to obtain statistical information of the (N-1) th data;

and obtaining the quantization parameter of the (N-1) th time by utilizing the statistical information of the (N-1) th time data.

In one embodiment, the processing the multi-frame image by using the convolutional layer in the neural network model to obtain the nth data statistical information and the (N + 1) th data statistical information includes:

and sequentially processing the multi-frame images acquired in the historical period by using the convolutional layer to obtain the nth data statistical information and the (N + 1) th data statistical information.

In one embodiment, the method further comprises:

and obtaining the total quantization parameter updated for the (N-1) th time according to the quantization parameter for the (N-1) th time and the quantization parameter for the Nth time.

In one embodiment, the method further comprises:

and processing the data statistical information by adopting a sliding window to obtain the data statistical information corresponding to one frame of image or the data statistical information corresponding to multiple frames of images.

In one embodiment, the method further comprises:

and processing the data statistical information by adopting an alpha filter to obtain a weighted average value of the data statistical information corresponding to the multi-frame image.

In a second aspect, an embodiment of the present application provides a neural network model quantization apparatus, including:

the first data processing module is used for processing the multi-frame image by utilizing the convolutional layer in the neural network model to obtain the statistical information of the nth data and the statistical information of the (N + 1) th data, wherein N is greater than or equal to 1;

the first quantization parameter calculation module is used for obtaining an Nth quantization parameter by using the Nth data statistical information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information;

the first total quantization parameter updating module is used for obtaining the total quantization parameter updated for the Nth time according to the quantization parameter for the Nth time and the quantization parameter for the (N + 1) th time;

the second data processing module is used for processing the image acquired in the current period by using the convolutional layer under the condition that the updating times N of the totalization parameters reach the corresponding threshold value to obtain activation output data;

and the quantization calculation module is used for performing quantization calculation on the activation output data based on the total quantization parameter updated for the Nth time to obtain quantized data.

In one embodiment, the method further comprises:

and the data processing triggering module is used for returning and executing the step of processing the multi-frame image by utilizing the convolution layer in the neural network model under the condition that the updating times of the totalization parameters do not reach the corresponding threshold value.

In one embodiment, the method further comprises:

the third data processing module is used for processing the preselected multi-frame image by using the convolutional layer to obtain the statistical information of the (N-1) th data;

and the second quantization parameter calculation module is used for obtaining the quantization parameter of the (N-1) th time by utilizing the statistical information of the (N-1) th time data.

In one embodiment, the first data processing module comprises:

and the data processing submodule is used for sequentially processing the multi-frame images acquired in the historical period by using the convolutional layer to obtain the nth data statistical information and the (N + 1) th data statistical information.

In one embodiment, the method further comprises:

and the second total quantization parameter updating module is used for obtaining the total quantization parameter updated for the N-1 th time according to the quantization parameter for the N-1 th time and the quantization parameter for the Nth time.

In one embodiment, the method further comprises:

and the sliding window processing module is used for processing the data statistical information by adopting a sliding window to obtain the data statistical information corresponding to one frame of image or the data statistical information corresponding to multiple frames of images.

In one embodiment, the method further comprises:

and the filter processing module is used for processing the data statistical information by adopting an alpha filter to obtain a weighted average value of the data statistical information corresponding to the multi-frame image.

In a third aspect, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above.

In a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions, wherein the computer instructions are for causing a computer to perform the method of any of the above.

One embodiment in the above application has the following advantages or benefits: due to the fact that the quantization parameters have large influence on the improvement of the data quantization performance, the quantization performance and the adaptability of an application scene are improved by updating the total quantization parameters in real time. Meanwhile, the updated total quantization parameter is used for performing quantization calculation on the activation output data corresponding to the current frame image, and because only the updated total quantization parameter is reserved, when the activation output data corresponding to the current frame image is subjected to quantization calculation, a large amount of data does not need to be additionally cached, so that the caching space is saved, and meanwhile, the calculation complexity of the next volume of lamination is effectively reduced.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a schematic diagram of a neural network model quantification method according to an embodiment of the present application;

FIG. 2 is a diagram of a neural network model quantification method according to another embodiment of the present application;

FIG. 3 is a schematic diagram of a neural network model quantification method according to another embodiment of the present application;

FIG. 4 is a schematic diagram of a neural network model quantification method according to another embodiment of the present application;

FIG. 5 is a schematic diagram of a neural network model quantification method according to another embodiment of the present application;

FIG. 6 is a diagram illustrating an apparatus for quantizing a neural network model according to an embodiment of the present application;

FIG. 7 is a diagram of an apparatus for neural network model quantization, according to another embodiment of the present application;

FIG. 8 is a schematic diagram of an apparatus for quantizing a neural network model according to another embodiment of the present application;

FIG. 9 is a diagram of an apparatus for neural network model quantization, according to another embodiment of the present application;

fig. 10 is a block diagram of an electronic device for implementing a neural network model quantization method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

As shown in fig. 1, the present embodiment provides a neural network model quantization method, including the following steps:

s110: processing the multi-frame image by utilizing a convolutional layer in the neural network model to obtain nth data statistical information and (N + 1) th data statistical information, wherein N is greater than or equal to 1;

s120: obtaining an Nth quantization parameter by using the Nth data statistical information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information;

s130: obtaining the total quantization parameter updated for the Nth time according to the quantization parameter for the Nth time and the quantization parameter for the (N + 1) th time;

s140: under the condition that the updating times N of the totalization parameters reach the corresponding threshold value, processing the images acquired in the current period by using the convolutional layer to obtain activation output data;

s150: and performing quantitative calculation on the activation output data based on the total quantization parameter updated for the Nth time to obtain quantized data.

In one example, as shown in fig. 2, in various practical application scenarios, multiple frames of images are collected in real time, and the multiple frames of images are input into the neural network model, and activation output data is obtained through convolution calculation, Normalization or Normalization calculation (Batch Normalization) and activation function (activation) operation of the convolutional layer, for example, activation output data obtained by inputting one frame of image into the neural network model, or activation output data obtained by inputting multiple frames of images into the neural network model. The number of frames of each processed image can be set according to requirements, and is within the protection scope of the embodiment. For example, the images of the first to fifth frames are processed by the convolution layer to obtain first activation output data, and the first activation output data is counted to obtain first data statistical information. And processing the convolution layer on the images of the sixth frame to the fifteenth frame to obtain second activation output data, and counting the second activation output data to obtain second data statistical information. The data statistics information may include maximum and minimum values in the activation output data, a distribution graph (histogram) of the activation output data, a data set of the first bits and a data set of the second bits when the activation output data are arranged from large to small, and the like.

The first time data statistics may then be calculated using the relative entropy to obtain a first time quantization parameter. Alternatively, the maximum and minimum values may be screened in the first activation output data as the first quantization parameter. Or, the first statistical data information may be subjected to peak clipping to obtain a first quantization parameter, a threshold value of the peak clipping may be set to a value of removing a% and b% from the top and the bottom, and a and b may have a value range of 0.5 to 4. The second-time data statistical information may be processed in any one of the three manners to obtain a second-time quantization parameter. The set of the first time data statistical information and the second time data statistical information may be processed in any one of the three manners, and the second time quantization parameter may also be obtained. As the first data statistical information is considered when the quantization parameter is calculated, the accuracy of the second quantization parameter is effectively improved.

And obtaining the total quantization parameter updated for the first time according to the first quantization parameter and the second quantization parameter. Judging whether the updating times 1 of the total quantization parameter reach a corresponding threshold value, if so, processing the image acquired in the current period by the convolution layer to obtain activation output data; and carrying out quantitative calculation on the activation output data based on the total quantization parameter updated for the first time to obtain quantized data. Because a plurality of frames of images are input into the neural network model, a large amount of multiplication and accumulation calculation exists in the calculation processes of convolution calculation, standardization or normalization calculation, activation function operation and the like of the convolution layer, in order to ensure the calculation precision, the activation output data is quantized to obtain quantized data, namely, the activation output data represented by floating point numbers is converted into data represented by fixed point numbers. For example, the activation output data with a wider bit width (for example, the bit width may be 24-32 bits) is quantized according to the quantization parameter obtained by the pre-calculation, so as to obtain quantized data (for example, the bit width of the quantized data is less than or equal to 8 bits).

In the above process, a round of quantization calculation is performed for each convolution layer when N is 1. Of course, when N is 2 or 3, the total quantization parameter may be continuously updated until the total quantization parameter is updated by the number N of times corresponding to the threshold value, the updating is stopped. The number of times of updating the quantization parameter may be adaptively adjusted according to actual requirements, and is within the protection scope of the present embodiment.

In this embodiment, since the quantization parameter has a large influence on the improvement of the data quantization performance, the quantization performance and the adaptability of the application scenario are improved by updating the total quantization parameter in real time. Meanwhile, the updated total quantization parameter is used for performing quantization calculation on the activation output data corresponding to the current frame image, and because only the updated total quantization parameter is reserved, when the activation output data corresponding to the current frame image is subjected to quantization calculation, a large amount of data does not need to be additionally cached, so that the caching space is saved, and meanwhile, the calculation complexity of the next volume of lamination is effectively reduced.

In one embodiment, as shown in fig. 3, the method further includes:

In one example, when N is 1, the total quantization parameter updated for the first time is obtained according to the quantization parameter for the first time and the quantization parameter for the second time. Judging whether the updating times 1 of the total quantization parameter reach the corresponding threshold value, if not, returning to execute the following steps: and when the N is 2, processing the sixteenth frame to the twenty-th frame of images by utilizing the convolution layer in the neural network model to obtain third activation output data. And counting the third activation output data to obtain third data statistical information. And obtaining a third quantization parameter by using the third data statistical information or a set consisting of the second data statistical information and the third statistical information. And obtaining the total quantization parameter updated for the second time according to the quantization parameter for the second time and the quantization parameter for the third time. Judging whether the updating times 2 of the total quantization parameter reach the corresponding threshold value, if not, continuing to return to the step S110 to the step S130 until the updating times N of the total quantization parameter reach the corresponding threshold value.

In one embodiment, as shown in fig. 4, the method further includes:

step S111: processing the preselected multi-frame image by using the convolutional layer to obtain statistical information of the (N-1) th data;

step S112: and obtaining the quantization parameter of the (N-1) th time by utilizing the statistical information of the (N-1) th time data.

In one example, a preselected multi-frame image in the correction set is input into the neural network model, and the 0 th activation output data, namely the initial activation output data, is output through convolution calculation, normalization or normalization calculation and activation function operation of the convolution layer. And counting the 0 th activation output data to obtain 0 th data statistical information, namely initial data statistical information. The 0 th data statistical information may include a maximum value and a minimum value of the 0 th activation output data, a distribution graph (histogram) of the 0 th activation output data, a data set of a plurality of previous bits and a data set of a plurality of next bits when the 0 th activation output data are arranged from large to small, and the like.

The statistical information of the 0 th time data is calculated by using relative entropy (relative entropy), also called Kullback-Leibler divergence or information divergence, to obtain the 0 th time quantization parameter, i.e. the initial quantization parameter. Or, the maximum value and the minimum value are obtained in the 0 th activation output data and are used as the 0 th quantization parameter. Or, the 0 th time data statistical information is subjected to peak clipping processing to obtain 0 th time quantization parameters, the threshold value of the peak clipping processing can be set to be a value of removing a% and b% of the top and the bottom, and the value ranges of a and b can be 0.5-4.

In one embodiment, as shown in fig. 4, step S110 includes:

step S113: and sequentially processing the multi-frame images acquired in the historical period by using the convolutional layer to obtain the nth data statistical information and the (N + 1) th data statistical information.

In one example, because the images acquired in real time in different time periods in the same application scene may change or the images acquired in real time in different application scenes are different, the multi-frame images acquired in real time in the historical time periods are input into the neural network model, and activation output data corresponding to each time period in the historical time periods are different after convolution calculation, normalization or normalization calculation and activation function operation of the convolutional layer. For example, the first activation output data and the second activation output data are not the same.

In one embodiment, as shown in fig. 4, the method further includes:

step S114: and obtaining the total quantization parameter updated for the (N-1) th time according to the quantization parameter for the (N-1) th time and the quantization parameter for the Nth time.

In one example, the 0 th quantization parameter may include an upper boundary point Xtop and a lower boundary point Xdown. The first-time quantization parameter may include a maximum value Qmax and a minimum value Qmin. And calculating the total quantization parameter updated at 0 th time, namely the initial total quantization parameter according to the 0 th quantization parameter and the first quantization parameter. Specifically, the slope s and the offset b of the linear quantization are calculated according to the following formulas:

the initial totalization parameters include slope s and bias b. Similarly, the calculation process of the total quantization parameter updated for the first time to the total quantization parameter updated for the nth time is similar to the calculation method of the initial total quantization parameter.

As shown in fig. 4, when the number of updating times N of the totalization parameter reaches the corresponding threshold, step 140 is executed, that is, the convolution layer is used to process the image acquired in the current period, so as to obtain the activation output data;

step S115: and carrying out quantitative calculation on the activation output data based on the total quantization parameter updated at the (N-1) th time to obtain quantized data.

In one example, the acquired current image is input into a neural network model, and activation output data x is output after convolution calculation, normalization or normalization calculation, and activation function operation of the convolutional layer. And obtaining quantized data y by using a linear calculation formula y-sx + b. Of course, the quantization calculation method includes, but is not limited to, the quantization by the linear calculation formula, and other quantization calculation methods are also possible, all of which are within the scope of the present embodiment.

In this embodiment, the initial quantization parameter, that is, the 0 th quantization parameter, is updated to obtain the first quantization parameter, the initial quantization parameter and the first quantization parameter are used to obtain the initial total quantization parameter, and the initial total quantization parameter is continuously updated until the number of times of updating the total quantization parameter reaches the threshold, so that the method has better adaptability to various practical application scenarios, further improves the quantization performance in various application scenarios, reduces the performance loss caused by quantization errors, and effectively improves the poorer generalization capability of the initial quantization parameter generated by directly using the correction set offline. Meanwhile, the total quantization parameter is updated by using the nth quantization parameter and the (N + 1) th quantization parameter corresponding to the image acquired in the historical period. The N-th updated total quantization parameter is not used for processing the activation output data corresponding to the image acquired in the historical period, but is used for processing the activation output data corresponding to the image acquired in the current period, so that a large amount of data does not need to be additionally cached, the caching space is saved, and meanwhile, the calculation complexity of the next convolution layer is effectively reduced.

In one embodiment, as shown in fig. 5, the method further includes:

step S170: and processing the data statistical information by adopting a sliding window to obtain the data statistical information corresponding to one frame of image or the data statistical information corresponding to multiple frames of images.

In an example, the sliding window may process the data statistics information according to a preset size to obtain data statistics information corresponding to one frame of image or data statistics information corresponding to multiple frames of images. For example, after the M1 frame images are processed by the convolutional layer, the nth activation output data corresponding to the M1 frame images is obtained, the nth activation output data corresponding to the multi-frame images are counted to obtain the data statistical information corresponding to the M1 frame images, the data statistical information corresponding to the M1 frame images is processed by the sliding window to obtain the data statistical information corresponding to one frame image or the data statistical information corresponding to the M2 frame image, and M1 is greater than or equal to M2.

In the embodiment, the length of the sliding window can be dynamically adjusted according to the actual situation, the accuracy of data statistical information is further improved, and the external environment change is tracked in time.

In one embodiment, as shown in fig. 5, the method further includes:

step S180: and processing the data statistical information by adopting an alpha filter to obtain a weighted average value of the data statistical information corresponding to the multi-frame image.

In one example, the data statistics are processed by an alpha filter to obtain a weighted average of the data statistics corresponding to the multiple frames of images. For example, the nth data statistic information corresponding to a plurality of frames of images obtained by processing the plurality of frames of images using a convolutional layer in the neural network model is processed using an alpha filter. And obtaining the Nth quantization parameter according to the weighted average value.

In this embodiment, the alpha filter is used to process the data statistics information, so as to track the external environment change in time and further reduce the storage space required by the actual statistics information.

As shown in fig. 6, the present embodiment provides a neural network model quantization apparatus including:

the first data processing module 110 is configured to process a plurality of frames of images by using a convolutional layer in a neural network model to obtain nth data statistical information and (N + 1) th data statistical information, where N is greater than or equal to 1;

a first quantization parameter calculation module 120, configured to obtain an nth quantization parameter by using the nth data statistics information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information;

a first total quantization parameter updating module 130, configured to obtain a total quantization parameter updated for the nth time according to the nth quantization parameter and the (N + 1) th quantization parameter;

the second data processing module 140 is configured to, when the number of updating times N of the totalization parameter reaches a corresponding threshold value, process an image acquired in a current period using the convolution layer to obtain activation output data;

and the quantization calculation module 150 is configured to perform quantization calculation on the activation output data based on the total quantization parameter updated for the nth time, so as to obtain quantized data.

In one embodiment, as shown in fig. 7, the method further includes:

and the data processing triggering module 160 is configured to return to execute the step of processing the multi-frame image by using the convolutional layer in the neural network model when the number of times of updating the totalization parameter does not reach the corresponding threshold.

In one embodiment, as shown in fig. 8, the method further includes:

the third data processing module 111 is configured to process the preselected multi-frame image by using the convolutional layer to obtain statistical information of the (N-1) th data;

and a second quantization parameter calculation module 112, configured to obtain an nth-1 th quantization parameter by using the nth-1 th data statistical information.

In one embodiment, as shown in fig. 8, the first data processing module 110 includes:

and the data processing submodule 113 is configured to sequentially process multiple frames of images acquired in a historical period by using the convolutional layer to obtain nth data statistical information and (N + 1) th data statistical information.

In one embodiment, as shown in fig. 8, the method further includes:

and a second totalization parameter updating module 114, configured to obtain a totalization parameter updated for the N-1 th time according to the quantization parameter for the N-1 th time and the quantization parameter for the nth time.

In an embodiment, in a case that the number of times of updating the totalization parameter does not reach the corresponding threshold, the quantization calculation module 150 is further configured to perform quantization calculation on the activation output data based on the totalization parameter updated for the N-1 st time, so as to obtain quantized data.

In one embodiment, as shown in fig. 9, the method further includes:

and the sliding window processing module 170 is configured to process the data statistics information by using a sliding window to obtain data statistics information corresponding to one frame of image or data statistics information corresponding to multiple frames of images.

In one embodiment, as shown in fig. 9, the method further includes:

and the filter processing module 180 is configured to process the data statistics information by using an alpha filter to obtain a weighted average of the data statistics information corresponding to the multiple frames of images.

The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 10 is a block diagram of an electronic device for a neural network model quantization method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 10, the electronic apparatus includes: one or more processors 1001, memory 1002, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display Graphical information for a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the Interface. In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 10 illustrates an example of one processor 1001.

The memory 1002 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform a neural network model quantization method provided by the present application. A non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform a neural network model quantization method provided by the present application.

The memory 1002 may be used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to a neural network model quantization method in the embodiments of the present application (for example, a neural network model quantization apparatus shown in fig. 6 includes the first data processing module 110, the first quantization parameter calculation module 120, the total quantization parameter update module 130, the second data processing module 140, and the quantization calculation module 150). The processor 1001 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 1002, that is, implements a neural network model quantization method in the above method embodiments.

The memory 1002 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of an electronic device according to a neural network model quantization method, or the like. Further, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1002 may optionally include memory located remotely from the processor 1001, which may be connected to the electronic devices via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 1003 and an output device 1004. The processor 1001, the memory 1002, the input device 1003, and the output device 1004 may be connected by a bus or other means, and the bus connection is exemplified in fig. 10.

The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 1004 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD) such as a Cr10sta display 10, a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, Integrated circuitry, Application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (Cathode Ray Tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A neural network model quantization method, comprising:

under the condition that the updating times N of the totalization parameters reach corresponding threshold values, processing the images acquired in the current period by using the convolutional layer to obtain activation output data;

and carrying out quantitative calculation on the activation output data based on the total quantization parameter updated for the Nth time to obtain quantized data.

2. The method of claim 1, further comprising:

and under the condition that the updating times of the totalization parameters do not reach the corresponding threshold value, returning to the step of processing the multi-frame images by utilizing the convolution layer in the neural network model.

3. The method of claim 1, further comprising:

processing a preselected multi-frame image by using the convolutional layer to obtain statistical information of the (N-1) th data;

4. The method of claim 3, wherein processing the plurality of frames of images using convolutional layers in the neural network model to obtain the nth data statistic and the (N + 1) th data statistic comprises:

5. The method of claim 4, further comprising:

and obtaining the total quantization parameter updated for the (N-1) th time according to the (N-1) th quantization parameter and the (N) th quantization parameter.

6. The method of claim 1, further comprising:

7. The method of claim 1, further comprising:

8. An apparatus for quantizing a neural network model, comprising:

the first quantization parameter calculation module is used for obtaining an Nth quantization parameter by utilizing the Nth data statistical information; obtaining an (N + 1) th quantization parameter by using the (N + 1) th data statistical information or a set consisting of the (N) th data statistical information and the (N + 1) th data statistical information;

the second data processing module is used for processing the image acquired in the current period by using the convolutional layer under the condition that the updating times N of the total quantization parameter reach a corresponding threshold value to obtain activation output data;

9. The apparatus of claim 8, further comprising:

and the data processing triggering module is used for returning to execute the step of processing the multi-frame images by utilizing the convolutional layer in the neural network model under the condition that the updating times of the total quantization parameter do not reach the corresponding threshold value.

10. The apparatus of claim 8, further comprising:

11. The apparatus of claim 10, wherein the first data processing module comprises:

and the data processing submodule is used for sequentially processing the multi-frame images acquired in the historical period by utilizing the convolutional layer to obtain the Nth data statistical information and the (N + 1) th data statistical information.

12. The apparatus of claim 11, further comprising:

13. The apparatus of claim 8, further comprising:

14. The apparatus of claim 8, further comprising:

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.