CN114528924B

CN114528924B - Image classification model reasoning method, device, equipment and medium

Info

Publication number: CN114528924B
Application number: CN202210099282.XA
Authority: CN
Inventors: 陈其宾; 李锐; 张晖
Original assignee: Shandong Inspur Science Research Institute Co Ltd
Current assignee: Shandong Inspur Science Research Institute Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2024-05-10
Anticipated expiration: 2042-01-27
Also published as: CN114528924A

Abstract

The application discloses an image classification model reasoning method, device, equipment and medium, wherein the method comprises the following steps: training an initial image classification model based on a convolutional neural network; calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; quantizing the weight value of the convolution layer through the weight quantization factor; calculating an inverse quantization factor of the initial image classification model, and calculating a shift representation of the activation value quantization factor and the inverse quantization factor; during reasoning, quantifying the activation value output by the convolution layer through an activation value quantification factor; calculating a convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer; the output result is inversely quantized by the shifted inverse quantization factor and the shifted activation value quantization factor. The reasoning efficiency of the image classification model is improved.

Description

Image classification model reasoning method, device, equipment and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for reasoning an image classification model.

Background

In recent years, neural network models have been widely used in many fields and have achieved very good effects, especially in the field of image classification.

However, the image classification neural network model has low efficiency in reasoning due to high complexity and large model, and has long reasoning time, especially when the image classification neural network model is operated on mobile equipment with low performance and low power consumption embedded equipment.

Therefore, there is a need for more efficient methods of reasoning for image classification models.

Disclosure of Invention

The embodiment of the application provides an image classification model reasoning method, device, equipment and medium, which are used for solving the problem of the need of a more efficient image classification model reasoning method.

The embodiment of the application adopts the following technical scheme:

In one aspect, an embodiment of the present application provides a method for reasoning about an image classification model, where the method includes: training an initial image classification model based on a convolutional neural network; calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types; quantizing the weight value of the convolution layer through the weight quantization factor, and quantizing the weight value of the convolution layer into an INT8 data type; calculating an inverse quantization factor of the initial image classification model and calculating a shifted representation of the activation value quantization factor and the inverse quantization factor; during reasoning, quantifying the activation value output by the convolution layer by the activation value quantification factor, and quantifying the activation value output by the convolution layer into a UINT8 data type; calculating the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer; and inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model.

In one example, the determining the activation value quantization factor of the convolution layer by minimizing the mean square error specifically includes: determining quantized test activation values and unquantized test activation values through a calibration data set and the initial image classification model; calculating a mean square error between the quantized test activation value and the unquantized test activation value; and determining an activation value quantization factor of the convolution layer when the mean square error is the minimum value.

In one example, the calculating the mean square error between the quantized test activation value and the unquantized test activation value specifically includes: calculating a mean square error between the quantized test activation value and the unquantized test activation value by the following formula: Wherein MSE is mean square error, y _i is quantized test activation value,/>, and For unquantized test activation values, n is the number of activation values output by the convolutional layer.

In one example, the calculating the weight quantization factor of the convolution layer in the initial image classification model specifically includes: respectively calculating absolute values of a plurality of weight parameters in the convolution layer; determining a weight quantization range of the convolution layer through the maximum value of the absolute value; and calculating the weight quantization factor of the convolution layer in the initial image classification model according to the weight quantization range of the convolution layer.

In one example, after calculating the weight quantization factor of the convolution layer in the initial image classification model according to the weight quantization range of the convolution layer, the method further includes: judging whether the weight quantization factor is in a preset range or not; if not, the weight quantization factor is optimized by adjusting the weight value of the convolution layer in the initial image classification model.

In one example, the training is based on an initial image classification model of a convolutional neural network, and specifically includes: acquiring a monitoring sample image; carrying out gray scale processing on the monitored sample image to obtain a gray scale image of the monitored sample image; training the convolutional neural network through the gray level image to obtain an initial image classification model based on the convolutional neural network.

In one example, after said inversely quantizing said output result to said floating point type data type to generate an image classification model, said method further comprises: the embedded equipment with low power consumption runs the image classification model; preprocessing the monitoring image uploaded by the monitoring equipment, and inputting the preprocessed monitoring image into the image classification model; and classifying the preprocessed monitored images through the image classification model to obtain classification results.

In another aspect, an embodiment of the present application provides an inference apparatus for an image classification model, including: the training module is used for training an initial image classification model based on the convolutional neural network; the first calculation module is used for calculating the weight quantization factor of the convolution layer in the initial image classification model and determining the activation value quantization factor of the convolution layer by minimizing the mean square error; the weight value and the activation value of the convolution layer are floating point data types; the quantization module quantizes the weight value of the convolution layer through the weight quantization factor and quantizes the weight value of the convolution layer into INT8 data type; a second calculation module that calculates an inverse quantization factor of the initial image classification model and calculates a shifted representation of the activation value quantization factor and the inverse quantization factor; the reasoning module is used for quantifying the activation value output by the convolution layer through the activation value quantification factor during reasoning, and quantifying the activation value output by the convolution layer into a UINT8 data type; the third calculation module calculates the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer; and the inverse quantization module is used for inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor so as to generate an image classification model.

In another aspect, an embodiment of the present application provides an inference apparatus of an image classification model, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to: training an initial image classification model based on a convolutional neural network; calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types; quantizing the weight value of the convolution layer through the weight quantization factor, and quantizing the weight value of the convolution layer into an INT8 data type; calculating an inverse quantization factor of the initial image classification model and calculating a shifted representation of the activation value quantization factor and the inverse quantization factor; during reasoning, quantifying the activation value output by the convolution layer by the activation value quantification factor, and quantifying the activation value output by the convolution layer into a UINT8 data type; calculating the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer; and inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model.

In another aspect, embodiments of the present application provide an inferential non-volatile computer storage medium of an image classification model, storing computer-executable instructions, the computer-executable instructions being configured to: training an initial image classification model based on a convolutional neural network; calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types; quantizing the weight value of the convolution layer through the weight quantization factor, and quantizing the weight value of the convolution layer into an INT8 data type; calculating an inverse quantization factor of the initial image classification model and calculating a shifted representation of the activation value quantization factor and the inverse quantization factor; during reasoning, quantifying the activation value output by the convolution layer by the activation value quantification factor, and quantifying the activation value output by the convolution layer into a UINT8 data type; calculating the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer; and inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

the activation value quantization factor of the convolution layer is calculated by minimizing the mean square error, the activation value quantization factor can be calculated before the image classification model is inferred, the inference speed of the model is improved, the convolution layer is calculated by the quantized weight value and the quantized activation value, the size of the initial image classification model can be effectively reduced, the inference speed of the initial image classification model is improved, and the supported embedded equipment type is increased.

Drawings

In order to more clearly illustrate the technical solution of the present application, some embodiments of the present application will be described in detail below with reference to the accompanying drawings, in which:

FIG. 1 is a schematic flow chart of an reasoning method of an image classification model according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of an inference apparatus of an image classification model according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of an inference device of an image classification model according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of an inference method of an image classification model according to an embodiment of the present application. Some of the input parameters or intermediate results in the flow allow for manual intervention adjustments to help improve accuracy.

The implementation of the analysis method according to the embodiment of the present application may be a terminal device or a server, which is not particularly limited in the present application. For ease of understanding and description, the following embodiments will be described in detail with reference to a terminal device as an example.

On low power consumption devices like MCUs, a low resource consumption model is required, and in addition, many MCUs do not support floating point operations, limiting the application of the model. The model quantization has good effect in coping with the problems, the model size can be effectively reduced by quantizing the model from a floating point type to a fixed point type, meanwhile, the image classification model reasoning speed is improved, and the supported embedded equipment type is increased.

For example, the terminal device is an embedded device with low power consumption, and the embedded device comprises an edge-end deep learning device based on RISC-V architecture, an Xilinx MCU development board is used as a hardware platform, and an on-board serial port, an HDMI interface and an image pickup device. Wherein, flat-head brother wujian open source IP can be used as MCU core, and OV5640 camera is adopted in the camera equipment.

The image classification model is operated on the MCU, and finally, the classification result is output to the peripheral equipment through the onboard serial port and HDMI.

The flow in fig. 1 may include the steps of:

s101: an initial image classification model based on a convolutional neural network is trained.

In some embodiments of the present application, when an initial image classification model is constructed, a monitoring sample image is obtained in advance, gray processing is performed on the monitoring sample image to obtain a gray image of the monitoring sample image, and a convolutional neural network is trained through the gray image to obtain an initial image classification model based on the convolutional neural network. That is, the convolutional neural network is trained using the monitored sample image as training data.

It should be noted that, by performing gray processing on the monitored sample image, the capacity of the initial image classification model can be reduced, thereby reducing the resource consumption of the edge deep learning device.

The capturing of the real-time image data can be realized through the camera equipment, so that a monitoring sample image uploaded by the camera is obtained.

S102: calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types.

In some embodiments of the present application, when calculating the weight quantization factor of the convolution layer in the initial image classification model, the terminal device first calculates the absolute values of a plurality of weight parameters in the convolution layer, respectively. Then, the weight quantization range of the convolution layer is determined by the maximum value of the absolute value. For example, if the absolute value is 100, the weight quantization range is [ -100,100]. And finally, calculating the weight quantization factor of the convolution layer in the initial image classification model according to the weight quantization range of the convolution layer.

Because the weight value may have errors, whether the weight quantization factor is in a preset range is judged, if not, the weight quantization factor is optimized by adjusting the weight value of the convolution layer in the initial image classification model. If yes, step S103 is executed.

When determining the activation value quantization factor of the convolution layer by minimizing the mean square error, that is, calculating the activation value quantization factor of each layer by minimizing the mean square error, firstly, determining the quantized test activation value and the unquantized test activation value by calibrating the data set and the initial image classification model. For example, the test data set is input into an initial image classification model, and quantized test activation values and unquantized test activation values of each convolution layer are calculated.

Then, the mean square error between the quantized test activation value and the unquantized test activation value is calculated, and when the mean square error is the minimum value, the activation value quantization factor of the convolution layer is determined. That is, by minimizing the mean square error, the activation value quantization factor is obtained.

The mean square error between quantized and unquantized test activation values is calculated by the following formula:

Where MSE is the mean square error, y _i is the quantized test activation value, For unquantized test activation values, n is the number of activation values output by the convolutional layer.

That is, the MSE formula is an objective function, the objective function is optimized, and when the mean square error is the minimum value, an activation value quantization range is obtained, and an activation value quantization factor is obtained through the activation value quantization range. Thus, the activation value quantization factor of each layer can be obtained by minimizing the mean square error and calculating in advance.

S103: and quantizing the weight value of the convolution layer by the weight quantization factor, and quantizing the weight value of the convolution layer into an INT8 data type.

S104: an inverse quantization factor of the initial image classification model is calculated, and a shifted representation of the activation value quantization factor and the inverse quantization factor is calculated.

In some embodiments of the present application, the weight values of the convolutional layer are quantized by a weight quantization factor in an asymmetric quantization manner. The weight value for the floating point data type is then quantized to the INT8 data type.

That is, the activation value quantization factor and the inverse quantization factor participate in the calculation in a shift manner, avoiding the use of floating point number calculation.

S105: and during reasoning, quantifying the activation value output by the convolution layer by the activation value quantification factor, and quantifying the activation value output by the convolution layer into a UINT8 data type.

Meanwhile, an asymmetric quantization mode is adopted, an activation value output by the convolution layer is quantized through an activation value quantization factor, and an activation value which is a floating point data type is quantized into a UINT8 data type.

It should be noted that training is by learning a certain capability from existing data, while reasoning is by simplifying and using that capability so that it can operate on unknown data quickly and efficiently to obtain the desired results.

S106: and calculating the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer.

S107: and inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model. The reasoning results are inversely quantized to the int32 data type.

In some embodiments of the application, to reduce model power consumption, an image classification model is run on the MCU development board. That is, the embedded device with low power consumption runs the image classification model.

Then, the MCU development board performs preprocessing on the monitoring image uploaded by the monitoring device, for example, preprocessing is gray processing. Then, the preprocessed monitoring images are input into an image classification model, and then the preprocessed monitoring images are classified through the image classification model, so that a classification result is obtained. And finally, outputting the classification result to the peripheral equipment through the serial port and the HDMI.

It should be noted that, although the embodiment of the present application is described with reference to fig. 1 to sequentially describe steps S101 to S107, this does not represent that steps S101 to S107 must be performed in strict order. The steps S101 to S107 are sequentially described according to the sequence shown in fig. 1 in order to facilitate the understanding of the technical solution of the embodiment of the present application by those skilled in the art. In other words, in the embodiment of the present application, the sequence between the steps S101 to S107 may be appropriately adjusted according to the actual needs.

By the method of fig. 1, the activation value quantization factor of the convolution layer is calculated by minimizing the mean square error, the activation value quantization factor can be calculated before the image classification model is inferred, the inference speed of the model is improved, the convolution layer is calculated by the quantized weight value and the quantized activation value, the size of the initial image classification model can be effectively reduced, the inference speed of the initial image classification model is improved, and the supported embedded equipment type is increased.

Based on the same thought, some embodiments of the present application further provide an apparatus, a device, and a non-volatile computer storage medium corresponding to the above method.

Fig. 2 is a schematic structural diagram of an inference apparatus of an image classification model according to an embodiment of the present application, where the apparatus includes:

the training module 201 trains an initial image classification model based on a convolutional neural network;

A first calculation module 202 for calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types;

the quantization module 203 quantizes the weight value of the convolution layer through the weight quantization factor, and quantizes the weight value of the convolution layer into an INT8 data type;

A second calculation module 204 that calculates an inverse quantization factor of the initial image classification model and calculates a shifted representation of the activation value quantization factor and the inverse quantization factor;

the reasoning module 205 is used for quantifying the activation value output by the convolution layer through the activation value quantification factor during reasoning, and quantifying the activation value output by the convolution layer into a UINT8 data type;

a third calculation module 206, configured to calculate the convolution layer according to the quantized weight value and the quantized activation value, so as to obtain an output result of the convolution layer;

the inverse quantization module 207 inversely quantizes the output result into the floating point type data type by the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model.

Fig. 3 is a schematic structural diagram of an inference device of an image classification model according to an embodiment of the present application, where the device includes:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to:

training an initial image classification model based on a convolutional neural network;

calculating a weight quantization factor of a convolution layer in the initial image classification model, and determining an activation value quantization factor of the convolution layer by minimizing a mean square error; the weight value and the activation value of the convolution layer are floating point data types;

Quantizing the weight value of the convolution layer through the weight quantization factor, and quantizing the weight value of the convolution layer into an INT8 data type;

Calculating an inverse quantization factor of the initial image classification model and calculating a shifted representation of the activation value quantization factor and the inverse quantization factor;

During reasoning, quantifying the activation value output by the convolution layer by the activation value quantification factor, and quantifying the activation value output by the convolution layer into a UINT8 data type;

Calculating the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer;

and inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model.

Some embodiments of the present application provide an inferential non-volatile computer storage medium of an image classification model, storing computer-executable instructions, the computer-executable instructions being configured to:

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for apparatus, devices and media embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

The devices, the devices and the media provided by the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices, the devices and the media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices, the devices and the media are not described again here.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the technical principle of the present application should fall within the protection scope of the present application.

Claims

1. A method of reasoning about an image classification model, the method comprising:

Inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor to generate an image classification model;

the determining the activation value quantization factor of the convolution layer by minimizing the mean square error specifically comprises the following steps:

Determining quantized test activation values and unquantized test activation values through a calibration data set and the initial image classification model;

Calculating a mean square error between the quantized test activation value and the unquantized test activation value;

When the mean square error is the minimum value, determining an activation value quantization factor of the convolution layer;

the calculating the mean square error between the quantized test activation value and the unquantized test activation value specifically includes:

calculating a mean square error between the quantized test activation value and the unquantized test activation value by the following formula:

Wherein, MSE is the mean square error, For quantized test activation values,/>For unquantized test activation values, n is the number of activation values output by the convolutional layer;

the calculating the weight quantization factor of the convolution layer in the initial image classification model specifically comprises the following steps:

respectively calculating absolute values of a plurality of weight parameters in the convolution layer;

Determining a weight quantization range of the convolution layer through the maximum value of the absolute value;

and calculating the weight quantization factor of the convolution layer in the initial image classification model according to the weight quantization range of the convolution layer.

2. The method of claim 1, wherein after calculating the weight quantization factor of the convolution layer in the initial image classification model according to the weight quantization range of the convolution layer, the method further comprises:

Judging whether the weight quantization factor is in a preset range or not;

If not, the weight quantization factor is optimized by adjusting the weight value of the convolution layer in the initial image classification model.

3. The method according to claim 1, wherein the training is based on an initial image classification model of a convolutional neural network, in particular comprising:

acquiring a monitoring sample image;

Carrying out gray scale processing on the monitored sample image to obtain a gray scale image of the monitored sample image;

Training the convolutional neural network through the gray level image to obtain an initial image classification model based on the convolutional neural network.

4. The method of claim 1, wherein after said inversely quantizing said output result to said floating point type data type to generate an image classification model, said method further comprises:

the embedded equipment with low power consumption runs the image classification model;

preprocessing the monitoring image uploaded by the monitoring equipment, and inputting the preprocessed monitoring image into the image classification model;

and classifying the preprocessed monitored images through the image classification model to obtain classification results.

5. An inference apparatus of an image classification model, the apparatus comprising:

the training module is used for training an initial image classification model based on the convolutional neural network;

the first calculation module is used for calculating the weight quantization factor of the convolution layer in the initial image classification model and determining the activation value quantization factor of the convolution layer by minimizing the mean square error; the weight value and the activation value of the convolution layer are floating point data types;

The quantization module quantizes the weight value of the convolution layer through the weight quantization factor and quantizes the weight value of the convolution layer into INT8 data type;

A second calculation module that calculates an inverse quantization factor of the initial image classification model and calculates a shifted representation of the activation value quantization factor and the inverse quantization factor;

the reasoning module is used for quantifying the activation value output by the convolution layer through the activation value quantification factor during reasoning, and quantifying the activation value output by the convolution layer into a UINT8 data type;

The third calculation module calculates the convolution layer through the quantized weight value and the quantized activation value to obtain an output result of the convolution layer;

the inverse quantization module is used for inversely quantizing the output result into the floating point type data type through the shifted inverse quantization factor and the shifted activation value quantization factor so as to generate an image classification model;

6. An inference apparatus of an image classification model, the apparatus comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

7. An inferential non-transitory computer storage medium storing computer executable instructions for an image classification model, the computer executable instructions configured to: