CN111738419A

CN111738419A - Quantification method and device of neural network model

Info

Publication number: CN111738419A
Application number: CN202010568260.4A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2020-10-02
Anticipated expiration: 2040-06-19
Also published as: CN111738419B

Abstract

The application discloses a quantification method and device of a neural network model, and relates to the technical field of artificial intelligence, deep learning and image processing. One embodiment of the method comprises: based on the current quantization mapping function, quantizing the neural network model to be quantized and testing the performance of the current quantized neural network model obtained based on the quantization of the current quantization mapping function, wherein the quantization mapping function is a preset function; iteratively adjusting parameters of the neural network model to be quantized and parameters of the current quantization mapping function based on the performance of the current quantized neural network model; and determining the current quantized neural network model as the target neural network model in response to determining that the current quantized neural network model meets the preset convergence condition. The embodiment improves the accuracy of the quantified neural network model.

Description

Quantification method and device of neural network model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of artificial intelligence, deep learning and image processing, and particularly relates to a quantification method and device of a neural network model.

Background

In recent years, deep learning techniques have been successful in many application fields, and in the deep learning techniques, the quality of a neural network structure has a very important influence on the effect of a model. In practice, to obtain higher performance, the structural complexity of the neural network is higher, and accordingly, the number of network parameters is huge. The storage of the parameters of the neural network requires a large memory space, and the requirement on the processor is high due to the numerous parameters and high precision when the neural network is operated.

In order to ensure the real-time performance of the neural network operation, reduce the operation pressure of the processor and ensure the performance of the neural network, the neural network model needs to be quantized.

Disclosure of Invention

A quantization method of a neural network model, a quantization apparatus, an electronic device, and a computer-readable medium are provided.

According to a first aspect, there is provided a method of quantifying a neural network model, the method comprising: based on the current quantization mapping function, quantizing the neural network model to be quantized and testing the performance of the current quantized neural network model obtained based on the quantization of the current quantization mapping function, wherein the quantization mapping function is a preset function; iteratively adjusting parameters of the neural network model to be quantized and parameters of the current quantization mapping function based on the performance of the current quantized neural network model; and determining the current quantized neural network model as the target neural network model in response to determining that the current quantized neural network model meets the preset convergence condition.

According to a second aspect, there is provided an apparatus for quantizing a neural network model, the apparatus comprising: the performance testing module is configured to quantize the neural network model to be quantized based on the current quantization mapping function and test the performance of the current quantized neural network model quantized based on the current quantization mapping function, wherein the quantization mapping function is a preset function; a parameter adjustment module configured to iteratively adjust a parameter of the neural network model to be quantized and a parameter of the current quantization mapping function based on a performance of the current quantized neural network model; a target determination module configured to determine the current quantized neural network model as a target neural network model in response to determining that the current quantized neural network model satisfies a preset convergence condition.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.

The method and the device for quantizing the neural network model provided by the embodiment of the application are characterized in that firstly, the neural network model to be quantized is quantized based on the current quantization mapping function, and the performance of the current quantized neural network model quantized based on the current quantization mapping function is tested; secondly, iteratively adjusting parameters of the neural network model to be quantized and parameters of the current quantization mapping function based on the performance of the current quantized neural network model; and finally, in response to the fact that the current quantized neural network model meets the preset convergence condition, determining that the current quantized neural network model is the target neural network model. Therefore, in each iteration process, the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function are adjusted based on the performance of the currently quantized neural network model, and after the iteration is completed, the target neural network model with the minimum quantization loss can be learned on line.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow diagram of one embodiment of a method for quantifying a neural network model according to the present application;

FIG. 2 is a flow diagram of another embodiment of a method for quantifying a neural network model according to the present application;

FIG. 3 is a flow chart of a third embodiment of a method of quantifying a neural network model according to the present application;

FIG. 4 is a schematic diagram of an embodiment of a quantization apparatus for a neural network model according to the present application;

fig. 5 is a block diagram of an electronic device for implementing a quantization method of a neural network model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 illustrates a flow 100 according to one embodiment of a method for quantifying a neural network model of the present application. The quantification method of the neural network model comprises the following steps:

step 101, quantizing the neural network model to be quantized based on the current quantization mapping function, and testing the performance of the current quantized neural network model quantized based on the current quantization mapping function, wherein the quantization mapping function is a preset function.

In this embodiment, the Neural Network model is a model established based on an Artificial Neural Network (ANN), which is also called a Neural Network, and is a Network formed by widely interconnecting a large number of processing units (Neurons), and is an abstraction, simplification, and simulation of the human brain, and reflects the basic characteristics of the human brain.

The neural network model to be quantified may be a trained model with higher precision parameters, such as fp32 (32-bit double precision) parameter model.

Generally, the neural network model contains more parameters, and a large number of matrix operations of the neural network model with higher parameter precision occupy more computing resources in the training process, so that the training efficiency of the neural network model is low. After training is completed, more resources are consumed when the neural network model is operated, a certain delay is usually caused, and the real-time requirement cannot be met, so that the parameters of the neural network model need to be quantized, the bit width of the parameters is reduced, and the operation speed of the neural network model is increased.

The model quantization belongs to the category of model compression, and the purpose of the model compression is to reduce the memory size of the model and accelerate the model reasoning speed. In this embodiment, the quantization of the neural network model to be quantized mainly compresses parameters of the neural network model to be quantized, and floating point numbers (for example, 32 bits) in the neural network model to be quantized are represented by numbers (for example, 8 bits) occupying less memory space and having lower bits in the compression process, so that the inside of the neural network model to be quantized is calculated by using a simpler numerical type without affecting the accuracy of the neural network model to be quantized, the calculation speed is increased greatly, and consumed calculation resources are greatly reduced.

In this embodiment, the neural network model to be quantized may be quantized into a quantized neural network model based on a quantization mapping function. The quantization mapping function is a preset function, and the preset function is used for representing a mathematical transformation relation between parameters of the neural network model to be quantized and parameters of the quantized neural network model. And the parameter bit width of the neural network model is reduced after quantization. The parameters of the neural network model to be quantized can be conveniently and quickly quantized into the parameters of the quantized neural network model by setting the quantization mapping function.

Further, the preset function may be a function with variable parameters, for example, the parameters are self-updated based on the performance change of the currently quantized neural network model, in order to improve the accuracy of the quantized neural network model, the execution subject of the quantization method of the neural network model needs to iteratively adjust the parameters of the neural network model to be quantized, and the preset function is set as a function with parameters changing with the performance of the currently quantized neural network model, so that the parameters of the function can be self-learned in the model quantization process, thereby continuously optimizing the parameters of the quantization mapping function, and facilitating to improve the accuracy of the neural network model to be quantized.

In this embodiment, an execution main body of the quantization method for the neural network model obtains performance of the currently quantized neural network model in each iteration operation, where an index representing the performance of the currently quantized neural network model may include at least one of the following: latency of a processor (CPU, GPU, etc.) on which the current quantized neural network model operates, precision of the current quantized neural network model, size or computational complexity of the current quantized neural network model, and so forth. At least one of the indexes can be compared with pre-prepared evaluation data (such as pre-prepared processor delay and pre-prepared precision value of the quantized neural network) corresponding to the current index, so as to determine the error of the quantized neural network model after the quantization.

It should be noted that the neural network model to be quantized may be a neural network model that performs an image processing task, the quantized mapping function is self-updated based on the image processing performance change of the current quantized neural network model, and the index of the image processing performance of the current quantized neural network model may include at least one of the following: the processing delay of the current quantized neural network model image, the distortion of the image output by the current quantized neural network model image, the delay of a processor operated by the current quantized neural network model, and the like.

And 102, iteratively adjusting the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the current quantized neural network model.

After determining the error after quantization of the current quantized neural network model, adjusting the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function by using the error through an error Back Propagation (BP) algorithm and a random parallel gradient descent (SPGD) algorithm.

In this embodiment, the initial value of the parameter of the neural network model to be quantized may be a parameter value of the neural network model to be quantized after initialization. When the first iteration operation is performed, the initialized parameter value of the neural network model to be quantized may be adjusted to the parameter of the neural network model to be quantized of the current iteration operation. In the non-first iteration operation, the parameters of the neural network model to be quantized after the last iteration operation may be adjusted to the parameters of the neural network model to be quantized in the current iteration operation.

In this embodiment, the initial value of the parameter of the quantization mapping function may be a preset value, and when the first iteration operation is performed, the initial value of the parameter of the quantization mapping function may be adjusted to the parameter of the quantization mapping function of the current iteration operation. In the non-first iteration operation, the parameter of the quantization mapping function in the current iteration operation can be obtained by adjusting the parameter of the quantization mapping function after the last iteration operation based on the error back propagation and gradient descent method.

In some optional implementations of this embodiment, the parameter of the quantized mapping function includes: mapping transformation parameters; the mapping transformation parameters include parameters characterizing a mathematical transformation relationship between parameters of the neural network model to be quantized and parameters of the quantized neural network model.

In this optional implementation manner, the parameter of the quantization mapping function includes a mapping transformation parameter, and the parameter of the quantization mapping function can be conveniently and quickly adjusted in each iteration operation through the mapping transformation parameter.

For example, the quantization mapping function is f (x) ax-x²-b, wherein x represents a parameter of the neural network model to be quantized, f (x) represents the quantized neural networkAnd a and b are parameters of a quantitative mapping function, and when a and b can perform online learning based on the performance of the current quantized neural network model, a and b in the quantitative mapping function are also mapping transformation parameters of the quantitative mapping function.

In some optional implementations of this embodiment, the parameter of the quantized mapping function includes: mapping transformation parameters and mapping interval thresholds. The mapping transformation parameters comprise parameters representing mathematical transformation relations between the parameters of the neural network model to be quantized and the parameters of the quantized neural network model. The mapping interval threshold represents the mapping relation between the interval to which the parameter of the neural network model to be quantized belongs and the interval to which the parameter of the quantized neural network model belongs. The mapping interval threshold may be a boundary value of a parameter interval of the neural network model to be quantized, and a value in the interval is mapped to the same parameter value after quantization.

In this optional implementation manner, the parameter of the quantized mapping function includes a mapping transformation parameter and a mapping interval threshold, and the parameter of the quantized mapping function can be conveniently and quickly adjusted through the mapping transformation parameter and the mapping interval threshold in each iteration operation.

And 103, responding to the condition that the current quantized neural network model meets the preset convergence condition, and determining that the current quantized neural network model is the target neural network model.

In this embodiment, it may be determined whether the current quantized neural network model meets a preset convergence condition, for example, whether a difference between an output value of the quantized neural network model and preset evaluation data in the last consecutive iteration operations is lower than a preset error value, if so, the iteration operations may be stopped, and the current quantized neural network model is used as the target neural network model. The preset convergence condition is a preset condition that can terminate the iteration, and for example, the preset convergence condition includes: the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold; or the performance of the currently quantized neural network model reaches a preset performance threshold.

Optionally, the determining whether the current quantized neural network model meets the preset convergence condition may include: judging whether the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold value or not; the preset time threshold may be set according to the structure of the neural network model to be quantized and a quantization mapping function, for example, the preset time threshold is 10 ten thousand times.

Optionally, the determining whether the current quantized neural network model meets the preset convergence condition may include: and judging whether the performance of the current quantized neural network model reaches a preset performance threshold value, wherein the performance threshold value can be set according to the performance of the current quantized neural network model.

In some optional implementations of this embodiment, the preset convergence condition includes: the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold, and the performance of the neural network model after current quantization reaches the preset performance threshold. Judging whether the current quantized neural network model meets the preset convergence condition may further include: judging whether the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold value or not; when the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold, judging whether the performance of the currently quantized neural network model reaches a preset performance threshold; in this optional implementation manner, when the number of times of iterative adjustment of the parameter of the neural network model to be quantized and the parameter of the current quantization mapping function exceeds a preset number threshold and the performance of the current quantized neural network model also reaches a preset performance threshold, it is determined that the current quantized neural network model satisfies a preset convergence condition.

In some optional implementations of this embodiment, in response to determining that the number of times of iterative adjustment of the parameter of the neural network model to be quantized and the parameter of the current quantization mapping function exceeds a preset number of times threshold and the performance of the current quantized neural network model does not reach a preset performance threshold, a quantization mapping function is determined from a preset quantization mapping function search space based on the performance of the current quantized neural network model, so as to update the current quantization mapping function.

In this embodiment, corresponding quantization mapping function search spaces may be constructed in advance for different quantization requirements, at least one quantization mapping function is set in the preset quantization mapping function search space, the quantization mapping relations corresponding to the respective quantization mapping functions are different, and the quantization mapping relations include bit width mapping relations, for example, the bit width mapping relation of one quantization mapping function is a mathematical transformation relation between float32 and int 8; the bit width mapping relation of one quantization mapping function is a mathematical transformation relation between float32 and float 16; of course, the difference in the quantization mapping relationship can also be characterized as: the bit width mapping relations of the two quantization mapping functions in the quantization mapping function search space are the same, but the parameters of the two quantization mapping functions are different.

In this embodiment, the number of times of iterative adjustment of the parameter of the neural network model to be quantized and the parameter of the current quantization mapping function exceeds a preset number threshold and the performance of the current quantized neural network model does not reach a preset performance threshold, and it is determined that the current quantized neural network model still does not satisfy a preset convergence condition after the iterative adjustment of at least the preset number threshold, so that it is necessary to determine the quantization mapping function from a preset quantization mapping function search space again and update the current quantization mapping function. Therefore, parameters of the quantization mapping function can be learned in the model quantization process, a more appropriate quantization mapping function can be searched, and the quantization precision loss is further reduced.

In this embodiment, the target neural network model is a model obtained by a quantization method of the neural network model, and may be used in a terminal device to implement: compared with the neural network model to be quantized, the calculation resources consumed by the target neural network model at the equipment terminal can be greatly reduced by the functions of speech recognition, speech synthesis, text translation, natural language understanding, image understanding, trend prediction, target detection and tracking and the like.

In the quantization method of the neural network model provided in this embodiment, firstly, based on the current quantization mapping function, the neural network model to be quantized is quantized and the performance of the current quantized neural network model quantized based on the current quantization mapping function is tested; secondly, iteratively adjusting parameters of the neural network model to be quantized and parameters of the current quantization mapping function based on the performance of the current quantized neural network model; and finally, in response to the fact that the current quantized neural network model meets the preset convergence condition, determining that the current quantized neural network model is the target neural network model. Therefore, in each iteration process, the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function are adjusted based on the performance of the current quantized neural network model, after the iteration is completed, the target neural network model with the minimum quantization loss can be learned on line, and the precision of the quantized neural network model can be improved by automatically searching and learning the parameters of the quantization mapping function in the model quantization process.

The neural network model to be quantized can be a neural network model for executing an image processing task, the quantized mapping function is self-updated based on the change of the image processing performance of the current quantized neural network model, the parameters of the neural network model to be quantized and the parameters of the current quantized mapping function are adjusted based on the image processing performance of the current quantized neural network model in each iteration process, after the iteration is completed, the target neural network model with the minimum quantization loss can be learned on line, the bit width of the parameters of the target neural network model is reduced, the occupied storage resources and calculation resources when the target neural network model is operated are reduced, the hardware delay of the image processing task can be effectively reduced, and the hardware resources are saved.

With continuing reference to FIG. 2, FIG. 2 illustrates a flow 200 according to another embodiment of a method for quantifying a neural network model of the present application, the method comprising the steps of:

step 201, based on the current quantization mapping function, quantizing the neural network model to be quantized and testing the performance of the current quantized neural network model obtained based on the quantization of the current quantization mapping function, wherein the quantization mapping function is a preset function.

Step 202, iteratively adjusting the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the current quantized neural network model.

Step 203, in response to determining that the current quantized neural network model meets the preset convergence condition, determining that the current quantized neural network model is the target neural network model.

Optionally, the preset convergence condition includes: the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold, or the performance of the neural network model after current quantization reaches a preset performance threshold.

In some optional implementations of this embodiment, the preset convergence condition includes: the times of iterative adjustment of the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function exceed a preset time threshold, and the performance of the neural network model after current quantization does not reach the preset performance threshold.

And 204, sending the target neural network model to the task execution end so as to deploy the target neural network model at the task execution end side and execute the corresponding media data processing task.

In this embodiment, the task execution end may be an execution main body for executing a task, where the execution main body includes: when the task execution end comprises a server, the server can be a server for running various services, such as a server for running a neural network structure search task or a server for running an optimization and deployment model; when the task execution end comprises a terminal device, the terminal device may be a user end device, and various client applications, such as an image processing application, an information analysis application, and the like, are installed on the terminal.

The tasks executed by the execution main body include and are not limited to: image processing tasks, information analysis tasks, voice recognition and processing tasks, financial security tasks, and the like.

In this embodiment, after the task execution end deploys the target neural network model, the task execution end may execute a corresponding media data processing task through the target neural network model, where the media data includes: video, text, voice, etc. data.

According to the quantization method of the neural network model, after the target neural network model is determined, the target neural network model is sent to the task execution end, and the task execution end deploys the target neural network model and enables the target neural network model to execute the corresponding media data processing task, so that fluency and reliability of executing the media data processing task are improved.

With continuing reference to FIG. 3, FIG. 3 illustrates a flow 300 of a third embodiment of a method of quantifying a neural network model according to the present application, the method of quantifying a neural network model comprising the steps of:

step 301, determining a quantization bit width based on the hardware operating environment information of the task execution end, and determining a current quantization mapping function according to the quantization bit width.

In this embodiment, the task execution end may be an execution main body for executing a task, where the execution main body includes: a server and/or a terminal device; the hardware running environment of the task execution end refers to a hardware device supporting a development software environment in a server or a terminal device, and the hardware running environment information includes but is not limited to a CPU type, a graphics card type, a memory size, a display resolution, and the like.

The bit width of the task execution end is determined according to the data volume which can be transmitted by the memory or the video memory of the task execution end at one time, the bit width which can be applied by the task execution end can be determined based on the hardware operation environment of the task execution end, and one bit width is selected from the bit width which is applied by the task execution end to be used as the quantization bit width.

The quantization bit width suitable for the task execution end can be selected according to the applicable relation between the quantization bit width and different quantization mapping functions. In this embodiment, the current quantization mapping function is determined according to the quantization bit width, and the bit width of the parameter of the neural network model to be quantized can be quantized into the quantization bit width through the current quantization mapping function, so that the consistency of the bit width of the task execution end and the quantized neural network model is ensured.

Step 302, quantizing the neural network model to be quantized based on the current quantization mapping function, and testing the performance of the current quantized neural network model quantized based on the current quantization mapping function, wherein the quantization mapping function is a preset function.

Step 303, iteratively adjusting the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the current quantized neural network model.

And 304, in response to determining that the current quantized neural network model meets the preset convergence condition, determining that the current quantized neural network model is the target neural network model.

And 305, sending the target neural network model to the task execution end so as to deploy the target neural network model at the task execution end side and execute the corresponding media data processing task.

In the quantization method of the neural network model provided in this embodiment, before quantizing the neural network model to be quantized based on the current quantization mapping function, the quantization bit width is determined based on the hardware operating environment information of the task execution end, and the current quantization mapping function is determined according to the quantization bit width, so that the quantization bit width of the quantization mapping function is consistent with the bit width corresponding to the task execution end, the degree of matching between the target neural network model and the task execution end is improved, and further, the reliability of task execution by the task execution end is improved.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present application provides an embodiment of a quantization apparatus of a neural network model, which corresponds to the embodiment of the method shown in fig. 1, and which is particularly applicable to various electronic devices.

As shown in fig. 4, the quantization apparatus 400 of the neural network model provided in this embodiment includes: quantization section 401, adjustment section 402, and determination section 403. The quantization unit 401 may be configured to quantize the neural network model to be quantized based on a current quantization mapping function, and test performance of a current quantized neural network model quantized based on the current quantization mapping function, where the quantization mapping function is a preset function. The adjusting unit 402 may be configured to iteratively adjust the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the current quantized neural network model. The determining unit 403 may be configured to determine the current quantized neural network model as the target neural network model in response to determining that the current quantized neural network model satisfies the preset convergence condition.

In the present embodiment, in the quantization apparatus 400 of the neural network model: the detailed processing and the technical effects of the quantization unit 401, the adjustment unit 402, and the determination unit 403 can refer to the related descriptions of step 101, step 102, and step 103 in the corresponding embodiment of fig. 1, which are not described herein again.

In some optional implementations of this embodiment, the parameters of the quantization mapping function include: mapping transformation parameters, or parameters of the quantization mapping function comprise mapping transformation parameters and mapping interval thresholds; the mapping transformation parameters comprise parameters representing mathematical transformation relations between parameters of the neural network model to be quantized and parameters of the quantized neural network model, and the mapping interval threshold represents mapping relations between intervals to which the parameters of the neural network model to be quantized belong and intervals to which the parameters of the quantized neural network model belong.

In some optional implementations of this embodiment, the quantization apparatus of the neural network model further includes: a sending unit (not shown in the figure). The sending unit may be configured to send the target neural network model to the task execution end, so as to deploy the target neural network model at the task execution end side and execute the corresponding media data processing task.

In some optional implementations of this embodiment, the quantization apparatus of the neural network model further includes: a bit width determination unit (not shown in the figure). The bit width determining unit may be configured to determine a quantization bit width based on the hardware operating environment information of the task execution end, and determine a current quantization mapping function according to the quantization bit width.

In some optional implementations of this embodiment, the quantization apparatus of the neural network model further includes: an update unit (not shown in the figure). The updating unit may be configured to determine, in response to determining that the number of times of iterative adjustment of the parameter of the neural network model to be quantized and the parameter of the current quantized mapping function exceeds a preset number threshold and that the performance of the current quantized neural network model does not reach a preset performance threshold, a quantized mapping function from a preset quantized mapping function search space based on the performance of the current quantized neural network model to update the current quantized mapping function.

In the quantization apparatus for a neural network model provided in the embodiment of the present application, first, the quantization unit 401 quantizes a neural network model to be quantized based on a current quantization mapping function and tests the performance of the current quantized neural network model quantized based on the current quantization mapping function; secondly, the adjusting unit 402 iteratively adjusts the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the current quantized neural network model; finally, the determining unit 403 determines the current quantized neural network model as the target neural network model in response to determining that the current quantized neural network model satisfies the preset convergence condition. Therefore, in each iteration process, the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function are adjusted based on the performance of the current quantized neural network model, after the iteration is completed, the target neural network model with the minimum quantization loss can be learned on line, and the precision of the quantized neural network model can be improved by automatically searching and learning the parameters of the quantization mapping function in the model quantization process.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device of a quantization method of a neural network model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.

Memory 502 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for quantifying neural network models provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of quantifying a neural network model provided herein.

The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the quantization method of the neural network model in the embodiment of the present application (for example, the quantization unit 401, the adjustment unit 402, and the determination unit 403 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing, i.e., implementing the quantization method of the neural network model in the above method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.

The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an electronic device of a quantization method of a neural network model, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 502 optionally includes memory located remotely from the processor 501, and these remote memories may be networked to the electronics of the neural network model quantification method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the quantization method of the neural network model may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus 505 or other means, and fig. 5 illustrates an example in which these are connected by the bus 505.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the quantization method of the neural network model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of quantifying a neural network model, comprising:

based on a current quantization mapping function, quantizing a neural network model to be quantized and testing the performance of the current quantized neural network model obtained based on the quantization of the current quantization mapping function, wherein the quantization mapping function is a preset function;

iteratively adjusting the parameters of the neural network model to be quantized and the parameters of the current quantization mapping function based on the performance of the currently quantized neural network model;

and in response to determining that the current quantized neural network model meets a preset convergence condition, determining that the current quantized neural network model is a target neural network model.

2. The method of claim 1, wherein the quantizing parameters of the mapping function comprises: mapping transformation parameters, or parameters of the quantization mapping function comprise mapping transformation parameters and mapping interval thresholds;

the mapping transformation parameters comprise parameters representing mathematical transformation relations between parameters of the neural network model to be quantized and parameters of the quantized neural network model, and the mapping interval threshold represents mapping relations between intervals to which the parameters of the neural network model to be quantized belong and intervals to which the parameters of the quantized neural network model belong.

3. The method of claim 1, further comprising:

and sending the target neural network model to a task execution end so as to deploy the target neural network model at the task execution end side and execute a corresponding media data processing task.

4. The method of claim 3, wherein the method further comprises:

and determining a quantization bit width based on the hardware operation environment information of the task execution end, and determining a current quantization mapping function according to the quantization bit width.

5. The method of any of claims 1-4, further comprising:

and determining a quantization mapping function from a preset quantization mapping function search space based on the performance of the current quantized neural network model so as to update the current quantization mapping function in response to the fact that the number of times of iterative adjustment of the parameter of the neural network model to be quantized and the parameter of the current quantization mapping function exceeds a preset number threshold and the performance of the current quantized neural network model does not reach a preset performance threshold.

6. An apparatus for quantizing a neural network model, comprising:

the quantization unit is configured to quantize the neural network model to be quantized based on a current quantization mapping function, and test the performance of the current quantized neural network model quantized based on the current quantization mapping function, wherein the quantization mapping function is a preset function;

an adjusting unit configured to iteratively adjust parameters of the neural network model to be quantized and parameters of the current quantization mapping function based on performance of the current quantized neural network model;

a determining unit configured to determine the current quantized neural network model as a target neural network model in response to determining that the current quantized neural network model satisfies a preset convergence condition.

7. The apparatus of claim 6, wherein the parameters of the quantized mapping function comprise: mapping transformation parameters, or parameters of the quantization mapping function comprise mapping transformation parameters and mapping interval thresholds;

8. The apparatus of claim 6, further comprising:

a sending unit configured to send the target neural network model to a task execution end so as to deploy the target neural network model at the task execution end side and execute a corresponding media data processing task.

9. The apparatus of claim 8, further comprising:

and the bit width determining unit is configured to determine a quantization bit width based on the hardware operating environment information of the task execution end, and determine a current quantization mapping function according to the quantization bit width.

10. The apparatus of any of claims 6-9, further comprising:

an updating unit configured to determine a quantized mapping function from a preset quantized mapping function search space based on the performance of the current quantized neural network model to update the current quantized mapping function in response to determining that the number of iterative adjustments to the parameter of the neural network model to be quantized and the parameter of the current quantized mapping function exceeds a preset number threshold and the performance of the current quantized neural network model does not reach a preset performance threshold.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.