WO2021164750A1

WO2021164750A1 - Method and apparatus for convolutional layer quantization

Info

Publication number: WO2021164750A1
Application number: PCT/CN2021/076983
Authority: WO
Inventors: 韩凯; 杨朝晖; 王云鹤; 许春景
Original assignee: 华为技术有限公司
Priority date: 2020-02-21
Filing date: 2021-02-20
Publication date: 2021-08-26
Also published as: CN111368972A; CN111368972B

Abstract

Provided is a method for convolutional layer quantization, applied to the field of artificial intelligence, comprising: obtaining image data, annotated values, a first convolutional neural network, and N candidate quantization values, the first convolutional neural network comprising a target convolutional layer, the target convolutional layer comprising weight values, the weight values corresponding to N probability values, each of the N probability values corresponding to a candidate quantization value, the weight values being quantization expected values determined according to the N probability values and the N candidate quantization values; processing the image data by means of the first convolutional neural network to obtain a second convolutional neural network, the second convolutional neural network comprising the updated weight values; performing weighting on the updated weight values to obtain a third convolutional neural network. The method can improve the update accuracy of network parameters.

Description

Method and device for quantizing convolutional layer

This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 202010109185.5, and the invention title is "a method and device for quantifying convolutional layers" on February 21, 2020, the entire content of which is by reference Incorporated in this application.

Technical field

This application relates to the field of artificial intelligence, and in particular to a method and device for quantization of a convolutional layer.

Background technique

The deep convolutional neural network has hundreds or even tens of millions of parameters after training, for example, the weight parameters and bias parameters included in the convolutional neural network model parameters, as well as the feature map parameters of each layer of convolutional layer, etc. . And the storage of model parameters and feature map parameters is based on 32 bits. Due to the large number of parameters and the large amount of data, the entire convolution calculation process needs to consume a large amount of storage and computing resources. The development of deep convolutional neural networks is moving in the direction of "deeper, larger and more complex". As far as the model size of deep convolutional neural networks is concerned, it cannot be transplanted to mobile phones or embedded chips at all, even if you want to Through network transmission, high bandwidth occupancy rate often becomes a difficult problem for engineering realization.

At present, the solution to reduce the complexity of the convolutional neural network without reducing the accuracy of the convolutional neural network is mainly realized by using a method of quantifying the parameters of the convolutional neural network. However, the current quantization method uses a straight-through estimator (STE) to approximate the gradient of the network parameters, which is inaccurate, which will affect the update accuracy of the network parameters.

Summary of the invention

In the first aspect, this application provides a convolutional layer quantization method, the method includes:

Acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;

The image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.

Optionally, in a design of the first aspect, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding The hidden variable is calculated, and the iteratively updating the weight value according to the target loss function includes:

The weight value is updated by updating the N hidden variables according to the target loss function.

Optionally, in a design of the first aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the processing of the image data through the first convolutional neural network includes:

Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.

Optionally, in a design of the first aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.

Optionally, in a design of the first aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;

Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.

Optionally, in a design of the first aspect, the preset function is the following function:

Wherein P _i is the i-th candidate probability values corresponding to the quantized values, and the W _pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.

Optionally, in a design of the first aspect, the weight value is calculated based on the following method:

Wherein said W _q is a weight value, V _i is the i-th quantized candidate values, the probability P _i is the i-th candidate value corresponding to the quantized values.

In the second aspect, the present application provides a method for quantizing a convolutional layer, the method including:

Obtain a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;

Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function until the target loss satisfies a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Optionally, in a design of the second aspect, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding The hidden variable is calculated, and the iteratively updating the weight value according to the target loss function includes:

Optionally, in a design of the second aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the feeding forward the first convolutional neural network includes:

Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.

Optionally, in a design of the second aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.

Optionally, in a design of the second aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:

Optionally, in a design of the second aspect, the preset function is the following function:

Optionally, in a design of the second aspect, the weight value is calculated based on the following method:

In a third aspect, the present application provides a convolutional layer quantization device, the device includes:

The acquisition module is used to acquire image data, annotated values, a first convolutional neural network and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value. The weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight The value is a quantization expected value determined according to the N probability values and the N candidate quantization values;

The training module is used to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value The difference between satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.

Optionally, in a design of the third aspect, the weight value corresponds to N hidden variables, each probability value of the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding Calculated by hidden variables, the training module is specifically used for:

Optionally, in a design of the third aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:

Optionally, in a design of the third aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.

Optionally, in a design of the third aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:

Optionally, in a design of the third aspect, the preset function is the following function:

Optionally, in a design of the third aspect, the weight value is calculated based on the following method:

In a fourth aspect, the present application provides a convolutional layer quantization device, the device includes:

An acquisition module for acquiring a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N Each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N Probability values and quantized expected values determined by the N candidate quantized values;

The training module is used to feed forward the first convolutional neural network and iteratively update the weight value according to the target loss function until the target loss meets a preset condition to obtain a second convolutional neural network. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Optionally, in a design of the fourth aspect, the weight value corresponds to N hidden variables, each probability value of the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding Calculated by hidden variables, the training module is specifically used for:

Optionally, in a design of the fourth aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:

Optionally, in a design of the fourth aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.

Optionally, in a design of the fourth aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth convolutional neural network of the M fifth neural networks Perform feedforward to obtain M output features, and the second BN layer is used to update the updated target volume included in the third convolutional neural network according to the second mean and second standard deviation of the M output features The output feature of the multilayer is subjected to BN operation.

Optionally, in a design of the fourth aspect, the preset function is the following function:

Optionally, in a design of the fourth aspect, the weight value is calculated based on the following method:

In the fifth aspect, an embodiment of the present application provides a neural network structure search device, which may include a memory, a processor, and a bus system, where the memory is used to store programs, and the processor is used to execute the programs in the memory to execute the above-mentioned One aspect and any optional method thereof or the second aspect and any optional method described above.

In a sixth aspect, the embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the first aspect and any one thereof. Optional method or the above-mentioned second aspect and any optional method thereof.

In the seventh aspect, the embodiments of the present application provide a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect and any of its optional methods or the above-mentioned second aspect and any of its optional methods. method.

In an eighth aspect, the present application provides a chip system that includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing the data involved in the above methods; Or, information. In a possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also include chips and other discrete devices.

An embodiment of the present application provides a method for quantizing a convolutional layer. The method includes acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target volume. Multilayer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight The value is the probability size of the corresponding candidate quantization value, and the weight value is the expected quantization value determined according to the N probability values and the N candidate quantization values; the image data is processed by the first convolutional neural network. Processing is performed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; weighting the updated weight value to obtain a third convolutional neural network, The third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is a candidate quantized value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.

Description of the drawings

Figure 1 is a schematic diagram of a structure of the main frame of artificial intelligence;

Figure 2 is a schematic diagram of a scenario of this application;

Figure 3 is a schematic diagram of a scenario of this application;

Figure 4 is a system architecture provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a convolutional neural network provided by an embodiment of the application;

FIG. 6 is a schematic diagram of a convolutional neural network provided by an embodiment of the application;

FIG. 7 is a hardware structure of a chip provided by an embodiment of the application;

FIG. 8 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application;

FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application;

FIG. 10 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application;

FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application;

FIG. 12 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application;

FIG. 13 is a schematic diagram of a structure of a convolutional layer quantization device provided by an embodiment of the application;

FIG. 14 is a schematic structural diagram of a training device provided by an embodiment of the application;

FIG. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application.

Detailed ways

The embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention. The terms used in the embodiment of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.

The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

The terms "first", "second", etc. in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application. In addition, the terms "include" and "have" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product or device containing a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products, or equipment.

First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a schematic diagram of the main framework of artificial intelligence. (Vertical axis) Two dimensions explain the above-mentioned artificial intelligence theme framework. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom". The "IT value chain" from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform. Communicate with the outside through sensors; computing capabilities are provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc. For example, sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.

Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.

(4) General ability

After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.

(5) Smart products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.

The embodiments of this application are mainly applied in fields such as driving assistance, automatic driving, and mobile phone terminals.

Here are several application scenarios:

Application scenario 1: ADAS/ADS visual perception system

As shown in Figure 2, in ADAS and ADS, real-time detection of multiple types of 2D targets is required, including: dynamic obstacles (Pedestrian, Cyclist, Tricycle, Car, Truck) (Truck), bus (Bus), static obstacles (TrafficCone, TrafficStick, FireHydrant, Motocycle, Bicycle), traffic signs ( TrafficSign, GuideSign, Billboard, TrafficLight_Red/TrafficLight_Yellow/TrafficLight_Green/TrafficLight_Black, RoadSign). In addition, in order to accurately obtain the area occupied by the dynamic obstacle in the 3-dimensional space, it is also necessary to perform a 3D estimation of the dynamic obstacle and output a 3D frame. In order to integrate with the lidar data, it is necessary to obtain the mask of the dynamic obstacle, so as to filter out the laser point cloud hitting the dynamic obstacle; in order to carry out accurate parking space, it is necessary to detect the 4 key points of the parking space at the same time ; In order to locate the composition, it is necessary to detect the key points of the static target. This is a semantic segmentation problem. The camera of the self-driving vehicle captures the road picture, and the picture needs to be segmented to separate different objects such as road surface, roadbed, vehicle, pedestrian, etc., so as to keep the vehicle driving in the correct area. For autonomous driving with extremely high safety requirements, it is necessary to understand the picture in real time, and a convolutional neural network that can run in real time for semantic segmentation is very important.

Application scenario 2: mobile phone beauty function

As shown in Figure 3, in the mobile phone, the mask and key points of the human body are detected through the neural network provided by the embodiments of the present application, and the corresponding parts of the human body can be zoomed in and out, such as waist reduction and buttocks operation, so as to output beauty picture of.

Application scenario 3: Image classification scenario:

After obtaining the image to be classified, the object recognition device adopts the object recognition method of this application to obtain the image to be classified;

The category of the object in the image to be classified can then be classified according to the object category of the object in the image to be classified. For photographers, many photos are taken every day, including animals, people, and plants. Using the method of the present application, photos can be quickly classified according to the content of the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.

In the case of a relatively large number of images, the manual classification method is relatively inefficient, and people are prone to fatigue when dealing with the same thing for a long time. At this time, the classification result will have a large error.

Application Scenario 4 Commodity Classification:

After the object recognition device obtains the image of the product, it then uses the object recognition method of the present application to obtain the category of the product in the image of the product, and then classifies the product according to the category of the product. For a wide variety of commodities in large shopping malls or supermarkets, the object recognition method of the present application can quickly complete the classification of commodities, reducing time and labor costs.

Application Scenario 5: Face verification at entrance gates

This is an image similarity comparison problem. At the gates at the entrances of high-speed railways and airports, when passengers perform face authentication, the camera will take facial images, use convolutional neural networks to extract features, and perform similarity calculations with the image features of ID documents stored in the system. If they are similar The verification is successful if the degree is high. Among them, the convolutional neural network is the most time-consuming feature extraction. To perform face verification quickly, an efficient convolutional neural network is required for feature extraction.

Application Scenario 6: Simultaneous Interpretation by Translator

This is a speech recognition and machine translation problem. In speech recognition and machine translation, convolutional neural networks are also a common recognition model. In scenes that require simultaneous interpretation, real-time speech recognition and translation must be achieved. An efficient convolutional neural network can bring a better experience to the translator.

The neural network model trained in the embodiment of the present application can realize the above-mentioned functions.

Since the embodiments of the present application involve a large number of applications of neural networks, in order to facilitate understanding, the following first introduces related terms, neural networks and other related concepts involved in the embodiments of the present application.

(1) Neural network

A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs. The output of the arithmetic unit can be:

Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.

(2) Deep neural network

Deep Neural Network (DNN) can be understood as a neural network with many hidden layers. There is no special metric for "many" here. The essence of the multi-layer neural network and deep neural network we often say The above is the same thing. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression:

in,

Is the input vector,

Is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector

After such a simple operation, the output vector is obtained

Due to the large number of DNN layers, the coefficient W and the offset vector

The number is also a lot. So, how are specific parameters defined in DNN? First, let's take a look at the definition of the coefficient W. Take a three-layer DNN as an example. For example, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4. In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as

Note that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.

(3) Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.

(4) Backpropagation algorithm

Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.

(5) Recurrent Neural Networks (RNN) are used to process sequence data. exist

In the traditional neural network model, from the input layer to the hidden layer and then to the output layer, the layers are fully connected, and the nodes in each layer are disconnected. Although this ordinary neural network has solved many problems, it is still powerless for many problems. For example, if you want to predict what the next word of a sentence will be, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output. The specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN.

Now that there is a convolutional neural network, why do you need to recycle neural networks? The reason is simple. In a convolutional neural network, there is a premise that the elements are independent of each other, and the input and output are also independent, such as cats and dogs. But in the real world, many elements are interconnected, such as the changes in stocks over time, and another person said: I like to travel, and my favorite place is Yunnan, and I must go if I have the opportunity in the future. To fill in the blanks here, humans should all know that it means to fill in "Yunnan". Because humans will make inferences based on the content of the context, but how to make the machine do this step? RNN came into being. RNN aims to make machines have memory capabilities like humans. Therefore, the output of RNN needs to rely on current input information and historical memory information.

(6) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.

(7) Backpropagation algorithm

The neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.

The following describes the system architecture provided by the embodiments of the present application.

Referring to FIG. 4, an embodiment of the present application provides a system architecture 100. As shown in the system architecture 100, the data collection device 160 is used to collect training data. In the embodiment of the application, the training data includes: the image or image block and the category of the object; and the training data is stored in the database 130, and the training device 120 is trained based on the training data maintained in the database 130 to obtain a CNN feature extraction model (explanation: the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.). The following will describe in more detail how the training device 120 obtains the CNN feature extraction model based on the training data with the first embodiment. The CNN feature extraction model can be used to implement the neural network provided by the embodiment of the application, that is, the image or image block to be recognized After relevant preprocessing, input the CNN feature extraction model to obtain the 2D, 3D, Mask, key points and other information of the object of interest in the image or image block to be recognized. The CNN feature extraction model in the embodiment of the present application may specifically be a CNN convolutional neural network. It should be noted that in actual applications, the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 does not necessarily train the CNN feature extraction model completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as an implementation of this application. Limitations of examples.

The target model/rule trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 4. The execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook. Computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds. In FIG. 4, the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices. The user can input data to the I/O interface 112 through the client device 140. The input data in the embodiment of the present application may include: an image to be recognized or an image block or a picture.

When the execution device 120 preprocesses the input data, or when the calculation module 111 of the execution device 120 executes calculations and other related processing (such as performing the function realization of the neural network in this application), the execution device 120 may call the data storage system 150 The data, codes, etc. are used for corresponding processing, and the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150.

Finally, the I/O interface 112 returns the processing result, such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture, to the client device 140 to provide it to the user.

Optionally, the client device 140 may be a planning control unit in an automatic driving system or a beauty algorithm module in a mobile phone terminal.

It is worth noting that the training device 120 can generate corresponding target models/rules based on different training data for different goals or different tasks, and the corresponding target models/rules can be used to achieve the above goals or complete the above tasks. , So as to provide users with the desired results.

In the case shown in FIG. 4, the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112. In another case, the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure. Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure. The data is stored in the database 130.

It is worth noting that FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 4, the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.

As shown in FIG. 4, a CNN feature extraction model is obtained through training according to the training device 120. The CNN feature extraction model may be a CNN convolutional neural network in this embodiment of the application, or may be a neural network that will be introduced in the following embodiment.

Since CNN is a very common neural network, the structure of CNN will be introduced in detail below in conjunction with Figure 5. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture. The deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of. As a deep learning architecture, CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.

The structure of the neural network specifically used in the image processing method of the embodiment of the present application may be as shown in FIG. 5. In FIG. 5, a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230. Among them, the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained. The following describes the internal layer structure in CNN 200 in Figure 5 in detail.

Convolutional layer/pooling layer 220:

Convolutional layer:

As shown in FIG. 5, the convolutional layer/pooling layer 220 may include layers 221-226, for example: in an implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer. Layers, 224 is the pooling layer, 225 is the convolutional layer, and 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers. Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.

The following will take the convolutional layer 221 as an example to introduce the internal working principle of a convolutional layer.

The convolution layer 221 can include many convolution operators. The convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the step size

The value of stride) is processed to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row×column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform obfuscation and so on. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .

When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network With the deepening of the network 200, the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.

Pooling layer:

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the 221-226 layers as illustrated by 220 in Figure 5, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the sole purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 230:

After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 5) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240. The output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 5, the propagation from 210 to 240 directions is forward propagation) is completed, the back propagation (as shown in Figure 5, the propagation from 240 to 210 directions is reverse propagation). Start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the output result of the convolutional neural network 200 through the output layer and the ideal result.

The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 6. In FIG. 6, a convolutional neural network (CNN) 200 may include an input layer 110, a convolutional layer/pooling layer 120 (the pooling layer is optional), and a neural network layer 130. Compared with FIG. 5, multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 6 are parallel, and the respectively extracted features are input to the full neural network layer 130 for processing.

It should be noted that the convolutional neural network shown in FIG. 5 and FIG. 6 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.

In addition, the structure of the convolutional neural network obtained by the search method of the neural network structure of the embodiment of the present application may be as shown in the convolutional neural network structure in FIG. 5 and FIG. 6.

FIG. 7 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50. The chip can be set in the execution device 110 as shown in FIG. 4 to complete the calculation work of the calculation module 111. The chip can also be set in the training device 120 as shown in FIG. 4 to complete the training work of the training device 120 and output the target model/rule. The algorithms of each layer in the convolutional neural network as shown in FIG. 5 and FIG. 6 can all be implemented in the chip as shown in FIG. 7.

Neural network processor NPU 50, NPU is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks. The core part of the NPU is the arithmetic circuit 503. The controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.

In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.

The vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .

In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.

The unified memory 506 is used to store input data and output data.

The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.

The bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.

An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;

The controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.

Optionally, the input data here in this application is a picture, and the output data is information such as 2D, 3D, Mask, and key points of the object of interest in the picture.

Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.

The execution device 110 in FIG. 5 introduced above can execute the image processing method or each step of the image processing method of the embodiment of the present application. The CNN model shown in FIG. 6 and FIG. 7 and the chip shown in FIG. 7 can also be used for Perform the image processing method or each step of the image processing method in the embodiment of the present application. The image processing method of the embodiment of the present application and the image processing method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.

The embodiment of the present application provides a system architecture. The system architecture includes a local device, a local device, an execution device and a data storage system, where the local device and the local device are connected to the execution device through a communication network.

The execution device can be implemented by one or more servers. Optionally, the execution device can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. Execution equipment can be arranged on one physical site or distributed on multiple physical sites. The execution device may use the data in the data storage system or call the program code in the data storage system to implement the method for searching the neural network structure of the embodiment of the present application.

Users can operate their respective user equipment (for example, local equipment and local equipment) to interact with the execution equipment. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.

Each user's local device can interact with the execution device through any communication mechanism/communication standard communication network. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination of them.

In one implementation, the local device and the local device obtain the relevant parameters of the target neural network from the execution device, deploy the target neural network on the local device, the local device, and use the target neural network for image classification or image processing, etc. .

In another implementation, the target neural network can be directly deployed on the execution device, and the execution device obtains the image to be processed from the local device and the local device, and classifies or performs other types of image processing on the image to be processed according to the target neural network.

The foregoing execution device may also be referred to as a cloud device. At this time, the execution device is generally deployed in the cloud.

First, the description provided by this application will be described from the training side. The method shown in FIG. 8 may be executed by a convolutional layer quantization device, and the convolutional layer quantization device may be a computer, a server, or the like. Referring to FIG. 8, FIG. 8 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application. As shown in FIG. 8, the convolutional layer quantization method provided by this application includes:

801. Obtain image data, a label value, a first convolutional neural network, and N candidate quantized values, where the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value Corresponding to N probability values, each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on The expected quantization value determined by the N probability values and the N candidate quantization values.

In the embodiment of the present application, the training device can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value , The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values.

In the embodiment of the present application, a first convolutional neural network and N candidate quantized values {v ₁ , v ₂ ,..., v _N } can be obtained. The first convolutional neural network includes multiple convolutional layers, and the target The convolutional layer is one of multiple convolutional layers. The weight matrix W corresponding to the target convolutional layer can include multiple weight values. It is set to quantize the weight values into N candidate quantized values {v ₁ ,v ₂ ,...,V _N }, the probability that the target weight value belongs to the N candidate quantized values are:

Wherein P _i is the i-th candidate probability values corresponding to the quantized values, and the W _pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ. The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The absolute value of the difference between the value and 1 is smaller. Taking the above probability as an example, in the iterative training process, the closer τ is to 0, the closer one of the N probability values will be to 1.

During training, the expected quantization value determined according to the N probability values and the N candidate quantization values may be used as the weight value and the input feature to perform a convolution operation, and the weight value is calculated based on the following method:

The weight value will be used to perform convolution calculation with the input feature to obtain the output feature y _q ;

Taking a binary neural network as an example, the parameter to be trained in the existing quantization method is W, and the parameter to be trained in the embodiment of the present application is W _pi . The quantization process of the existing traditional method is: W _q =sign(W). This process is not divisible at the zero point, so it is difficult to train. Therefore, the gradient of the network parameters is approximately calculated by using a straight through estimator (STE). The gradient is inaccurate, which will affect the update accuracy of the network parameters. The weight value quantization process in the embodiment of the present application is _{a mapping from W pi} to W _q . The mapping process is derivable, which solves the problem that the mapping process from the weight value to be trained to the quantized value in the traditional quantization process is not derivable. .

Through the quantization method in the embodiment of the present application, _{the derivative of W q} can be directly obtained through the back-propagation algorithm, and then the parameter W _pi can be trained.

802. Process the image data through the first convolutional neural network to obtain a detection result and a target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value are different If the difference meets the preset condition, a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.

In the embodiment of the present application, after acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values, the training device may process the image data through the first convolutional neural network to obtain The detection result and the target loss are updated iteratively according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained, and the second convolution The neural network includes updated weight values, and the updated weight values correspond to the updated N probability values.

In the embodiment of the present application, the first convolutional neural network may be fed forward, and the weight value may be iteratively updated according to the target loss function, until the target loss satisfies a preset condition, and the second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.

In the embodiment of the present application, in the training process, the N hidden variables may be updated based on the loss function, and then the weight value may be updated. And during the training process, the value of the temperature coefficient can be updated to make the temperature coefficient close to the preset value. For example, the temperature coefficient τ can be gradually attenuated from a larger value (pre-set) to close to 0, so that N The probability value P _i tends to 0 or 1, so that _{the candidate quantization value corresponding to P i} close to 1 is used as the value to be quantized into the weight value.

803. Perform weighting on the updated weight value to obtain a third convolutional neural network, where the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.

In the embodiment of the present application, {v ₁ , v ₂ ,..., v _N } corresponding to the maximum probability value can be used as the quantized weight value, namely:

W _d =∑ _i v _i (P ₁ =max(P ₁ ,..., P _N ));

W _d can be used to perform convolution calculation with the input feature to get the output feature y _d

In the embodiment of the present application, each weight value in the weight matrix can be processed in the above-mentioned manner, and the updated weight value is weighted to obtain the third convolutional neural network.

Referring to FIG. 9, FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application. As shown in FIG. 9, by updating the value of the hidden variable, the probability value and the weight value are updated. The weight value is used to perform a convolution operation with the input feature to obtain the output feature.

10, FIG. 10 is a schematic diagram of the structure of the convolutional layer in an application in an embodiment of the application. As shown in FIG. 10, the quantized weight value obtained through training can be used to perform the convolution with the input feature. Product operation to get the output characteristics.

In the embodiment of the present application, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used according to The first mean value and the first standard deviation of the output feature of the target convolutional layer perform a BN operation on the output feature of the target convolutional layer. That is, in the training process, the BN layer performs BN operations based on the mean and standard deviation of the output features of the convolutional layer in the current feedforward process.

In the embodiment of the present application, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks includes the updated Quantify the weight value of the updated weight included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth of the M fifth neural networks The convolutional neural network performs feedforward to obtain M output features, and the second BN layer is used to update the third convolutional neural network according to the second mean and second standard deviation of the M output features The output feature of the subsequent target convolutional layer is subjected to BN operation. That is, in the training process, the convolutional neural network obtained after each parameter update can be quantified to obtain the fourth convolutional neural network. In the application process, the BN layer is based on the output characteristics of each fourth convolutional neural network The mean and standard deviation perform BN operations on the input features. It should be noted that the BN operation also needs to be based on the affine coefficients obtained during training. Regarding how to perform the BN operation, reference can be made to the description in the prior art, which will not be repeated here.

11, FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of the application. As shown in FIG. 11, the mean, standard deviation, and affine coefficients obtained through training can be used for sum input The feature performs BN operation to get the output feature.

Referring to FIG. 12, FIG. 12 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application. As shown in FIG. 12, the convolutional layer quantization method provided by this application includes:

1201. Obtain a first convolutional neural network and N candidate quantized values, where the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values , Each probability value in the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values And the expected quantization value determined by the N candidate quantization values.

1202. Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function, until the target loss meets a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The product neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.

1203. Perform weighting on the updated weight value to obtain a third convolutional neural network, where the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.

Optionally, the weight value may be updated by updating the N hidden variables according to a target loss function.

Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The first convolutional neural network may be fed forward multiple times, wherein the multiple feedforward includes a first feedforward process and a second feedforward process, and the second feedforward process is performed in the first After the feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the second feedforward process is performed on the first convolutional neural network When the preset function includes a second temperature coefficient, the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.

Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.

Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight The updated weight value corresponds to the updated N probability values, and the updated weight values included in the fourth convolutional neural network may also be quantified to obtain M fifth convolutional neural networks; Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.

Optionally, the preset function is the following function:

Optionally, the weight value is calculated based on the following method:

An embodiment of the present application provides a method for quantizing a convolutional layer. The method includes obtaining a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, and the target The convolutional layer includes weight values, the weight values corresponding to N probability values, each of the N probability values corresponds to a candidate quantization value, and each probability value indicates that the weight value corresponds to the candidate quantization value The probability size of the value, the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values; feed forward the first convolutional neural network, and iteratively update according to the objective loss function For the weight value, until the target loss satisfies a preset condition, a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value. N probability values; weighting the updated weight value to obtain a third convolutional neural network, the third convolutional neural network including a target quantization value corresponding to the updated weight value, the The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.

On the basis of the embodiments corresponding to FIG. 1 to FIG. 12, in order to better implement the above solutions of the embodiments of the present application, related equipment for implementing the above solutions is also provided below. For details, refer to FIG. 13, which is a schematic structural diagram of a convolutional layer quantization apparatus 1300 according to an embodiment of the application. The convolutional layer quantization apparatus 1300 may be a server, and the convolutional layer quantization apparatus 1300 includes:

The obtaining module 1301 is configured to obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value, The weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;

The training module 1302 is configured to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label The difference between the values satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

The weight value quantization module 1303 is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value , The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.

Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and the training module 1302, specifically used for:

Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The training module 1302 is specifically used for:

Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module 1303 is further configured to:

Optionally, the preset function is the following function:

Optionally, the weight value is calculated based on the following method:

The embodiment of the present application provides a convolutional layer quantization device 1300. The acquisition module 1301 acquires image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolution Layer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight value Is the probability size of the corresponding candidate quantized value, and the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values; the training module 1302 uses the first convolutional neural network to analyze the The image data is processed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets the preset condition, and the second convolutional neural network is obtained, The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; the weight value quantization module 1303 performs weighting on the updated weight value to obtain A third convolutional neural network, where the third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is the largest probability value among the updated N probability values The corresponding candidate quantization value. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.

In the embodiment of the present application, the convolutional layer quantization apparatus 1300 may further include:

The obtaining module 1301 is configured to obtain a first convolutional neural network and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;

The training module 1302 is used to feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function, until the target loss meets a preset condition to obtain a second convolutional neural network, so The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and the training module , Specifically used for:

Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The training module is specifically used for:

Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is further used for:

Optionally, the preset function is the following function:

Optionally, the weight value is calculated based on the following method:

The embodiment of the present application provides a convolutional layer quantization device 1300. The acquisition module 1301 acquires a first convolutional neural network and N candidate quantization values. The first convolutional neural network includes a target convolutional layer, and the target volume The product layer includes a weight value, the weight value corresponding to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value indicates that the weight value is a corresponding candidate quantized value The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values; the training module 1302 feeds forward the first convolutional neural network, and according to the target loss function The weight value is updated iteratively until the target loss satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value. After the N probability values; the weight value quantization module 1303 weights the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes corresponding to the updated weight value The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.

An embodiment of the present application also provides a training device. Please refer to FIG. 14. FIG. 14 is a schematic structural diagram of the training device provided in an embodiment of the present application. The training device is used to implement the function of the convolutional layer quantization device in the embodiment corresponding to FIG. 13. Specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performance. It may include one or more central processing units (CPU) 1414 (for example, one or more processors) and memory 1432, and one or more storage media 1430 (for example, one or A storage device in Shanghai). Among them, the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device. Furthermore, the central processing unit 1414 may be configured to communicate with the storage medium 1430, and execute a series of instruction operations in the storage medium 1430 on the training device 1400.

The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, and one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.

In the embodiment of the present application, the central processing unit 1414 is configured to execute the data processing method executed by the convolutional layer quantization device in the embodiment corresponding to FIG. 12.

Specifically, the central processing unit 1414 can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value , The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;

Optionally, the weight value corresponds to N hidden variables, and each probability value of the N probability values corresponds to a hidden variable, and the central processing unit 1414 may execute:

Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The central processing unit 1414 can execute:

Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the method further includes:

Optionally, the preset function is the following function:

Optionally, the weight value is calculated based on the following method:

The embodiments of the present application also provide a product including a computer program, which when running on a computer, causes the computer to execute the steps performed by the aforementioned training device.

The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a program for signal processing, and when it runs on a computer, the computer executes the following steps:

Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and according to the target The loss function iteratively updates the weight value, including:

Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , Said feeding forward the first convolutional neural network includes:

Optionally, the preset function is the following function:

Optionally, the weight value is calculated based on the following method:

The execution device, training device, or terminal device provided by the embodiments of the present application may specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to make the chip in the execution device execute the data processing method described in the foregoing embodiment, or to make the chip in the training device execute the data processing method described in the foregoing embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.

Specifically, please refer to Fig. 15. Fig. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application. On the CPU), the Host CPU assigns tasks. The core part of the NPU is the arithmetic circuit 1503. The arithmetic circuit 1503 is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.

In some implementations, the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 1501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 1508.

The unified memory 1506 is used to store input data and output data. The weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502. The input data is also transferred to the unified memory 1506 through the DMAC.

The BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1509.

The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.

The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 or to transfer the weight data to the weight memory 1502 or to transfer the input data to the input memory 1501.

The vector calculation unit 1507 includes multiple arithmetic processing units, and further processes the output of the arithmetic circuit if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.

In some implementations, the vector calculation unit 1507 can store the processed output vector to the unified memory 1506. For example, the vector calculation unit 1507 may apply a linear function; or, apply a nonlinear function to the output of the arithmetic circuit 1503, for example, perform linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate the activation value. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.

The instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;

The unified memory 1506, the input memory 1501, the weight memory 1502, and the fetch memory 1509 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

Among them, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.

In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.

Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments of this application method.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, training device, or data. The center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims

A method for convolutional quantization, characterized in that the method includes:

Acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;

The image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
The method according to claim 1, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the iteratively updating the weight value according to the target loss function includes:

The weight value is updated by updating the N hidden variables according to the target loss function.
The method according to claim 2, wherein each of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference from 1 is, the processing the image data through the first convolutional neural network includes:

Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
The method according to any one of claims 1 to 3, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
The method according to claim 4, characterized in that after iteratively updating the weight value according to the objective loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks is The convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;

Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
The method according to any one of claims 1 to 5, wherein the preset function is the following function:

Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
The method according to any one of claims 1 to 6, wherein the weight value is calculated based on the following method:

Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
A method for quantization of a convolutional layer, characterized in that the method includes:

Obtain a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;

Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function until the target loss satisfies a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
The method according to claim 8, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the iteratively updating the weight value according to the target loss function includes:

The weight value is updated by updating the N hidden variables according to the target loss function.
The method according to claim 9, wherein each of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the feeding forward the first convolutional neural network includes:

Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
The method according to any one of claims 8 to 10, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
The method according to claim 11, characterized in that after iteratively updating the weight value according to the objective loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks is The convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;

Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
The method according to any one of claims 8 to 12, wherein the preset function is the following function:

Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
The method according to any one of claims 8 to 13, wherein the weight value is calculated based on the following method:

Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
A convolutional layer quantization device, characterized in that the device includes:

The acquisition module is used to acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value. The weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight The value is a quantization expected value determined according to the N probability values and the N candidate quantization values;

The training module is used to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value The difference between satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
The device according to claim 15, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the training module is specifically used for:

The weight value is updated by updating the N hidden variables according to the target loss function.
The device according to claim 16, wherein each probability value of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the training module is specifically used for:

Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
The device according to any one of claims 15 to 17, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
The device according to claim 18, wherein the weight value is iteratively updated according to the target loss function to obtain M fourth convolutional neural networks, and each fourth convolutional neural network in the M fourth convolutional neural networks The convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;

Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
The device according to any one of claims 15 to 19, wherein the preset function is the following function:

Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
The device according to any one of claims 15 to 20, wherein the weight value is calculated based on the following method:

Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
A convolutional layer quantization device, characterized in that the device includes:

An acquisition module for acquiring a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N Each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N Probability values and quantized expected values determined by the N candidate quantized values;

The training module is used to feed forward the first convolutional neural network and iteratively update the weight value according to the target loss function until the target loss meets a preset condition to obtain a second convolutional neural network. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;

The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
The device according to claim 22, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on a corresponding hidden variable. Obtained by variable calculation, the training module is specifically used for:

The weight value is updated by updating the N hidden variables according to the target loss function.
The device according to claim 23, wherein each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the training module is specifically used for:

Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
The device according to any one of claims 22 to 24, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
The device according to claim 25, wherein the weight values are updated iteratively according to the target loss function to obtain M fourth convolutional neural networks, and each fourth convolutional neural network in the M fourth convolutional neural networks The convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:

Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;

Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
The device according to any one of claims 22 to 26, wherein the preset function is the following function:

Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
The device according to any one of claims 22 to 27, wherein the weight value is calculated based on the following method:

Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
A computer-readable storage medium for storing a computer program, wherein the computer program is used to execute instructions of the method for quantizing a convolutional layer according to any one of claims 1 to 14.